Paper available here: https://arxiv.org/abs/2110.15734
Abstract: Linear and semidefinite programming (LP, SDP), regularisation through basis pursuit (BP) and Lasso have seen great success in mathematics, statistics, data science, computer-assisted proofs and learning. The success of LP is traditionally attributed to the fact that it is “in P” for rational inputs. On the other hand, in his list of problems for the 21st century S. Smale calls for “[Computational] models which process approximate inputs and which permit round-off computations”. Indeed, since e.g. the exponential function does not have an exact representation or floating-point arithmetic approximates every rational number that is not in base-2, inexact input is a daily encounter. The model allowing inaccurate input of arbitrary precision, which we call the extended model, leads to extended versions of fundamental problems such as: “Are LP and other aforementioned problems in P?” The same question can be asked for an extended version of Smale’s 9th problem on the list of mathematical problems for the 21st century: “Is there a polynomial time algorithm over the real numbers which decides the feasibility of the linear system of inequalities, and if so, outputs a feasible candidate?” One can thus pose this problem in the extended model. Similarly, the optimisation problems BP, SDP and Lasso, where the task is to output a solution to a specified precision, can likewise be posed in the extended model, also considering randomised algorithms. We will collectively refer to these problems as the extended Smale’s 9th problem, which we settle in both the negative and the positive yielding two surprises: (1) In mathematics, sparse regularisation, statistics, and learning, one successfully computes with non-computable functions. (2) In order to mathematically characterise this phenomenon, one needs an intricate complexity theory for, seemingly paradoxically, non-computable functions.
Paper available here: https://arxiv.org/abs/2312.11425
Abstract: The arrival of AI techniques in computations, with the potential for hallucinations and non-robustness, has made trustworthiness of algorithms a focal point. However, trustworthiness of the many classical approaches are not well understood. This is the case for feature selection, a classical problem in the sciences, statistics, machine learning etc. Here, the LASSO optimisation problem is standard. Despite its widespread use, it has not been established when the output of algorithms attempting to compute support sets of minimisers of LASSO in order to do feature selection can be trusted. In this paper we establish how no (randomised) algorithm that works on all inputs can determine the correct support sets (with probability >1/2) of minimisers of LASSO when reading approximate input, regardless of precision and computing power. However, we define a LASSO condition number and design an efficient algorithm for computing these support sets provided the input data is well-posed (has finite condition number) in time polynomial in the dimensions and logarithm of the condition number. For ill-posed inputs the algorithm runs forever, hence, it will never produce a wrong answer. Furthermore, the algorithm computes an upper bound for the condition number when this is finite. Finally, for any algorithm defined on an open set containing a point with infinite condition number, there is an input for which the algorithm will either run forever or produce a wrong answer. Our impossibility results stem from generalised hardness of approximation — within the Solvability Complexity Index (SCI) hierarchy framework — that generalises the classical phenomenon of hardness of approximation.
Paper available here: https://arxiv.org/abs/2312.11429
Abstract: In Part I, we defined a LASSO condition number and developed an algorithm — for computing support sets (feature selection) of the LASSO minimisation problem — that runs in polynomial time in the number of variables and the logarithm of the condition number. The algorithm is trustworthy in the sense that if the condition number is infinite, the algorithm will run forever and never produce an incorrect output. In this Part II article, we demonstrate how finite precision algorithms (for example algorithms running floating point arithmetic) will fail on open sets when the condition number is large — but still finite. This augments Part I’s result: If an algorithm takes inputs from an open set that includes at least one point with an infinite condition number, it fails to compute the correct support set for all inputs within that set. Hence, for any finite precision algorithm working on open sets for the LASSO problem with random inputs, our LASSO condition number — as a random variable — will estimate the probability of success/failure of the algorithm. We show that a finite precision version of our algorithm works on traditional Gaussian data for LASSO with high probability. The algorithm is trustworthy, specifically, in the random cases where the algorithm fails, it will not produce an output. Finally, we demonstrate classical random ensembles for which the condition number will be large with high probability, and hence where any finite precision algorithm on open sets will fail. We show numerically how commercial software fails on these cases.
Paper available here: https://arxiv.org/abs/2109.06098
Abstract: The unprecedented success of deep learning (DL) makes it unchallenged when it comes to classification problems. However, it is well established that the current DL methodology produces universally unstable neural networks (NNs). The instability problem has caused an enormous research effort — with a vast literature on so-called adversarial attacks — yet there has been no solution to the problem. Our paper addresses why there has been no solution to the problem, as we prove the following mathematical paradox: any training procedure based on training neural networks for classification problems with a fixed architecture will yield neural networks that are either inaccurate or unstable (if accurate) — despite the provable existence of both accurate and stable neural networks for the same classification problems. The key is that the stable and accurate neural networks must have variable dimensions depending on the input, in particular, variable dimensions is a necessary condition for stability.
Our result points towards the paradox that accurate and stable neural networks exist, however, modern algorithms do not compute them. This yields the question: if the existence of neural networks with desirable properties can be proven, can one also find algorithms that compute them? There are cases in mathematics where provable existence implies computability, but will this be the case for neural networks? The contrary is true, as we demonstrate how neural networks can provably exist as approximate minimisers to standard optimisation problems with standard cost functions, however, no randomised algorithm can compute them with probability better than 1/2.
Paper available here: https://academic.oup.com/imamat/advance-article/doi/10.1093/imamat/hxad027/7323594?login=false
Abstract: We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgruntled member of a software development team. It could also be made by those wishing to exploit a ‘democratization of AI’ agenda, where network architectures and trained parameter sets are shared publicly. We develop a range of new implementable attack strategies with accompanying analysis, showing that with high probability a stealth attack can be made transparent, in the sense that system performance is unchanged on a fixed validation set which is unknown to the attacker, while evoking any desired output on a trigger input of interest. The attacker only needs to have estimates of the size of the validation set and the spread of the AI’s relevant latent space. In the case of deep learning neural networks, we show that a one-neuron attack is possible—a modification to the weights and bias associated with a single neuron—revealing a vulnerability arising from over-parameterization. We illustrate these concepts using state-of-the-art architectures on two standard image data sets. Guided by the theory and computational results, we also propose strategies to guard against stealth attacks.
Paper available here: https://arxiv.org/abs/2309.03665
Abstract: Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability — notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier’s decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required.
Paper available here: https://arxiv.org/abs/2309.07072
Abstract: In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accurate neural networks in the above settings is extremely challenging, if at all possible, even when such ideal solutions exist within the given class of neural architectures.
Paper available here: https://api.repository.cam.ac.uk/server/api/core/bitstreams/ad36b554-74da-474d-84c1-78ffa111960b/content
Abstract: The purpose of this paper is twofold. The first is to point out that the property of uniform recovery, meaning that all sparse vectors are recovered, does not hold in many applications where compressed sensing is successfully used. This includes fields like magnetic resonance imaging (MRI), nuclear magnetic resonance computerized tomography, electron tomography, radio interferometry, helium atom scattering, and fluorescence microscopy. We demonstrate that for natural compressed sensing matrices involving a level based reconstruction basis (e.g., wavelets), the number of measurements
required to recover all s-sparse signals for reasonable s is excessive. In particular, uniform recovery of all s-sparse signals is quite unrealistic. This realization explains why the restricted isometry property (RIP) is insufficient for explaining the success of compressed sensing in various practical applications.
The second purpose of the paper is to introduce a new framework based on a generalized RIP and a generalized nullspace property that fit the applications where compressed sensing is used. We demonstrate that the shortcomings previously used to prove that uniform recovery is unreasonable no longer apply if we instead ask for structured recovery that is uniform only within each of the levels. To examine this phenomenon, a new tool, termed the “restricted isometry property in levels”
(RIPL) is described and analyzed. Furthermore, we show that with certain conditions on the RIPL, a form of uniform recovery within each level is possible. Fortunately, recent theoretical advances made by Li and Adcock demonstrate the existence of large classes of matrices that satisfy the RIPL.
Moreover, such matrices are used extensively in applications such as MRI. Finally, we conclude the
paper by providing examples that demonstrate the optimality of the results obtained.
See here: https://bath-numerical-analysis.github.io/categories/2023/seminar_10_01
Abstract: Instability and hallucinations are the Achilles’ heel of modern AI and a paradox, with training algorithms finding unstable and hallucinating neural networks (NNs) despite the provable existence of stable and accurate NNs. This prompts the fundamental question – can one build trustworthy AI? This is now becoming a delicate question with connections to the regulation of AI technology in high risk areas. In this talk we discuss how this question is linked to recent results on the extended Smale’s 9th problem, from the list of mathematical problems for the 21st century, and newly discovered phase transitions in optimisation. In particular, we will discuss how these results link to the difficulty of creating verification algorithms that can verify the validity of the output of modern AI. In particular, we demonstrate how in general it is impossible to check if AI based on neural networks hallucinate or not. The immediate corollary is that trustworthy AI can only have a weaker from of trust – the ability to say ‘I don’t know’.
See here: https://aimath.org/pastworkshops/compproofsvrep.pdf for details and the resulting problem 5 studied by the participants
Participant list: https://aimath.org/cgi-bin/showparticipants2.prl?workshop=889&mode=participantlist
Abstract: In this talk we discuss how the problem of building trustworthy AI is linked to recent results on the extended Smale’s 9th problem, from the list of mathematical problems for the 21st century, and newly discovered phase transitions in optimisation. In particular, we will discuss how these results link to the difficulty of creating verification algorithms that can verify the validity of the output of modern AI.
Abstract: Instability is the Achilles’ heel of AI and a paradox, with typical training algorithms unable to recover stable neural networks (NNs). Hence the fundamental question: can one find algorithms that compute stable and accurate NNs? If not, what are the foundational barriers we encounter across machine learning? These questions are linked to recent results on the extended Smale’s 9th problem, which uncovers new phase transitions in optimisation and yields barriers on the computation of NNs.