The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID depends on the scale at which the data are analysed. Quite typically at a small scale, the ID is very large, as the data are affected by measurement errors. At large scale, the ID can also be erroneously large, due to the curvature and the topology of the manifold containing the data. In this work, we introduce an automatic protocol to select the sweet spot, namely the correct range of scales in which the ID is meaningful and useful. This protocol is based on imposing that for distances smaller than the correct scale the density of the data is constant. Since to estimate the density it is necessary to know the ID, this condition is imposed self-consistently. We illustrate the usefulness and robustness of this procedure by benchmarks on artificial and real-world datasets.
Linear causal disentanglement is a recent method in causal representation learning to describe a collection of observed variables via latent variables with causal dependencies between them. It can be viewed as a generalization of both independent component analysis and linear structural equation models. We study the identifiability of linear causal disentanglement, assuming access to data under multiple contexts, each given by an intervention on a latent variable. We show that one perfect intervention on each latent variable is sufficient and in the worst case necessary to recover parameters under perfect interventions, generalizing previous work to allow more latent than observed variables. We give a constructive proof that computes parameters via a coupled tensor decomposition. For soft interventions, we find the equivalence class of latent graphs and parameters that are consistent with observed data, via the study of a system of polynomial equations. Our results hold assuming the existence of non-zero higher-order cumulants, which implies non-Gaussianity of variables.
We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics for the Fokker-Planck equation, called stochastic thermodynamics. Based on the techniques of stochastic thermodynamics, we derive the speed-accuracy trade-off for the diffusion models, which is a trade-off relationship between the speed and accuracy of data generation in diffusion models. Our result implies that the entropy production rate in the forward process affects the errors in data generation. From a stochastic thermodynamic perspective, our results provide quantitative insight into how best to generate data in diffusion models. The optimal learning protocol is introduced by the conservative force in stochastic thermodynamics and the geodesic of space by the 2-Wasserstein distance in optimal transport theory. We numerically illustrate the validity of the speed-accuracy trade-off for the diffusion models with different noise schedules such as the cosine schedule, the conditional optimal transport, and the optimal transport.
Quantum machine learning models based on parametrized quantum circuits, also called quantum neural networks (QNNs), are considered to be among the most promising candidates for applications on near-term quantum devices. Here we explore the expressivity and inductive bias of QNNs by exploiting an exact mapping from QNNs with inputs $x$ to classical perceptrons acting on $x \otimes x$ (generalised to complex inputs). The simplicity of the perceptron architecture allows us to provide clear examples of the shortcomings of current QNN models, and the many barriers they face to becoming useful general-purpose learning algorithms. For example, a QNN with amplitude encoding cannot express the Boolean parity function for $n\geq 3$, which is but one of an exponential number of data structures that such a QNN is unable to express. Mapping a QNN to a classical perceptron simplifies training, allowing us to systematically study the inductive biases of other, more expressive embeddings on Boolean data. Several popular embeddings primarily produce an inductive bias towards functions with low class balance, reducing their generalisation performance compared to deep neural network architectures which exhibit much richer inductive biases. We explore two alternate strategies that move beyond standard QNNs. In the first, we use a QNN to help generate a classical DNN-inspired kernel. In the second we draw an analogy to the hierarchical structure of deep neural networks and construct a layered non-linear QNN that is provably fully expressive on Boolean data, while also exhibiting a richer inductive bias than simple QNNs. Finally, we discuss characteristics of the QNN literature that may obscure how hard it is to achieve quantum advantage over deep learning algorithms on classical data.
In this paper, we propose a new numerical strategy for the stabilization of evolution systems. The method is based on the methodology given by Ammari, Nicaise and Pignotti in "Stabilization by switching time-delay, Asymptot. Anal., 83 (2013), 263--283". This method is then implemented in 1D by suitable numerical approximation techniques. Numerical experiments complete this study to confirm the theoretical announced results.
The Lippmann--Schwinger--Lanczos (LSL) algorithm has recently been shown to provide an efficient tool for imaging and direct inversion of synthetic aperture radar data in multi-scattering environments [17], where the data set is limited to the monostatic, a.k.a. single input/single output (SISO) measurements. The approach is based on constructing data-driven estimates of internal fields via a reduced-order model (ROM) framework and then plugging them into the Lippmann-Schwinger integral equation. However, the approximations of the internal solutions may have more error due to missing the off diagonal elements of the multiple input/multiple output (MIMO) matrix valued transfer function. This, in turn, may result in multiple echoes in the image. Here we present a ROM-based data completion algorithm to mitigate this problem. First, we apply the LSL algorithm to the SISO data as in [17] to obtain approximate reconstructions as well as the estimate of internal field. Next, we use these estimates to calculate a forward Lippmann-Schwinger integral to populate the missing off-diagonal data (the lifting step). Finally, to update the reconstructions, we solve the Lippmann-Schwinger equation using the original SISO data, where the internal fields are constructed from the lifted MIMO data. The steps of obtaining the approximate reconstructions and internal fields and populating the missing MIMO data entries can be repeated for complex models to improve the images even further. Efficiency of the proposed approach is demonstrated on 2D and 2.5D numerical examples, where we see reconstructions are improved substantially.
Determining the complexity of computing Gr\"{o}bner bases is an important problem both in theory and in practice, and for that the solving degree plays a key role. In this paper, we study the solving degrees of affine semi-regular sequences and their homogenized sequences. Some of our results are considered to give mathematically rigorous proofs of the correctness of methods for computing Gr\"{o}bner bases of the ideal generated by an affine semi-regular sequence. This paper is a sequel of the authors' previous work and gives additional results on the solving degrees and important behaviors of Gr\"obner basis computation. We also define the generalized degree of regularity for a sequence of homogeneous polynomials. For the homogenization of an affine semi-regular sequence, we relate its generalized degree of regularity with its maximal Gr\"{o}bner basis degree (i.e., the solving degree of the homogenized sequence). The definition of a generalized (cryptographic) semi-regular sequence is also given, and it derives a new cryptographic assumption to estimate the security of cryptosystems and signature schemes. From our experimental observation, we raise a conjecture and some questions related to this generalized semi-regularity. These new definitions and our results provide a theoretical formulation of (somehow heuristic) discussions done so far in the cryptographic community.
In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of "self-reflection" consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. Compared to their original versions, our resulting models generate fewer \emph{hallucinations} overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention.
Efficiently enumerating all the extreme points of a polytope identified by a system of linear inequalities is a well-known challenge issue.We consider a special case and present an algorithm that enumerates all the extreme points of a bisubmodular polyhedron in $\mathcal{O}(n^4|V|)$ time and $\mathcal{O}(n^2)$ space complexity, where $ n$ is the dimension of underlying space and $V$ is the set of outputs. We use the reverse search and signed poset linked to extreme points to avoid the redundant search. Our algorithm is a generalization of enumerating all the extreme points of a base polyhedron which comprises some combinatorial enumeration problems.
Threshold selection is a fundamental problem in any threshold-based extreme value analysis. While models are asymptotically motivated, selecting an appropriate threshold for finite samples is difficult and highly subjective through standard methods. Inference for high quantiles can also be highly sensitive to the choice of threshold. Too low a threshold choice leads to bias in the fit of the extreme value model, while too high a choice leads to unnecessary additional uncertainty in the estimation of model parameters. We develop a novel methodology for automated threshold selection that directly tackles this bias-variance trade-off. We also develop a method to account for the uncertainty in the threshold estimation and propagate this uncertainty through to high quantile inference. Through a simulation study, we demonstrate the effectiveness of our method for threshold selection and subsequent extreme quantile estimation, relative to the leading existing methods, and show how the method's effectiveness is not sensitive to the tuning parameters. We apply our method to the well-known, troublesome example of the River Nidd dataset.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.