This paper introduces a collection of four data sets, similar to Anscombe's Quartet, that aim to highlight the challenges involved when estimating causal effects. Each of the four data sets is generated based on a distinct causal mechanism: the first involves a collider, the second involves a confounder, the third involves a mediator, and the fourth involves the induction of M-Bias by an included factor. The paper includes a mathematical summary of each data set, as well as directed acyclic graphs that depict the relationships between the variables. Despite the fact that the statistical summaries and visualizations for each data set are identical, the true causal effect differs, and estimating it correctly requires knowledge of the data-generating mechanism. These example data sets can help practitioners gain a better understanding of the assumptions underlying causal inference methods and emphasize the importance of gathering more information beyond what can be obtained from statistical tools alone. The paper also includes R code for reproducing all figures and provides access to the data sets themselves through an R package named quartets.
Fano varieties are basic building blocks in geometry - they are `atomic pieces' of mathematical shapes. Recent progress in the classification of Fano varieties involves analysing an invariant called the quantum period. This is a sequence of integers which gives a numerical fingerprint for a Fano variety. It is conjectured that a Fano variety is uniquely determined by its quantum period. If this is true, one should be able to recover geometric properties of a Fano variety directly from its quantum period. We apply machine learning to the question: does the quantum period of X know the dimension of X? Note that there is as yet no theoretical understanding of this. We show that a simple feed-forward neural network can determine the dimension of X with 98% accuracy. Building on this, we establish rigorous asymptotics for the quantum periods of a class of Fano varieties. These asymptotics determine the dimension of X from its quantum period. Our results demonstrate that machine learning can pick out structure from complex mathematical data in situations where we lack theoretical understanding. They also give positive evidence for the conjecture that the quantum period of a Fano variety determines that variety.
Conditional graph entropy is known to be the minimal rate for a natural functional compression problem with side information at the receiver. In this paper we show that it can be formulated as an alternating minimization problem, which gives rise to a simple iterative algorithm for numerically computing (conditional) graph entropy. This also leads to a new formula which shows that conditional graph entropy is part of a more general framework: the solution of an optimization problem over a convex corner. In the special case of graph entropy (i.e., unconditioned version) this was known due to Csisz\'ar, K\"orner, Lov\'asz, Marton, and Simonyi. In that case the role of the convex corner was played by the so-called vertex packing polytope. In the conditional version it is a more intricate convex body but the function to minimize is the same. Furthermore, we describe a dual problem that leads to an optimality check and an error bound for the iterative algorithm.
Is Stochastic Gradient Descent (SGD) substantially different from Glauber dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g.\ SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.
With the increasing availability of large scale datasets, computational power and tools like automatic differentiation and expressive neural network architectures, sequential data are now often treated in a data-driven way, with a dynamical model trained from the observation data. While neural networks are often seen as uninterpretable black-box architectures, they can still benefit from physical priors on the data and from mathematical knowledge. In this paper, we use a neural network architecture which leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, enabling a number of appealing features. We introduce methods that enable to train such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques, with applications to e.g. time series interpolation and forecasting.
In this paper, we propose a Riemannian Acceleration with Preconditioning (RAP) for symmetric eigenvalue problems, which is one of the most important geodesically convex optimization problem on Riemannian manifold, and obtain the acceleration. Firstly, the preconditioning for symmetric eigenvalue problems from the Riemannian manifold viewpoint is discussed. In order to obtain the local geodesic convexity, we develop the leading angle to measure the quality of the preconditioner for symmetric eigenvalue problems. A new Riemannian acceleration, called Locally Optimal Riemannian Accelerated Gradient (LORAG) method, is proposed to overcome the local geodesic convexity for symmetric eigenvalue problems. With similar techniques for RAGD and analysis of local convex optimization in Euclidean space, we analyze the convergence of LORAG. Incorporating the local geodesic convexity of symmetric eigenvalue problems under preconditioning with the LORAG, we propose the Riemannian Acceleration with Preconditioning (RAP) and prove its acceleration. Additionally, when the Schwarz preconditioner, especially the overlapping or non-overlapping domain decomposition method, is applied for elliptic eigenvalue problems, we also obtain the rate of convergence as $1-C\kappa^{-1/2}$, where $C$ is a constant independent of the mesh sizes and the eigenvalue gap, $\kappa=\kappa_{\nu}\lambda_{2}/(\lambda_{2}-\lambda_{1})$, $\kappa_{\nu}$ is the parameter from the stable decomposition, $\lambda_{1}$ and $\lambda_{2}$ are the smallest two eigenvalues of the elliptic operator. Numerical results show the power of Riemannian acceleration and preconditioning.
We study a subspace constrained version of the randomized Kaczmarz algorithm for solving large linear systems in which the iterates are confined to the space of solutions of a selected subsystem. We show that the subspace constraint leads to an accelerated convergence rate, especially when the system has structure such as having coherent rows or being approximately low-rank. On Gaussian-like random data, it results in a form of dimension reduction that effectively improves the aspect ratio of the system. Furthermore, this method serves as a building block for a second, quantile-based algorithm for the problem of solving linear systems with arbitrary sparse corruptions, which is able to efficiently exploit partial external knowledge about uncorrupted equations and achieve convergence in difficult settings such as in almost-square systems. Numerical experiments on synthetic and real-world data support our theoretical results and demonstrate the validity of the proposed methods for even more general data models than guaranteed by the theory.
Model selection aims to identify a sufficiently well performing model that is possibly simpler than the most complex model among a pool of candidates. However, the decision-making process itself can inadvertently introduce non-negligible bias when the cross-validation estimates of predictive performance are marred by excessive noise. In finite data regimes, cross-validated estimates can encourage the statistician to select one model over another when it is not actually better for future data. While this bias remains negligible in the case of few models, when the pool of candidates grows, and model selection decisions are compounded (as in forward search), the expected magnitude of selection-induced bias is likely to grow too. This paper introduces an efficient approach to estimate and correct selection-induced bias based on order statistics. Numerical experiments demonstrate the reliability of our approach in both estimating selection-induced bias and quantifying the degree of over-fitting along compounded model selection decisions, with specific application to forward search. This work represents a light-weight alternative to more computationally expensive approaches to correcting selection-induced bias, such as nested cross-validation and the bootstrap. Our approach rests on several theoretic assumptions, and we provide a diagnostic to help understand when these may not be valid, and when to fall back on safer, albeit more computationally expensive approaches. The accompanying code facilitates its practical implementation and fosters further exploration in this area.
In this paper we establish limit theorems for power variations of stochastic processes controlled by fractional Brownian motions with Hurst parameter $H\leq 1/2$. We show that the power variations of such processes can be decomposed into the mix of several weighted random sums plus some remainder terms, and the convergences of power variations are dominated by different combinations of those weighted sums depending on whether $H<1/4$, $H=1/4$, or $H>1/4$. We show that when $H\geq 1/4$ the centered power variation converges stably at the rate $n^{-1/2}$, and when $H<1/4$ it converges in probability at the rate $n^{-2H}$. We determine the limit of the mixed weighted sum based on a rough path approach developed in \cite{LT20}.
The generation of curves and surfaces from given data is a well-known problem in Computer-Aided Design that can be approached using subdivision schemes. They are powerful tools that allow obtaining new data from the initial one by means of simple calculations. However, in some applications, the collected data are given with noise and most of schemes are not adequate to process them. In this paper, we present some new families of binary univariate linear subdivision schemes using weighted local polynomial regression. We study their properties, such as convergence, monotonicity, polynomial reproduction and approximation and denoising capabilities. For the convergence study, we develop some new theoretical results. Finally, some examples are presented to confirm the proven properties.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.