We study matrix sensing, which is the problem of reconstructing a low-rank matrix from a few linear measurements. It can be formulated as an overparameterized regression problem, which can be solved by factorized gradient descent when starting from a small random initialization. Linear neural networks, and in particular matrix sensing by factorized gradient descent, serve as prototypical models of non-convex problems in modern machine learning, where complex phenomena can be disentangled and studied in detail. Much research has been devoted to studying special cases of asymmetric matrix sensing, such as asymmetric matrix factorization and symmetric positive semi-definite matrix sensing. Our key contribution is introducing a continuous differential equation that we call the $\textit{perturbed gradient flow}$. We prove that the perturbed gradient flow converges quickly to the true target matrix whenever the perturbation is sufficiently bounded. The dynamics of gradient descent for matrix sensing can be reduced to this formulation, yielding a novel proof of asymmetric matrix sensing with factorized gradient descent. Compared to directly analyzing the dynamics of gradient descent, the continuous formulation allows bounding key quantities by considering their derivatives, often simplifying the proofs. We believe the general proof technique may prove useful in other settings as well.
We present a semi-Lagrangian characteristic mapping method for the incompressible Euler equations on a rotating sphere. The numerical method uses a spatio-temporal discretization of the inverse flow map generated by the Eulerian velocity as a composition of sub-interval flows formed by $C^1$ spherical spline interpolants. This approximation technique has the capacity of resolving sub-grid scales generated over time without increasing the spatial resolution of the computational grid. The numerical method is analyzed and validated using standard test cases yielding third-order accuracy in the supremum norm. Numerical experiments illustrating the unique resolution properties of the method are performed and demonstrate the ability to reproduce the forward energy cascade at sub-grid scales by upsampling the numerical solution.
Distributed quantum computing is a promising computational paradigm for performing computations that are beyond the reach of individual quantum devices. Privacy in distributed quantum computing is critical for maintaining confidentiality and protecting the data in the presence of untrusted computing nodes. In this work, we introduce novel blind quantum machine learning protocols based on the quantum bipartite correlator algorithm. Our protocols have reduced communication overhead while preserving the privacy of data from untrusted parties. We introduce robust algorithm-specific privacy-preserving mechanisms with low computational overhead that do not require complex cryptographic techniques. We then validate the effectiveness of the proposed protocols through complexity and privacy analysis. Our findings pave the way for advancements in distributed quantum computing, opening up new possibilities for privacy-aware machine learning applications in the era of quantum technologies.
Factor models are widely used for dimension reduction in the analysis of multivariate data. This is achieved through decomposition of a p x p covariance matrix into the sum of two components. Through a latent factor representation, they can be interpreted as a diagonal matrix of idiosyncratic variances and a shared variation matrix, that is, the product of a p x k factor loadings matrix and its transpose. If k << p, this defines a sparse factorisation of the covariance matrix. Historically, little attention has been paid to incorporating prior information in Bayesian analyses using factor models where, at best, the prior for the factor loadings is order invariant. In this work, a class of structured priors is developed that can encode ideas of dependence structure about the shared variation matrix. The construction allows data-informed shrinkage towards sensible parametric structures while also facilitating inference over the number of factors. Using an unconstrained reparameterisation of stationary vector autoregressions, the methodology is extended to stationary dynamic factor models. For computational inference, parameter-expanded Markov chain Monte Carlo samplers are proposed, including an efficient adaptive Gibbs sampler. Two substantive applications showcase the scope of the methodology and its inferential benefits.
In this paper, we consider an inverse space-dependent source problem for a time-fractional diffusion equation. To deal with the ill-posedness of the problem, we transform the problem into an optimal control problem with total variational (TV) regularization. In contrast to the classical Tikhonov model incorporating $L^2$ penalty terms, the inclusion of a TV term proves advantageous in reconstructing solutions that exhibit discontinuities or piecewise constancy. The control problem is approximated by a fully discrete scheme, and convergence results are provided within this framework. Furthermore, a lineraed primal-dual iterative algorithm is proposed to solve the discrete control model based on an equivalent saddle-point reformulation, and several numerical experiments are presented to demonstrate the efficiency of the algorithm.
In this article, we focus on the error that is committed when computing the matrix logarithm using the Gauss--Legendre quadrature rules. These formulas can be interpreted as Pad\'e approximants of a suitable Gauss hypergeometric function. Empirical observation tells us that the convergence of these quadratures becomes slow when the matrix is not close to the identity matrix, thus suggesting the usage of an inverse scaling and squaring approach for obtaining a matrix with this property. The novelty of this work is the introduction of error estimates that can be used to select a priori both the number of Legendre points needed to obtain a given accuracy and the number of inverse scaling and squaring to be performed. We include some numerical experiments to show the reliability of the estimates introduced.
Fairness holds a pivotal role in the realm of machine learning, particularly when it comes to addressing groups categorised by sensitive attributes, e.g., gender, race. Prevailing algorithms in fair learning predominantly hinge on accessibility or estimations of these sensitive attributes, at least in the training process. We design a single group-blind projection map that aligns the feature distributions of both groups in the source data, achieving (demographic) group parity, without requiring values of the protected attribute for individual samples in the computation of the map, as well as its use. Instead, our approach utilises the feature distributions of the privileged and unprivileged groups in a boarder population and the essential assumption that the source data are unbiased representation of the population. We present numerical results on synthetic data and real data.
Bidirectional typing is a discipline in which the typing judgment is decomposed explicitly into inference and checking modes, allowing to control the flow of type information in typing rules and to specify algorithmically how they should be used. Bidirectional typing has been fruitfully studied and bidirectional systems have been developed for many type theories. However, the formal development of bidirectional typing has until now been kept confined to specific theories, with general guidelines remaining informal. In this work, we give a generic account of bidirectional typing for a general class of dependent type theories. This is done by first giving a general definition of type theories (or equivalently, a logical framework), for which we define declarative and bidirectional type systems. We then show, in a theory-independent fashion, that the two systems are equivalent. This equivalence is then explored to establish the decidability of typing for weak normalizing theories, yielding a generic type-checking algorithm that has been implemented in a prototype and used in practice with many theories.
We study step-wise time approximations of non-linear hyperbolic initial value problems. The technique used here is a generalization of the minimizing movements method, using two time-scales: one for velocity, the other (potentially much larger) for acceleration. The main applications are from elastodynamics namely so-called generalized solids, undergoing large deformations. The evolution follows an underlying variational structure exploited by step-wise minimisation. We show for a large family of (elastic) energies that the introduced scheme is stable; allowing for non-linearities of highest order. If the highest order can assumed to be linear, we show that the limit solutions are regular and that the minimizing movements scheme converges with optimal linear rate. Thus this work extends numerical time-step minimization methods to the realm of hyperbolic problems.
Over the last decade, approximating functions in infinite dimensions from samples has gained increasing attention in computational science and engineering, especially in computational uncertainty quantification. This is primarily due to the relevance of functions that are solutions to parametric differential equations in various fields, e.g. chemistry, economics, engineering, and physics. While acquiring accurate and reliable approximations of such functions is inherently difficult, current benchmark methods exploit the fact that such functions often belong to certain classes of holomorphic functions to get algebraic convergence rates in infinite dimensions with respect to the number of (potentially adaptive) samples $m$. Our work focuses on providing theoretical approximation guarantees for the class of $(\boldsymbol{b},\varepsilon)$-holomorphic functions, demonstrating that these algebraic rates are the best possible for Banach-valued functions in infinite dimensions. We establish lower bounds using a reduction to a discrete problem in combination with the theory of $m$-widths, Gelfand widths and Kolmogorov widths. We study two cases, known and unknown anisotropy, in which the relative importance of the variables is known and unknown, respectively. A key conclusion of our paper is that in the latter setting, approximation from finite samples is impossible without some inherent ordering of the variables, even if the samples are chosen adaptively. Finally, in both cases, we demonstrate near-optimal, non-adaptive (random) sampling and recovery strategies which achieve close to same rates as the lower bounds.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.