Saddle points constitute a crucial challenge for first-order gradient descent algorithms. In notions of classical machine learning, they are avoided for example by means of stochastic gradient descent methods. In this work, we provide evidence that the saddle points problem can be naturally avoided in variational quantum algorithms by exploiting the presence of stochasticity. We prove convergence guarantees and present practical examples in numerical simulations and on quantum hardware. We argue that the natural stochasticity of variational algorithms can be beneficial for avoiding strict saddle points, i.e., those saddle points with at least one negative Hessian eigenvalue. This insight that some levels of shot noise could help is expected to add a new perspective to notions of near-term variational quantum algorithms.
One of the fundamental results in quantum foundations is the Kochen-Specker (KS) theorem, which states that any theory whose predictions agree with quantum mechanics must be contextual, i.e., a quantum observation cannot be understood as revealing a pre-existing value. The theorem hinges on the existence of a mathematical object called a KS vector system. While many KS vector systems are known, the problem of finding the minimum KS vector system in three dimensions has remained stubbornly open for over 55 years. In this paper, we present a new method based on a combination of a Boolean satisfiability (SAT) solver and a computer algebra system (CAS) to address this problem. Our approach shows that a KS system in three dimensions must contain at least 24 vectors. Our SAT+CAS method is over 35,000 times faster at deriving the previously known lower bound of 22 vectors than the prior CAS-based searches. More importantly, we generate certificates that allow verifying our results without trusting either the SAT solver or the CAS. The increase in efficiency is due to the fact we are able to exploit the powerful combinatorial search-with-learning capabilities of SAT solvers, together with the CAS-based isomorph-free exhaustive method of orderly generation of graphs. To the best of our knowledge, our work is the first application of a SAT+CAS method to a problem in the realm of quantum foundations and the first lower bound in the minimum Kochen-Specker problem with a computer-verifiable proof certificate.
Computed Tomography (CT) is a prominent example of Imaging Inverse Problem (IIP), highlighting the unrivalled performances of data-driven methods in degraded measurements setups like sparse X-ray projections. Although a significant proportion of deep learning approaches benefit from large supervised datasets to directly map experimental measurements to medical scans, they cannot generalize to unknown acquisition setups. In contrast, fully unsupervised techniques, most notably using score-based generative models, have recently demonstrated similar or better performances compared to supervised approaches to solve IIPs while being flexible at test time regarding the imaging setup. However, their use cases are limited by two factors: (a) they need considerable amounts of training data to have good generalization properties and (b) they require a backward operator, like Filtered-Back-Projection in the case of CT, to condition the learned prior distribution of medical scans to experimental measurements. To overcome these issues, we propose an unsupervised conditional approach to the Generative Latent Optimization framework (cGLO), in which the parameters of a decoder network are initialized on an unsupervised dataset. The decoder is then used for reconstruction purposes, by performing Generative Latent Optimization with a loss function directly comparing simulated measurements from proposed reconstructions to experimental measurements. The resulting approach, tested on sparse-view CT using multiple training dataset sizes, demonstrates better reconstruction quality compared to state-of-the-art score-based strategies in most data regimes and shows an increasing performance advantage for smaller training datasets and reduced projection angles. Furthermore, cGLO does not require any backward operator and could expand use cases even to non-linear IIPs.
To quantify uncertainties in inverse problems of partial differential equations (PDEs), we formulate them into statistical inference problems using Bayes' formula. Recently, well-justified infinite-dimensional Bayesian analysis methods have been developed to construct dimension-independent algorithms. However, there are three challenges for these infinite-dimensional Bayesian methods: prior measures usually act as regularizers and are not able to incorporate prior information efficiently; complex noises, such as more practical non-i.i.d. distributed noises, are rarely considered; and time-consuming forward PDE solvers are needed to estimate posterior statistical quantities. To address these issues, an infinite-dimensional inference framework has been proposed based on the infinite-dimensional variational inference method and deep generative models. Specifically, by introducing some measure equivalence assumptions, we derive the evidence lower bound in the infinite-dimensional setting and provide possible parametric strategies that yield a general inference framework called the Variational Inverting Network (VINet). This inference framework can encode prior and noise information from learning examples. In addition, relying on the power of deep neural networks, the posterior mean and variance can be efficiently and explicitly generated in the inference stage. In numerical experiments, we design specific network structures that yield a computable VINet from the general inference framework. Numerical examples of linear inverse problems of an elliptic equation and the Helmholtz equation are presented to illustrate the effectiveness of the proposed inference framework.
We consider a general optimization problem of minimizing a composite objective functional defined over a class of probability distributions. The objective is composed of two functionals: one is assumed to possess the variational representation and the other is expressed in terms of the expectation operator of a possibly nonsmooth convex regularizer function. Such a regularized distributional optimization problem widely appears in machine learning and statistics, such as proximal Monte-Carlo sampling, Bayesian inference and generative modeling, for regularized estimation and generation. We propose a novel method, dubbed as Moreau-Yoshida Variational Transport (MYVT), for solving the regularized distributional optimization problem. First, as the name suggests, our method employs the Moreau-Yoshida envelope for a smooth approximation of the nonsmooth function in the objective. Second, we reformulate the approximate problem as a concave-convex saddle point problem by leveraging the variational representation, and then develope an efficient primal-dual algorithm to approximate the saddle point. Furthermore, we provide theoretical analyses and report experimental results to demonstrate the effectiveness of the proposed method.
In this paper, we propose efficient quantum algorithms for solving nonlinear stochastic differential equations (SDE) via the associated Fokker-Planck equation (FPE). We discretize the FPE in space and time using two well-known numerical schemes, namely Chang-Cooper and implicit finite difference. We then compute the solution of the resulting system of linear equations using the quantum linear systems algorithm. We present detailed error and complexity analyses for both these schemes and demonstrate that our proposed algorithms, under certain conditions, provably compute the solution to the FPE within prescribed $\epsilon$ error bounds with polynomial dependence on state dimension $d$. Classical numerical methods scale exponentially with dimension, thus, our approach, under the aforementioned conditions, provides an \emph{exponential speed-up} over traditional approaches.
In this paper we give the first efficient algorithms for the $k$-center problem on dynamic graphs undergoing edge updates. In this problem, the goal is to partition the input into $k$ sets by choosing $k$ centers such that the maximum distance from any data point to the closest center is minimized. It is known that it is NP-hard to get a better than $2$ approximation for this problem. While in many applications the input may naturally be modeled as a graph, all prior works on $k$-center problem in dynamic settings are on metrics. In this paper, we give a deterministic decremental $(2+\epsilon)$-approximation algorithm and a randomized incremental $(4+\epsilon)$-approximation algorithm, both with amortized update time $kn^{o(1)}$ for weighted graphs. Moreover, we show a reduction that leads to a fully dynamic $(2+\epsilon)$-approximation algorithm for the $k$-center problem, with worst-case update time that is within a factor $k$ of the state-of-the-art upper bound for maintaining $(1+\epsilon)$-approximate single-source distances in graphs. Matching this bound is a natural goalpost because the approximate distances of each vertex to its center can be used to maintain a $(2+\epsilon)$-approximation of the graph diameter and the fastest known algorithms for such a diameter approximation also rely on maintaining approximate single-source distances.
We consider the problem of learning control policies in discrete-time stochastic systems which guarantee that the system stabilizes within some specified stabilization region with probability~$1$. Our approach is based on the novel notion of stabilizing ranking supermartingales (sRSMs) that we introduce in this work. Our sRSMs overcome the limitation of methods proposed in previous works whose applicability is restricted to systems in which the stabilizing region cannot be left once entered under any control policy. We present a learning procedure that learns a control policy together with an sRSM that formally certifies probability~$1$ stability, both learned as neural networks. We show that this procedure can also be adapted to formally verifying that, under a given Lipschitz continuous control policy, the stochastic system stabilizes within some stabilizing region with probability~$1$. Our experimental evaluation shows that our learning procedure can successfully learn provably stabilizing policies in practice.
A software product line models the variability of highly configurable systems. Complete exploration of all valid configurations (the configuration space) is infeasible as it grows exponentially with the number of features in the worst case. In practice, few representative configurations are sampled instead, which may be used for software testing or hardware verification. Pseudo-randomness of modern computers introduces statistical bias into these samples. Quantum computing enables truly random, uniform configuration sampling based on inherently random quantum physical effects. We propose a method to encode the entire configuration space in a superposition and then measure one random sample. We show the method's uniformity over multiple samples and investigate its scale for different feature models. We discuss the possibilities and limitations of quantum computing for uniform random sampling regarding current and future quantum hardware.
Latent linear dynamical systems with Bernoulli observations provide a powerful modeling framework for identifying the temporal dynamics underlying binary time series data, which arise in a variety of contexts such as binary decision-making and discrete stochastic processes (e.g., binned neural spike trains). Here we develop a spectral learning method for fast, efficient fitting of probit-Bernoulli latent linear dynamical system (LDS) models. Our approach extends traditional subspace identification methods to the Bernoulli setting via a transformation of the first and second sample moments. This results in a robust, fixed-cost estimator that avoids the hazards of local optima and the long computation time of iterative fitting procedures like the expectation-maximization (EM) algorithm. In regimes where data is limited or assumptions about the statistical structure of the data are not met, we demonstrate that the spectral estimate provides a good initialization for Laplace-EM fitting. Finally, we show that the estimator provides substantial benefits to real world settings by analyzing data from mice performing a sensory decision-making task.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.