We introduce a new stochastic algorithm to locate the index-1 saddle points of a function $V:\mathbb R^d \to \mathbb R$, with $d$ possibly large. This algorithm can be seen as an equivalent of the stochastic gradient descent which is a natural stochastic process to locate local minima. It relies on two ingredients: (i) the concentration properties on index-1 saddle points of the first eigenmodes of the Witten Laplacian (associated with $V$) on $1$-forms and (ii) a probabilistic representation of a partial differential equation involving this differential operator. Numerical examples on simple molecular systems illustrate the efficacy of the proposed approach.
We give algorithms that, given a straight-line program (SLP) with $g$ rules that generates (only) a text $T [1..n]$, builds within $O(g)$ space the Lempel-Ziv (LZ) parse of $T$ (of $z$ phrases) in time $O(n\log^2 n)$ or in time $O(gz\log^2(n/z))$. We also show how to build a locally consistent grammar (LCG) of optimal size $g_{lc} = O(\delta\log\frac{n}{\delta})$ from the SLP within $O(g+g_{lc})$ space and in $O(n\log g)$ time, where $\delta$ is the substring complexity measure of $T$. Finally, we show how to build the LZ parse of $T$ from such a LCG within $O(g_{lc})$ space and in time $O(z\log^2 n \log^2(n/z))$. All our results hold with high probability.
Binary codes of length $n$ may be viewed as subsets of vertices of the Boolean hypercube $\{0,1\}^n$. The ability of a linear error-correcting code to recover erasures is connected to influences of particular monotone Boolean functions. These functions provide insight into the role that particular coordinates play in a code's erasure repair capability. In this paper, we consider directly the influences of coordinates of a code. We describe a family of codes, called codes with minimum disjoint support, for which all influences may be determined. As a consequence, we find influences of repetition codes and certain distinct weight codes. Computing influences is typically circumvented by appealing to the transitivity of the automorphism group of the code. Some of the codes considered here fail to meet the transitivity conditions requires for these standard approaches, yet we can compute them directly.
We propose MNPCA, a novel non-linear generalization of (2D)$^2${PCA}, a classical linear method for the simultaneous dimension reduction of both rows and columns of a set of matrix-valued data. MNPCA is based on optimizing over separate non-linear mappings on the left and right singular spaces of the observations, essentially amounting to the decoupling of the two sides of the matrices. We develop a comprehensive theoretical framework for MNPCA by viewing it as an eigenproblem in reproducing kernel Hilbert spaces. We study the resulting estimators on both population and sample levels, deriving their convergence rates and formulating a coordinate representation to allow the method to be used in practice. Simulations and a real data example demonstrate MNPCA's good performance over its competitors.
We consider the problem $(\rm P)$ of exactly fitting an ellipsoid (centered at $0$) to $n$ standard Gaussian random vectors in $\mathbb{R}^d$, as $n, d \to \infty$ with $n / d^2 \to \alpha > 0$. This problem is conjectured to undergo a sharp transition: with high probability, $(\rm P)$ has a solution if $\alpha < 1/4$, while $(\rm P)$ has no solutions if $\alpha > 1/4$. So far, only a trivial bound $\alpha > 1/2$ is known to imply the absence of solutions, while the sharpest results on the positive side assume $\alpha \leq \eta$ (for $\eta > 0$ a small constant) to prove that $(\rm P)$ is solvable. In this work we study universality between this problem and a so-called "Gaussian equivalent", for which the same transition can be rigorously analyzed. Our main results are twofold. On the positive side, we prove that if $\alpha < 1/4$, there exist an ellipsoid fitting all the points up to a small error, and that the lengths of its principal axes are bounded above and below. On the other hand, for $\alpha > 1/4$, we show that achieving small fitting error is not possible if the length of the ellipsoid's shortest axis does not approach $0$ as $d \to \infty$ (and in particular there does not exist any ellipsoid fit whose shortest axis length is bounded away from $0$ as $d \to \infty$). To the best of our knowledge, our work is the first rigorous result characterizing the expected phase transition in ellipsoid fitting at $\alpha = 1/4$. In a companion non-rigorous work, the first author and D. Kunisky give a general analysis of ellipsoid fitting using the replica method of statistical physics, which inspired the present work.
The computation of a matrix function $f(A)$ is an important task in scientific computing appearing in machine learning, network analysis and the solution of partial differential equations. In this work, we use only matrix-vector products $x\mapsto Ax$ to approximate functions of sparse matrices and matrices with similar structures such as sparse matrices $A$ themselves or matrices that have a similar decay property as matrix functions. We show that when $A$ is a sparse matrix with an unknown sparsity pattern, techniques from compressed sensing can be used under natural assumptions. Moreover, if $A$ is a banded matrix then certain deterministic matrix-vector products can efficiently recover the large entries of $f(A)$. We describe an algorithm for each of the two cases and give error analysis based on the decay bound for the entries of $f(A)$. We finish with numerical experiments showing the accuracy of our algorithms.
We consider the problem of approximating the solution to $A(\mu) x(\mu) = b$ for many different values of the parameter $\mu$. Here we assume $A(\mu)$ is large, sparse, and nonsingular with a nonlinear dependence on $\mu$. Our method is based on a companion linearization derived from an accurate Chebyshev interpolation of $A(\mu)$ on the interval $[-a,a]$, $a \in \mathbb{R}$. The solution to the linearization is approximated in a preconditioned BiCG setting for shifted systems, where the Krylov basis matrix is formed once. This process leads to a short-term recurrence method, where one execution of the algorithm produces the approximation to $x(\mu)$ for many different values of the parameter $\mu \in [-a,a]$ simultaneously. In particular, this work proposes one algorithm which applies a shift-and-invert preconditioner exactly as well as an algorithm which applies the preconditioner inexactly. The competitiveness of the algorithms are illustrated with large-scale problems arising from a finite element discretization of a Helmholtz equation with parameterized material coefficient. The software used in the simulations is publicly available online, and thus all our experiments are reproducible.
In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.
We study the numerical integration of functions from isotropic Sobolev spaces $W_p^s([0,1]^d)$ using finitely many function evaluations within randomized algorithms, aiming for the smallest possible probabilistic error guarantee $\varepsilon > 0$ at confidence level $1-\delta \in (0,1)$. For spaces consisting of continuous functions, non-linear Monte Carlo methods with optimal confidence properties have already been known, in few cases even linear methods that succeed in that respect. In this paper we promote a new method called stratified control variates (SCV) and by it show that already linear methods achieve optimal probabilistic error rates in the high smoothness regime without the need to adjust algorithmic parameters to the uncertainty $\delta$. We also analyse a version of SCV in the low smoothness regime where $W_p^s([0,1]^d)$ may contain functions with singularities. Here, we observe a polynomial dependence of the error on $\delta^{-1}$ which cannot be avoided for linear methods. This is worse than what is known to be possible using non-linear algorithms where only a logarithmic dependence on $\delta^{-1}$ occurs if we tune in for a specific value of $\delta$.
We expound on some known lower bounds of the quadratic Wasserstein distance between random vectors in $\mathbb{R}^n$ with an emphasis on affine transformations that have been used in manifold learning of data in Wasserstein space. In particular, we give concrete lower bounds for rotated copies of random vectors in $\mathbb{R}^2$ with uncorrelated components by computing the Bures metric between the covariance matrices. We also derive upper bounds for compositions of affine maps which yield a fruitful variety of diffeomorphisms applied to an initial data measure. We apply these bounds to various distributions including those lying on a 1-dimensional manifold in $\mathbb{R}^2$ and illustrate the quality of the bounds. Finally, we give a framework for mimicking handwritten digit or alphabet datasets that can be applied in a manifold learning framework.
We consider Gibbs distributions, which are families of probability distributions over a discrete space $\Omega$ with probability mass function of the form $\mu^\Omega_\beta(\omega) \propto e^{\beta H(\omega)}$ for $\beta$ in an interval $[\beta_{\min}, \beta_{\max}]$ and $H( \omega ) \in \{0 \} \cup [1, n]$. The partition function is the normalization factor $Z(\beta)=\sum_{\omega \in\Omega}e^{\beta H(\omega)}$. Two important parameters of these distributions are the log partition ratio $q = \log \tfrac{Z(\beta_{\max})}{Z(\beta_{\min})}$ and the counts $c_x = |H^{-1}(x)|$. These are correlated with system parameters in a number of physical applications and sampling algorithms. Our first main result is to estimate the counts $c_x$ using roughly $\tilde O( \frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and $\tilde O( \frac{n^2}{\varepsilon^2} )$ samples for integer-valued distributions (ignoring some second-order terms and parameters), and we show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs, independent sets, and perfect matchings. As a key subroutine, we also develop algorithms to compute the partition function $Z$ using $\tilde O(\frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and using $\tilde O(\frac{n^2}{\varepsilon^2})$ samples for integer-valued distributions.