Approximate message passing (AMP) emerges as an effective iterative paradigm for solving high-dimensional statistical problems. However, prior AMP theory -- which focused mostly on high-dimensional asymptotics -- fell short of predicting the AMP dynamics when the number of iterations surpasses $o\big(\frac{\log n}{\log\log n}\big)$ (with $n$ the problem dimension). To address this inadequacy, this paper develops a non-asymptotic framework for understanding AMP in spiked matrix estimation. Built upon new decomposition of AMP updates and controllable residual terms, we lay out an analysis recipe to characterize the finite-sample behavior of AMP in the presence of an independent initialization, which is further generalized to allow for spectral initialization. As two concrete consequences of the proposed analysis recipe: (i) when solving $\mathbb{Z}_2$ synchronization, we predict the behavior of spectrally initialized AMP for up to $O\big(\frac{n}{\mathrm{poly}\log n}\big)$ iterations, showing that the algorithm succeeds without the need of a subsequent refinement stage (as conjectured recently by \citet{celentano2021local}); (ii) we characterize the non-asymptotic behavior of AMP in sparse PCA (in the spiked Wigner model) for a broad range of signal-to-noise ratio.
The ParaOpt algorithm was recently introduced as a time-parallel solver for optimal-control problems with a terminal-cost objective, and convergence results have been presented for the linear diffusive case with implicit-Euler time integrators. We reformulate ParaOpt for tracking problems and provide generalized convergence analyses for both objectives. We focus on linear diffusive equations and prove convergence bounds that are generic in the time integrators used. For large problem dimensions, ParaOpt's performance depends crucially on having a good preconditioner to solve the arising linear systems. For the case where ParaOpt's cheap, coarse-grained propagator is linear, we introduce diagonalization-based preconditioners inspired by recent advances in the ParaDiag family of methods. These preconditioners not only lead to a weakly-scalable ParaOpt version, but are themselves invertible in parallel, making maximal use of available concurrency. They have proven convergence properties in the linear diffusive case that are generic in the time discretization used, similarly to our ParaOpt results. Numerical results confirm that the iteration count of the iterative solvers used for ParaOpt's linear systems becomes constant in the limit of an increasing processor count. The paper is accompanied by a sequential MATLAB implementation.
A novel algorithm is proposed for quantitative comparisons between compact surfaces embedded in the three-dimensional Euclidian space. The key idea is to identify those objects with the associated surface measures and compute a weak distance between them using the Fourier transform on the ambient space. In particular, the inhomogeneous Sobolev norm of negative order for a difference between two surface measures is evaluated via the Plancherel theorem, which amounts to approximating an weighted integral norm of smooth data on the frequency space. This approach allows several advantages including high accuracy due to fast-converging numerical quadrature rules, acceleration by the nonuniform fast Fourier transform, and parallelization on many-core processors. In numerical experiments, the 2-sphere, which is an example whose Fourier transform is explicitly known, is compared with its icosahedral discretization, and it is observed that the piecewise linear approximations converge to the smooth object at the quadratic rate up to small truncation.
Motivated by recent works on streaming algorithms for constraint satisfaction problems (CSPs), we define and analyze oblivious algorithms for the Max-$k$AND problem. This generalizes the definition by Feige and Jozeph (Algorithmica '15) of oblivious algorithms for Max-DICUT, a special case of Max-$2$AND. Oblivious algorithms round each variable with probability depending only on a quantity called the variable's bias. For each oblivious algorithm, we design a so-called "factor-revealing linear program" (LP) which captures its worst-case instance, generalizing one of Feige and Jozeph for Max-DICUT. Then, departing from their work, we perform a fully explicit analysis of these (infinitely many!) LPs. In particular, we show that for all $k$, oblivious algorithms for Max-$k$AND provably outperform a special subclass of algorithms we call "superoblivious" algorithms. Our result has implications for streaming algorithms: Generalizing the result for Max-DICUT of Saxena, Singer, Sudan, and Velusamy (SODA'23), we prove that certain separation results hold between streaming models for infinitely many CSPs: for every $k$, $O(\log n)$-space sketching algorithms for Max-$k$AND known to be optimal in $o(\sqrt n)$-space can be beaten in (a) $O(\log n)$-space under a random-ordering assumption, and (b) $O(n^{1-1/k} D^{1/k})$ space under a maximum-degree-$D$ assumption. Even in the previously-known case of Max-DICUT, our analytic proof gives a fuller, computer-free picture of these separation results.
Clustering is a powerful and extensively used data science tool. While clustering is generally thought of as an unsupervised learning technique, there are also supervised variations such as Spath's clusterwise regression that attempt to find clusters of data that yield low regression error on a supervised target. We believe that clusterwise regression is just a single vertex of a largely unexplored design space of supervised clustering models. In this article, we define a generalized optimization framework for predictive clustering that admits different cluster definitions (arbitrary point assignment, closest center, and bounding box) and both regression and classification objectives. We then present a joint optimization strategy that exploits mixed-integer linear programming (MILP) for global optimization in this generalized framework. To alleviate scalability concerns for large datasets, we also provide highly scalable greedy algorithms inspired by the Majorization-Minimization (MM) framework. Finally, we demonstrate the ability of our models to uncover different interpretable discrete cluster structures in data by experimenting with four real-world datasets.
Motivated by applications in distributed storage, distributed computing, and homomorphic secret sharing, we study communication-efficient schemes for computing linear combinations of coded symbols. Specifically, we design low-bandwidth schemes that evaluate the weighted sum of $\ell$ coded symbols in a codeword $\pmb{c}\in\mathbb{F}^n$, when we are given access to $d$ of the remaining components in $\pmb{c}$. Formally, suppose that $\mathbb{F}$ is a field extension of $\mathbb{B}$ of degree $t$. Let $\pmb{c}$ be a codeword in a Reed-Solomon code of dimension $k$ and our task is to compute the weighted sum of $\ell$ coded symbols. In this paper, for some $s<t$, we provide an explicit scheme that performs this task by downloading $d(t-s)$ sub-symbols in $\mathbb{B}$ from $d$ available nodes, whenever $d\geq \ell|\mathbb{B}|^s-\ell+k$. In many cases, our scheme outperforms previous schemes in the literature. Furthermore, we provide a characterization of evaluation schemes for general linear codes. Then in the special case of Reed-Solomon codes, we use this characterization to derive a lower bound for the evaluation bandwidth.
Profile likelihoods are rarely used in geostatistical models due to the computational burden imposed by repeated decompositions of large variance matrices. Accounting for uncertainty in covariance parameters can be highly consequential in geostatistical models as some covariance parameters are poorly identified, the problem is severe enough that the differentiability parameter of the Matern correlation function is typically treated as fixed. The problem is compounded with anisotropic spatial models as there are two additional parameters to consider. In this paper, we make the following contributions: 1, A methodology is created for profile likelihoods for Gaussian spatial models with Mat\'ern family of correlation functions, including anisotropic models. This methodology adopts a novel reparametrization for generation of representative points, and uses GPUs for parallel profile likelihoods computation in software implementation. 2, We show the profile likelihood of the Mat\'ern shape parameter is often quite flat but still identifiable, it can usually rule out very small values. 3, Simulation studies and applications on real data examples show that profile-based confidence intervals of covariance parameters and regression parameters have superior coverage to the traditional standard Wald type confidence intervals.
Private inference refers to a two-party setting in which one has a model (e.g., a linear classifier), the other has data, and the model is to be applied over the data while safeguarding the privacy of both parties. In particular, models in which the weights are quantized (e.g., to 1 or -1) gained increasing attention lately, due to their benefits in efficient, private, or robust computations. Traditionally, private inference has been studied from a cryptographic standpoint, which suffers from high complexity and degraded accuracy. More recently, Raviv et al. showed that in quantized models, an information theoretic tradeoff exists between the privacy of the parties, and a scheme based on a combination of Boolean and real-valued algebra was presented which attains that tradeoff. Both the scheme and the respective bound required the computation to be done exactly. In this work we show that by relaxing the requirement for exact computation, one can break the information theoretic privacy barrier of Raviv et al., and provide better privacy at the same communication costs. We provide a scheme for such approximate computation, bound its error, show its improved privacy, and devise a respective lower bound for some parameter regimes.
We present a new approach for computing compact sketches that can be used to approximate the inner product between pairs of high-dimensional vectors. Based on the Weighted MinHash algorithm, our approach admits strong accuracy guarantees that improve on the guarantees of popular linear sketching approaches for inner product estimation, such as CountSketch and Johnson-Lindenstrauss projection. Specifically, while our method admits guarantees that exactly match linear sketching for dense vectors, it yields significantly lower error for sparse vectors with limited overlap between non-zero entries. Such vectors arise in many applications involving sparse data. They are also important in increasingly popular dataset search applications, where inner product sketches are used to estimate data covariance, conditional means, and other quantities involving columns in unjoined tables. We complement our theoretical results by showing that our approach empirically outperforms existing linear sketches and unweighted hashing-based sketches for sparse vectors.
In Bayesian inference, the approximation of integrals of the form $\psi = \mathbb{E}_{F}{l(X)} = \int_{\chi} l(\mathbf{x}) d F(\mathbf{x})$ is a fundamental challenge. Such integrals are crucial for evidence estimation, which is important for various purposes, including model selection and numerical analysis. The existing strategies for evidence estimation are classified into four categories: deterministic approximation, density estimation, importance sampling, and vertical representation (Llorente et al., 2020). In this paper, we show that the Riemann sum estimator due to Yakowitz (1978) can be used in the context of nested sampling (Skilling, 2006) to achieve a $O(n^{-4})$ rate of convergence, faster than the usual Ergodic Central Limit Theorem. We provide a brief overview of the literature on the Riemann sum estimators and the nested sampling algorithm and its connections to vertical likelihood Monte Carlo. We provide theoretical and numerical arguments to show how merging these two ideas may result in improved and more robust estimators for evidence estimation, especially in higher dimensional spaces. We also briefly discuss the idea of simulating the Lorenz curve that avoids the problem of intractable $\Lambda$ functions, essential for the vertical representation and nested sampling.
Graph Convolution Networks (GCNs) manifest great potential in recommendation. This is attributed to their capability on learning good user and item embeddings by exploiting the collaborative signals from the high-order neighbors. Like other GCN models, the GCN based recommendation models also suffer from the notorious over-smoothing problem - when stacking more layers, node embeddings become more similar and eventually indistinguishable, resulted in performance degradation. The recently proposed LightGCN and LR-GCN alleviate this problem to some extent, however, we argue that they overlook an important factor for the over-smoothing problem in recommendation, that is, high-order neighboring users with no common interests of a user can be also involved in the user's embedding learning in the graph convolution operation. As a result, the multi-layer graph convolution will make users with dissimilar interests have similar embeddings. In this paper, we propose a novel Interest-aware Message-Passing GCN (IMP-GCN) recommendation model, which performs high-order graph convolution inside subgraphs. The subgraph consists of users with similar interests and their interacted items. To form the subgraphs, we design an unsupervised subgraph generation module, which can effectively identify users with common interests by exploiting both user feature and graph structure. To this end, our model can avoid propagating negative information from high-order neighbors into embedding learning. Experimental results on three large-scale benchmark datasets show that our model can gain performance improvement by stacking more layers and outperform the state-of-the-art GCN-based recommendation models significantly.