Low-rank matrix approximation is one of the central concepts in machine learning, with applications in dimension reduction, de-noising, multivariate statistical methodology, and many more. A recent extension to LRMA is called low-rank matrix completion (LRMC). It solves the LRMA problem when some observations are missing and is especially useful for recommender systems. In this paper, we consider an element-wise weighted generalization of LRMA. The resulting weighted low-rank matrix approximation technique therefore covers LRMC as a special case with binary weights. WLRMA has many applications. For example, it is an essential component of GLM optimization algorithms, where an exponential family is used to model the entries of a matrix, and the matrix of natural parameters admits a low-rank structure. We propose an algorithm for solving the weighted problem, as well as two acceleration techniques. Further, we develop a non-SVD modification of the proposed algorithm that is able to handle extremely high-dimensional data. We compare the performance of all the methods on a small simulation example as well as a real-data application.
Majority-SAT is the problem of determining whether an input $n$-variable formula in conjunctive normal form (CNF) has at least $2^{n-1}$ satisfying assignments. Majority-SAT and related problems have been studied extensively in various AI communities interested in the complexity of probabilistic planning and inference. Although Majority-SAT has been known to be PP-complete for over 40 years, the complexity of a natural variant has remained open: Majority-$k$SAT, where the input CNF formula is restricted to have clause width at most $k$. We prove that for every $k$, Majority-$k$SAT is in P. In fact, for any positive integer $k$ and rational $\rho \in (0,1)$ with bounded denominator, we give an algorithm that can determine whether a given $k$-CNF has at least $\rho \cdot 2^n$ satisfying assignments, in deterministic linear time (whereas the previous best-known algorithm ran in exponential time). Our algorithms have interesting positive implications for counting complexity and the complexity of inference, significantly reducing the known complexities of related problems such as E-MAJ-$k$SAT and MAJ-MAJ-$k$SAT. At the heart of our approach is an efficient method for solving threshold counting problems by extracting sunflowers found in the corresponding set system of a $k$-CNF. We also show that the tractability of Majority-$k$SAT is somewhat fragile. For the closely related GtMajority-SAT problem (where we ask whether a given formula has greater than $2^{n-1}$ satisfying assignments) which is known to be PP-complete, we show that GtMajority-$k$SAT is in P for $k\le 3$, but becomes NP-complete for $k\geq 4$. These results are counterintuitive, because the ``natural'' classifications of these problems would have been PP-completeness, and because there is a stark difference in the complexity of GtMajority-$k$SAT and Majority-$k$SAT for all $k\ge 4$.
This paper introduces the R package drpop to flexibly estimate total population size from incomplete lists. Total population estimation, also called capture-recapture, is an important problem in many biological and social sciences. A typical dataset consists of incomplete lists of individuals from the population of interest along with some covariate information. The goal is to estimate the number of unobserved individuals and equivalently, the total population size. drpop flexibly models heterogeneity using the covariate information, under the assumption that two lists are conditionally independent given covariates. This can be a much weaker assumption than full marginal independence often required by classical methods. Moreover, it can incorporate complex and high dimensional covariates, and does not require parametric models like other popular methods. In particular, our estimator is doubly robust and has fast convergence rates even under flexible non-parametric set-ups. drpop provides the user with the flexibility to choose the model for estimation of intermediate parameters and returns the estimated population size, confidence interval and some other related quantities. In this paper, we illustrate the applications of drpop in different scenarios and we also present some performance summaries.
Some properties of generalized convexity for sets and for functions are identified in case of the reliability polynomials of two dual minimal networks. A method of approximating the reliability polynomials of two dual minimal network is developed based on their mutual complementarity properties. The approximating objects are from the class of quadratic spline functions, constructed based both on interpolation conditions and on shape knowledge. It is proved that the approximant objects preserve the shape properties of the exact reliability polynomials. Numerical examples and simulations show the performance of the algorithm, both in terms of low complexity, small error and shape preserving. Possibilities of increasing the accuracy of approximation are discussed.
Optimal transport (OT) naturally arises in a wide range of machine learning applications but may often become the computational bottleneck. Recently, one line of works propose to solve OT approximately by searching the \emph{transport plan} in a low-rank subspace. However, the optimal transport plan is often not low-rank, which tends to yield large approximation errors. For example, when Monge's \emph{transport map} exists, the transport plan is full rank. This paper concerns the computation of the OT distance with adequate accuracy and efficiency. A novel approximation for OT is proposed, in which the transport plan can be decomposed into the sum of a low-rank matrix and a sparse one. We theoretically analyze the approximation error. An augmented Lagrangian method is then designed to efficiently calculate the transport plan.
We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been extensively studied in the optimization literature, their statistical and computational guarantees in the high dimensional setting remain unclear. This work provides an answer to this open problem. Our contribution is two-fold. First, we establish statistical consistency of the estimator: under a suitable choice of the penalty parameter, the optimal solution of the penalized problem achieves near optimal minimax rate $\mathcal{O}(s \log d/N)$ in $\ell_2$-loss, where $s$ is the sparsity value, $d$ is the ambient dimension, and $N$ is the total sample size in the network -- this matches centralized sample rates. Second, we show that the proximal-gradient algorithm applied to the penalized problem, which naturally leads to distributed implementations, converges linearly up to a tolerance of the order of the centralized statistical error -- the rate scales as $\mathcal{O}(d)$, revealing an unavoidable speed-accuracy dilemma.Numerical results demonstrate the tightness of the derived sample rate and convergence rate scalings.
The recovery of sparse data is at the core of many applications in machine learning and signal processing. While such problems can be tackled using $\ell_1$-regularization as in the LASSO estimator and in the Basis Pursuit approach, specialized algorithms are typically required to solve the corresponding high-dimensional non-smooth optimization for large instances. Iteratively Reweighted Least Squares (IRLS) is a widely used algorithm for this purpose due its excellent numerical performance. However, while existing theory is able to guarantee convergence of this algorithm to the minimizer, it does not provide a global convergence rate. In this paper, we prove that a variant of IRLS converges with a global linear rate to a sparse solution, i.e., with a linear error decrease occurring immediately from any initialization, if the measurements fulfill the usual null space property assumption. We support our theory by numerical experiments showing that our linear rate captures the correct dimension dependence. We anticipate that our theoretical findings will lead to new insights for many other use cases of the IRLS algorithm, such as in low-rank matrix recovery.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.