Given a matrix $M\in \mathbb{R}^{m\times n}$, the low rank matrix completion problem asks us to find a rank-$k$ approximation of $M$ as $UV^\top$ for $U\in \mathbb{R}^{m\times k}$ and $V\in \mathbb{R}^{n\times k}$ by only observing a few entries specified by a set of entries $\Omega\subseteq [m]\times [n]$. In particular, we examine an approach that is widely used in practice -- the alternating minimization framework. Jain, Netrapalli, and Sanghavi [JNS13] showed that if $M$ has incoherent rows and columns, then alternating minimization provably recovers the matrix $M$ by observing a nearly linear in $n$ number of entries. While the sample complexity has been subsequently improved [GLZ17], alternating minimization steps are required to be computed exactly. This hinders the development of more efficient algorithms and fails to depict the practical implementation of alternating minimization, where the updates are usually performed approximately in favor of efficiency. In this paper, we take a major step towards a more efficient and error-robust alternating minimization framework. To this end, we develop an analytical framework for alternating minimization that can tolerate a moderate amount of errors caused by approximate updates. Moreover, our algorithm runs in time $\widetilde O(|\Omega| k)$, which is nearly linear in the time to verify the solution while preserving the sample complexity. This improves upon all prior known alternating minimization approaches which require $\widetilde O(|\Omega| k^2)$ time.
Let $\{X_k\}_{k \in \mathbb{Z}}$ be a stationary Gaussian process with values in a separable Hilbert space $\mathcal{H}_1$, and let $G:\mathcal{H}_1 \to \mathcal{H}_2$ be an operator acting on $X_k$. Under suitable conditions on the operator $G$ and the temporal and cross-sectional correlations of $\{X_k\}_{k \in \mathbb{Z}}$, we derive a central limit theorem (CLT) for the normalized partial sums of $\{G[X_k]\}_{k \in \mathbb{Z}}$. To prove a CLT for the Hilbert space-valued process $\{G[X_k]\}_{k \in \mathbb{Z}}$, we employ techniques from the recently developed infinite dimensional Malliavin-Stein framework. In addition, we provide quantitative and continuous time versions of the derived CLT. In a series of examples, we recover and strengthen limit theorems for a wide array of statistics relevant in functional data analysis, and present a novel limit theorem in the framework of neural operators as an application of our result.
For a field $\mathbb{F}$ and integers $d$ and $k$, a set of vectors of $\mathbb{F}^d$ is called $k$-nearly orthogonal if its members are non-self-orthogonal and every $k+1$ of them include an orthogonal pair. We prove that for every prime $p$ there exists a positive constant $\delta = \delta (p)$, such that for every field $\mathbb{F}$ of characteristic $p$ and for all integers $k \geq 2$ and $d \geq k^{1/(p-1)}$, there exists a $k$-nearly orthogonal set of at least $d^{\delta \cdot k^{1/(p-1)}/ \log k}$ vectors of $\mathbb{F}^d$. In particular, for the binary field we obtain a set of $d^{\Omega( k /\log k)}$ vectors, and this is tight up to the $\log k$ term in the exponent. For comparison, the best known lower bound over the reals is $d^{\Omega( \log k / \log \log k)}$ (Alon and Szegedy, Graphs and Combin., 1999). The proof combines probabilistic and spectral arguments.
This paper outlines a methodology for constructing a geometrically smooth interpolatory curve in $\mathbb{R}^d$ applicable to oriented and flattenable points with $d\ge 2$. The construction involves four essential components: local functions, blending functions, redistributing functions, and gluing functions. The resulting curve possesses favorable attributes, including $G^2$ geometric smoothness, locality, the absence of cusps, and no self-intersection. Moreover, the algorithm is adaptable to various scenarios, such as preserving convexity, interpolating sharp corners, and ensuring sphere preservation. The paper substantiates the efficacy of the proposed method through the presentation of numerous numerical examples, offering a practical demonstration of its capabilities.
We define the relative fractional independence number of a graph $G$ with respect to another graph $H$, as $$\alpha^*(G|H)=\max_{W}\frac{\alpha(G\boxtimes W)}{\alpha(H\boxtimes W)},$$ where the maximum is taken over all graphs $W$, $G\boxtimes W$ is the strong product of $G$ and $W$, and $\alpha$ denotes the independence number. We give a non-trivial linear program to compute $\alpha^*(G|H)$ and discuss some of its properties. We show that $\alpha^*(G|H)\geq \frac{X(G)}{X(H)} \geq \frac{1}{\alpha^*(H|G)},$ where $X(G)$ can be the independence number, the zero-error Shannon capacity, the fractional independence number, the Lov\'{a}sz number, or the Schrijver's or Szegedy's variants of the Lov\'{a}sz number of a graph $G$. This inequality is the first explicit non-trivial upper bound on the ratio of the invariants of two arbitrary graphs, as mentioned earlier, which can also be used to obtain upper or lower bounds for these invariants. As explicit applications, we present new upper bounds for the ratio of the zero-error Shannon capacity of two Cayley graphs and compute new lower bounds on the Shannon capacity of certain Johnson graphs (yielding the exact value of their Haemers number). Moreover, we show that $\alpha^*(G|H)$ can be used to present a stronger version of the well-known No-Homomorphism Lemma.
We study active learning methods for single index models of the form $F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf x,\mathbf w} \in \mathbb{R}^d$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $f$ is known and Lipschitz, we show that $\tilde{O}(d)$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent ${O}(d^{2})$ bound of \cite{gajjar2023active}. Second, we show that $\tilde{O}(d)$ samples suffice even in the more difficult setting when $f$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.
We propose a volumetric formulation for computing the Optimal Transport problem defined on surfaces in $\mathbb{R}^3$, found in disciplines like optics, computer graphics, and computational methodologies. Instead of directly tackling the original problem on the surface, we define a new Optimal Transport problem on a thin tubular region, $T_{\epsilon}$, adjacent to the surface. This extension offers enhanced flexibility and simplicity for numerical discretization on Cartesian grids. The Optimal Transport mapping and potential function computed on $T_{\epsilon}$ are consistent with the original problem on surfaces. We demonstrate that, with the proposed volumetric approach, it is possible to use simple and straightforward numerical methods to solve Optimal Transport for $\Gamma = \mathbb{S}^2$ and the $2$-torus.
We consider the construction of maximal families of polynomials over the finite field $\mathbb{F}_q$, all having the same degree $n$ and a nonzero constant term, where the degree of the GCD of any two polynomials is $d$ with $1 \le d\le n$. The motivation for this problem lies in a recent construction for subspace codes based on cellular automata. More precisely, the minimum distance of such subspace codes relates to the maximum degree $d$ of the pairwise GCD in this family of polynomials. Hence, characterizing the maximal families of such polynomials is equivalent to determining the maximum cardinality of the corresponding subspace codes for a given minimum distance. We first show a lower bound on the cardinality of such families, and then focus on the specific case where $d=1$. There, we characterize the maximal families of polynomials over the binary field $\mathbb{F}_2$. Our findings prompt several more open questions, which we plan to address in an extended version of this work.
We consider the optimization problem of the form $\min_{x \in \mathbb{R}^d} f(x) \triangleq \mathbb{E}_{\xi} [F(x; \xi)]$, where the component $F(x;\xi)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most $\mathcal{O}( L^4 d^{3/2} \epsilon^{-4} + \Delta L^3 d^{3/2} \delta^{-1} \epsilon^{-4})$ stochastic zeroth-order oracle complexity to find a $(\delta,\epsilon)$-Goldstein stationary point of objective function, where $\Delta = f(x_0) - \inf_{x \in \mathbb{R}^d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} \epsilon^{-3}+ \Delta L^2 d^{3/2} \delta^{-1} \epsilon^{-3})$.
Matrix sketching, aimed at approximating a matrix $\boldsymbol{A} \in \mathbb{R}^{N\times d}$ consisting of vector streams of length $N$ with a smaller sketching matrix $\boldsymbol{B} \in \mathbb{R}^{\ell\times d}, \ell \ll N$, has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Directions algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound and provides a covariance error guarantee of $\varepsilon = \lVert \boldsymbol{A}^\top \boldsymbol{A} - \boldsymbol{B}^\top \boldsymbol{B} \rVert_2/\lVert \boldsymbol{A} \rVert_F^2$. The matrix sketching problem becomes particularly interesting in the context of sliding windows, where the goal is to approximate the matrix $\boldsymbol{A}_W$, formed by input vectors over the most recent $N$ time units. However, despite recent efforts, whether achieving the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound on sliding windows is possible has remained an open question. In this paper, we introduce the DS-FD algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound for matrix sketching over row-normalized, sequence-based sliding windows. We also present matching upper and lower space bounds for time-based and unnormalized sliding windows, demonstrating the generality and optimality of \dsfd across various sliding window models. This conclusively answers the open question regarding the optimal space bound for matrix sketching over sliding windows. Furthermore, we conduct extensive experiments with both synthetic and real-world datasets, validating our theoretical claims and thus confirming the correctness and effectiveness of our algorithm, both theoretically and empirically.
We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.