Despite many applications, dimensionality reduction in the $\ell_1$-norm is much less understood than in the Euclidean norm. We give two new oblivious dimensionality reduction techniques for the $\ell_1$-norm which improve exponentially over prior ones: 1. We design a distribution over random matrices $S \in \mathbb{R}^{r \times n}$, where $r = 2^{\tilde O(d/(\varepsilon \delta))}$, such that given any matrix $A \in \mathbb{R}^{n \times d}$, with probability at least $1-\delta$, simultaneously for all $x$, $\|SAx\|_1 = (1 \pm \varepsilon)\|Ax\|_1$. Note that $S$ is linear, does not depend on $A$, and maps $\ell_1$ into $\ell_1$. Our distribution provides an exponential improvement on the previous best known map of Wang and Woodruff (SODA, 2019), which required $r = 2^{2^{\Omega(d)}}$, even for constant $\varepsilon$ and $\delta$. Our bound is optimal, up to a polynomial factor in the exponent, given a known $2^{\sqrt d}$ lower bound for constant $\varepsilon$ and $\delta$. 2. We design a distribution over matrices $S \in \mathbb{R}^{k \times n}$, where $k = 2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, such that given any $q$-mode tensor $A \in (\mathbb{R}^{d})^{\otimes q}$, one can estimate the entrywise $\ell_1$-norm $\|A\|_1$ from $S(A)$. Moreover, $S = S^1 \otimes S^2 \otimes \cdots \otimes S^q$ and so given vectors $u_1, \ldots, u_q \in \mathbb{R}^d$, one can compute $S(u_1 \otimes u_2 \otimes \cdots \otimes u_q)$ in time $2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, which is much faster than the $d^q$ time required to form $u_1 \otimes u_2 \otimes \cdots \otimes u_q$. Our linear map gives a streaming algorithm for independence testing using space $2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, improving the previous doubly exponential $(\varepsilon^{-1} \log d)^{q^{O(q)}}$ space bound of Braverman and Ostrovsky (STOC, 2010).
We present a $(1+\frac{k}{k+2})$-approximation algorithm for the Maximum $k$-dependent Set problem on bipartite graphs for any $k\ge1$. For a graph with $n$ vertices and $m$ edges, the algorithm runs in $O(k m \sqrt{n})$ time and improves upon the previously best-known approximation ratio of $1+\frac{k}{k+1}$ established by Kumar et al. [Theoretical Computer Science, 526: 90--96 (2014)]. Our proof also indicates that the algorithm retains its approximation ratio when applied to the (more general) class of K\"{o}nig-Egerv\'{a}ry graphs.
We show that a constant factor approximation of the shortest and closest lattice vector problem in any norm can be computed in time $2^{0.802\, n}$. This contrasts the corresponding $2^n$ time, (gap)-SETH based lower bounds for these problems that even apply for small constant approximation. For both problems, $\mathrm{SVP}$ and $\mathrm{CVP}$, we reduce to the case of the Euclidean norm. A key technical ingredient in that reduction is a twist of Milman's construction of an $M$-ellipsoid which approximates any symmetric convex body $K$ with an ellipsoid $\mathcal{E}$ so that $2^{\varepsilon n}$ translates of a constant scaling of $\mathcal{E}$ can cover $K$ and vice versa.
In this article, we investigate the spectral behavior of random features kernel matrices of the type ${\bf K} = \mathbb{E}_{{\bf w}} \left[\sigma\left({\bf w}^{\sf T}{\bf x}_i\right)\sigma\left({\bf w}^{\sf T}{\bf x}_j\right)\right]_{i,j=1}^n$, with nonlinear function $\sigma(\cdot)$, data ${\bf x}_1, \ldots, {\bf x}_n \in \mathbb{R}^p$, and random projection vector ${\bf w} \in \mathbb{R}^p$ having i.i.d. entries. In a high-dimensional setting where the number of data $n$ and their dimension $p$ are both large and comparable, we show, under a Gaussian mixture model for the data, that the eigenspectrum of ${\bf K}$ is independent of the distribution of the i.i.d.(zero-mean and unit-variance) entries of ${\bf w}$, and only depends on $\sigma(\cdot)$ via its (generalized) Gaussian moments $\mathbb{E}_{z\sim \mathcal N(0,1)}[\sigma'(z)]$ and $\mathbb{E}_{z\sim \mathcal N(0,1)}[\sigma''(z)]$. As a result, for any kernel matrix ${\bf K}$ of the form above, we propose a novel random features technique, called Ternary Random Feature (TRF), that (i) asymptotically yields the same limiting kernel as the original ${\bf K}$ in a spectral sense and (ii) can be computed and stored much more efficiently, by wisely tuning (in a data-dependent manner) the function $\sigma$ and the random vector ${\bf w}$, both taking values in $\{-1,0,1\}$. The computation of the proposed random features requires no multiplication, and a factor of $b$ times less bits for storage compared to classical random features such as random Fourier features, with $b$ the number of bits to store full precision values. Besides, it appears in our experiments on real data that the substantial gains in computation and storage are accompanied with somewhat improved performances compared to state-of-the-art random features compression/quantization methods.
Langevin Monte Carlo (LMC) is a popular Bayesian sampling method. For the log-concave distribution function, the method converges exponentially fast, up to a controllable discretization error. However, the method requires the evaluation of a full gradient in each iteration, and for a problem on $\mathbb{R}^d$, this amounts to $d$ times partial derivative evaluations per iteration. The cost is high when $d\gg1$. In this paper, we investigate how to enhance computational efficiency through the application of RCD (random coordinate descent) on LMC. There are two sides of the theory: 1 By blindly applying RCD to LMC, one surrogates the full gradient by a randomly selected directional derivative per iteration. Although the cost is reduced per iteration, the total number of iteration is increased to achieve a preset error tolerance. Ultimately there is no computational gain; 2 We then incorporate variance reduction techniques, such as SAGA (stochastic average gradient) and SVRG (stochastic variance reduced gradient), into RCD-LMC. It will be proved that the cost is reduced compared with the classical LMC, and in the underdamped case, convergence is achieved with the same number of iterations, while each iteration requires merely one-directional derivative. This means we obtain the best possible computational cost in the underdamped-LMC framework.
An orthogonal representation of a graph $G$ over a field $\mathbb{F}$ is an assignment of a vector $u_v \in \mathbb{F}^t$ to every vertex $v$ of $G$, such that $\langle u_v,u_v \rangle \neq 0$ for every vertex $v$ and $\langle u_v,u_{v'} \rangle = 0$ whenever $v$ and $v'$ are adjacent in $G$. The locality of the orthogonal representation is the largest dimension of a subspace spanned by the vectors associated with a closed neighborhood in the graph. We introduce a novel graph parameter, called the local orthogonality dimension, defined for a given graph $G$ and a given field $\mathbb{F}$, as the smallest possible locality of an orthogonal representation of $G$ over $\mathbb{F}$. We investigate the usefulness of topological methods for proving lower bounds on the local orthogonality dimension. We prove that graphs for which topological methods imply a lower bound of $t$ on their chromatic number have local orthogonality dimension at least $\lceil t/2 \rceil +1$ over every field, strengthening a result of Simonyi and Tardos on the local chromatic number. We show that for certain graphs this lower bound is tight, whereas for others, the local orthogonality dimension over the reals is equal to the chromatic number. More generally, we prove that for every complement of a line graph, the local orthogonality dimension over $\mathbb{R}$ coincides with the chromatic number. This strengthens a recent result by Daneshpajouh, Meunier, and Mizrahi, who proved that the local and standard chromatic numbers of these graphs are equal. As another extension of their result, we prove that the local and standard chromatic numbers are equal for some additional graphs, from the family of Kneser graphs. We also show an $\mathsf{NP}$-hardness result for the local orthogonality dimension and present an application of this graph parameter to the index coding problem from information theory.
Given a point set $P$ in the plane, we seek a subset $Q\subseteq P$, whose convex hull gives a smaller and thus simpler representation of the convex hull of $P$. Specifically, let $cost(Q,P)$ denote the Hausdorff distance between the convex hulls $\mathcal{CH}(Q)$ and $\mathcal{CH}(P)$. Then given a value $\varepsilon>0$ we seek the smallest subset $Q\subseteq P$ such that $cost(Q,P)\leq \varepsilon$. We also consider the dual version, where given an integer $k$, we seek the subset $Q\subseteq P$ which minimizes $cost(Q,P)$, such that $|Q|\leq k$. For these problems, when $P$ is in convex position, we respectively give an $O(n\log^2 n)$ time algorithm and an $O(n\log^3 n)$ time algorithm, where the latter running time holds with high probability. When there is no restriction on $P$, we show the problem can be reduced to APSP in an unweighted directed graph, yielding an $O(n^{2.5302})$ time algorithm when minimizing $k$ and an $O(\min\{n^{2.5302}, kn^{2.376}\})$ time algorithm when minimizing $\varepsilon$, using prior results for APSP. Finally, we show our near linear algorithms for convex position give 2-approximations for the general case.
We study the problem of maximizing the probability that (i) an electric component or financial institution $X$ does not default before another component or institution $Y$ and (ii) that $X$ and $Y$ default jointly within the class of all random variables $X,Y$ with given univariate continuous distribution functions $F$ and $G$, respectively, and show that the maximization problems correspond to finding copulas maximizing the mass of the endograph $\Gamma^\leq(T)$ and the graph $\Gamma(T)$ of $T=G \circ F^-$, respectively. After providing simple, copula-based proofs for the existence of copulas attaining the two maxima $\overline{m}_T$ and $\overline{w}_T$ we generalize the obtained results to the case of general (not necessarily monotonic) transformations $T:[0,1] \rightarrow [0,1]$ and derive simple and easily calculable formulas for $\overline{m}_T$ and $\overline{w}_T$ involving the distribution function $F_T$ of $T$ (interpreted as random variable on $[0,1]$). The latter are then used to charac\-terize all non-decreasing transformations $T:[0,1] \rightarrow [0,1]$ for which $\overline{m}_T$ and $\overline{w}_T$ coincide. A strongly consistent estimator for the maximum probability that $X$ does not default before $Y$ is derived and proven to be asymptotically normal under very mild regularity conditions. Several examples and graphics illustrate the main results and falsify some seemingly natural conjectures.
With the rapid development of data collection techniques, complex data objects that are not in the Euclidean space are frequently encountered in new statistical applications. Fr\'echet regression model (Peterson & M\"uller 2019) provides a promising framework for regression analysis with metric space-valued responses. In this paper, we introduce a flexible sufficient dimension reduction (SDR) method for Fr\'echet regression to achieve two purposes: to mitigate the curse of dimensionality caused by high-dimensional predictors, and to provide a tool for data visualization for Fr\'echet regression. Our approach is flexible enough to turn any existing SDR method for Euclidean (X,Y) into one for Euclidean X and metric space-valued Y. The basic idea is to first map the metric-space valued random object $Y$ to a real-valued random variable $f(Y)$ using a class of functions, and then perform classical SDR to the transformed data. If the class of functions is sufficiently rich, then we are guaranteed to uncover the Fr\'echet SDR space. We showed that such a class, which we call an ensemble, can be generated by a universal kernel. We established the consistency and asymptotic convergence rate of the proposed methods. The finite-sample performance of the proposed methods is illustrated through simulation studies for several commonly encountered metric spaces that include Wasserstein space, the space of symmetric positive definite matrices, and the sphere. We illustrated the data visualization aspect of our method by exploring the human mortality distribution data across countries and by studying the distribution of hematoma density.
Given an undirected $n$-vertex planar graph $G=(V,E,\omega)$ with non-negative edge weight function $\omega:E\rightarrow \mathbb R$ and given an assigned label to each vertex, a vertex-labeled distance oracle is a data structure which for any query consisting of a vertex $u$ and a label $\lambda$ reports the shortest path distance from $u$ to the nearest vertex with label $\lambda$. We show that if there is a distance oracle for undirected $n$-vertex planar graphs with non-negative edge weights using $s(n)$ space and with query time $q(n)$, then there is a vertex-labeled distance oracle with $\tilde{O}(s(n))$ space and $\tilde{O}(q(n))$ query time. Using the state-of-the-art distance oracle of Long and Pettie, our construction produces a vertex-labeled distance oracle using $n^{1+o(1)}$ space and query time $\tilde O(1)$ at one extreme, $\tilde O(n)$ space and $n^{o(1)}$ query time at the other extreme, as well as such oracles for the full tradeoff between space and query time obtained in their paper. This is the first non-trivial exact vertex-labeled distance oracle for planar graphs and, to our knowledge, for any interesting graph class other than trees.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.