We present a new systematic approach to constructing spherical codes in dimensions $2^k$, based on Hopf foliations. Using the fact that a sphere $S^{2n-1}$ is foliated by manifolds $S_{\cos\eta}^{n-1} \times S_{\sin\eta}^{n-1}$, $\eta\in[0,\pi/2]$, we distribute points in dimension $2^k$ via a recursive algorithm from a basic construction in $\mathbb{R}^4$. Our procedure outperforms some current constructive methods in several small-distance regimes and constitutes a compromise between achieving a large number of codewords for a minimum given distance and effective constructiveness with low encoding computational cost. Bounds for the asymptotic density are derived and compared with other constructions. The encoding process has storage complexity $O(n)$ and time complexity $O(n \log n)$. We also propose a sub-optimal decoding procedure, which does not require storing the codebook and has time complexity $O(n \log n)$.
A locally testable code (LTC) is an error-correcting code that has a property-tester. The tester reads $q$ bits that are randomly chosen, and rejects words with probability proportional to their distance from the code. The parameter $q$ is called the locality of the tester. LTCs were initially studied as important components of PCPs, and since then the topic has evolved on its own. High rate LTCs could be useful in practice: before attempting to decode a received word, one can save time by first quickly testing if it is close to the code. An outstanding open question has been whether there exist "$c^3$-LTCs", namely LTCs with *c*onstant rate, *c*onstant distance, and *c*onstant locality. In this work we construct such codes based on a new two-dimensional complex which we call a left-right Cayley complex. This is essentially a graph which, in addition to vertices and edges, also has squares. Our codes can be viewed as a two-dimensional version of (the one-dimensional) expander codes, where the codewords are functions on the squares rather than on the edges.
We prove the existence of Reed-Solomon codes of any desired rate $R \in (0,1)$ that are combinatorially list-decodable up to a radius approaching $1-R$, which is the information-theoretic limit. This is established by starting with the full-length $[q,k]_q$ Reed-Solomon code over a field $\mathbb F_q$ that is polynomially larger than the desired dimension $k$, and "puncturing" it by including $k/R$ randomly chosen codeword positions. Our puncturing result is more general and applies to any code with large minimum distance: we show that a random rate $R$ puncturing of an $\mathbb F_q$-linear "mother" code whose relative distance is close enough to $1-1/q$ is list-decodable up to a radius approaching the $q$-ary list-decoding capacity bound $h_q^{-1}(1-R)$. In fact, for large $q$, or under a stronger assumption of low-bias of the mother-code, we prove that the threshold rate for list-decodability with a specific list-size (and more generally, any "local" property) of the random puncturing approaches that of fully random linear codes. Thus, all current (and future) list-decodability bounds shown for random linear codes extend automatically to random puncturings of any low-bias (or large alphabet) code. This can be viewed as a general derandomization result applicable to random linear codes. To obtain our conclusion about Reed-Solomon codes, we establish some hashing properties of field trace maps that allow us to reduce the list-decodability of RS codes to its associated trace (dual-BCH) code, and then apply our puncturing theorem to the latter. Our approach implies, essentially for free, optimal rate list-recoverability of punctured RS codes as well.
Solutions to many partial differential equations satisfy certain bounds or constraints. For example, the density and pressure are positive for equations of fluid dynamics, and in the relativistic case the fluid velocity is upper bounded by the speed of light, etc. As widely realized, it is crucial to develop bound-preserving numerical methods that preserve such intrinsic constraints. Exploring provably bound-preserving schemes has attracted much attention and is actively studied in recent years. This is however still a challenging task for many systems especially those involving nonlinear constraints. Based on some key insights from geometry, we systematically propose an innovative and general framework, referred to as geometric quasilinearization (GQL), which paves a new effective way for studying bound-preserving problems with nonlinear constraints. The essential idea of GQL is to equivalently transfer all nonlinear constraints into linear ones, through properly introducing some free auxiliary variables. We establish the fundamental principle and general theory of GQL via the geometric properties of convex regions, and propose three simple effective methods for constructing GQL. We apply the GQL approach to a variety of partial differential equations, and demonstrate its effectiveness and remarkable advantages for studying bound-preserving schemes, by diverse challenging examples and applications which cannot be easily handled by direct or traditional approaches.
Analytical understanding of how low-dimensional latent features reveal themselves in large-dimensional data is still lacking. We study this by defining a linear latent feature model with additive noise constructed from probabilistic matrices, and analytically and numerically computing the statistical distributions of pairwise correlations and eigenvalues of the correlation matrix. This allows us to resolve the latent feature structure across a wide range of data regimes set by the number of recorded variables, observations, latent features and the signal-to-noise ratio. We find a characteristic imprint of latent features in the distribution of correlations and eigenvalues and provide an analytic estimate for the boundary between signal and noise even in the absence of a clear spectral gap.
We present a deterministic $(1+\varepsilon)$-approximate maximum matching algorithm in $\mathsf{poly} 1/\varepsilon$ passes in the semi-streaming model, solving the long-standing open problem of breaking the exponential barrier in the dependence on $1/\varepsilon$. Our algorithm exponentially improves on the well-known randomized $(1/\varepsilon)^{O(1/\varepsilon)}$-pass algorithm from the seminal work by McGregor~[APPROX05], the recent deterministic algorithm by Tirodkar with the same pass complexity~[FSTTCS18]. Up to polynomial factors in $1/\varepsilon$, our work matches the state-of-the-art deterministic $(\log n / \log \log n) \cdot (1/\varepsilon)$-pass algorithm by Ahn and Guha~[TOPC18], that is allowed a dependence on the number of nodes $n$. Our result also makes progress on the Open Problem 60 at sublinear.info. Moreover, we design a general framework that simulates our approach for the streaming setting in other models of computation. This framework requires access to an algorithm computing a maximal matching and an algorithm for processing disjoint $(\mathsf{poly} 1 / \varepsilon)$-size connected components. Instantiating our framework in $\mathsf{CONGEST}$ yields a $\mathsf{poly}(\log{n}, 1/\varepsilon)$ round algorithm for computing $(1+\varepsilon$)-approximate maximum matching. In terms of the dependence on $1/\varepsilon$, this result improves exponentially state-of-the-art result by Lotker, Patt-Shamir, and Pettie~[LPSP15]. Our framework leads to the same quality of improvement in the context of the Massively Parallel Computation model as well.
The Wasserstein distance, rooted in optimal transport (OT) theory, is a popular discrepancy measure between probability distributions with various applications to statistics and machine learning. Despite their rich structure and demonstrated utility, Wasserstein distances are sensitive to outliers in the considered distributions, which hinders applicability in practice. Inspired by the Huber contamination model, we propose a new outlier-robust Wasserstein distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass to be removed from each contaminated distribution. Our formulation amounts to a highly regular optimization problem that lends itself better for analysis compared to previously considered frameworks. Leveraging this, we conduct a thorough theoretical study of $\mathsf{W}_p^\varepsilon$, encompassing characterization of optimal perturbations, regularity, duality, and statistical estimation and robustness results. In particular, by decoupling the optimization variables, we arrive at a simple dual form for $\mathsf{W}_p^\varepsilon$ that can be implemented via an elementary modification to standard, duality-based OT solvers. We illustrate the benefits of our framework via applications to generative modeling with contaminated datasets.
A family of lattice packings of $ n $-dimensional cross-polytopes ($ \ell_1 $ balls) is constructed by using the notion of Sidon sets in finite Abelian groups. The resulting density exceeds that of any prior construction by a factor of at least $ 2^{ \Theta( \frac{ n }{ \ln n } ) } $ in the asymptotic regime $ n \to \infty $.
Maximally recoverable local reconstruction codes (MR LRCs for short) have received great attention in the last few years. Various constructions have been proposed in literatures. The main focus of this topic is to construct MR LRCs over small fields. An $(N=nr,r,h,\Gd)$-MR LRC is a linear code over finite field $\F_\ell$ of length $N$, whose codeword symbols are partitioned into $n$ local groups each of size $r$. Each local group can repair any $\Gd$ erasure errors and there are further $h$ global parity checks to provide fault tolerance from more global erasure patterns. MR LRCs deployed in practice have a small number of global parities such as $h=O(1)$. In this parameter setting, all previous constructions require the field size $\ell =\Omega_h (N^{h-1-o(1)})$. It remains challenging to improve this bound. In this paper, via subspace direct sum systems, we present a construction of MR LRC with the field size $\ell= O(N^{h-2+\frac1{h-1}-o(1)})$. In particular, for the most interesting cases where $h=2,3$, we improve previous constructions by either reducing field size or removing constraints. In addition, we also offer some constructions of MR LRCs for larger global parity $h$ that have field size incomparable with known upper bounds. The main techniques used in this paper is through subspace direct sum systems that we introduce. Interestingly, subspace direct sum systems are actually equivalent to $\F_q$-linear codes over extension fields. Based on various constructions of subspace direct sum systems, we are able to construct several classes of MR LRCs.
An improved Singleton-type upper bound is presented for the list decoding radius of linear codes, in terms of the code parameters [n,k,d] and the list size L. L-MDS codes are then defined as codes that attain this bound (under a slightly stronger notion of list decodability), with 1-MDS codes corresponding to ordinary linear MDS codes. Several properties of such codes are presented; in particular, it is shown that the 2-MDS property is preserved under duality. Finally, explicit constructions for 2-MDS codes are presented through generalized Reed-Solomon (GRS) codes.
The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$-VAE, in low dimensions on other data types.