A long line of work in the past two decades or so established close connections between several different pseudorandom objects and applications. These connections essentially show that an asymptotically optimal construction of one central object will lead to asymptotically optimal solutions to all the others. However, despite considerable effort, previous works can get close but still lack one final step to achieve truly asymptotically optimal constructions. In this paper we provide the last missing link, thus simultaneously achieving explicit, asymptotically optimal constructions and solutions for various well studied extractors and applications, that have been the subjects of long lines of research. Our results include: Asymptotically optimal seeded non-malleable extractors, which in turn give two source extractors for asymptotically optimal min-entropy of $O(\log n)$, explicit constructions of $K$-Ramsey graphs on $N$ vertices with $K=\log^{O(1)} N$, and truly optimal privacy amplification protocols with an active adversary. Two source non-malleable extractors and affine non-malleable extractors for some linear min-entropy with exponentially small error, which in turn give the first explicit construction of non-malleable codes against $2$-split state tampering and affine tampering with constant rate and \emph{exponentially} small error. Explicit extractors for affine sources, sumset sources, interleaved sources, and small space sources that achieve asymptotically optimal min-entropy of $O(\log n)$ or $2s+O(\log n)$ (for space $s$ sources). An explicit function that requires strongly linear read once branching programs of size $2^{n-O(\log n)}$, which is optimal up to the constant in $O(\cdot)$. Previously, even for standard read once branching programs, the best known size lower bound for an explicit function is $2^{n-O(\log^2 n)}$.
This note is an attempt to unconditionally prove the existence of weak one way functions (OWF). Starting from a provably intractable decision problem $L_D$ (whose existence is nonconstructively assured from the well-known discrete time-hierarchy theorem from complexity theory), we construct another intractable decision problem $L\subseteq \{0,1\}^*$ that has its words scattered across $\{0,1\}^\ell$ at a relative frequency $p(\ell)$, for which upper and lower bounds can be worked out. The value $p(\ell)$ is computed from the density of the language within $\{0,1\}^\ell$ divided by the total word count $2^\ell$. It corresponds to the probability of retrieving a yes-instance of a decision problem upon a uniformly random draw from $\{0,1\}^\ell$. The trick to find a language with known bounds on $p(\ell)$ relies on switching from $L_D$ to $L_0:=L_D\cap L'$, where $L'$ is an easy-to-decide language with a known density across $\{0,1\}^*$. In defining $L'$ properly (and upon a suitable G\"odel numbering), the hardness of deciding $L_D\cap L'$ is inherited from $L_D$, while its density is controlled by that of $L'$. The lower and upper approximation of $p(\ell)$ then let us construct an explicit threshold function (as in random graph theory) that can be used to efficiently and intentionally sample yes- or no-instances of the decision problem (language) $L_0$ (however, without any auxiliary information that could ease the decision like a polynomial witness). In turn, this allows to construct a weak OWF that encodes a bit string $w\in\{0,1\}^*$ by efficiently (in polynomial time) emitting a sequence of randomly constructed intractable decision problems, whose answers correspond to the preimage $w$.
We study the size of a neural network needed to approximate the maximum function over $d$ inputs, in the most basic setting of approximating with respect to the $L_2$ norm, for continuous distributions, for a network that uses ReLU activations. We provide new lower and upper bounds on the width required for approximation across various depths. Our results establish new depth separations between depth 2 and 3, and depth 3 and 5 networks, as well as providing a depth $\mathcal{O}(\log(\log(d)))$ and width $\mathcal{O}(d)$ construction which approximates the maximum function, significantly improving upon the depth requirements of the best previously known bounds for networks with linearly-bounded width. Our depth separation results are facilitated by a new lower bound for depth 2 networks approximating the maximum function over the uniform distribution, assuming an exponential upper bound on the size of the weights. Furthermore, we are able to use this depth 2 lower bound to provide tight bounds on the number of neurons needed to approximate the maximum by a depth 3 network. Our lower bounds are of potentially broad interest as they apply to the widely studied and used \emph{max} function, in contrast to many previous results that base their bounds on specially constructed or pathological functions and distributions.
Storage codes on graphs are an instance of \emph{codes with locality}, which are used in distributed storage schemes to provide local repairability. Specifically, the nodes of the graph correspond to storage servers, and the neighbourhood of each server constitute the set of servers it can query to repair its stored data in the event of a failure. A storage code on a graph with $n$-vertices is a set of $n$-length codewords over $\field_q$ where the $i$th codeword symbol is stored in server $i$, and it can be recovered by querying the neighbours of server $i$ according to the underlying graph. In this work, we look at binary storage codes whose repair function is the parity check, and characterise the tradeoff between the locality of the code and its rate. Specifically, we show that the maximum rate of a code on $n$ vertices with locality $r$ is bounded between $1-1/n\lceil n/(r+1)\rceil$ and $1-1/n\lceil n/(r+1)\rceil$. The lower bound on the rate is derived by constructing an explicit family of graphs with locality $r$, while the upper bound is obtained via a lower bound on the binary-field rank of a class of symmetric binary matrices. Our upper bound on maximal rate of a storage code matches the upper bound on the larger class of codes with locality derived by Tamo and Barg. As a corollary to our result, we obtain the following asymptotic separation result: given a sequence $r(n), n\geq 1$, there exists a sequence of graphs on $n$-vertices with storage codes of rate $1-o(1)$ if and only if $r(n)=\omega(1)$.
We design a novel algorithm for optimal transport by drawing from the entropic optimal transport, mirror descent and conjugate gradients literatures. Our algorithm is able to compute optimal transport costs with arbitrary accuracy without running into numerical stability issues. The algorithm is implemented efficiently on GPUs and is shown empirically to converge more quickly than traditional algorithms such as Sinkhorn's Algorithm both in terms of number of iterations and wall-clock time in many cases. We pay particular attention to the entropy of marginal distributions and show that high entropy marginals make for harder optimal transport problems, for which our algorithm is a good fit. We provide a careful ablation analysis with respect to algorithm and problem parameters, and present benchmarking over the MNIST dataset. The results suggest that our algorithm can be a useful addition to the practitioner's optimal transport toolkit. Our code is open-sourced at //github.com/adaptive-agents-lab/MDOT-PNCG .
The classical way of extending an $[n, k, d]$ linear code $\C$ is to add an overall parity-check coordinate to each codeword of the linear code $\C$. The extended code, denoted by $\overline{\C}$ and called the standardly extended code of $\C$, is a linear code with parameters $[n+1, k, \bar{d}]$, where $\bar{d}=d$ or $\bar{d}=d+1$. This extending technique is one of the classical ways to construct a new linear code with a known linear code and a way to study the original code $\C$ via its extended code $\overline{\C}$. The standardly extended codes of some families of binary linear codes have been studied to some extent. However, not much is known about the standardly extended codes of nonbinary codes. For example, the minimum distances of the standardly extended codes of the nonbinary Hamming codes remain open for over 70 years. The first objective of this paper is to introduce the nonstandardly extended codes of a linear code and develop some general theory for extended linear codes. The second objective is to study the extended codes of several families of linear codes, including cyclic codes, projective two-weight codes and nonbinary Hamming codes. Many families of distance-optimal linear codes are obtained with the extending technique.
Graph theory and enumerative combinatorics are two branches of mathematical sciences that have developed astonishingly over the past one hundred years. It is especially important to point out that graph theory employs combinatorial techniques to solve key problems of characterization, construction, enumeration and classification of an enormous set of different families of graphs. This paper describes the construction of two classes of bigeodetic blocks using balanced incomplete block designs (BIBDs). On the other hand, even though graph theory and combinatorics have a close relationship, the opposite problem, that is, considering certain graph constructions when solving problems of combinatorics is not common, but possible. The construction of the second class of bigeodetic blocks described in this paper represents an example of how graph theory could somehow give a clue to the description of a problem of existence in combinatorics. We refer to the problem of existence for biplanes. A connection between the mentioned construction, the Bruck-Ryser-Chowla theorem and the problem of existence for biplanes is considered.
Information diagram and the I-measure are useful mnemonics where random variables are treated as sets, and entropy and mutual information are treated as a signed measure. Although the I-measure has been successful in machine proofs of entropy inequalities, the theoretical underpinning of the ``random variables as sets'' analogy has been unclear until the recent works on mappings from random variables to sets by Ellerman (recovering order-$2$ Tsallis entropy over general probability space), and Down and Mediano (recovering Shannon entropy over discrete probability space). We generalize these constructions by designing a mapping which recovers the Shannon entropy (and the information density) over general probability space. Moreover, it has an intuitive interpretation based on the arrival time in a Poisson process, allowing us to understand the union, intersection and difference between (sets corresponding to) random variables and events. Cross entropy, KL divergence, and conditional entropy given an event, can be obtained as set intersections. We propose a generalization of the information diagram that also includes events, and demonstrate its usage by a diagrammatic proof of Fano's inequality.
The edit distance is a fundamental measure of sequence similarity, defined as the minimum number of character insertions, deletions, and substitutions needed to transform one string into the other. Given two strings of length at most $n$, simple dynamic programming computes their edit distance exactly in $O(n^2)$ time, which is also the best possible (up to subpolynomial factors) assuming the Strong Exponential Time Hypothesis (SETH). The last few decades have seen tremendous progress in edit distance approximation, where the runtime has been brought down to subquadratic, near-linear, and even sublinear at the cost of approximation. In this paper, we study the dynamic edit distance problem, where the strings change dynamically as the characters are substituted, inserted, or deleted over time. Each change may happen at any location of either of the two strings. The goal is to maintain the (exact or approximate) edit distance of such dynamic strings while minimizing the update time. The exact edit distance can be maintained in $\tilde{O}(n)$ time per update (Charalampopoulos, Kociumaka, Mozes; 2020), which is again tight assuming SETH. Unfortunately, even with the unprecedented progress in edit distance approximation in the static setting, strikingly little is known regarding dynamic edit distance approximation. Utilizing the off-the-shelf tools, it is possible to achieve an $O(n^{c})$-approximation in $n^{0.5-c+o(1)}$ update time for any constant $c\in [0,\frac16]$. Improving upon this trade-off remains open. The contribution of this work is a dynamic $n^{o(1)}$-approximation algorithm with amortized expected update time of $n^{o(1)}$. In other words, we bring the approximation-ratio and update-time product down to $n^{o(1)}$. Our solution utilizes an elegant framework of precision sampling tree for edit distance approximation (Andoni, Krauthgamer, Onak; 2010).
The absence of unknown timing information about the microphones recording start time and the sources emission time presents a challenge in several applications, including joint microphones and sources localization. Compared with traditional optimization methods that try to estimate unknown timing information directly, low rank property (LRP) contains an additional low rank structure that facilitates a linear constraint of unknown timing information for formulating corresponding low rank structure information, enabling the achievement of global optimal solutions of unknown timing information with suitable initialization. However, the initialization of unknown timing information is random, resulting in local minimal values for estimation of the unknown timing information. In this paper, we propose a combined low rank approximation method to alleviate the effect of random initialization on the estimation of unknown timing information. We define three new variants of LRP supported by proof that allows unknown timing information to benefit from more low rank structure information. Then, by utilizing the low rank structure information from both LRP and proposed variants of LRP, four linear constraints of unknown timing information are presented. Finally, we use the proposed combined low rank approximation algorithm to obtain global optimal solutions of unknown timing information through the four available linear constraints. Experimental results demonstrate superior performance of our method compared to state-of-the-art approaches in terms of recovery rate (the number of successful initialization for any configuration), convergency rate (the number of successfully recovered configurations), and estimation errors of unknown timing information.
A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In this work, we argue that existing pretext tasks inevitably introduce biases into the learned representation, which in turn leads to biased transfer performance on various downstream tasks. To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks. Inspired by the principle of maximum entropy in information theory, we hypothesize that a generalizable representation should be the one that admits the maximum entropy among all plausible representations. To make the objective end-to-end trainable, we propose to leverage the minimal coding length in lossy data coding as a computationally tractable surrogate for the entropy, and further derive a scalable reformulation of the objective that allows fast computation. Extensive experiments demonstrate that MEC learns a more generalizable representation than previous methods based on specific pretext tasks. It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking. Interestingly, we show that existing batch-wise and feature-wise self-supervised objectives could be seen equivalent to low-order approximations of MEC. Code and pre-trained models are available at //github.com/xinliu20/MEC.