Sequences of numbers (either natural integers, or integers or rational) of level $k \in \mathbb{N}$ have been defined in \cite{Fra05,Fra-Sen06} as the sequences which can be computed by deterministic pushdown automata of level $k$. This definition has been extended to sequences of {\em words} indexed by {\em words} in \cite{Sen07,Fer-Mar-Sen14}. We characterise here the sequences of level 3 as the compositions of two HDT0L-systems. Two applications are derived: - the sequences of rational numbers of level 3 are characterised by polynomial recurrences - the equality problem for sequences of rational numbers of level 3 is decidable.
Let $\Omega = [0,1]^d$ be the unit cube in $\mathbb{R}^d$. We study the problem of how efficiently, in terms of the number of parameters, deep neural networks with the ReLU activation function can approximate functions in the Sobolev spaces $W^s(L_q(\Omega))$ and Besov spaces $B^s_r(L_q(\Omega))$, with error measured in the $L_p(\Omega)$ norm. This problem is important when studying the application of neural networks in a variety of fields, including scientific computing and signal processing, and has previously been solved only when $p=q=\infty$. Our contribution is to provide a complete solution for all $1\leq p,q\leq \infty$ and $s > 0$ for which the corresponding Sobolev or Besov space compactly embeds into $L_p$. The key technical tool is a novel bit-extraction technique which gives an optimal encoding of sparse vectors. This enables us to obtain sharp upper bounds in the non-linear regime where $p > q$. We also provide a novel method for deriving $L_p$-approximation lower bounds based upon VC-dimension when $p < \infty$. Our results show that very deep ReLU networks significantly outperform classical methods of approximation in terms of the number of parameters, but that this comes at the cost of parameters which are not encodable.
Given two $n$-element structures, $\mathcal{A}$ and $\mathcal{B}$, which can be distinguished by a sentence of $k$-variable first-order logic ($\mathcal{L}^k$), what is the minimum $f(n)$ such that there is guaranteed to be a sentence $\phi \in \mathcal{L}^k$ with at most $f(n)$ quantifiers, such that $\mathcal{A} \models \phi$ but $\mathcal{B} \not \models \phi$? We will present various results related to this question obtained by using the recently introduced QVT games. In particular, we show that when we limit the number of variables, there can be an exponential gap between the quantifier depth and the quantifier number needed to separate two structures. Through the lens of this question, we will highlight some difficulties that arise in analysing the QVT game and some techniques which can help to overcome them. We also show, in the setting of the existential-positive fragment, how to lift quantifier depth lower bounds to quantifier number lower bounds. This leads to almost tight bounds.
Assuming the Exponential Time Hypothesis (ETH), a result of Marx (ToC'10) implies that there is no $f(k)\cdot n^{o(k/\log k)}$ time algorithm that can solve 2-CSPs with $k$ constraints (over a domain of arbitrary large size $n$) for any computable function $f$. This lower bound is widely used to show that certain parameterized problems cannot be solved in time $f(k)\cdot n^{o(k/\log k)}$ time (assuming the ETH). The purpose of this note is to give a streamlined proof of this result.
We provide a simple $(1-O(\frac{1}{\sqrt{k}}))$-selectable Online Contention Resolution Scheme for $k$-uniform matroids against a fixed-order adversary. If $A_i$ and $G_i$ denote the set of selected elements and the set of realized active elements among the first $i$ (respectively), our algorithm selects with probability $1-\frac{1}{\sqrt{k}}$ any active element $i$ such that $|A_{i-1}| + 1 \leq (1-\frac{1}{\sqrt{k}})\cdot \mathbb{E}[|G_i|]+\sqrt{k}$. This implies a $(1-O(\frac{1}{\sqrt{k}}))$ prophet inequality against fixed-order adversaries for $k$-uniform matroids that is considerably simpler than previous algorithms [Ala14, AKW14, JMZ22]. We also prove that no OCRS can be $(1-\Omega(\sqrt{\frac{\log k}{k}}))$-selectable for $k$-uniform matroids against an almighty adversary. This guarantee is matched by the (known) simple greedy algorithm that accepts every active element with probability $1-\Theta(\sqrt{\frac{\log k}{k}})$ [HKS07].
Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to \emph{one} of the two classes. Special cases of this problem are well-known: with complete knowledge of class distributions ($n=\infty$) the problem is solved optimally by the likelihood-ratio test; when $m=1$ it corresponds to binary classification; and when $m\approx n$ it is equivalent to two-sample testing. The intermediate settings occur in the field of likelihood-free inference, where labeled samples are obtained by running forward simulations and the unlabeled sample is collected experimentally. In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed. In this work we (a) introduce a generalization where unlabeled samples come from a mixture of the two classes -- a case often encountered in practice; (b) study the minimax sample complexity for non-parametric classes of densities under \textit{maximum mean discrepancy} (MMD) separation; and (c) investigate the empirical performance of kernels parameterized by neural networks on two tasks: detection of the Higgs boson and detection of planted DDPM generated images amidst CIFAR-10 images. For both problems we confirm the existence of the theoretically predicted asymmetric $m$ vs $n$ trade-off.
A set of vectors $S \subseteq \mathbb{R}^d$ is $(k_1,\varepsilon)$-clusterable if there are $k_1$ balls of radius $\varepsilon$ that cover $S$. A set of vectors $S \subseteq \mathbb{R}^d$ is $(k_2,\delta)$-far from being clusterable if there are at least $k_2$ vectors in $S$, with all pairwise distances at least $\delta$. We propose a probabilistic algorithm to distinguish between these two cases. Our algorithm reaches a decision by only looking at the extreme values of a scalar valued hash function, defined by a random field, on $S$; hence, it is especially suitable in distributed and online settings. An important feature of our method is that the algorithm is oblivious to the number of vectors: in the online setting, for example, the algorithm stores only a constant number of scalars, which is independent of the stream length. We introduce random field hash functions, which are a key ingredient in our paradigm. Random field hash functions generalize locality-sensitive hashing (LSH). In addition to the LSH requirement that ``nearby vectors are hashed to similar values", our hash function also guarantees that the ``hash values are (nearly) independent random variables for distant vectors". We formulate necessary conditions for the kernels which define the random fields applied to our problem, as well as a measure of kernel optimality, for which we provide a bound. Then, we propose a method to construct kernels which approximate the optimal one.
We construct and analyze finite element approximations of the Einstein tensor in dimension $N \ge 3$. We focus on the setting where a smooth Riemannian metric tensor $g$ on a polyhedral domain $\Omega \subset \mathbb{R}^N$ has been approximated by a piecewise polynomial metric $g_h$ on a simplicial triangulation $\mathcal{T}$ of $\Omega$ having maximum element diameter $h$. We assume that $g_h$ possesses single-valued tangential-tangential components on every codimension-1 simplex in $\mathcal{T}$. Such a metric is not classically differentiable in general, but it turns out that one can still attribute meaning to its Einstein curvature in a distributional sense. We study the convergence of the distributional Einstein curvature of $g_h$ to the Einstein curvature of $g$ under refinement of the triangulation. We show that in the $H^{-2}(\Omega)$-norm, this convergence takes place at a rate of $O(h^{r+1})$ when $g_h$ is an optimal-order interpolant of $g$ that is piecewise polynomial of degree $r \ge 1$. We provide numerical evidence to support this claim.
We explore the maximum likelihood degree of a homogeneous polynomial $F$ on a projective variety $X$, $\mathrm{MLD}_F(X)$, which generalizes the concept of Gaussian maximum likelihood degree. We show that $\mathrm{MLD}_F(X)$ is equal to the count of critical points of a rational function on $X$, and give different geometric characterizations of it via topological Euler characteristic, dual varieties, and Chern classes.
We study the convergence of specific inexact alternating projections for two non-convex sets in a Euclidean space. The $\sigma$-quasioptimal metric projection ($\sigma \geq 1$) of a point $x$ onto a set $A$ consists of points in $A$ the distance to which is at most $\sigma$ times larger than the minimal distance $\mathrm{dist}(x,A)$. We prove that quasioptimal alternating projections, when one or both projections are quasioptimal, converge locally and linearly for super-regular sets with transversal intersection. The theory is motivated by the successful application of alternating projections to low-rank matrix and tensor approximation. We focus on two problems -- nonnegative low-rank approximation and low-rank approximation in the maximum norm -- and develop fast alternating-projection algorithms for matrices and tensor trains based on cross approximation and acceleration techniques. The numerical experiments confirm that the proposed methods are efficient and suggest that they can be used to regularise various low-rank computational routines.
Given an undirected possibly weighted $n$-vertex graph $G=(V,E)$ and a set $\mathcal{P}\subseteq V^2$ of pairs, a subgraph $S=(V,E')$ is called a ${\cal P}$-pairwise $\alpha$-spanner of $G$, if for every pair $(u,v)\in\mathcal{P}$ we have $d_S(u,v)\leq\alpha\cdot d_G(u,v)$. The parameter $\alpha$ is called the stretch of the spanner, and its size overhead is define as $\frac{|E'|}{|{\cal P}|}$. A surprising connection was recently discussed between the additive stretch of $(1+\epsilon,\beta)$-spanners, to the hopbound of $(1+\epsilon,\beta)$-hopsets. A long sequence of works showed that if the spanner/hopset has size $\approx n^{1+1/k}$ for some parameter $k\ge 1$, then $\beta\approx\left(\frac1\epsilon\right)^{\log k}$. In this paper we establish a new connection to the size overhead of pairwise spanners. In particular, we show that if $|{\cal P}|\approx n^{1+1/k}$, then a ${\cal P}$-pairwise $(1+\epsilon)$-spanner must have size at least $\beta\cdot |{\cal P}|$ with $\beta\approx\left(\frac1\epsilon\right)^{\log k}$ (a near matching upper bound was recently shown in \cite{ES23}). We also extend the connection between pairwise spanners and hopsets to the large stretch regime, by showing nearly matching upper and lower bounds for ${\cal P}$-pairwise $\alpha$-spanners. In particular, we show that if $|{\cal P}|\approx n^{1+1/k}$, then the size overhead is $\beta\approx\frac k\alpha$. A source-wise spanner is a special type of pairwise spanner, for which ${\cal P}=A\times V$ for some $A\subseteq V$. A prioritized spanner is given also a ranking of the vertices $V=(v_1,\dots,v_n)$, and is required to provide improved stretch for pairs containing higher ranked vertices. By using a sequence of reductions, we improve on the state-of-the-art results for source-wise and prioritized spanners.