We study the problem of gradient descent learning of a single-index target function $f_*(\boldsymbol{x}) = \textstyle\sigma_*\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $\sigma_*:\mathbb{R}\to\mathbb{R}$ is an unknown degree $q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with $n\gtrsim d^{\Theta(p)}$ samples, and such statistical complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary polynomial link function with a sample and runtime complexity of $n \asymp T \asymp C(q) \cdot d\mathrm{polylog} d$, where constant $C(q)$ only depends on the degree of $\sigma_*$, regardless of information exponent; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.
A global approximation method of Nystr\"om type is explored for the numerical solution of a class of nonlinear integral equations of the second kind. The cases of smooth and weakly singular kernels are both considered. In the first occurrence, the method uses a Gauss-Legendre rule whereas in the second one resorts to a product rule based on Legendre nodes. Stability and convergence are proved in functional spaces equipped with the uniform norm and several numerical tests are given to show the good performance of the proposed method. An application to the interior Neumann problem for the Laplace equation with nonlinear boundary conditions is also considered.
In this note we propose a new algorithm for checking whether two counting functions on a free monoid $M_r$ of rank $r$ are equivalent modulo a bounded function. The previously known algorithm has time complexity $O(n)$ for all ranks $r>2$, however in case $r=2$ it was estimated only as $O(n^2)$. Here we apply a new approach, based on explicit basis expansion and weighted rectangles summation, which allows us to construct a much simpler algorithm with time complexity $O(n)$ for any $r\geq 2$.
We provide, for each natural number $n$ and each class among $D_n(\Sigma^0_1)$, $\bar D_n(\Sigma^0_1)$ and $D_{2n+1}(\Sigma^0_1)\oplus\bar D_{2n+1}(\Sigma^0_1)$, a regular language whose associated omega-power is complete for this class.
We consider the simultaneously fast and in-place computation of the Euclidean polynomial modular remainder $R(X) $\not\equiv$ A(X) \mod B(X)$ with $A$ and $B$ of respective degrees $n$ and $m $\le$ n$. But fast algorithms for this usually come at the expense of (potentially large) extra temporary space. To remain in-place a further issue is to avoid the storage of the whole quotient $Q(X)$ such that $A=BQ+R$. If the multiplication of two polynomials of degree $k$ can be performed with $M(k)$ operations and $O(k)$ extra space, and if it is allowed to use the input space of $A$ or $B$ for intermediate computations, but putting $A$ and $B$ back to their initial states after the completion of the remainder computation, we here propose an in-place algorithm (that is with its extra required space reduced to $O(1)$ only) using at most $O(n/m M(m)\log(m)$ arithmetic operations, if $\M(m)$ is quasi-linear, or $O(n/m M(m)}$ otherwise. We also propose variants that compute -- still in-place and with the same kind of complexity bounds -- the over-place remainder $A(X) $\not\equiv$ A(X) \mod B(X)$, the accumulated remainder $R(X) += A(X) \mod B(X)$ and the accumulated modular multiplication $R(X) += A(X)C(X) \mod B(X)$. To achieve this, we develop techniques for Toeplitz matrix operations which output is also part of the input. Fast and in-place accumulating versions are obtained for the latter, and thus for convolutions, and then used for polynomial remaindering. This is realized via further reductions to accumulated polynomial multiplication, for which fast in-place algorithms have recently been developed.
Consider an operator that takes the Fourier transform of a discrete measure supported in $\mathcal{X}\subset[-\frac 12,\frac 12)^d$ and restricts it to a compact $\Omega\subset\mathbb{R}^d$. We provide lower bounds for its smallest singular value when $\Omega$ is either a ball or cube of radius $m$, and under different types of geometric assumptions on $\mathcal{X}$. We first show that if distances between points in $\mathcal{X}$ are lower bounded by a $\delta$ that is allowed to be arbitrarily small, then the smallest singular value is at least $Cm^{d/2} (m\delta)^{\lambda-1}$, where $\lambda$ is the maximum number of elements in $\mathcal{X}$ contained within any ball or cube of an explicitly given radius. This estimate communicates a localization effect of the Fourier transform. While it is sharp, the smallest singular value behaves better than expected for many $\mathcal{X}$, including when we dilate a generic set by parameter $\delta$. We next show that if there is a $\eta$ such that, for each $x\in\mathcal{X}$, the set $\mathcal{X}\setminus\{x\}$ locally consists of at most $r$ hyperplanes whose distances to $x$ are at least $\eta$, then the smallest singular value is at least $C m^{d/2} (m\eta)^r$. For dilations of a generic set by $\delta$, the lower bound becomes $C m^{d/2} (m\delta)^{\lceil (\lambda-1)/d\rceil }$. The appearance of a $1/d$ factor in the exponent indicates that compared to worst case scenarios, the condition number of nonharmonic Fourier transforms is better than expected for typical sets and improve with higher dimensionality.
\v{C}ech Persistence diagrams (PDs) are topological descriptors routinely used to capture the geometry of complex datasets. They are commonly compared using the Wasserstein distances $OT_{p}$; however, the extent to which PDs are stable with respect to these metrics remains poorly understood. We partially close this gap by focusing on the case where datasets are sampled on an $m$-dimensional submanifold of $\mathbb{R}^{d}$. Under this manifold hypothesis, we show that convergence with respect to the $OT_{p}$ metric happens exactly when $p\gt m$. We also provide improvements upon the bottleneck stability theorem in this case and prove new laws of large numbers for the total $\alpha$-persistence of PDs. Finally, we show how these theoretical findings shed new light on the behavior of the feature maps on the space of PDs that are used in ML-oriented applications of Topological Data Analysis.
We consider the statistical linear inverse problem of making inference on an unknown source function in an elliptic partial differential equation from noisy observations of its solution. We employ nonparametric Bayesian procedures based on Gaussian priors, leading to convenient conjugate formulae for posterior inference. We review recent results providing theoretical guarantees on the quality of the resulting posterior-based estimation and uncertainty quantification, and we discuss the application of the theory to the important classes of Gaussian series priors defined on the Dirichlet-Laplacian eigenbasis and Mat\'ern process priors. We provide an implementation of posterior inference for both classes of priors, and investigate its performance in a numerical simulation study.
Expected values weighted by the inverse of a multivariate density or, equivalently, Lebesgue integrals of regression functions with multivariate regressors occur in various areas of applications, including estimating average treatment effects, nonparametric estimators in random coefficient regression models or deconvolution estimators in Berkson errors-in-variables models. The frequently used nearest-neighbor and matching estimators suffer from bias problems in multiple dimensions. By using polynomial least squares fits on each cell of the $K^{\text{th}}$-order Voronoi tessellation for sufficiently large $K$, we develop novel modifications of nearest-neighbor and matching estimators which again converge at the parametric $\sqrt n $-rate under mild smoothness assumptions on the unknown regression function and without any smoothness conditions on the unknown density of the covariates. We stress that in contrast to competing methods for correcting for the bias of matching estimators, our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent smoothing parameters. We complement the upper bounds with appropriate lower bounds derived from information-theoretic arguments, which show that some smoothness of the regression function is indeed required to achieve the parametric rate. Simulations illustrate the practical feasibility of the proposed methods.
The convex dimension of a $k$-uniform hypergraph is the smallest dimension $d$ for which there is an injective mapping of its vertices into $\mathbb{R}^d$ such that the set of $k$-barycenters of all hyperedges is in convex position. We completely determine the convex dimension of complete $k$-uniform hypergraphs, which settles an open question by Halman, Onn and Rothblum, who solved the problem for complete graphs. We also provide lower and upper bounds for the extremal problem of estimating the maximal number of hyperedges of $k$-uniform hypergraphs on $n$ vertices with convex dimension $d$. To prove these results, we restate them in terms of affine projections that preserve the vertices of the hypersimplex. More generally, we provide a full characterization of the projections that preserve its $i$-dimensional skeleton. In particular, we obtain a hypersimplicial generalization of the linear van Kampen-Flores theorem: for each $n$, $k$ and $i$ we determine onto which dimensions can the $(n,k)$-hypersimplex be linearly projected while preserving its $i$-skeleton. Our results have direct interpretations in terms of $k$-sets and $(i,j)$-partitions, and are closely related to the problem of finding large convexly independent subsets in Minkowski sums of $k$ point sets.
We consider so-called $N$-fold integer programs (IPs) of the form $\max\{c^T x : Ax = b, \ell \leq x \leq u, x \in \mathbb Z^{nt}\}, where $A \in \mathbb Z^{(r+sn)\times nt} consists of $n$ arbitrary matrices $A^{(i)} \in \mathbb Z^{r\times t}$ on a horizontal, and $n$ arbitrary matrices $B^{(j)} \in \mathbb Z^{s\times t} on a diagonal line. Several recent works design fixed-parameter algorithms for $N$-fold IPs by taking as parameters the numbers of rows and columns of the $A$- and $B$-matrices, together with the largest absolute value $\Delta$ over their entries. These advances provide fast algorithms for several well-studied combinatorial optimization problems on strings, on graphs, and in machine scheduling. In this work, we extend this research by proposing algorithms that additionally harness a partition structure of submatrices $A^{(i)}$ and $B^{(j)}$, where row indices of non-zero entries do not overlap between any two sets in the partition. Our main result is an algorithm for solving any $N$-fold IP in time $nt log(nt)L^2(S_A)^{O(r+s)}(p_Ap_B\Delta)^{O(rp_Ap_B+sp_Ap_B)}$, where $p_A$ and $p_B$ are the size of the largest set in such a partition of $A^{(i)}$ and $B^{(j)}$, respectively, $S_A$ is the number of parts in the partition of $A = (A^{(1)},..., A^{(n)}), and $L = (log(||u - \ell||_\infty)\cdot (log(max_{x:\ell \leq x \leq u} |c^Tx|))$ is a measure of the input. We show that these new structural parameters are naturally small in high-multiplicity scheduling problems, such as makespan minimization on related and unrelated machines, with and without release times, the Santa Claus objective, and the weighted sum of completion times. In essence, we obtain algorithms that are exponentially faster than previous works by Knop et al. (ESA 2017) and Eisenbrand et al./Kouteck{\'y} et al. (ICALP 2018) in terms of the number of job types.