We quantify the minimax rate for a nonparametric regression model over a convex function class $\mathcal{F}$ with bounded diameter. We obtain a minimax rate of ${\varepsilon^{\ast}}^2\wedge\mathrm{diam}(\mathcal{F})^2$ where \[\varepsilon^{\ast} =\sup\{\varepsilon>0:n\varepsilon^2 \le \log M_{\mathcal{F}}^{\operatorname{loc}}(\varepsilon,c)\},\] where $M_{\mathcal{F}}^{\operatorname{loc}}(\cdot, c)$ is the local metric entropy of $\mathcal{F}$ and our loss function is the squared population $L_2$ distance over our input space $\mathcal{X}$. In contrast to classical works on the topic [cf. Yang and Barron, 1999], our results do not require functions in $\mathcal{F}$ to be uniformly bounded in sup-norm. In addition, we prove that our estimator is adaptive to the true point, and to the best of our knowledge this is the first such estimator in this general setting. This work builds on the Gaussian sequence framework of Neykov [2022] using a similar algorithmic scheme to achieve the minimax rate. Our algorithmic rate also applies with sub-Gaussian noise. We illustrate the utility of this theory with examples including multivariate monotone functions, linear functionals over ellipsoids, and Lipschitz classes.
In this paper, we derive the improved uniform error bounds for the long-time dynamics of the $d$-dimensional $(d=2,3)$ nonlinear space fractional sine-Gordon equation (NSFSGE). The nonlinearity strength of the NSFSGE is characterized by $\varepsilon^2$ where $0<\varepsilon \le 1$ is a dimensionless parameter. The second-order time-splitting method is applied to the temporal discretization and the Fourier pseudo-spectral method is used for the spatial discretization. To obtain the explicit relation between the numerical errors and the parameter $\varepsilon$, we introduce the regularity compensation oscillation technique to the convergence analysis of fractional models. Then we establish the improved uniform error bounds $O\left(\varepsilon^2 \tau^2\right)$ for the semi-discretization scheme and $O\left(h^m+\varepsilon^2 \tau^2\right)$ for the full-discretization scheme up to the long time at $O(1/\varepsilon^2)$. Further, we extend the time-splitting Fourier pseudo-spectral method to the complex NSFSGE as well as the oscillatory complex NSFSGE, and the improved uniform error bounds for them are also given. Finally, extensive numerical examples in two-dimension or three-dimension are provided to support the theoretical analysis. The differences in dynamic behaviors between the fractional sine-Gordon equation and classical sine-Gordon equation are also discussed.
We describe a `discretize-then-relax' strategy to globally minimize integral functionals over functions $u$ in a Sobolev space subject to Dirichlet boundary conditions. The strategy applies whenever the integral functional depends polynomially on $u$ and its derivatives, even if it is nonconvex. The `discretize' step uses a bounded finite element scheme to approximate the integral minimization problem with a convergent hierarchy of polynomial optimization problems over a compact feasible set, indexed by the decreasing size $h$ of the finite element mesh. The `relax' step employs sparse moment-sum-of-squares relaxations to approximate each polynomial optimization problem with a hierarchy of convex semidefinite programs, indexed by an increasing relaxation order $\omega$. We prove that, as $\omega\to\infty$ and $h\to 0$, solutions of such semidefinite programs provide approximate minimizers that converge in a suitable sense (including in certain $L^p$ norms) to the global minimizer of the original integral functional if it is unique. We also report computational experiments showing that our numerical strategy works well even when technical conditions required by our theoretical analysis are not satisfied.
We analyze the Schr\"odingerisation method for quantum simulation of a general class of non-unitary dynamics with inhomogeneous source terms. The Schr\"odingerisation technique, introduced in \cite{JLY22a,JLY23}, transforms any linear ordinary and partial differential equations with non-unitary dynamics into a system under unitary dynamics via a warped phase transition that maps the equations into a higher dimension, making them suitable for quantum simulation. This technique can also be applied to these equations with inhomogeneous terms modeling source or forcing terms or boundary and interface conditions, and discrete dynamical systems such as iterative methods in numerical linear algebra, through extra equations in the system. Difficulty airses with the presense of inhomogeneous terms since it can change the stability of the original system. In this paper, we systematically study--both theoretically and numerically--the important issue of recovering the original variables from the Schr\"odingerized equations, even when the evolution operator contains unstable modes. We show that even with unstable modes, one can still construct a stable scheme, yet to recover the original variable one needs to use suitable data in the extended space. We analyze and compare both the discrete and continuous Fourier transforms used in the extended dimension, and derive corresponding error estimates, which allows one to use the more appropriate transform for specific equations. We also provide a smoother initialization for the Schrod\"odingerized system to gain higher order accuracy in the extended space. We homogenize the inhomogeneous terms with a stretch transformation, making it easier to recover the original variable. Our recovering technique also provides a simple and generic framework to solve general ill-posed problems in a computationally stable way.
We explore the theoretical possibility of learning $d$-dimensional targets with $W$-parameter models by gradient flow (GF) when $W<d$. Our main result shows that if the targets are described by a particular $d$-dimensional probability distribution, then there exist models with as few as two parameters that can learn the targets with arbitrarily high success probability. On the other hand, we show that for $W<d$ there is necessarily a large subset of GF-non-learnable targets. In particular, the set of learnable targets is not dense in $\mathbb R^d$, and any subset of $\mathbb R^d$ homeomorphic to the $W$-dimensional sphere contains non-learnable targets. Finally, we observe that the model in our main theorem on almost guaranteed two-parameter learning is constructed using a hierarchical procedure and as a result is not expressible by a single elementary function. We show that this limitation is essential in the sense that such learnability can be ruled out for a large class of elementary functions.
We study the optimal rate of convergence in periodic homogenization of the viscous Hamilton-Jacobi equation $u^\varepsilon_t + H(\frac{x}{\varepsilon},Du^\varepsilon) = \varepsilon \Delta u^\varepsilon$ in $\mathbb R^n\times (0,\infty)$ subject to a given initial datum. We prove that $\|u^\varepsilon-u\|_{L^\infty(\mathbb R^n \times [0,T])} \leq C(1+T) \sqrt{\varepsilon}$ for any given $T>0$, where $u$ is the viscosity solution of the effective problem. Moreover, we show that the $O(\sqrt{\varepsilon})$ rate is optimal for a natural class of $H$ and a Lipschitz continuous initial datum, both theoretically and through numerical experiments. It remains an interesting question to investigate whether the convergence rate can be improved when $H$ is uniformly convex. Finally, we propose a numerical scheme for the approximation of the effective Hamiltonian based on a finite element approximation of approximate corrector problems.
We consider the additive version of the matrix denoising problem, where a random symmetric matrix $S$ of size $n$ has to be inferred from the observation of $Y=S+Z$, with $Z$ an independent random matrix modeling a noise. For prior distributions of $S$ and $Z$ that are invariant under conjugation by orthogonal matrices we determine, using results from first and second order free probability theory, the Bayes-optimal (in terms of the mean square error) polynomial estimators of degree at most $D$, asymptotically in $n$, and show that as $D$ increases they converge towards the estimator introduced by Bun, Allez, Bouchaud and Potters in [IEEE Transactions on Information Theory {\bf 62}, 7475 (2016)]. We conjecture that this optimality holds beyond strictly orthogonally invariant priors, and provide partial evidences of this universality phenomenon when $S$ is an arbitrary Wishart matrix and $Z$ is drawn from the Gaussian Orthogonal Ensemble, a case motivated by the related extensive rank matrix factorization problem.
Let $1<t<n$ be integers, where $t$ is a divisor of $n$. An R-$q^t$-partially scattered polynomial is a $\mathbb F_q$-linearized polynomial $f$ in $\mathbb F_{q^n}[X]$ that satisfies the condition that for all $x,y\in\mathbb F_{q^n}^*$ such that $x/y\in\mathbb F_{q^t}$, if $f(x)/x=f(y)/y$, then $x/y\in\mathbb F_q$; $f$ is called scattered if this implication holds for all $x,y\in\mathbb F_{q^n}^*$. Two polynomials in $\mathbb F_{q^n}[X]$ are said to be equivalent if their graphs are in the same orbit under the action of the group $\Gamma L(2,q^n)$. For $n>8$ only three families of scattered polynomials in $\mathbb F_{q^n}[X]$ are known: $(i)$~monomials of pseudoregulus type, $(ii)$~binomials of Lunardon-Polverino type, and $(iii)$~a family of quadrinomials defined in [1,10] and extended in [8,13]. In this paper we prove that the polynomial $\varphi_{m,q^J}=X^{q^{J(t-1)}}+X^{q^{J(2t-1)}}+m(X^{q^J}-X^{q^{J(t+1)}})\in\mathbb F_{q^{2t}}[X]$, $q$ odd, $t\ge3$ is R-$q^t$-partially scattered for every value of $m\in\mathbb F_{q^t}^*$ and $J$ coprime with $2t$. Moreover, for every $t>4$ and $q>5$ there exist values of $m$ for which $\varphi_{m,q}$ is scattered and new with respect to the polynomials mentioned in $(i)$, $(ii)$ and $(iii)$ above. The related linear sets are of $\Gamma L$-class at least two.
A constraint satisfaction problem (CSP), $\textsf{Max-CSP}(\mathcal{F})$, is specified by a finite set of constraints $\mathcal{F} \subseteq \{[q]^k \to \{0,1\}\}$ for positive integers $q$ and $k$. An instance of the problem on $n$ variables is given by $m$ applications of constraints from $\mathcal{F}$ to subsequences of the $n$ variables, and the goal is to find an assignment to the variables that satisfies the maximum number of constraints. In the $(\gamma,\beta)$-approximation version of the problem for parameters $0 \leq \beta < \gamma \leq 1$, the goal is to distinguish instances where at least $\gamma$ fraction of the constraints can be satisfied from instances where at most $\beta$ fraction of the constraints can be satisfied. In this work we consider the approximability of this problem in the context of sketching algorithms and give a dichotomy result. Specifically, for every family $\mathcal{F}$ and every $\beta < \gamma$, we show that either a linear sketching algorithm solves the problem in polylogarithmic space, or the problem is not solvable by any sketching algorithm in $o(\sqrt{n})$ space. In particular, we give non-trivial approximation algorithms using polylogarithmic space for infinitely many constraint satisfaction problems. We also extend previously known lower bounds for general streaming algorithms to a wide variety of problems, and in particular the case of $q=k=2$, where we get a dichotomy, and the case when the satisfying assignments of the constraints of $\mathcal{F}$ support a distribution on $[q]^k$ with uniform marginals. Prior to this work, other than sporadic examples, the only systematic classes of CSPs that were analyzed considered the setting of Boolean variables $q=2$, binary constraints $k=2$, singleton families $|\mathcal{F}|=1$ and only considered the setting where constraints are placed on literals rather than variables.
A uniform $k$-{\sc dag} generalizes the uniform random recursive tree by picking $k$ parents uniformly at random from the existing nodes. It starts with $k$ ''roots''. Each of the $k$ roots is assigned a bit. These bits are propagated by a noisy channel. The parents' bits are flipped with probability $p$, and a majority vote is taken. When all nodes have received their bits, the $k$-{\sc dag} is shown without identifying the roots. The goal is to estimate the majority bit among the roots. We identify the threshold for $p$ as a function of $k$ below which the majority rule among all nodes yields an error $c+o(1)$ with $c<1/2$. Above the threshold the majority rule errs with probability $1/2+o(1)$.
In a regression model with multiple response variables and multiple explanatory variables, if the difference of the mean vectors of the response variables for different values of explanatory variables is always in the direction of the first principal eigenvector of the covariance matrix of the response variables, then it is called a multivariate allometric regression model. This paper studies the estimation of the first principal eigenvector in the multivariate allometric regression model. A class of estimators that includes conventional estimators is proposed based on weighted sum-of-squares matrices of regression sum-of-squares matrix and residual sum-of-squares matrix. We establish an upper bound of the mean squared error of the estimators contained in this class, and the weight value minimizing the upper bound is derived. Sufficient conditions for the consistency of the estimators are discussed in weak identifiability regimes under which the difference of the largest and second largest eigenvalues of the covariance matrix decays asymptotically and in ``large $p$, large $n$" regimes, where $p$ is the number of response variables and $n$ is the sample size. Several numerical results are also presented.