We prove various approximation theorems with polynomials whose coefficients with respect to the Bernstein basis of a given order are all integers. In the extreme, we draw the coefficients from the set $\{ \pm 1\}$ only. We show that for any Lipschitz function $f:[0,1] \to [-1,1]$ and for any positive integer $n$, there are signs $\sigma_0,\dots,\sigma_n \in \{\pm 1\}$ such that $$\left |f(x) - \sum_{k=0}^n \sigma_k \, \binom{n}{k} x^k (1-x)^{n-k} \right | \leq \frac{C (1+|f|_{\mathrm{Lip}})}{1+\sqrt{nx(1-x)}} ~\mbox{ for all } x \in [0,1].$$ These polynomial approximations are not constrained by saturation of Bernstein polynomials, and we show that higher accuracy is indeed achievable for smooth functions: If $f$ has a Lipschitz $(s{-}1)$st derivative, then accuracy of order $O(n^{-s/2})$ is achievable with $\pm 1$ coefficients provided $\|f \|_\infty < 1$, and accuracy of order $O(n^{-s})$ is achievable with unrestricted integer coefficients. Our approximations are constructive in nature.
There has been a long-standing interest in computing diverse solutions to optimization problems. Motivated by reallocation of governmental institutions in Sweden, in 1995 J. Krarup posed the problem of finding $k$ edge-disjoint Hamiltonian Circuits of minimum total weight, called the peripatetic salesman problem (PSP). Since then researchers have investigated the complexity of finding diverse solutions to spanning trees, paths, vertex covers, matchings, and more. Unlike the PSP that has a constraint on the total weight of the solutions, recent work has involved finding diverse solutions that are all optimal. However, sometimes the space of exact solutions may be too small to achieve sufficient diversity. Motivated by this, we initiate the study of obtaining sufficiently-diverse, yet approximately-optimal solutions to optimization problems. Formally, given an integer $k$, an approximation factor $c$, and an instance $I$ of an optimization problem, we aim to obtain a set of $k$ solutions to $I$ that a) are all $c$ approximately-optimal for $I$ and b) maximize the diversity of the $k$ solutions. Finding such solutions, therefore, requires a better understanding of the global landscape of the optimization function. We show that, given any metric on the space of solutions, and the diversity measure as the sum of pairwise distances between solutions, this problem can be solved by combining ideas from dispersion and multicriteria optimization. We first provide a general reduction to an associated budget-constrained optimization (BCO) problem, where one objective function is to be maximized (minimized) subject to a bound on the second objective function. We then prove that bi-approximations to the BCO can be used to give bi-approximations to the diverse approximately optimal solutions problem with a little overhead.
A short, information-theoretic proof of the Kac--Bernstein theorem, which is stated as follows, is presented: For any independent random variables $X$ and $Y$, if $X+Y$ and $X-Y$ are independent, then $X$ and $Y$ are normally distributed.
Multifidelity approximation is an important technique in scientific computation and simulation. In this paper, we introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates of the parameters of interest. Under a linear model assumption, we formulate a multifidelity approximation as a modified stochastic bandit, and analyze the loss for a class of policies that uniformly explore each model before exploiting. Utilizing the estimated conditional mean-squared error, we propose a consistent algorithm, adaptive Explore-Then-Commit (AETC), and establish a corresponding trajectory-wise optimality result. These results are then extended to the case of vector-valued responses, where we demonstrate that the algorithm is efficient without the need to worry about estimating high-dimensional parameters. The main advantage of our approach is that we require neither hierarchical model structure nor \textit{a priori} knowledge of statistical information (e.g., correlations) about or between models. Instead, the AETC algorithm requires only knowledge of which model is a trusted high-fidelity model, along with (relative) computational cost estimates of querying each model. Numerical experiments are provided at the end to support our theoretical findings.
Given a point set $P\subset \mathbb{R}^d$, the kernel density estimate of $P$ is defined as \[ \overline{\mathcal{G}}_P(x) = \frac{1}{\left|P\right|}\sum_{p\in P}e^{-\left\lVert x-p \right\rVert^2} \] for any $x\in\mathbb{R}^d$. We study how to construct a small subset $Q$ of $P$ such that the kernel density estimate of $P$ is approximated by the kernel density estimate of $Q$. This subset $Q$ is called a coreset. The main technique in this work is constructing a $\pm 1$ coloring on the point set $P$ by discrepancy theory and we leverage Banaszczyk's Theorem. When $d>1$ is a constant, our construction gives a coreset of size $O\left(\frac{1}{\varepsilon}\right)$ as opposed to the best-known result of $O\left(\frac{1}{\varepsilon}\sqrt{\log\frac{1}{\varepsilon}}\right)$. It is the first result to give a breakthrough on the barrier of $\sqrt{\log}$ factor even when $d=2$.
This paper studies the approximation capacity of ReLU neural networks with norm constraint on the weights. We prove upper and lower bounds on the approximation error of these networks for smooth function classes. The lower bound is derived through the Rademacher complexity of neural networks, which may be of independent interest. We apply these approximation bounds to analyze the convergence of regression using norm constrained neural networks and distribution estimation by GANs. In particular, we obtain convergence rates for over-parameterized neural networks. It is also shown that GANs can achieve optimal rate of learning probability distributions, when the discriminator is a properly chosen norm constrained neural network.
We study the complexity of approximating the multimarginal optimal transport (MOT) distance, a generalization of the classical optimal transport distance, considered here between $m$ discrete probability distributions supported each on $n$ support points. First, we show that the standard linear programming (LP) representation of the MOT problem is not a minimum-cost flow problem when $m \geq 3$. This negative result implies that some combinatorial algorithms, e.g., network simplex method, are not suitable for approximating the MOT problem, while the worst-case complexity bound for the deterministic interior-point algorithm remains a quantity of $\tilde{O}(n^{3m})$. We then propose two simple and \textit{deterministic} algorithms for approximating the MOT problem. The first algorithm, which we refer to as \textit{multimarginal Sinkhorn} algorithm, is a provably efficient multimarginal generalization of the Sinkhorn algorithm. We show that it achieves a complexity bound of $\tilde{O}(m^3n^m\varepsilon^{-2})$ for a tolerance $\varepsilon \in (0, 1)$. This provides a first \textit{near-linear time} complexity bound guarantee for approximating the MOT problem and matches the best known complexity bound for the Sinkhorn algorithm in the classical OT setting when $m = 2$. The second algorithm, which we refer to as \textit{accelerated multimarginal Sinkhorn} algorithm, achieves the acceleration by incorporating an estimate sequence and the complexity bound is $\tilde{O}(m^3n^{m+1/3}\varepsilon^{-4/3})$. This bound is better than that of the first algorithm in terms of $1/\varepsilon$, and accelerated alternating minimization algorithm~\citep{Tupitsa-2020-Multimarginal} in terms of $n$. Finally, we compare our new algorithms with the commercial LP solver \textsc{Gurobi}. Preliminary results on synthetic data and real images demonstrate the effectiveness and efficiency of our algorithms.
The paper primarily addressed the problem of linear representation, invertibility, and construction of the compositional inverse for non-linear maps over finite fields. Though there is vast literature available for the invertibility of polynomials and construction of inverses of permutation polynomials over $\mathbb{F}$, this paper explores a completely new approach using the dual map defined through the Koopman operator. This helps define the linear representation of the non-linear map,, which helps translate the map's non-linear compositions to a linear algebraic framework. The linear representation, defined over the space of functions, naturally defines a notion of linear complexity for non-linear maps, which can be viewed as a measure of computational complexity associated with such maps. The framework of linear representation is then extended to parameter dependent maps over $\mathbb{F}$, and the conditions on parametric invertibility of such maps are established, leading to a construction of a parametric inverse map (under composition). It is shown that the framework can be extended to multivariate maps over $\mathbb{F}^n$, and the conditions are established for invertibility of such maps, and the inverse is constructed using the linear representation. Further, the problem of linear representation of a group generated by a finite set of permutation maps over $\mathbb{F}^n$ under composition is also solved by extending the theory of linear representation of a single map.
The scope of this paper is the analysis and approximation of an optimal control problem related to the Allen-Cahn equation. A tracking functional is minimized subject to the Allen-Cahn equation using distributed controls that satisfy point-wise control constraints. First and second order necessary and sufficient conditions are proved. The lowest order discontinuous Galerkin - in time - scheme is considered for the approximation of the control to state and adjoint state mappings. Under a suitable restriction on maximum size of the temporal and spatial discretization parameters $k$, $h$ respectively in terms of the parameter $\epsilon$ that describes the thickness of the interface layer, a-priori estimates are proved with constants depending polynomially upon $1/ \epsilon$. Unlike to previous works for the uncontrolled Allen-Cahn problem our approach does not rely on a construction of an approximation of the spectral estimate, and as a consequence our estimates are valid under low regularity assumptions imposed by the optimal control setting. These estimates are also valid in cases where the solution and its discrete approximation do not satisfy uniform space-time bounds independent of $\epsilon$. These estimates and a suitable localization technique, via the second order condition (see \cite{Arada-Casas-Troltzsch_2002,Casas-Mateos-Troltzsch_2005,Casas-Raymond_2006,Casas-Mateos-Raymond_2007}), allows to prove error estimates for the difference between local optimal controls and their discrete approximation as well as between the associated state and adjoint state variables and their discrete approximations
We introduce Monte-Carlo Attention (MCA), a randomized approximation method for reducing the computational cost of self-attention mechanisms in Transformer architectures. MCA exploits the fact that the importance of each token in an input sequence varies with respect to their attention scores; thus, some degree of error can be tolerable when encoding tokens with low attention. Using approximate matrix multiplication, MCA applies different error bounds to encode input tokens such that those with low attention scores are computed with relaxed precision, whereas errors of salient elements are minimized. MCA can operate in parallel with other attention optimization schemes and does not require model modification. We study the theoretical error bounds and demonstrate that MCA reduces attention complexity (in FLOPS) for various Transformer models by up to 11$\times$ in GLUE benchmarks without compromising model accuracy.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.