In this work, we study a global quadrature scheme for analytic functions on compact intervals based on function values on arbitrary grids of quadrature nodes. In practice it is not always possible to sample functions at optimal nodes with a low-order Lebesgue constant. Therefore, we go beyond classical interpolatory quadrature by lowering the degree of the polynomial approximant and by applying auxiliary mapping functions that map the original quadrature nodes to more suitable fake nodes. More precisely, we investigate the combination of the Kosloff Tal-Ezer map and Least-squares approximation (KTL) for numerical quadrature: a careful selection of the mapping parameter $\alpha$ ensures a high accuracy of the approximation and, at the same time, an asymptotically optimal ratio between the degree of the polynomial and the spacing of the grid. We will investigate the properties of this KTL quadrature and focus on the symmetry of the quadrature weights, the limit relations for $\alpha$ converging to $0^{+}$ and $1^{-}$, as well as the computation of the quadrature weights in the standard monomial and in the Chebyshev bases with help of a cosine transform. Numerical tests on equispaced nodes show that some static choices of the map's parameter improve the results of the composite trapezoidal rule, while a dynamic approach achieves larger stability and faster convergence, even when the sampling nodes are perturbed. From a computational point of view the proposed method is practical and can be implemented in a simple and efficient way.
We develop an a posteriori error analysis for a novel quantity of interest (QoI) evolutionary partial differential equations (PDEs). Specifically, the QoI is the first time at which a functional of the solution to the PDE achieves a threshold value signifying a particular event, and differs from classical QoIs which are modeled as bounded linear functionals. We use Taylor's theorem and adjoint based analysis to derive computable and accurate error estimates for linear parabolic and hyperbolic PDEs. Specifically, the heat equation and linearized shallow water equations (SWE) are used for the parabolic and hyperbolic cases, respectively. Numerical examples illustrate the accuracy of the error estimates.
We introduce a simple diagnostic test for assessing the overall or partial goodness of fit of linear regression. We propose to evaluate the sensitivity of the regression coefficient with respect to changes of the marginal distribution of covariates by comparing the so-called higher-order least squares with the usual least squares estimates. In spite of its simplicity, this strategy is extremely general and powerful, including high-dimensional settings. Specifically, we show that it allows to distinguish between confounded and unconfounded predictor variables as well as determining ancestor variables in linear structural equation models assuming some non-Gaussianity. Thus, we provide a test for partial goodness of fit.
Thread pooling is a common programming idiom in which a fixed set of worker threads are maintained to execute tasks concurrently. The workers repeatedly pick tasks and execute them to completion. Each task is sequential, with possibly recursive code, and tasks communicate over shared memory. Executing a task can lead to more new tasks being spawned. We consider the safety verification problem for thread-pooled programs. We parameterize the problem with two parameters: the size of the thread pool as well as the number of context switches for each task. The size of the thread pool determines the number of workers running concurrently. The number of context switches determines how many times a worker can be swapped out while executing a single task - like many verification problems for multithreaded recursive programs, the context bounding is important for decidability. We show that the safety verification problem for thread-pooled, context-bounded, Boolean programs is EXPSPACE-complete, even if the size of the thread pool and the context bound are given in binary. Our main result, the EXPSPACE upper bound, is derived using a sequence of new succinct encoding techniques of independent language-theoretic interest. In particular, we show a polynomial-time construction of downward closures of languages accepted by succinct pushdown automata as doubly succinct nondeterministic finite automata. While there are explicit doubly exponential lower bounds on the size of nondeterministic finite automata accepting the downward closure, our result shows these automata can be compressed. We show that thread pooling significantly reduces computational power: in contrast, if only the context bound is provided in binary, but there is no thread pooling, the safety verification problem becomes 3EXPSPACE-complete.
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly popular in financial applications, owing to certain desirable properties that it enjoys. We consider the problem of estimating UBSR in a recursive setting, where samples from the underlying loss distribution are available one-at-a-time. We cast the UBSR estimation problem as a root finding problem, and propose stochastic approximation-based estimations schemes. We derive non-asymptotic bounds on the estimation error in the number of samples. We also consider the problem of UBSR optimization within a parameterized class of random variables. We propose a stochastic gradient descent based algorithm for UBSR optimization, and derive non-asymptotic bounds on its convergence.
The best polynomial approximation and Chebyshev approximation are both important in numerical analysis. In tradition, the best approximation is regarded as more better than the Chebyshev approximation, because it is usually considered in the uniform norm. However, it not always superior to the latter noticed by Trefethen \cite{Trefethen11sixmyths,Trefethen2020} for the algebraic singularity function. Recently Wang \cite{Wang2021best} have proved it in theory. In this paper, we find that for the functions with logarithmic regularities, the pointwise errors of Chebyshev approximation are smaller than the ones of the best approximations except only in the very narrow boundaries at the same degree. The pointwise error for Chebyshev series, truncated at the degree $n$ is $O(n^{-\kappa})$ ($\kappa = \min\{2\gamma+1, 2\delta + 1\}$), but is worse by one power of $n$ in narrow boundary layer near the weak singular endpoints. Theorems are given to explain this effect.
In this paper we study anisotropic consensus-based optimization (CBO), a multi-agent metaheuristic derivative-free optimization method capable of globally minimizing nonconvex and nonsmooth functions in high dimensions. CBO is based on stochastic swarm intelligence, and inspired by consensus dynamics and opinion formation. Compared to other metaheuristic algorithms like particle swarm optimization, CBO is of a simpler nature and therefore more amenable to theoretical analysis. By adapting a recently established proof technique, we show that anisotropic CBO converges globally with a dimension-independent rate for a rich class of objective functions under minimal assumptions on the initialization of the method. Moreover, the proof technique reveals that CBO performs a convexification of the optimization problem as the number of agents goes to infinity, thus providing an insight into the internal CBO mechanisms responsible for the success of the method. To motivate anisotropic CBO from a practical perspective, we further test the method on a complicated high-dimensional benchmark problem, which is well understood in the machine learning literature.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.
We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU nonlinearity is the expected transformation of a stochastic regularizer which randomly applies the identity or zero map to a neuron's input. The GELU nonlinearity weights inputs by their magnitude, rather than gates inputs by their sign as in ReLUs. We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.
We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.