We study the uniform approximation of echo state networks with randomly generated internal weights. These models, in which only the readout weights are optimized during training, have made empirical success in learning dynamical systems. Recent results showed that echo state networks with ReLU activation are universal. In this paper, we give an alternative construction and prove that the universality holds for general activation functions. Specifically, our main result shows that, under certain condition on the activation function, there exists a sampling procedure for the internal weights so that the echo state network can approximate any continuous casual time-invariant operators with high probability. In particular, for ReLU activation, we give explicit construction for these sampling procedures. We also quantify the approximation error of the constructed ReLU echo state networks for sufficiently regular operators.
We study the sharp interface limit of the stochastic Cahn-Hilliard equation with cubic double-well potential and additive space-time white noise $\epsilon^{\sigma}\dot{W}$ where $\epsilon>0$ is an interfacial width parameter. We prove that, for sufficiently large scaling constant $\sigma >0$, the stochastic Cahn-Hilliard equation converges to the deterministic Mullins-Sekerka/Hele-Shaw problem for $\epsilon\rightarrow 0$. The convergence is shown in suitable fractional Sobolev norms as well as in the $L^p$-norm for $p\in (2, 4]$ in spatial dimension $d=2,3$. This generalizes the existing result for the space-time white noise to dimension $d=3$ and improves the existing results for smooth noise, which were so far limited to $p\in \left(2, \frac{2d+8}{d+2}\right]$ in spatial dimension $d=2,3$. As a byproduct of the analysis of the stochastic problem with space-time white noise, we identify minimal regularity requirements on the noise which allow convergence to the sharp interface limit in the $\mathbb{H}^1$-norm and also provide improved convergence estimates for the sharp interface limit of the deterministic problem.
The Sinkhorn algorithm is a numerical method for the solution of optimal transport problems. Here, I give a brief survey of this algorithm, with a strong emphasis on its geometric origin: it is natural to view it as a discretization, by standard methods, of a non-linear integral equation. In the appendix, I also provide a short summary of an early result of Beurling on product measures, directly related to the Sinkhorn algorithm.
Most existing neural network-based approaches for solving stochastic optimal control problems using the associated backward dynamic programming principle rely on the ability to simulate the underlying state variables. However, in some problems, this simulation is infeasible, leading to the discretization of state variable space and the need to train one neural network for each data point. This approach becomes computationally inefficient when dealing with large state variable spaces. In this paper, we consider a class of this type of stochastic optimal control problems and introduce an effective solution employing multitask neural networks. To train our multitask neural network, we introduce a novel scheme that dynamically balances the learning across tasks. Through numerical experiments on real-world derivatives pricing problems, we prove that our method outperforms state-of-the-art approaches.
Physics-informed neural networks (PINNs), rooted in deep learning, have emerged as a promising approach for solving partial differential equations (PDEs). By embedding the physical information described by PDEs into feedforward neural networks, PINNs are trained as surrogate models to approximate solutions without the need for label data. Nevertheless, even though PINNs have shown remarkable performance, they can face difficulties, especially when dealing with equations featuring rapidly changing solutions. These difficulties encompass slow convergence, susceptibility to becoming trapped in local minima, and reduced solution accuracy. To address these issues, we propose a binary structured physics-informed neural network (BsPINN) framework, which employs binary structured neural network (BsNN) as the neural network component. By leveraging a binary structure that reduces inter-neuron connections compared to fully connected neural networks, BsPINNs excel in capturing the local features of solutions more effectively and efficiently. These features are particularly crucial for learning the rapidly changing in the nature of solutions. In a series of numerical experiments solving Burgers equation, Euler equation, Helmholtz equation, and high-dimension Poisson equation, BsPINNs exhibit superior convergence speed and heightened accuracy compared to PINNs. From these experiments, we discover that BsPINNs resolve the issues caused by increased hidden layers in PINNs resulting in over-smoothing, and prevent the decline in accuracy due to non-smoothness of PDEs solutions.
We develop a numerical method for the Westervelt equation, an important equation in nonlinear acoustics, in the form where the attenuation is represented by a class of non-local in time operators. A semi-discretisation in time based on the trapezoidal rule and A-stable convolution quadrature is stated and analysed. Existence and regularity analysis of the continuous equations informs the stability and error analysis of the semi-discrete system. The error analysis includes the consideration of the singularity at $t = 0$ which is addressed by the use of a correction in the numerical scheme. Extensive numerical experiments confirm the theory.
We study the complexity (that is, the weight of the multiplication table) of the elliptic normal bases introduced by Couveignes and Lercier. We give an upper bound on the complexity of these elliptic normal bases, and we analyze the weight of some special vectors related to the multiplication table of those bases. This analysis leads us to some perspectives on the search for low complexity normal bases from elliptic periods.
The need to Fourier transform data sets with irregular sampling is shared by various domains of science. This is the case for example in astronomy or sismology. Iterative methods have been developed that allow to reach approximate solutions. Here an exact solution to the problem for band-limited periodic signals is presented. The exact spectrum can be deduced from the spectrum of the non-equispaced data through the inversion of a Toeplitz matrix. The result applies to data of any dimension. This method also provides an excellent approximation for non-periodic band-limit signals. The method allows to reach very high dynamic ranges ($10^{13}$ with double-float precision) which depend on the regularity of the samples.
Comparisons of frequency distributions often invoke the concept of shift to describe directional changes in properties such as the mean. In the present study, we sought to define shift as a property in and of itself. Specifically, we define distributional shift (DS) as the concentration of frequencies away from the discrete class having the greatest value (e.g., the right-most bin of a histogram). We derive a measure of DS using the normalized sum of exponentiated cumulative frequencies. We then define relative distributional shift (RDS) as the difference in DS between two distributions, revealing the magnitude and direction by which one distribution is concentrated to lesser or greater discrete classes relative to another. We find that RDS is highly related to popular measures that, while based on the comparison of frequency distributions, do not explicitly consider shift. While RDS provides a useful complement to other comparative measures, DS allows shift to be quantified as a property of individual distributions, similar in concept to a statistical moment.
Neural network with quadratic decision functions have been introduced as alternatives to standard neural networks with affine linear one. They are advantageous when the objects to be identified are of compact basic geometries like circles, ellipsis etc. In this paper we investigate the use of such ansatz functions for classification. In particular we test and compare the algorithm on the MNIST dataset for classification of handwritten digits and for classification of subspecies. We also show, that the implementation can be based on the neural network structure in the software Tensorflow and Keras, respectively.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.