We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks. In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed to obtain a trained network. The sampling is based on the idea of random feature models. However, instead of a data-agnostic distribution, e.g., a normal distribution, we use both the input and the output training data to sample shallow and deep networks. We prove that sampled networks are universal approximators. For Barron functions, we show that the $L^2$-approximation error of sampled shallow networks decreases with the square root of the number of neurons. Our sampling scheme is invariant to rigid body transformations and scaling of the input data, which implies many popular pre-processing techniques are not required. In numerical experiments, we demonstrate that sampled networks achieve accuracy comparable to iteratively trained ones, but can be constructed orders of magnitude faster. Our test cases involve a classification benchmark from OpenML, sampling of neural operators to represent maps in function spaces, and transfer learning using well-known architectures.
Dynamical models described by ordinary differential equations (ODEs) are a fundamental tool in the sciences and engineering. Exact reduction aims at producing a lower-dimensional model in which each macro-variable can be directly related to the original variables, and it is thus a natural step towards the model's formal analysis and mechanistic understanding. We present an algorithm which, given a polynomial ODE model, computes a longest possible chain of exact linear reductions of the model such that each reduction refines the previous one, thus giving a user control of the level of detail preserved by the reduction. This significantly generalizes over the existing approaches which compute only the reduction of the lowest dimension subject to an approach-specific constraint. The algorithm reduces finding exact linear reductions to a question about representations of finite-dimensional algebras. We provide an implementation of the algorithm, demonstrate its performance on a set of benchmarks, and illustrate the applicability via case studies. Our implementation is freely available at //github.com/x3042/ExactODEReduction.jl
By conceiving physical systems as 3D many-body point clouds, geometric graph neural networks (GNNs), such as SE(3)/E(3) equivalent GNNs, have showcased promising performance. In particular, their effective message-passing mechanics make them adept at modeling molecules and crystalline materials. However, current geometric GNNs only offer a mean-field approximation of the many-body system, encapsulated within two-body message passing, thus falling short in capturing intricate relationships within these geometric graphs. To address this limitation, tensor networks, widely employed by computational physics to handle manybody systems using high-order tensors, have been introduced. Nevertheless, integrating these tensorized networks into the message-passing framework of GNNs faces scalability and symmetry conservation (e.g., permutation and rotation) challenges. In response, we introduce an innovative equivariant Matrix Product State (MPS)-based message-passing strategy, through achieving an efficient implementation of the tensor contraction operation. Our method effectively models complex many-body relationships, suppressing mean-field approximations, and captures symmetries within geometric graphs. Importantly, it seamlessly replaces the standard message-passing and layer-aggregation modules intrinsic to geometric GNNs. We empirically validate the superior accuracy of our approach on benchmark tasks, including predicting classical Newton systems and quantum tensor Hamiltonian matrices. To our knowledge, our approach represents the inaugural utilization of parameterized geometric tensor networks.
Residual connections have been proposed as an architecture-based inductive bias to mitigate the problem of exploding and vanishing gradients and increased task performance in both feed-forward and recurrent networks (RNNs) when trained with the backpropagation algorithm. Yet, little is known about how residual connections in RNNs influence their dynamics and fading memory properties. Here, we introduce weakly coupled residual recurrent networks (WCRNNs) in which residual connections result in well-defined Lyapunov exponents and allow for studying properties of fading memory. We investigate how the residual connections of WCRNNs influence their performance, network dynamics, and memory properties on a set of benchmark tasks. We show that several distinct forms of residual connections yield effective inductive biases that result in increased network expressivity. In particular, those are residual connections that (i) result in network dynamics at the proximity of the edge of chaos, (ii) allow networks to capitalize on characteristic spectral properties of the data, and (iii) result in heterogeneous memory properties. In addition, we demonstrate how our results can be extended to non-linear residuals and introduce a weakly coupled residual initialization scheme that can be used for Elman RNNs.
We consider a time-fractional subdiffusion equation with a Caputo derivative in time, a general second-order elliptic spatial operator, and a right-hand side that is non-smooth in time. The presence of the latter may lead to locking problems in our time stepping procedure recently introduced in [2,4]. Hence, a generalized version of the residual barrier is proposed to rectify the issue. We also consider related alternatives to this generalized algorithm, and, furthermore, show that this new residual barrier may be useful in the case of a negative reaction coefficient.
Multiscale dynamical systems, modeled by high-dimensional stiff ordinary differential equations (ODEs) with wide-ranging characteristic timescales, arise across diverse fields of science and engineering, but their numerical solvers often encounter severe efficiency bottlenecks. This paper introduces a novel DeePODE method, which consists of a global multiscale sampling method and a fitting by deep neural networks to handle multiscale systems. DeePODE's primary contribution is to address the multiscale challenge of efficiently uncovering representative training sets by combining the Monte Carlo method and the ODE system's intrinsic evolution without suffering from the ``curse of dimensionality''. The DeePODE method is validated in multiscale systems from diverse areas, including a predator-prey model, a power system oscillation, a battery electrolyte auto-ignition, and turbulent flames. Our methods exhibit strong generalization capabilities to unseen conditions, highlighting the power of deep learning in modeling intricate multiscale dynamical processes across science and engineering domains.
Sensory perception originates from the responses of sensory neurons, which react to a collection of sensory signals linked to various physical attributes of a singular perceptual object. Unraveling how the brain extracts perceptual information from these neuronal responses is a pivotal challenge in both computational neuroscience and machine learning. Here we introduce a statistical mechanical theory, where perceptual information is first encoded in the correlated variability of sensory neurons and then reformatted into the firing rates of downstream neurons. Applying this theory, we illustrate the encoding of motion direction using neural covariance and demonstrate high-fidelity direction recovery by spiking neural networks. Networks trained under this theory also show enhanced performance in classifying natural images, achieving higher accuracy and faster inference speed. Our results challenge the traditional view of neural covariance as a secondary factor in neural coding, highlighting its potential influence on brain function.
The structure of a network has a major effect on dynamical processes on that network. Many studies of the interplay between network structure and dynamics have focused on models of phenomena such as disease spread, opinion formation and changes, coupled oscillators, and random walks. In parallel to these developments, there have been many studies of wave propagation and other spatially extended processes on networks. These latter studies consider metric networks, in which the edges are associated with real intervals. Metric networks give a mathematical framework to describe dynamical processes that include both temporal and spatial evolution of some quantity of interest -- such as the concentration of a diffusing substance or the amplitude of a wave -- by using edge-specific intervals that quantify distance information between nodes. Dynamical processes on metric networks often take the form of partial differential equations (PDEs). In this paper, we present a collection of techniques and paradigmatic linear PDEs that are useful to investigate the interplay between structure and dynamics in metric networks. We start by considering a time-independent Schr\"odinger equation. We then use both finite-difference and spectral approaches to study the Poisson, heat, and wave equations as paradigmatic examples of elliptic, parabolic, and hyperbolic PDE problems on metric networks. Our spectral approach is able to account for degenerate eigenmodes. In our numerical experiments, we consider metric networks with up to about $10^4$ nodes and about $10^4$ edges. A key contribution of our paper is to increase the accessibility of studying PDEs on metric networks. Software that implements our numerical approaches is available at //gitlab.com/ComputationalScience/metric-networks.
To obtain strong convergence rates of numerical schemes, an overwhelming majority of existing works impose a global monotonicity condition on coefficients of SDEs. On the contrary, a majority of SDEs from applications do not have globally monotone coefficients. As a recent breakthrough, the authors of [Hutzenthaler, Jentzen, Ann. Probab., 2020] originally presented a perturbation theory for stochastic differential equations (SDEs), which is crucial to recovering strong convergence rates of numerical schemes in a non-globally monotone setting. However, only a convergence rate of order $1/2$ was obtained there for time-stepping schemes such as a stopped increment-tamed Euler-Maruyama (SITEM) method. As an open problem, a natural question was raised by the aforementioned work as to whether higher convergence rate than $1/2$ can be obtained when higher order schemes are used. The present work attempts to solve the tough problem. To this end, we develop some new perturbation estimates that are able to reveal the order-one strong convergence of numerical methods. As the first application of the newly developed estimates, we identify the expected order-one pathwise uniformly strong convergence of the SITEM method for additive noise driven SDEs and multiplicative noise driven second order SDEs with non-globally monotone coefficients. As the other application, we propose and analyze a positivity preserving explicit Milstein-type method for Lotka-Volterra competition model driven by multi-dimensional noise, with a pathwise uniformly strong convergence rate of order one recovered under mild assumptions. These obtained results are completely new and significantly improve the existing theory. Numerical experiments are also provided to confirm the theoretical findings.
In fitting a continuous bounded data, the generalized beta (and several variants of this distribution) and the two-parameter Kumaraswamy (KW) distributions are the two most prominent univariate continuous distributions that come to our mind. There are some common features between these two rival probability models and to select one of them in a practical situation can be of great interest. Consequently, in this paper, we discuss various methods of selection between the generalized beta proposed by Libby and Novick (1982) (LNGB) and the KW distributions, such as the criteria based on probability of correct selection which is an improvement over the likelihood ratio statistic approach, and also based on pseudo-distance measures. We obtain an approximation for the probability of correct selection under the hypotheses HLNGB and HKW , and select the model that maximizes it. However, our proposal is more appealing in the sense that we provide the comparison study for the LNGB distribution that subsumes both types of classical beta and exponentiated generators (see, for details, Cordeiro et al. 2014; Libby and Novick 1982) which can be a natural competitor of a two-parameter KW distribution in an appropriate scenario.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.