In this work, we are concerned with a Fokker-Planck equation related to the nonlinear noisy leaky integrate-and-fire model for biological neural networks which are structured by the synaptic weights and equipped with the Hebbian learning rule. The equation contains a small parameter $\varepsilon$ separating the time scales of learning and reacting behavior of the neural system, and an asymptotic limit model can be derived by letting $\varepsilon\to 0$, where the microscopic quasi-static states and the macroscopic evolution equation are coupled through the total firing rate. To handle the endowed flux-shift structure and the multi-scale dynamics in a unified framework, we propose a numerical scheme for this equation that is mass conservative, unconditionally positivity preserving, and asymptotic preserving. We provide extensive numerical tests to verify the schemes' properties and carry out a set of numerical experiments to investigate the model's learning ability, and explore the solution's behavior when the neural network is excitatory.
Determining process-structure-property linkages is one of the key objectives in material science, and uncertainty quantification plays a critical role in understanding both process-structure and structure-property linkages. In this work, we seek to learn a distribution of microstructure parameters that are consistent in the sense that the forward propagation of this distribution through a crystal plasticity finite element model (CPFEM) matches a target distribution on materials properties. This stochastic inversion formulation infers a distribution of acceptable/consistent microstructures, as opposed to a deterministic solution, which expands the range of feasible designs in a probabilistic manner. To solve this stochastic inverse problem, we employ a recently developed uncertainty quantification (UQ) framework based on push-forward probability measures, which combines techniques from measure theory and Bayes rule to define a unique and numerically stable solution. This approach requires making an initial prediction using an initial guess for the distribution on model inputs and solving a stochastic forward problem. To reduce the computational burden in solving both stochastic forward and stochastic inverse problems, we combine this approach with a machine learning (ML) Bayesian regression model based on Gaussian processes and demonstrate the proposed methodology on two representative case studies in structure-property linkages.
The well-known stochastic SIS model characterized by highly nonlinear in epidemiology has a unique positive solution taking values in a bounded domain with a series of dynamical behaviors. However, the approximation methods to maintain the positivity and long-time behaviors for the stochastic SIS model, while very important, are also lacking. In this paper, based on a logarithmic transformation, we propose a novel explicit numerical method for a stochastic SIS epidemic model whose coefficients violate the global monotonicity condition, which can preserve the positivity of the original stochastic SIS model. And we show the strong convergence of the numerical method and derive that the rate of convergence is of order one. Moreover, the extinction of the exact solution of stochastic SIS model is reproduced. Some numerical experiments are given to illustrate the theoretical results and testify the efficiency of our algorithm.
We consider a statistical inverse learning problem, where the task is to estimate a function $f$ based on noisy point evaluations of $Af$, where $A$ is a linear operator. The function $Af$ is evaluated at i.i.d. random design points $u_n$, $n=1,...,N$ generated by an unknown general probability distribution. We consider Tikhonov regularization with general convex and $p$-homogeneous penalty functionals and derive concentration rates of the regularized solution to the ground truth measured in the symmetric Bregman distance induced by the penalty functional. We derive concrete rates for Besov norm penalties and numerically demonstrate the correspondence with the observed rates in the context of X-ray tomography.
Considering a probability distribution over parameters is known as an efficient strategy to learn a neural network with non-differentiable activation functions. We study the expectation of a probabilistic neural network as a predictor by itself, focusing on the aggregation of binary activated neural networks with normal distributions over real-valued weights. Our work leverages a recent analysis derived from the PAC-Bayesian framework that derives tight generalization bounds and learning procedures for the expected output value of such an aggregation, which is given by an analytical expression. While the combinatorial nature of the latter has been circumvented by approximations in previous works, we show that the exact computation remains tractable for deep but narrow neural networks, thanks to a dynamic programming approach. This leads us to a peculiar bound minimization learning algorithm for binary activated neural networks, where the forward pass propagates probabilities over representations instead of activation values. A stochastic counterpart of this new neural networks training scheme that scales to wider architectures is proposed.
This paper introduces a notation of $\varepsilon$-weakened robustness for analyzing the reliability and stability of deep neural networks (DNNs). Unlike the conventional robustness, which focuses on the "perfect" safe region in the absence of adversarial examples, $\varepsilon$-weakened robustness focuses on the region where the proportion of adversarial examples is bounded by user-specified $\varepsilon$. Smaller $\varepsilon$ means a smaller chance of failure. Under such robustness definition, we can give conclusive results for the regions where conventional robustness ignores. We prove that the $\varepsilon$-weakened robustness decision problem is PP-complete and give a statistical decision algorithm with user-controllable error bound. Furthermore, we derive an algorithm to find the maximum $\varepsilon$-weakened robustness radius. The time complexity of our algorithms is polynomial in the dimension and size of the network. So, they are scalable to large real-world networks. Besides, We also show its potential application in analyzing quality issues.
In numerical simulations of complex flows with discontinuities, it is necessary to use nonlinear schemes. The spectrum of the scheme used have a significant impact on the resolution and stability of the computation. Based on the approximate dispersion relation method, we combine the corresponding spectral property with the dispersion relation preservation proposed by De and Eswaran (J. Comput. Phys. 218 (2006) 398-416) and propose a quasi-linear dispersion relation preservation (QL-DRP) analysis method, through which the group velocity of the nonlinear scheme can be determined. In particular, we derive the group velocity property when a high-order Runge-Kutta scheme is used and compare the performance of different time schemes with QL-DRP. The rationality of the QL-DRP method is verified by a numerical simulation and the discrete Fourier transform method. To further evaluate the performance of a nonlinear scheme in finding the group velocity, new hyperbolic equations are designed. The validity of QL-DRP and the group velocity preservation of several schemes are investigated using two examples of the equation for one-dimensional wave propagation and the new hyperbolic equations. The results show that the QL-DRP method integrated with high-order time schemes can determine the group velocity for nonlinear schemes and evaluate their performance reasonably and efficiently.
We consider the problems of the numerical solution of the Cauchy problem for an evolutionary equation with memory when the kernel of the integral term is a difference one. The computational implementation is associated with the need to work with an approximate solution for all previous points in time. In this paper, the considered nonlocal problem is transformed into a local one; a loosely coupled equation system with additional ordinary differential equations is solved. This approach is based on the approximation of the difference kernel by the sum of exponentials. Estimates for the stability of the solution concerning the initial data and the right-hand side for the corresponding Cauchy problem are obtained. Two-level schemes with weights with convenient computational implementation are constructed and investigated. The theoretical consideration is supplemented by the results of the numerical solution of the integrodifferential equation when the kernel is the stretching exponential function.
Contention-based wireless channel access methods like CSMA and ALOHA paved the way for the rise of the Internet of Things in industrial applications (IIoT). However, to cope with increasing demands for reliability and throughput, several mostly TDMA-based protocols like IEEE 802.15.4 and its extensions were proposed. Nonetheless, many of these IIoT-protocols still require contention-based communication, e.g., for slot allocation and broadcast transmission. In many cases, subtle but hidden patterns characterize this secondary traffic. Present contention-based protocols are unaware of these hidden patterns and can therefore not exploit this information. Especially in dense networks, they often do not provide sufficient reliability for primary traffic, e.g., they are unable to allocate transmission slots in time. In this paper, we propose QMA, a contention-based multiple access scheme based on Q-learning, which dynamically adapts transmission times to avoid collisions by learning patterns in the contention-based traffic. QMA is designed to be resource-efficient and targets small embedded devices. We show that QMA solves the hidden node problem without the additional overhead of RTS / CTS messages and verify the behaviour of QMA in the FIT IoT-LAB testbed. Finally, QMA's scalability is studied by simulation, where it is used for GTS allocation in IEEE 802.15.4 DSME. Results show that QMA considerably increases reliability and throughput in comparison to CSMA/CA, especially in networks with a high load.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.
Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.