We consider the 3D stochastic Navier-Stokes equation on the torus. Our main result concerns the temporal and spatio-temporal discretisation of a local strong pathwise solution. We prove optimal convergence rates in for the energy error with respect to convergence in probability, that is convergence of order 1 in space and of order (up to) 1/2 in time. The result holds up to the possible blow-up of the (time-discrete) solution. Our approach is based on discrete stopping times for the (time-discrete) solution.
We study a non-local optimal control problem involving a linear, bond-based peridynamics model. In addition to existence and uniqueness of solutions to our problem, we investigate their behavior as the horizon parameter $\delta$, which controls the degree of nonlocality, approaches zero. We then study a finite element-based discretization of this problem, its convergence, and the so-called asymptotic compatibility as the discretization parameter $h$ and the horizon parameter $\delta$ tend to zero simultaneously.
This paper is devoted to the study of Bingham flow with variable density. We propose a local bi-viscosity regularization of the stress tensor based on a Huber smoothing step. Next, our computational approach is based on a second-order, divergence-conforming discretization of the Huber regularized Bingham constitutive equations, coupled with a discontinuous Galerkin scheme for the mass density. We take advantage of the properties of the divergence conforming and discontinuous Galerkin formulations to incorporate upwind discretizations to stabilize the formulation. The stability of the continuous problem and the full-discrete scheme are analyzed. Further, a semismooth Newton method is proposed for solving the obtained fully-discretized system of equations at each time step. Finally, several numerical examples that illustrate the main features of the problem and the properties of the numerical scheme are presented.
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ``(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a ``q-learning" theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.
We propose a collocation method based on multivariate polynomial splines over triangulation or tetrahedralization for the numerical solution of partial differential equations. We start with a detailed explanation of the method for the Poisson equation and then extend the study to the second-order elliptic PDE in non-divergence form. We shall show that the numerical solution can approximate the exact PDE solution very well. Then we present a large amount of numerical experimental results to demonstrate the performance of the method over the 2D and 3D settings. In addition, we present a comparison with the existing multivariate spline methods in \cite{ALW06} and \cite{LW17} to show that the new method produces a similar and sometimes more accurate approximation in a more efficient fashion.
We describe a decisional attack against a version of the PLWE problem in which the samples are taken from a certain proper subring of large dimension of the cyclotomic ring $\mathbb{F}_q[x]/(\Phi_{p^k}(x))$ with $k>1$ in the case where $q\equiv 1\pmod{p}$ but $\Phi_{p^k}(x)$ is not totally split over $\mathbb{F}_q$. Our attack uses the fact that the roots of $\Phi_{p^k}(x)$ over suitable extensions of $\mathbb{F}_q$ have zero-trace and has overwhelming success probability as a function of the number of input samples. An implementation in Maple and some examples of our attack are also provided.
We propose a spectral collocation method to approximate the exact boundary control of the wave equation in a square domain. The idea is to introduce a suitable approximate control problem that we solve in the finite-dimensional space of polynomials of degree N in space. We prove that we can choose a sequence of discrete controls depending on the parameter N associated with the approximate control problem in such a way that they converge, as N goes to infinity, to a control of the continuous wave equation. Unlike other numerical approximations tried in the literature, this one does not require regularization techniques and can be easily adapted to other equations and systems where the controllability of the continuous model is known. The method is illustrated with several examples in 1-d and 2-d in a square domain. We also give numerical evidence of the highly accurate approximation inherent to spectral methods.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.