An implicit Euler finite-volume scheme for a parabolic reaction-diffusion system modeling biofilm growth is analyzed and implemented. The system consists of a degenerate-singular diffusion equation for the biomass fraction, which is coupled to a diffusion equation for the nutrient concentration, and it is solved in a bounded domain with Dirichlet boundary conditions. By transforming the biomass fraction to an entropy-type variable, it is shown that the numerical scheme preserves the lower and upper bounds of the biomass fraction. The existence and uniqueness of a discrete solution and the convergence of the scheme are proved. Numerical experiments in one and two space dimensions illustrate, respectively, the rate of convergence in space of our scheme and the temporal evolution of the biomass fraction and the nutrient concentration.
We introduce a nonparametric graphical model for discrete node variables based on additive conditional independence. Additive conditional independence is a three way statistical relation that shares similar properties with conditional independence by satisfying the semi-graphoid axioms. Based on this relation we build an additive graphical model for discrete variables that does not suffer from the restriction of a parametric model such as the Ising model. We develop an estimator of the new graphical model via the penalized estimation of the discrete version of the additive precision operator and establish the consistency of the estimator under the ultrahigh-dimensional setting. Along with these methodological developments, we also exploit the properties of discrete random variables to uncover a deeper relation between additive conditional independence and conditional independence than previously known. The new graphical model reduces to a conditional independence graphical model under certain sparsity conditions. We conduct simulation experiments and analysis of an HIV antiretroviral therapy data set to compare the new method with existing ones.
This paper deals with a special type of Lyapunov functions, namely the solution of Zubov's equation. Such a function can be used to characterize the domain of attraction for systems of ordinary differential equations. We derive and prove an integral form solution to Zubov's equation. For numerical computation, we develop two data-driven methods. One is based on the integration of an augmented system of differential equations; and the other one is based on deep learning. The former is effective for systems with a relatively low state space dimension and the latter is developed for high dimensional problems. The deep learning method is applied to a New England 10-generator power system model. We prove that a neural network approximation exists for the Lyapunov function of power systems such that the approximation error is a cubic polynomial of the number of generators. The error convergence rate as a function of n, the number of neurons, is proved.
We study Hibridizable Discontinuous Galerkin (HDG) discretizations for a class of non-linear interior elliptic boundary value problems posed in curved domains where both the source term and the diffusion coefficient are non-linear. We consider the cases where the non-linear diffusion coefficient depends on the solution and on the gradient of the solution. To sidestep the need for curved elements, the discrete solution is computed on a polygonal subdomain that is not assumed to interpolate the true boundary, giving rise to an unfitted computational mesh. We show that, under mild assumptions on the source term and the computational domain, the discrete systems are well posed. Furthermore, we provide a priori error estimates showing that the discrete solution will have optimal order of convergence as long as the distance between the curved boundary and the computational boundary remains of the same order of magnitude as the mesh parameter.
Existing analyses of optimization in deep learning are either continuous, focusing on (variants of) gradient flow, or discrete, directly treating (variants of) gradient descent. Gradient flow is amenable to theoretical analysis, but is stylized and disregards computational efficiency. The extent to which it represents gradient descent is an open question in the theory of deep learning. The current paper studies this question. Viewing gradient descent as an approximate numerical solution to the initial value problem of gradient flow, we find that the degree of approximation depends on the curvature around the gradient flow trajectory. We then show that over deep neural networks with homogeneous activations, gradient flow trajectories enjoy favorable curvature, suggesting they are well approximated by gradient descent. This finding allows us to translate an analysis of gradient flow over deep linear neural networks into a guarantee that gradient descent efficiently converges to global minimum almost surely under random initialization. Experiments suggest that over simple deep neural networks, gradient descent with conventional step size is indeed close to gradient flow. We hypothesize that the theory of gradient flows will unravel mysteries behind deep learning.
Certain applications that analyze damping effects require the solution of quadratic eigenvalue problems (QEPs). We use refined isogeometric analysis (rIGA) to solve quadratic eigenproblems. rIGA discretization, while conserving desirable properties of maximum-continuity isogeometric analysis (IGA), reduces the interconnection between degrees of freedom by adding low-continuity basis functions. This connectivity reduction in rIGA's algebraic system results in faster matrix LU factorizations when using multifrontal direct solvers. We compare computational costs of rIGA versus those of IGA when employing Krylov eigensolvers to solve quadratic eigenproblems arising in 2D vector-valued multifield problems. For large problem sizes, the eigencomputation cost is governed by the cost of LU factorization, followed by costs of several matrix-vector and vector-vector multiplications, which correspond to Krylov projections. We minimize the computational cost by introducing C^0 and C^1 separators at specific element interfaces for our rIGA generalizations of the curl-conforming Nedelec and divergence-conforming Raviart-Thomas finite elements. Let p be the polynomial degree of basis functions; the LU factorization is up to O((p-1)^2) times faster when using rIGA compared to IGA in the asymptotic regime. Thus, rIGA theoretically improves the total eigencomputation cost by O((p-1)^2) for sufficiently large problem sizes. Yet, in practical cases of moderate-size eigenproblems, the improvement rate deteriorates as the number of computed eigenvalues increases because of multiple matrix-vector and vector-vector operations. Our numerical tests show that rIGA accelerates the solution of quadratic eigensystems by O(p-1) for moderately sized problems when we seek to compute a reasonable number of eigenvalues.
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. We assume that the learner is not allowed to interact with the expert and has no access to reinforcement signal of any kind. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive, while state-of-the-art policy optimization algorithms achieve significant empirical success, but are hampered by limited theoretical understanding. To bridge the gap between theory and practice, we introduce a novel bilinear saddle-point framework using Lagrangian duality. The proposed primal-dual viewpoint allows us to develop a model-free provably efficient algorithm through the lens of stochastic convex optimization. The method enjoys the advantages of simplicity of implementation, low memory requirements, and computational and sample complexities independent of the number of states. We further present an equivalent no-regret online-learning interpretation.
The recent statistical finite element method (statFEM) provides a coherent statistical framework to synthesise finite element models with observed data. Through embedding uncertainty inside of the governing equations, finite element solutions are updated to give a posterior distribution which quantifies all sources of uncertainty associated with the model. However to incorporate all sources of uncertainty, one must integrate over the uncertainty associated with the model parameters, the known forward problem of uncertainty quantification. In this paper, we make use of Langevin dynamics to solve the statFEM forward problem, studying the utility of the unadjusted Langevin algorithm (ULA), a Metropolis-free Markov chain Monte Carlo sampler, to build a sample-based characterisation of this otherwise intractable measure. Due to the structure of the statFEM problem, these methods are able to solve the forward problem without explicit full PDE solves, requiring only sparse matrix-vector products. ULA is also gradient-based, and hence provides a scalable approach up to high degrees-of-freedom. Leveraging the theory behind Langevin-based samplers, we provide theoretical guarantees on sampler performance, demonstrating convergence, for both the prior and posterior, in the Kullback-Leibler divergence, and, in Wasserstein-2, with further results on the effect of preconditioning. Numerical experiments are also provided, for both the prior and posterior, to demonstrate the efficacy of the sampler, with a Python package also included.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
Neural waveform models such as the WaveNet are used in many recent text-to-speech systems, but the original WaveNet is quite slow in waveform generation because of its autoregressive (AR) structure. Although faster non-AR models were recently reported, they may be prohibitively complicated due to the use of a distilling training method and the blend of other disparate training criteria. This study proposes a non-AR neural source-filter waveform model that can be directly trained using spectrum-based training criteria and the stochastic gradient descent method. Given the input acoustic features, the proposed model first uses a source module to generate a sine-based excitation signal and then uses a filter module to transform the excitation signal into the output speech waveform. Our experiments demonstrated that the proposed model generated waveforms at least 100 times faster than the AR WaveNet and the quality of its synthetic speech is close to that of speech generated by the AR WaveNet. Ablation test results showed that both the sine-wave excitation signal and the spectrum-based training criteria were essential to the performance of the proposed model.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.