东京热加勒比中文无码,九九99精品国产精品欧洲

We develop an error estimator for neural network approximations of PDEs. The proposed approach is based on dual weighted residual estimator (DWR). It is destined to serve as a stopping criterion that guarantees the accuracy of the solution independently of the design of the neural network training. The result is equipped with computational examples for Laplace and Stokes problems.

相關內容

估計(ji)(ji)/估計(ji)(ji)量

關注 3

估計/估計量 · 優化器 · 易處理的 · Performer · Principle ·

2021 年 9 月 24 日

Entropic estimation of optimal transport maps

Aram-Alexandre Pooladian,Jonathan Niles-Weed

from arxiv, 30 pages, 9 figures

We develop a computationally tractable method for estimating the optimal map between two distributions over $\mathbb{R}^d$ with rigorous finite-sample guarantees. Leveraging an entropic version of Brenier's theorem, we show that our estimator -- the barycentric projection of the optimal entropic plan -- is easy to compute using Sinkhorn's algorithm. As a result, unlike current approaches for map estimation, which are slow to evaluate when the dimension or number of samples is large, our approach is parallelizable and extremely efficient even for massive data sets. Under smoothness assumptions on the optimal map, we show that our estimator enjoys comparable statistical performance to other estimators in the literature, but with much lower computational cost. We showcase the efficacy of our proposed estimator through numerical examples. Our proofs are based on a modified duality principle for entropic optimal transport and on a method for approximating optimal entropic plans due to Pal (2019).

核化 · 再生核希爾伯特空間 · 估計/估計量 · 馬爾可夫鏈 · 策略評估 ·

2021 年 9 月 24 日

Optimal policy evaluation using kernel-based temporal difference methods

Yaqi Duan,Mengdi Wang,Martin J. Wainwright

We study methods based on reproducing kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process (MRP). We study a regularized form of the kernel least-squares temporal difference (LSTD) estimate; in the population limit of infinite data, it corresponds to the fixed point of a projected Bellman operator defined by the associated reproducing kernel Hilbert space. The estimator itself is obtained by computing the projected fixed point induced by a regularized version of the empirical operator; due to the underlying kernel structure, this reduces to solving a linear system involving kernel matrices. We analyze the error of this estimate in the $L^2(\mu)$-norm, where $\mu$ denotes the stationary distribution of the underlying Markov chain. Our analysis imposes no assumptions on the transition operator of the Markov chain, but rather only conditions on the reward function and population-level kernel LSTD solutions. We use empirical process theory techniques to derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator, as well as the instance-dependent variance of the Bellman residual error. In addition, we prove minimax lower bounds over sub-classes of MRPs, which shows that our rate is optimal in terms of the sample size $n$ and the effective horizon $H = (1 - \gamma)^{-1}$. Whereas existing worst-case theory predicts cubic scaling ($H^3$) in the effective horizon, our theory reveals that there is in fact a much wider range of scalings, depending on the kernel, the stationary distribution, and the variance of the Bellman residual error. Notably, it is only parametric and near-parametric problems that can ever achieve the worst-case cubic scaling.

近似誤差 · 近似 · ReLU · Networking · 寬度 ·

2021 年 9 月 24 日

Deep Network Approximation for Smooth Functions

Jianfeng Lu,Zuowei Shen,Haizhao Yang,Shijun Zhang

This paper establishes the (nearly) optimal approximation error characterization of deep rectified linear unit (ReLU) networks for smooth functions in terms of both width and depth simultaneously. To that end, we first prove that multivariate polynomials can be approximated by deep ReLU networks of width $\mathcal{O}(N)$ and depth $\mathcal{O}(L)$ with an approximation error $\mathcal{O}(N^{-L})$. Through local Taylor expansions and their deep ReLU network approximations, we show that deep ReLU networks of width $\mathcal{O}(N\ln N)$ and depth $\mathcal{O}(L\ln L)$ can approximate $f\in C^s([0,1]^d)$ with a nearly optimal approximation error $\mathcal{O}(\|f\|_{C^s([0,1]^d)}N^{-2s/d}L^{-2s/d})$. Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, respectively.

估計/估計量 · 近似 · Performer · tuning · 正則化項 ·

2021 年 9 月 24 日

A posteriori error estimates via equilibrated stress reconstructions for contact problems approximated by Nitsche's method

Daniele Antonio Di Pietro,Ilaria Fontana,Kyrylo Kazymyrenko

We present an a posteriori error estimate based on equilibrated stress reconstructions for the finite element approximation of a unilateral contact problem with weak enforcement of the contact conditions. We start by proving a guaranteed upper bound for the dual norm of the residual. This norm is shown to control the natural energy norm up to a boundary term, which can be removed under a saturation assumption. The basic estimate is then refined to distinguish the different components of the error, and is used as a starting point to design an algorithm including adaptive stopping criteria for the nonlinear solver and automatic tuning of a regularization parameter. We then discuss an actual way of computing the stress reconstruction based on the Arnold-Falk-Winther finite elements. Finally, after briefly discussing the efficiency of our estimators, we showcase their performance on a panel of numerical tests.

蒙特卡羅 · 學成 · 時間步 · 可約的 · SC ·

2021 年 9 月 23 日

The Seven-League Scheme: Deep learning for large time step Monte Carlo simulations of stochastic differential equations

Shuaiqiang Liu,Lech A. Grzelak,Cornelis W. Oosterlee

from arxiv, 26 pages

We propose an accurate data-driven numerical scheme to solve Stochastic Differential Equations (SDEs), by taking large time steps. The SDE discretization is built up by means of a polynomial chaos expansion method, on the basis of accurately determined stochastic collocation (SC) points. By employing an artificial neural network to learn these SC points, we can perform Monte Carlo simulations with large time steps. Error analysis confirms that this data-driven scheme results in accurate SDE solutions in the sense of strong convergence, provided the learning methodology is robust and accurate. With a method variant called the compression-decompression collocation and interpolation technique, we can drastically reduce the number of neural network functions that have to be learned, so that computational speed is enhanced. Numerical experiments confirm a high-quality strong convergence error when using large time steps, and the novel scheme outperforms some classical numerical SDE discretizations. Some applications, here in financial option valuation, are also presented.

估計/估計量 · PCA · 統計量 · 稀疏 · 多元高斯分布 ·

2021 年 9 月 23 日

Sparse PCA: A New Scalable Estimator Based On Integer Programming

Kayhan Behdin,Rahul Mazumder

We consider the Sparse Principal Component Analysis (SPCA) problem under the well-known spiked covariance model. Recent work has shown that the SPCA problem can be reformulated as a Mixed Integer Program (MIP) and can be solved to global optimality, leading to estimators that are known to enjoy optimal statistical properties. However, current MIP algorithms for SPCA are unable to scale beyond instances with a thousand features or so. In this paper, we propose a new estimator for SPCA which can be formulated as a MIP. Different from earlier work, we make use of the underlying spiked covariance model and properties of the multivariate Gaussian distribution to arrive at our estimator. We establish statistical guarantees for our proposed estimator in terms of estimation error and support recovery. We propose a custom algorithm to solve the MIP which is significantly more scalable than off-the-shelf solvers; and demonstrate that our approach can be much more computationally attractive compared to earlier exact MIP-based approaches for the SPCA problem. Our numerical experiments on synthetic and real datasets show that our algorithms can address problems with up to 20000 features in minutes; and generally result in favorable statistical properties compared to existing popular approaches for SPCA.

優化器 · Performer · 可行 · 控制器 · 二次規劃 ·

2021 年 9 月 22 日

Recursive Feasibility Guided Optimal Parameter Adaptation of Differential Convex Optimization Policies for Safety-Critical Systems

Hardik Parwana,Dimitra Panagou

Quadratic programs (QPs) that enforce control barrier functions (CBFs) have become popular for safety-critical control synthesis, in part due to their ease of implementation and constraint specification. The construction of valid CBFs, however, is not straightforward, and for arbitrarily chosen parameters of the QP, the system trajectories may enter states at which the QP either eventually becomes infeasible, or may not achieve desired performance. In this work, we pose the control synthesis problem as a differential policy whose parameters are optimized for performance over a time horizon at high level, thus resulting in a bi-level optimization routine. In the absence of knowledge of the set of feasible parameters, we develop a Recursive Feasibility Guided Gradient Descent approach for updating the parameters of QP so that the new solution performs at least as well as previous solution. By considering the dynamical system as a directed graph over time, this work presents a novel way of optimizing performance of a QP controller over a time horizon for multiple CBFs by (1) using the gradient of its solution with respect to its parameters by employing sensitivity analysis, and (2) backpropagating these as well as system dynamics gradients to update parameters while maintaining feasibility of QPs.

Machine Learning · 雅可比矩陣 · 雅克比 · ONCE · 估計/估計量 ·

2021 年 9 月 22 日

An automatic differentiation system for the age of differential privacy

Dmitrii Usynin,Alexander Ziller,Moritz Knolle,Daniel Rueckert,Georgios Kaissis

from arxiv, 8 pages

We introduce Tritium, an automatic differentiation-based sensitivity analysis framework for differentially private (DP) machine learning (ML). Optimal noise calibration in this setting requires efficient Jacobian matrix computations and tight bounds on the L2-sensitivity. Our framework achieves these objectives by relying on a functional analysis-based method for sensitivity tracking, which we briefly outline. This approach interoperates naturally and seamlessly with static graph-based automatic differentiation, which enables order-of-magnitude improvements in compilation times compared to previous work. Moreover, we demonstrate that optimising the sensitivity of the entire computational graph at once yields substantially tighter estimates of the true sensitivity compared to interval bound propagation techniques. Our work naturally befits recent developments in DP such as individual privacy accounting, aiming to offer improved privacy-utility trade-offs, and represents a step towards the integration of accessible machine learning tooling with advanced privacy accounting systems.

優化器 · 可約的 · 近似 · 控制器 · Principle ·

2020 年 6 月 29 日

Differential Dynamic Programming Neural Optimizer

Guan-Horng Liu,Tianrong Chen,Evangelos A. Theodorou

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.

Networking · MoDELS · Neural Networks · 潛變量/隱變量 · Continuity ·

2018 年 10 月 3 日

Neural Ordinary Differential Equations

Ricky T. Q. Chen,Yulia Rubanova,Jesse Bettencourt,David Duvenaud

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.