The discrete gradient methods are integrators designed to preserve invariants of ordinary differential equations. From a formal series expansion of a subclass of these methods, we derive conditions for arbitrarily high order. We derive specific results for the average vector field discrete gradient, from which we get P-series methods in the general case, and B-series methods for canonical Hamiltonian systems. Higher order schemes are presented, and their applications are demonstrated on the H\'enon-Heiles system and a Lotka-Volterra system, and on both the training and integration of a pendulum system learned from data by a neural network.
We consider a Johnson-N\'ed\'elec FEM-BEM coupling, which is a direct and non-symmetric coupling of finite and boundary element methods, in order to solve interface problems for the magnetostatic Maxwell's equations with the magnetic vector potential ansatz. In the FEM-domain, equations may be non-linear, whereas they are exclusively linear in the BEM-part to guarantee the existence of a fundamental solution. First, the weak problem is formulated in quotient spaces to avoid resolving to a saddle point problem. Second, we establish in this setting well-posedness of the arising problem using the framework of Lipschitz and strongly monotone operators as well as a stability result for a special type of non-linearity, which is typically considered in magnetostatic applications. Then, the discretization is performed in the isogeometric context, i.e., the same type of basis functions that are used for geometry design are considered as ansatz functions for the discrete setting. In particular, NURBS are employed for geometry considerations, and B-Splines, which can be understood as a special type of NURBS, for analysis purposes. In this context, we derive a priori estimates w.r.t. h-refinement, and point out to an interesting behavior of BEM, which consists in an amelioration of the convergence rates, when a functional of the solution is evaluated in the exterior BEM-domain. This improvement may lead to a doubling of the convergence rate under certain assumptions. Finally, we end the paper with a numerical example to illustrate the theoretical results, along with a conclusion and an outlook.
Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control (SC) methods have quickly become one of the leading methods for estimating causal effects in observational studies in settings with panel data. Formal discussions often motivate SC methods by the assumption that the potential outcomes were generated by a factor model. Here we study SC methods from a design-based perspective, assuming a model for the selection of the treated unit(s) and period(s). We show that the standard SC estimator is generally biased under random assignment. We propose a Modified Unbiased Synthetic Control (MUSC) estimator that guarantees unbiasedness under random assignment and derive its exact, randomization-based, finite-sample variance. We also propose an unbiased estimator for this variance. We document in settings with real data that under random assignment, SC-type estimators can have root mean-squared errors that are substantially lower than that of other common estimators. We show that such an improvement is weakly guaranteed if the treated period is similar to the other periods, for example, if the treated period was randomly selected.
Many force-gradient explicit symplectic integration algorithms have been designed for the Hamiltonian $H=T (\mathbf{p})+V(\mathbf{q})$ with a kinetic energy $T(\mathbf{p})=\mathbf{p}^2/2$ in the existing references. When the force-gradient operator is appropriately adjusted as a new operator, they are still suitable for a class of Hamiltonian problems $H=K(\mathbf{p},\mathbf{q})+V(\mathbf{q})$ with \emph{integrable} part $K(\mathbf{p},\mathbf{q}) = \sum_{i=1}^{n} \sum_{j=1}^{n}a_{ij}p_ip_j+\sum_{i=1}^{n} b_ip_i$, where $a_{ij}=a_{ij}(\textbf{q})$ and $b_i=b_i(\textbf{q})$ are functions of coordinates $\textbf{q}$. The newly adjusted operator is not a force-gradient operator but is similar to the momentum-version operator associated to the potential $V$. The newly extended (or adjusted) algorithms are no longer solvers of the original Hamiltonian, but are solvers of slightly modified Hamiltonians. They are explicit symplectic integrators with time reversibility and time symmetry. Numerical tests show that the standard symplectic integrators without the new operator are generally poorer than the corresponding extended methods with the new operator in computational accuracies and efficiencies. The optimized methods have better accuracies than the corresponding non-optimized methods. Among the tested symplectic methods, the two extended optimized seven-stage fourth-order methods of Omelyan, Mryglod and Folk exhibit the best numerical performance. As a result, one of the two optimized algorithms is used to study the orbital dynamical features of a modified H\'{e}non-Heiles system and a spring pendulum. These extended integrators allow for integrations in Hamiltonian problems, such as the spiral structure in self-consistent models of rotating galaxies and the spiral arms in galaxies.
Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime. We study this phenomenon with mean-field analysis, by translating the training process of ResNet to a gradient-flow partial differential equation (PDE) and examining the convergence properties of this limiting process. The activation function is assumed to be $2$-homogeneous or partially $1$-homogeneous; the regularized ReLU satisfies the latter condition. We show that if the ResNet is sufficiently large, with depth and width depending algebraically on the accuracy and confidence levels, first-order optimization methods can find global minimizers that fit the training data.
In this work, we investigate various approaches that use learning from training data to solve inverse problems, following a bi-level learning approach. We consider a general framework for optimal inversion design, where training data can be used to learn optimal regularization parameters, data fidelity terms, and regularizers, thereby resulting in superior variational regularization methods. In particular, we describe methods to learn optimal $p$ and $q$ norms for ${\rm L}^p-{\rm L}^q$ regularization and methods to learn optimal parameters for regularization matrices defined by covariance kernels. We exploit efficient algorithms based on Krylov projection methods for solving the regularized problems, both at training and validation stages, making these methods well-suited for large-scale problems. Our experiments show that the learned regularization methods perform well even when there is some inexactness in the forward operator, resulting in a mixture of model and measurement error.
We provide a control-theoretic perspective on optimal tensor algorithms for minimizing a convex function in a finite-dimensional Euclidean space. Given a function $\Phi: \mathbb{R}^d \rightarrow \mathbb{R}$ that is convex and twice continuously differentiable, we study a closed-loop control system that is governed by the operators $\nabla \Phi$ and $\nabla^2 \Phi$ together with a feedback control law $\lambda(\cdot)$ satisfying the algebraic equation $(\lambda(t))^p\|\nabla\Phi(x(t))\|^{p-1} = \theta$ for some $\theta \in (0, 1)$. Our first contribution is to prove the existence and uniqueness of a local solution to this system via the Banach fixed-point theorem. We present a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories. The rate of convergence is $O(1/t^{(3p+1)/2})$ in terms of objective function gap and $O(1/t^{3p})$ in terms of squared gradient norm. Our second contribution is to provide two algorithmic frameworks obtained from discretization of our continuous-time system, one of which generalizes the large-step A-HPE framework and the other of which leads to a new optimal $p$-th order tensor algorithm. While our discrete-time analysis can be seen as a simplification and generalization of~\citet{Monteiro-2013-Accelerated}, it is largely motivated by the aforementioned continuous-time analysis, demonstrating the fundamental role that the feedback control plays in optimal acceleration and the clear advantage that the continuous-time perspective brings to algorithmic design. A highlight of our analysis is that we show that all of the $p$-th order optimal tensor algorithms that we discuss minimize the squared gradient norm at a rate of $O(k^{-3p})$, which complements the recent analysis.
In this paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning based on Bregman divergences and momentum techniques. Specifically, we propose a Bregman gradient policy optimization (BGPO) algorithm based on the basic momentum technique and mirror descent iteration. At the same time, we present an accelerated Bregman gradient policy optimization (VR-BGPO) algorithm based on a momentum variance-reduced technique. Moreover, we introduce a convergence analysis framework for our Bregman gradient policy optimization under the nonconvex setting. Specifically, we prove that BGPO achieves the sample complexity of $\tilde{O}(\epsilon^{-4})$ for finding $\epsilon$-stationary point only requiring one trajectory at each iteration, and VR-BGPO reaches the best known sample complexity of $\tilde{O}(\epsilon^{-3})$ for finding an $\epsilon$-stationary point, which also only requires one trajectory at each iteration. In particular, by using different Bregman divergences, our methods unify many existing policy optimization algorithms and their new variants such as the existing (variance-reduced) policy gradient algorithms and (variance-reduced) natural policy gradient algorithms. Extensive experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithms.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.
Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics system, learning molecular fingerprints, predicting protein interface, and classifying diseases require a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures, like the dependency tree of sentences and the scene graph of images, is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are connectionist models that capture the dependence of graphs via message passing between the nodes of graphs. Unlike standard neural networks, graph neural networks retain a state that can represent information from its neighborhood with arbitrary depth. Although the primitive GNNs have been found difficult to train for a fixed point, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful learning with them. In recent years, systems based on variants of graph neural networks such as graph convolutional network (GCN), graph attention network (GAT), gated graph neural network (GGNN) have demonstrated ground-breaking performance on many tasks mentioned above. In this survey, we provide a detailed review over existing graph neural network models, systematically categorize the applications, and propose four open problems for future research.
Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.