In this paper we introduce a procedure for identifying optimal methods in parametric families of numerical schemes for initial value problems in partial differential equations. The procedure maximizes accuracy by adaptively computing optimal parameters that minimize a defect-based estimate of the local error at each time-step. Viable refinements are proposed to reduce the computational overheads involved in the solution of the optimization problem, and to maintain conservation properties of the original methods. We apply the new strategy to recently introduced families of conservative schemes for the Korteweg-de Vries equation and for a nonlinear heat equation. Numerical tests demonstrate the improved efficiency of the new technique in comparison with existing methods.
We consider a Johnson-N\'ed\'elec FEM-BEM coupling, which is a direct and non-symmetric coupling of finite and boundary element methods, in order to solve interface problems for the magnetostatic Maxwell's equations with the magnetic vector potential ansatz. In the FEM-domain, equations may be non-linear, whereas they are exclusively linear in the BEM-part to guarantee the existence of a fundamental solution. First, the weak problem is formulated in quotient spaces to avoid resolving to a saddle point problem. Second, we establish in this setting well-posedness of the arising problem using the framework of Lipschitz and strongly monotone operators as well as a stability result for a special type of non-linearity, which is typically considered in magnetostatic applications. Then, the discretization is performed in the isogeometric context, i.e., the same type of basis functions that are used for geometry design are considered as ansatz functions for the discrete setting. In particular, NURBS are employed for geometry considerations, and B-Splines, which can be understood as a special type of NURBS, for analysis purposes. In this context, we derive a priori estimates w.r.t. h-refinement, and point out to an interesting behavior of BEM, which consists in an amelioration of the convergence rates, when a functional of the solution is evaluated in the exterior BEM-domain. This improvement may lead to a doubling of the convergence rate under certain assumptions. Finally, we end the paper with a numerical example to illustrate the theoretical results, along with a conclusion and an outlook.
High-order clustering aims to identify heterogeneous substructures in multiway datasets that arise commonly in neuroimaging, genomics, social network studies, etc. The non-convex and discontinuous nature of this problem pose significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, \emph{high-order Lloyd algorithm} (HLloyd), and \emph{high-order spectral clustering} (HSC), for high-order clustering. The convergence guarantees and statistical optimality are established for the proposed procedure under a mild sub-Gaussian noise assumption. Under the Gaussian tensor block model, we completely characterize the statistical-computational trade-off for achieving high-order exact clustering based on three different signal-to-noise ratio regimes. The analysis relies on new techniques of high-order spectral perturbation analysis and a "singular-value-gap-free" error bound in tensor estimation, which are substantially different from the matrix spectral analyses in the literature. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.
An explicit numerical method is developed for a class of time-changed stochastic differential equations, whose the coefficients obey H\"older's continuity in terms of the time variable and are allowed to grow super-linearly in terms of the state variable. The strong convergence of the method in a finite time interval is proved and the convergence rate is obtained. Numerical simulations are provided, which are in line with those theoretical results.
In this work, we investigate various approaches that use learning from training data to solve inverse problems, following a bi-level learning approach. We consider a general framework for optimal inversion design, where training data can be used to learn optimal regularization parameters, data fidelity terms, and regularizers, thereby resulting in superior variational regularization methods. In particular, we describe methods to learn optimal $p$ and $q$ norms for ${\rm L}^p-{\rm L}^q$ regularization and methods to learn optimal parameters for regularization matrices defined by covariance kernels. We exploit efficient algorithms based on Krylov projection methods for solving the regularized problems, both at training and validation stages, making these methods well-suited for large-scale problems. Our experiments show that the learned regularization methods perform well even when there is some inexactness in the forward operator, resulting in a mixture of model and measurement error.
We prove new optimality results for adaptive mesh refinement algorithms for non-symmetric, indefinite, and time-dependent problems by proposing a generalization of quasi-orthogonality which follows directly from the inf-sup stability of the underlying problem. This completely removes a central technical difficulty in modern proofs of optimal convergence of adaptive mesh refinement algorithms and leads to simple optimality proofs for the Taylor-Hood discretization of the stationary Stokes problem, a finite-element/boundary-element discretization of an unbounded transmission problem, and an adaptive time-stepping scheme for parabolic equations. The main technical tool are new stability bounds for the $LU$-factorization of matrices together with a recently established connection between quasi-orthogonality and matrix factorization.
Fully implicit Runge-Kutta (IRK) methods have many desirable properties as time integration schemes in terms of accuracy and stability, but high-order IRK methods are not commonly used in practice with numerical PDEs due to the difficulty of solving the stage equations. This paper introduces a theoretical and algorithmic preconditioning framework for solving the systems of equations that arise from IRK methods applied to linear numerical PDEs (without algebraic constraints). This framework also naturally applies to discontinuous Galerkin discretizations in time. Under quite general assumptions on the spatial discretization that yield stable time integration, the preconditioned operator is proven to have condition number bounded by a small, order-one constant, independent of the spatial mesh and time-step size, and with only weak dependence on number of stages/polynomial order; for example, the preconditioned operator for 10th-order Gauss IRK has condition number less than two, independent of the spatial discretization and time step. The new method can be used with arbitrary existing preconditioners for backward Euler-type time stepping schemes, and is amenable to the use of three-term recursion Krylov methods when the underlying spatial discretization is symmetric. The new method is demonstrated to be effective on various high-order finite-difference and finite-element discretizations of linear parabolic and hyperbolic problems, demonstrating fast, scalable solution of up to 10th order accuracy. The new method consistently outperforms existing block preconditioning approaches, and in several cases, the new method can achieve 4th-order accuracy using Gauss integration with roughly half the number of preconditioner applications and wallclock time as required using standard diagonally implicit RK methods.
Fully implicit Runge-Kutta (IRK) methods have many desirable accuracy and stability properties as time integration schemes, but high-order IRK methods are not commonly used in practice with large-scale numerical PDEs because of the difficulty of solving the stage equations. This paper introduces a theoretical and algorithmic framework for solving the nonlinear equations that arise from IRK methods (and discontinuous Galerkin discretizations in time) applied to nonlinear numerical PDEs, including PDEs with algebraic constraints. Several new linearizations of the nonlinear IRK equations are developed, offering faster and more robust convergence than the often-considered simplified Newton, as well as an effective preconditioner for the true Jacobian if exact Newton iterations are desired. Inverting these linearizations requires solving a set of block 2x2 systems. Under quite general assumptions, it is proven that the preconditioned 2x2 operator's condition number is bounded by a small constant close to one, independent of the spatial discretization, spatial mesh, and time step, and with only weak dependence on the number of stages or integration accuracy. Moreover, the new method is built using the same preconditioners needed for backward Euler-type time stepping schemes, so can be readily added to existing codes. The new methods are applied to several challenging fluid flow problems, including the compressible Euler and Navier Stokes equations, and the vorticity-streamfunction formulation of the incompressible Euler and Navier Stokes equations. Up to 10th-order accuracy is demonstrated using Gauss IRK, while in all cases 4th-order Gauss IRK requires roughly half the number of preconditioner applications as required by standard SDIRK methods.
Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.