Total generalization variation (TGV) is a very powerful and important regularization for various inverse problems and computer vision tasks. In this paper, we proposed a semismooth Newton based augmented Lagrangian method to solve this problem. The augmented Lagrangian method (also called as method of multipliers) is widely used for lots of smooth or nonsmooth variational problems. However, its efficiency usually heavily depends on solving the coupled and nonlinear system together and simultaneously, which is very complicated and highly coupled for total generalization variation. With efficient primal-dual semismooth Newton methods for the complicated linear subproblems involving total generalized variation, we investigated a highly efficient and competitive algorithm compared to some efficient first-order method. With the analysis of the metric subregularities of the corresponding functions, we give both the global convergence and local linear convergence rate for the proposed augmented Lagrangian methods.
In this paper, an upwind GFDM is developed for the coupled heat and mass transfer problems in porous media. GFDM is a meshless method that can obtain the difference schemes of spatial derivatives by using Taylor expansion in local node influence domains and the weighted least squares method. The first-order single-point upstream scheme in the FDM/FVM-based reservoir simulator is introduced to GFDM to form the upwind GFDM, based on which, a sequential coupled discrete scheme of the pressure diffusion equation and the heat convection-conduction equation is solved to obtain pressure and temperature profiles. This paper demonstrates that this method can be used to obtain the meshless solution of the convection-diffusion equation with a stable upwind effect. For porous flow problems, the upwind GFDM is more practical and stable than the method of manually adjusting the influence domain based on the prior information of the flow field to achieve the upwind effect. Two types of calculation errors are analyzed, and three numerical examples are implemented to illustrate the good calculation accuracy and convergence of the upwind GFDM for heat and mass transfer problems in porous media, and indicate the increase of the radius of the node influence domain will increase the calculation error of temperature profiles. Overall, the upwind GFDM discretizes the computational domain using only a point cloud that is generated with much less topological constraints than the generated mesh, but achieves good computational performance as the mesh-based approaches, and therefore has great potential to be developed as a general-purpose numerical simulator for various porous flow problems in domains with complex geometry.
We introduce a new constrained optimization method for policy gradient reinforcement learning, which uses two trust regions to regulate each policy update. In addition to using the proximity of one single old policy as the first trust region as done by prior works, we propose to form a second trust region through the construction of another virtual policy that represents a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial in case the old policy performs badly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory buffer of past policies, providing a new capability for dynamically selecting appropriate trust regions during the optimization process. Our proposed method, dubbed as Memory-Constrained Policy Optimization (MCPO), is examined on a diverse suite of environments including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.
We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as the Subsampled Newton and Newton Sketch, which can efficiently construct stochastic Hessian estimates for many tasks. Despite using second-order information, these existing methods do not exhibit superlinear convergence, unless the stochastic noise is gradually reduced to zero during the iteration, which would lead to a computational blow-up in the per-iteration cost. We address this limitation with Hessian averaging: instead of using the most recent Hessian estimate, our algorithm maintains an average of all past estimates. This reduces the stochastic noise while avoiding the computational blow-up. We show that this scheme enjoys local $Q$-superlinear convergence with a non-asymptotic rate of $(\Upsilon\sqrt{\log (t)/t}\,)^{t}$, where $\Upsilon$ is proportional to the level of stochastic noise in the Hessian oracle. A potential drawback of this (uniform averaging) approach is that the averaged estimates contain Hessian information from the global phase of the iteration, i.e., before the iterates converge to a local neighborhood. This leads to a distortion that may substantially delay the superlinear convergence until long after the local neighborhood is reached. To address this drawback, we study a number of weighted averaging schemes that assign larger weights to recent Hessians, so that the superlinear convergence arises sooner, albeit with a slightly slower rate. Remarkably, we show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still enjoys a superlinear convergence~rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.
Stein variational gradient descent (SVGD) is a general-purpose optimization-based sampling algorithm that has recently exploded in popularity, but is limited by two issues: it is known to produce biased samples, and it can be slow to converge on complicated distributions. A recently proposed stochastic variant of SVGD (sSVGD) addresses the first issue, producing unbiased samples by incorporating a special noise into the SVGD dynamics such that asymptotic convergence is guaranteed. Meanwhile, Stein variational Newton (SVN), a Newton-like extension of SVGD, dramatically accelerates the convergence of SVGD by incorporating Hessian information into the dynamics, but also produces biased samples. In this paper we derive, and provide a practical implementation of, a stochastic variant of SVN (sSVN) which is both asymptotically correct and converges rapidly. We demonstrate the effectiveness of our algorithm on a difficult class of test problems -- the Hybrid Rosenbrock density -- and show that sSVN converges using three orders of magnitude fewer gradient evaluations of the log likelihood than its stochastic SVGD counterpart. Our results show that sSVN is a promising approach to accelerating high-precision Bayesian inference tasks with modest-dimension, $d\sim\mathcal{O}(10)$.
This paper makes the first attempt to apply newly developed upwind GFDM for the meshless solution of two-phase porous flow equations. In the presented method, node cloud is used to flexibly discretize the computational domain, instead of complicated mesh generation. Combining with moving least square approximation and local Taylor expansion, spatial derivatives of oil-phase pressure at a node are approximated by generalized difference operators in the local influence domain of the node. By introducing the first-order upwind scheme of phase relative permeability, and combining the discrete boundary conditions, fully-implicit GFDM-based nonlinear discrete equations of the immiscible two-phase porous flow are obtained and solved by the nonlinear solver based on the Newton iteration method with the automatic differentiation, to avoid the additional computational cost and possible computational instability caused by sequentially coupled scheme. Two numerical examples are implemented to test the computational performances of the presented method. Detailed error analysis finds the two sources of the calculation error, roughly studies the convergence order thus find that the low-order error of GFDM makes the convergence order of GFDM lower than that of FDM when node spacing is small, and points out the significant effect of the symmetry or uniformity of the node collocation in the node influence domain on the accuracy of generalized difference operators, and the radius of the node influence domain should be small to achieve high calculation accuracy, which is a significant difference between the studied hyperbolic two-phase porous flow problem and the elliptic problems when GFDM is applied.
In this article we suggest two discretization methods based on isogeometric analysis (IGA) for planar linear elasticity. On the one hand, we apply the well-known ansatz of weakly imposed symmetry for the stress tensor and obtain a well-posed mixed formulation. Such modified mixed problems have been already studied by different authors. But we concentrate on the exploitation of IGA results to handle also curved boundary geometries. On the other hand, we consider the more complicated situation of strong symmetry, i.e. we discretize the mixed weak form determined by the so-called Hellinger-Reissner variational principle. We show the existence of suitable approximate fields leading to an inf-sup stable saddle-point problem. For both discretization approaches we prove convergence statements and in case of weak symmetry we illustrate the approximation behavior by means of several numerical experiments.
The instrumental variable method is widely used in the health and social sciences for identification and estimation of causal effects in the presence of potentially unmeasured confounding. In order to improve efficiency, multiple instruments are routinely used, leading to concerns about bias due to possible violation of the instrumental variable assumptions. To address this concern, we introduce a new class of g-estimators that are guaranteed to remain consistent and asymptotically normal for the causal effect of interest provided that a set of at least $\gamma$ out of $K$ candidate instruments are valid, for $\gamma\leq K$ set by the analyst ex ante, without necessarily knowing the identities of the valid and invalid instruments. We provide formal semiparametric efficiency theory supporting our results. Both simulation studies and applications to the UK Biobank data demonstrate the superior empirical performance of our estimators compared to competing methods.
The Koopman operator is beneficial for analyzing nonlinear and stochastic dynamics; it is linear but infinite-dimensional, and it governs the evolution of observables. The extended dynamic mode decomposition (EDMD) is one of the famous methods in the Koopman operator approach. The EDMD employs a data set of snapshot pairs and a specific dictionary to evaluate an approximation for the Koopman operator, i.e., the Koopman matrix. In this study, we focus on stochastic differential equations, and a method to obtain the Koopman matrix is proposed. The proposed method does not need any data set, which employs the original system equations to evaluate some of the targeted elements of the Koopman matrix. The proposed method comprises combinatorics, an approximation of the resolvent, and extrapolations. Comparisons with the EDMD are performed for a noisy van der Pol system. The proposed method yields reasonable results even in cases wherein the EDMD exhibits a slow convergence behavior.
Multigrid is a powerful solver for large-scale linear systems arising from discretized partial differential equations. The convergence theory of multigrid methods for symmetric positive definite problems has been well developed over the past decades, while, for nonsymmetric problems, such theory is still not mature. As a foundation for multigrid analysis, two-grid convergence theory plays an important role in motivating multigrid algorithms. Regarding two-grid methods for nonsymmetric problems, most previous works focus on the spectral radius of iteration matrix or rely on convergence measures that are typically difficult to compute in practice. Moreover, the existing results are confined to two-grid methods with exact solution of the coarse-grid system. In this paper, we analyze the convergence of a two-grid method for nonsymmetric positive definite problems (e.g., linear systems arising from the discretizations of convection-diffusion equations). In the case of exact coarse solver, we establish an elegant identity for characterizing two-grid convergence factor, which is measured by a smoother-induced norm. The identity can be conveniently used to derive a class of optimal restriction operators and analyze how the convergence factor is influenced by restriction. More generally, we present some convergence estimates for an inexact variant of the two-grid method, in which both linear and nonlinear coarse solvers are considered.
Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at //github.com/google-research/google-research/tree/master/cluster_gcn.