In this paper, we develop a high order residual distribution (RD) method for solving steady state conservation laws in a novel Hermite weighted essentially non-oscillatory (HWENO) framework recently developed in [24]. In particular, we design a high order HWENO integration for the integrals of source term and fluxes based on the point value of the solution and its spatial derivatives, and the principles of residual distribution schemes are adapted to obtain steady state solutions. Two advantages of the novel HWENO framework have been shown in [24]: first, compared with the traditional HWENO framework, the proposed method does not need to introduce additional auxiliary equations to update the derivatives of the unknown variable, and just compute them from the current point value of the solution and its old spatial derivatives, which saves the computational storage and CPU time, and thereby improve the computational efficiency of the traditional HWENO framework. Second, compared with the traditional WENO method, reconstruction stencil of the HWENO methods becomes more compact, their boundary treatment is simpler, and the numerical errors are smaller at the same grid. Thus, it is also a compact scheme when we design the higher order accuracy, compared with that in [11] Chou and Shu proposed. Extensive numerical experiments for one- and two-dimensional scalar and systems problems confirm the high order accuracy and good quality of our scheme.
This paper focuses on stochastic saddle point problems with decision-dependent distributions in both the static and time-varying settings. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -- which are saddle points for the stationary stochastic minimax problem that they induce -- and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two classes of solutions is bounded provided that the objective has a strongly-convex-strongly-concave payoff and Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration; moreover, we show convergence to a neighborhood in expectation and almost surely. Finally, we investigate a condition on the distributional map -- which we call opposing mixture dominance -- that ensures the objective is strongly-convex-strongly-concave. Under this assumption, we show that primal-dual algorithms converge to the saddle points in a similar fashion.
We establish summability results for coefficient sequences of Wiener-Hermite polynomial chaos expansions for countably-parametric solutions of linear elliptic and parabolic divergence-form partial differential equations with Gaussian random field inputs. The novel proof technique developed here is based on analytic continuation of parametric solutions into the complex domain. It differs from previous works that used bootstrap arguments and induction on the differentiation order of solution derivatives with respect to the parameters. The present holomorphy-based argument allows a unified, "differentiation-free" sparsity analysis of Wiener-Hermite polynomial chaos expansions in various scales of function spaces. The analysis also implies corresponding results for posterior densities in Bayesian inverse problems subject to Gaussian priors on uncertain inputs from function spaces. Our results furthermore yield dimension-independent convergence rates of various constructive high-dimensional deterministic numerical approximation schemes such as single-level and multi-level versions of anisotropic sparse-grid Hermite-Smolyak interpolation and quadrature in both forward and inverse computational uncertainty quantification.
This contribution focuses on the development of Model Order Reduction (MOR) for one-way coupled steady state linear thermomechanical problems in a finite element setting. We apply Proper Orthogonal Decomposition (POD) for the computation of reduced basis space. On the other hand, for the evaluation of the modal coefficients, we use two different methodologies: the one based on the Galerkin projection (G) and the other one based on Artificial Neural Network (ANN). We aim at comparing POD-G and POD-ANN in terms of relevant features including errors and computational efficiency. In this context, both physical and geometrical parametrization are considered. We also carry out a validation of the Full Order Model (FOM) based on customized benchmarks in order to provide a complete computational pipeline. The framework proposed is applied to a relevant industrial problem related to the investigation of thermomechanical phenomena arising in blast furnace hearth walls. Keywords: Thermomechanical problems, Finite element method, Proper orthogonal decomposition, Galerkin projection, Artificial neural network, Geometric and physical parametrization, Blast furnace.
For the general class of residual distribution (RD) schemes, including many finite element (such as continuous/discontinuous Galerkin) and flux reconstruction methods, an approach to construct entropy conservative/ dissipative semidiscretizations by adding suitable correction terms has been proposed by Abgrall (J.~Comp.~Phys. 372: pp. 640--666, 2018). In this work, the correction terms are characterized as solutions of certain optimization problems and are adapted to the SBP-SAT framework, focusing on discontinuous Galerkin methods. Novel generalizations to entropy inequalities, multiple constraints, and kinetic energy preservation for the Euler equations are developed and tested in numerical experiments. For all of these optimization problems, explicit solutions are provided. Additionally, the correction approach is applied for the first time to obtain a fully discrete entropy conservative/dissipative RD scheme. Here, the application of the deferred correction (DeC) method for the time integration is essential. This paper can be seen as describing a systematic method to construct structure preserving discretization, at least for the considered example.
The goal of this paper is to reduce the total complexity of gradient-based methods for two classes of problems: affine-constrained composite convex optimization and bilinear saddle-point structured non-smooth convex optimization. Our technique is based on a double-loop inexact accelerated proximal gradient (APG) method for minimizing the summation of a non-smooth but proximable convex function and two smooth convex functions with different smoothness constants and computational costs. Compared to the standard APG method, the inexact APG method can reduce the total computation cost if one smooth component has higher computational cost but a smaller smoothness constant than the other. With this property, the inexact APG method can be applied to approximately solve the subproblems of a proximal augmented Lagrangian method for affine-constrained composite convex optimization and the smooth approximation for bilinear saddle-point structured non-smooth convex optimization, where the smooth function with a smaller smoothness constant has significantly higher computational cost. Thus it can reduce total complexity for finding an approximately optimal/stationary solution. This technique is similar to the gradient sliding technique in the literature. The difference is that our inexact APG method can efficiently stop the inner loop by using a computable condition based on a measure of stationarity violation, while the gradient sliding methods need to pre-specify the number of iterations for the inner loop. Numerical experiments demonstrate significantly higher efficiency of our methods over an optimal primal-dual first-order method and the gradient sliding methods.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. Conventional methods assume either the known heterogeneity of training data (e.g. domain labels) or the approximately equal capacities of different domains. In this paper, we consider a more challenging case where neither of the above assumptions holds. We propose to address this problem by removing the dependencies between features via learning weights for training samples, which helps deep models get rid of spurious correlations and, in turn, concentrate more on the true connection between discriminative features and labels. Extensive experiments clearly demonstrate the effectiveness of our method on multiple distribution generalization benchmarks compared with state-of-the-art counterparts. Through extensive experiments on distribution generalization benchmarks including PACS, VLCS, MNIST-M, and NICO, we show the effectiveness of our method compared with state-of-the-art counterparts.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
The Normalized Cut (NCut) objective function, widely used in data clustering and image segmentation, quantifies the cost of graph partitioning in a way that biases clusters or segments that are balanced towards having lower values than unbalanced partitionings. However, this bias is so strong that it avoids any singleton partitions, even when vertices are very weakly connected to the rest of the graph. Motivated by the B\"uhler-Hein family of balanced cut costs, we propose the family of Compassionately Conservative Balanced (CCB) Cut costs, which are indexed by a parameter that can be used to strike a compromise between the desire to avoid too many singleton partitions and the notion that all partitions should be balanced. We show that CCB-Cut minimization can be relaxed into an orthogonally constrained $\ell_{\tau}$-minimization problem that coincides with the problem of computing Piecewise Flat Embeddings (PFE) for one particular index value, and we present an algorithm for solving the relaxed problem by iteratively minimizing a sequence of reweighted Rayleigh quotients (IRRQ). Using images from the BSDS500 database, we show that image segmentation based on CCB-Cut minimization provides better accuracy with respect to ground truth and greater variability in region size than NCut-based image segmentation.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.