The Poisson pressure solve resulting from the spectral element discretization of the incompressible Navier-Stokes equation requires fast, robust, and scalable preconditioning. In the current work, a parallel scaling study of Chebyshev-accelerated Schwarz and Jacobi preconditioning schemes is presented, with special focus on GPU architectures, such as OLCF's Summit. Convergence properties of the Chebyshev-accelerated schemes are compared with alternative methods, such as low-order preconditioners combined with algebraic multigrid. Performance and scalability results are presented for a variety of preconditioner and solver settings. The authors demonstrate that Chebyshev-accelerated-Schwarz methods provide a robust and effective smoothing strategy when using $p$-multigrid as a preconditioner in a Krylov-subspace projector. At the same time, optimal preconditioning parameters can vary for different geometries, problem sizes, and processor counts. This variance motivates the development of an autotuner to optimize solver parameters on-line, during the course of production simulations.
We consider here a cell-centered finite difference approximation of the Richards equation in three dimensions, averaging for interface values the hydraulic conductivity $K=K(p)$, a highly nonlinear function, by arithmetic, upstream, and harmonic means. The nonlinearities in the equation can lead to changes in soil conductivity over several orders of magnitude and discretizations with respect to space variables often produce stiff systems of differential equations. Fully implicit time discretization is provided by backward Euler one-step formula; the resulting nonlinear algebraic system is solved by an inexact Newton Armijo-Goldstein algorithm, requiring the solution of a sequence of linear systems involving Jacobian matrices. We prove some new results concerning the distribution of the Jacobians eigenvalues and the explicit expression of their entries. Moreover, we explore some connections between the saturation of the soil and the ill-conditioning of the Jacobians. The information on eigenvalues justifies the effectiveness of some preconditioner approaches which are widely used in the solution of the Richards equation. We propose a new software framework to experiment with scalable and robust preconditioners suitable for efficient parallel simulations at very large scales. Performance results on a literature test case show that our framework is very promising in the advance towards realistic simulations at extreme scale.
We provide a global convergence proof of the recently proposed sequential homotopy method with an inexact Krylov--semismooth-Newton method employed as a local solver. The resulting method constitutes an active-set method in function space. After discretization, it allows for efficient application of Krylov-subspace methods. For a certain class of optimal control problems with PDE constraints, in which the control enters the Lagrangian only linearly, we propose and analyze an efficient, parallelizable, symmetric positive definite preconditioner based on a double Schur complement approach. We conclude with numerical results for a badly conditioned and highly nonlinear benchmark optimization problem with elliptic partial differential equations and control bounds. The resulting method is faster than using direct linear algebra for the 2D benchmark and allows for the parallel solution of large 3D problems.
Performing exact Bayesian inference for complex models is computationally intractable. Markov chain Monte Carlo (MCMC) algorithms can provide reliable approximations of the posterior distribution but are expensive for large datasets and high-dimensional models. A standard approach to mitigate this complexity consists in using subsampling techniques or distributing the data across a cluster. However, these approaches are typically unreliable in high-dimensional scenarios. We focus here on a recent alternative class of MCMC schemes exploiting a splitting strategy akin to the one used by the celebrated alternating direction of multipliers (ADMM) optimization algorithm. These methods appear to provide empirically state-of-the-art performance but their theoretical behavior in high dimension is currently unknown. In this paper, we propose a detailed theoretical study of one of these algorithms known as the split Gibbs sampler. Under regularity conditions, we establish explicit convergence rates for this scheme using Ricci curvature and coupling ideas. We support our theory with numerical illustrations.
Motivated by a wide range of real-world problems whose solutions exhibit boundary and interior layers, the numerical analysis of discretizations of singularly perturbed differential equations is an established sub-discipline within the study of the numerical approximation of solutions to differential equations. Consequently, much is known about how to accurately and stably discretize such equations on \textit{a priori} adapted meshes, in order to properly resolve the layer structure present in their continuum solutions. However, despite being a key step in the numerical simulation process, much less is known about the efficient and accurate solution of the linear systems of equations corresponding to these discretizations. In this paper, we discuss problems associated with the application of direct solvers to these discretizations, and we propose a preconditioning strategy that is tuned to the matrix structure induced by using layer-adapted meshes for convection-diffusion equations, proving a strong condition-number bound on the preconditioned system in one spatial dimension, and a weaker bound in two spatial dimensions. Numerical results confirm the efficiency of the resulting preconditioners in one and two dimensions, with time-to-solution of less than one second for representative problems on $1024\times 1024$ meshes and up to $40\times$ speedup over standard sparse direct solvers.
This work is devoted to design and study efficient and accurate numerical schemes to approximate a chemo-attraction model with consumption effects, which is a nonlinear parabolic system for two variables; the cell density and the concentration of the chemical signal that the cell feel attracted to. We present several finite element schemes to approximate the system, detailing the main properties of each of them, such as conservation of cells, energy-stability and approximated positivity. Moreover, we carry out several numerical simulations to study the efficiency of each of the schemes and to compare them with others classical schemes.
We propose a state redistribution method for high order discontinuous Galerkin methods on curvilinear embedded boundary grids. State redistribution relaxes the overly restrictive CFL condition that results from arbitrarily small cut cells and explicit time stepping. Thus, the scheme can take time steps that are proportional to the size of cells in the background grid. The discontinuous Galerkin scheme is stabilized by postprocessing the numerical solution after each stage or step of an explicit time stepping method. This is done by temporarily merging the small cells into larger, possibly overlapping neighborhoods using a special weighted inner product. Then, the numerical solution on the neighborhoods is returned to the base grid in a conservative fashion. The advantage of this approach is that it uses only basic mesh information that is already available in many cut cell codes and does not require complex geometric manipulations. Finally, we present a number of test problems that demonstrate the encouraging potential of this technique for applications on curvilinear embedded geometries. Numerical experiments reveal that our scheme converges with order $p+1$ in $L_1$ and between $p$ and $p+1$ in $L_\infty$ for problems with smooth solutions. We also demonstrate that state redistribution is capable of capturing shocks.
Linear kinetic transport equations play a critical role in optical tomography, radiative transfer and neutron transport. The fundamental difficulty hampering their efficient and accurate numerical resolution lies in the high dimensionality of the physical and velocity/angular variables and the fact that the problem is multiscale in nature. Leveraging the existence of a hidden low-rank structure hinted by the diffusive limit, in this work, we design and test the angular-space reduced order model for the linear radiative transfer equation, the first such effort based on the celebrated reduced basis method (RBM). Our method is built upon a high-fidelity solver employing the discrete ordinates method in the angular space, an asymptotic preserving upwind discontinuous Galerkin method for the physical space, and an efficient synthetic accelerated source iteration for the resulting linear system. Addressing the challenge of the parameter values (or angular directions) being coupled through an integration operator, the first novel ingredient of our method is an iterative procedure where the macroscopic density is constructed from the RBM snapshots, treated explicitly and allowing a transport sweep, and then updated afterwards. A greedy algorithm can then proceed to adaptively select the representative samples in the angular space and form a surrogate solution space. The second novelty is a least-squares density reconstruction strategy, at each of the relevant physical locations, enabling the robust and accurate integration over an arbitrarily unstructured set of angular samples toward the macroscopic density. Numerical experiments indicate that our method is effective for computational cost reduction in a variety of regimes.
We introduce a new class of preconditioners to enable flexible GMRES to find a least-squares solution, and potentially the pseudoinverse solution, of large-scale sparse, asymmetric, singular, and potentially inconsistent systems. We develop the preconditioners based on a new observation that generalized inverses (i.e., $\boldsymbol{A}^{g}\in\{\boldsymbol{G}\mid\boldsymbol{A}\boldsymbol{G}\boldsymbol{A}=\boldsymbol{A}\}$) enable the preconditioned Krylov subspaces to converge in a single step. We then compute an approximate generalized inverse (AGI) efficiently using a hybrid incomplete factorization (HIF), which combines multilevel incomplete LU with rank-revealing QR on its final Schur complement. We define the criteria of $\epsilon$-accuracy and stability of AGI to guarantee the convergence of preconditioned GMRES for consistent systems. For inconsistent systems, we fortify HIF with iterative refinement to obtain HIFIR, which allows accurate computations of the null-space vectors. By combining the two techniques, we then obtain a new solver, called PIPIT, for obtaining the pseudoinverse solutions for systems with low-dimensional null spaces. We demonstrate the robustness of HIF and HIFIR and show that they improve both accuracy and efficiency of the prior state of the art by orders of magnitude for systems with up to a million unknowns.
We consider a least-squares variational kernel-based method for numerical solution of second order elliptic partial differential equations on a multi-dimensional domain. In this setting it is not assumed that the differential operator is self-adjoint or positive definite as it should be in the Rayleigh-Ritz setting. However, the new scheme leads to a symmetric and positive definite algebraic system of equations. Moreover, the resulting method does not rely on certain subspaces satisfying the boundary conditions. The trial space for discretization is provided via standard kernels that reproduce the Sobolev spaces as their native spaces. The error analysis of the method is given, but it is partly subjected to an inverse inequality on the boundary which is still an open problem. The condition number of the final linear system is approximated in terms of the smoothness of the kernel and the discretization quality. Finally, the results of some computational experiments support the theoretical error bounds.
Network embedding aims to learn low-dimensional representations of nodes in a network, while the network structure and inherent properties are preserved. It has attracted tremendous attention recently due to significant progress in downstream network learning tasks, such as node classification, link prediction, and visualization. However, most existing network embedding methods suffer from the expensive computations due to the large volume of networks. In this paper, we propose a $10\times \sim 100\times$ faster network embedding method, called Progle, by elegantly utilizing the sparsity property of online networks and spectral analysis. In Progle, we first construct a \textit{sparse} proximity matrix and train the network embedding efficiently via sparse matrix decomposition. Then we introduce a network propagation pattern via spectral analysis to incorporate local and global structure information into the embedding. Besides, this model can be generalized to integrate network information into other insufficiently trained embeddings at speed. Benefiting from sparse spectral network embedding, our experiment on four different datasets shows that Progle outperforms or is comparable to state-of-the-art unsupervised comparison approaches---DeepWalk, LINE, node2vec, GraRep, and HOPE, regarding accuracy, while is $10\times$ faster than the fastest word2vec-based method. Finally, we validate the scalability of Progle both in real large-scale networks and multiple scales of synthetic networks.