This paper explores a family of generalized sweeping preconditionners for Helmholtz problems with non-overlapping checkerboard partition of the computational domain. The domain decomposition procedure relies on high-order transmission conditions and cross-point treatments, which cannot scale without an efficient preconditioning technique when the number of subdomains increases. With the proposed approach, existing sweeping preconditioners, such as the symmetric Gauss-Seidel and parallel double sweep preconditioners, can be applied to checkerboard partitions with different sweeping directions (e.g. horizontal and diagonal). Several directions can be combined thanks to the flexible version of GMRES, allowing for the rapid transfer of information in the different zones of the computational domain, then accelerating the convergence of the final iterative solution procedure. Several two-dimensional finite element results are proposed to study and to compare the sweeping preconditioners, and to illustrate the performance on cases of increasing complexity.
Solving the normal equations corresponding to large sparse linear least-squares problems is an important and challenging problem. For very large problems, an iterative solver is needed and, in general, a preconditioner is required to achieve good convergence. In recent years, a number of preconditioners have been proposed. These are largely serial and reported results demonstrate that none of the commonly used preconditioners for the normal equations matrix is capable of solving all sparse least-squares problems. Our interest is thus in designing new preconditioners for the normal equations that are efficient, robust, and can be implemented in parallel. Our proposed preconditioners can be constructed efficiently and algebraically without any knowledge of the problem and without any assumption on the least-squares matrix except that it is sparse. We exploit the structure of the symmetric positive definite normal equations matrix and use the concept of algebraic local symmetric positive semi-definite splittings to introduce two-level Schwarz preconditioners for least-squares problems. The condition number of the preconditioned normal equations is shown to be theoretically bounded independently of the number of subdomains in the splitting. This upper bound can be adjusted using a single parameter $\tau$ that the user can specify. We discuss how the new preconditioners can be implemented on top of the PETSc library using only 150 lines of Fortran, C, or Python code. Problems arising from practical applications are used to compare the performance of the proposed new preconditioner with that of other preconditioners.
AC optimal power flow (AC-OPF) problems need to be solved more frequently in the future to maintain stable and economic power system operation. To tackle this challenge, a deep neural network-based voltage-constrained approach (DeepOPF-V) is proposed to solve AC-OPF problems with high computational efficiency. Its unique design predicts voltages of all buses and then uses them to reconstruct the remaining variables without solving non-linear AC power flow equations. A fast post-processing process is developed to enforce the box constraints. The effectiveness of DeepOPF-V is validated by simulations on IEEE 118/300-bus systems and a 2000-bus test system. Compared with existing studies, DeepOPF-V achieves decent computation speedup up to four orders of magnitude and comparable performance in optimality gap and preserving the feasibility of the solution.
Laplacians may generate spurious eigenvalues on non-convex domains. To overcome this difficulty, we adopt a recently developed mixed method, which decomposes the biharmonic equation into three Poisson equations and still recovers the original solution. Using this idea, we design an efficient biharmonic eigenvalue algorithm, which contains only Poisson solvers. With this approach, eigenfunctions can be confined in the correct space and thereby spurious modes in non-convex domains are avoided. A priori error estimates for both eigenvalues and eigenfunctions on quasi-uniform meshes are obtained; in particular, a convergence rate of $\mathcal{O}({h}^{2\alpha})$ ($ 0<\alpha<\pi/\omega$, $\omega > \pi$ is the angle of the reentrant corner) is proved for the linear finite element. Surprisingly, numerical evidence demonstrates a $\mathcal{O}({h}^{2})$ convergent rate for the quasi-uniform mesh with the regular refinement strategy even on non-convex polygonal domains.
Recent advances show that neural networks embedded with physics-informed priors significantly outperform vanilla neural networks in learning and predicting the long term dynamics of complex physical systems from noisy data. Despite this success, there has only been a limited study on how to optimally combine physics priors to improve predictive performance. To tackle this problem we unpack and generalize recent innovations into individual inductive bias segments. As such, we are able to systematically investigate all possible combinations of inductive biases of which existing methods are a natural subset. Using this framework we introduce Variational Integrator Graph Networks - a novel method that unifies the strengths of existing approaches by combining an energy constraint, high-order symplectic variational integrators, and graph neural networks. We demonstrate, across an extensive ablation, that the proposed unifying framework outperforms existing methods, for data-efficient learning and in predictive accuracy, across both single and many-body problems studied in recent literature. We empirically show that the improvements arise because high order variational integrators combined with a potential energy constraint induce coupled learning of generalized position and momentum updates which can be formalized via the Partitioned Runge-Kutta method.
This work develops optimal preconditioners for the discrete H(curl) and H(div) problems on two and three-dimensional hypersurfaces by nodal auxiliary space preconditioning [R. Hiptmair, J. Xu: SIAM J. Numer. Anal. 45, 2483-2509 (2007)]. In particular, on unstructured triangulated surfaces, we develop fast and user-friendly preconditioners for the edge and face element discretizations of curl-curl and grad-div problems based on inverting several discrete surface Laplacians. The proposed preconditioners lead to efficient iterative methods for computing harmonic tangential vector fields on discrete surfaces. Numerical experiments on hypersurfaces are presented to test the performance of those surface preconditioners.
Recently, physics-informed neural networks (PINNs) have offered a powerful new paradigm for solving problems relating to differential equations. Compared to classical numerical methods PINNs have several advantages, for example their ability to provide mesh-free solutions of differential equations and their ability to carry out forward and inverse modelling within the same optimisation problem. Whilst promising, a key limitation to date is that PINNs have struggled to accurately and efficiently solve problems with large domains and/or multi-scale solutions, which is crucial for their real-world application. Multiple significant and related factors contribute to this issue, including the increasing complexity of the underlying PINN optimisation problem as the problem size grows and the spectral bias of neural networks. In this work we propose a new, scalable approach for solving large problems relating to differential equations called Finite Basis PINNs (FBPINNs). FBPINNs are inspired by classical finite element methods, where the solution of the differential equation is expressed as the sum of a finite set of basis functions with compact support. In FBPINNs neural networks are used to learn these basis functions, which are defined over small, overlapping subdomains. FBINNs are designed to address the spectral bias of neural networks by using separate input normalisation over each subdomain, and reduce the complexity of the underlying optimisation problem by using many smaller neural networks in a parallel divide-and-conquer approach. Our numerical experiments show that FBPINNs are effective in solving both small and larger, multi-scale problems, outperforming standard PINNs in both accuracy and computational resources required, potentially paving the way to the application of PINNs on large, real-world problems.
Linearized polynomials have attracted a lot of attention because of their applications in both geometric and algebraic areas. Let $q$ be a prime power, $n$ be a positive integer and $\sigma$ be a generator of $\mathrm{Gal}(\mathbb{F}_{q^n}\colon\mathbb{F}_q)$. In this paper we provide closed formulas for the coefficients of a $\sigma$-trinomial $f$ over $\mathbb{F}_{q^n}$ which ensure that the dimension of the kernel of $f$ equals its $\sigma$-degree, that is linearized polynomials with maximum kernel. As a consequence, we present explicit examples of linearized trinomials with maximum kernel and characterize those having $\sigma$-degree $3$ and $4$. Our techniques rely on the tools developed in [24]. Finally, we apply these results to investigate a class of rank metric codes introduced in [8], to construct quasi-subfield polynomials and cyclic subspace codes, obtaining new explicit constructions to the conjecture posed in [37].
We present the class of Hida-Mat\'ern kernels, which is the canonical family of covariance functions over the entire space of stationary Gauss-Markov Processes. It extends upon Mat\'ern kernels, by allowing for flexible construction of priors over processes with oscillatory components. Any stationary kernel, including the widely used squared-exponential and spectral mixture kernels, are either directly within this class or are appropriate asymptotic limits, demonstrating the generality of this class. Taking advantage of its Markovian nature we show how to represent such processes as state space models using only the kernel and its derivatives. In turn this allows us to perform Gaussian Process inference more efficiently and side step the usual computational burdens. We also show how exploiting special properties of the state space representation enables improved numerical stability in addition to further reductions of computational complexity.
Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other. This makes integrating eigendecomposition into deep networks difficult and often results in poor convergence, particularly when dealing with large matrices. While this can be mitigated by partitioning the data into small arbitrary groups, doing so has no theoretical basis and makes it impossible to exploit the full power of eigendecomposition. In previous work, we mitigated this using SVD during the forward pass and PI to compute the gradients during the backward pass. However, the iterative deflation procedure required to compute multiple eigenvectors using PI tends to accumulate errors and yield inaccurate gradients. Here, we show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients. We demonstrate the benefits of this increased accuracy for image classification and style transfer.
Latent Dirichlet Allocation(LDA) is a popular topic model. Given the fact that the input corpus of LDA algorithms consists of millions to billions of tokens, the LDA training process is very time-consuming, which may prevent the usage of LDA in many scenarios, e.g., online service. GPUs have benefited modern machine learning algorithms and big data analysis as they can provide high memory bandwidth and computation power. Therefore, many frameworks, e.g. Ten- sorFlow, Caffe, CNTK, support to use GPUs for accelerating the popular machine learning data-intensive algorithms. However, we observe that LDA solutions on GPUs are not satisfying. In this paper, we present CuLDA_CGS, a GPU-based efficient and scalable approach to accelerate large-scale LDA problems. CuLDA_CGS is designed to efficiently solve LDA problems at high throughput. To it, we first delicately design workload partition and synchronization mechanism to exploit the benefits of mul- tiple GPUs. Then, we offload the LDA sampling process to each individual GPU by optimizing from the sampling algorithm, par- allelization, and data compression perspectives. Evaluations show that compared with state-of-the-art LDA solutions, CuLDA_CGS outperforms them by a large margin (up to 7.3X) on a single GPU. CuLDA_CGS is able to achieve extra 3.0X speedup on 4 GPUs. The source code is publicly available on //github.com/cuMF/ CuLDA_CGS.