We present a novel solver technique for the anisotropic heat flux equation, aimed at the high level of anisotropy seen in magnetic confinement fusion plasmas. Such problems pose two major challenges: (i) discretization accuracy and (ii) efficient implicit linear solvers. We simultaneously address each of these challenges by constructing a new finite element discretization with excellent accuracy properties, tailored to a novel solver approach based on algebraic multigrid (AMG) methods designed for advective operators. We pose the problem in a mixed formulation, introducing the heat flux as an auxiliary variable and discretizing the temperature and auxiliary fields in a discontinuous Galerkin space. The resulting block matrix system is then reordered and solved using an approach in which two advection operators are inverted using AMG solvers based on approximate ideal restriction (AIR), which is particularly efficient for upwind discontinuous Galerkin discretizations of advection. To ensure that the advection operators are non-singular, in this paper we restrict ourselves to considering open (acyclic) magnetic field lines. We demonstrate the proposed discretization's superior accuracy over other discretizations of anisotropic heat flux, achieving error $1000\times$ smaller for anisotropy ratio of $10^9$, while also demonstrating fast convergence of the proposed iterative solver in highly anisotropic regimes where other diffusion-based AMG methods fail.
In recent years there has been increased use of machine learning (ML) techniques within mathematics, including symbolic computation where it may be applied safely to optimise or select algorithms. This paper explores whether using explainable AI (XAI) techniques on such ML models can offer new insight for symbolic computation, inspiring new implementations within computer algebra systems that do not directly call upon AI tools. We present a case study on the use of ML to select the variable ordering for cylindrical algebraic decomposition. It has already been demonstrated that ML can make the choice well, but here we show how the SHAP tool for explainability can be used to inform new heuristics of a size and complexity similar to those human-designed heuristics currently commonly used in symbolic computation.
The time continuous Volterra equations valued in $\mathbb{R}$ with nonnegative resolvent kernels have two basic monotone properties. The first is that any two solution curves do not intersect with suitable given signals. The second is that the solutions to the autonomous equations are monotone. The so-called CM-preserving schemes (Comm. Math. Sci., 2021,19(5), 1301-1336) have been proposed to preserve the complete monotonicity property and thus these monotonicity properties but they are restricted to uniform meshes. In this work, through an analogue of the convolution on nonuniform meshes, we introduce the concept of ``right complementary monotone'' (R-CMM) kernels in the discrete level for nonuniform meshes, which is an analogue of the CM-preserving property but much more flexible. We prove that the discrete solutions preserve these two monotone properties if the discretized kernel satisfies R-CMM property. Technically, we highly rely on the resolvent kernels to achieve this.
While lightweight ViT framework has made tremendous progress in image super-resolution, its uni-dimensional self-attention modeling, as well as homogeneous aggregation scheme, limit its effective receptive field (ERF) to include more comprehensive interactions from both spatial and channel dimensions. To tackle these drawbacks, this work proposes two enhanced components under a new Omni-SR architecture. First, an Omni Self-Attention (OSA) block is proposed based on dense interaction principle, which can simultaneously model pixel-interaction from both spatial and channel dimensions, mining the potential correlations across omni-axis (i.e., spatial and channel). Coupling with mainstream window partitioning strategies, OSA can achieve superior performance with compelling computational budgets. Second, a multi-scale interaction scheme is proposed to mitigate sub-optimal ERF (i.e., premature saturation) in shallow models, which facilitates local propagation and meso-/global-scale interactions, rendering an omni-scale aggregation building block. Extensive experiments demonstrate that Omni-SR achieves record-high performance on lightweight super-resolution benchmarks (e.g., 26.95 dB@Urban100 $\times 4$ with only 792K parameters). Our code is available at \url{//github.com/Francis0625/Omni-SR}.
In this article, we derive fast and robust parallel-in-time preconditioned iterative methods for the all-at-once linear systems arising upon discretization of time-dependent PDEs. The discretization we employ is based on a Runge--Kutta method in time, for which the development of parallel solvers is an emerging research area in the literature of numerical methods for time-dependent PDEs. By making use of classical theory of block matrices, one is able to derive a preconditioner for the systems considered. The block structure of the preconditioner allows for parallelism in the time variable, as long as one is able to provide an optimal solver for the system of the stages of the method. We thus propose a preconditioner for the latter system based on a singular value decomposition (SVD) of the (real) Runge--Kutta matrix $A_{\mathrm{RK}} = U \Sigma V^\top$. Supposing $A_{\mathrm{RK}}$ is invertible, we prove that the spectrum of the system for the stages preconditioned by our SVD-based preconditioner is contained within the right-half of the unit circle, under suitable assumptions on the matrix $U^\top V$ (the assumptions are well posed due to the polar decomposition of $A_{\mathrm{RK}}$). We show the numerical efficiency of our SVD-based preconditioner by solving the system of the stages arising from the discretization of the heat equation and the Stokes equations, with sequential time-stepping. Finally, we provide numerical results of the all-at-once approach for both problems, showing the speed-up achieved on a parallel architecture.
Number Theoretic Transform (NTT) is an essential mathematical tool for computing polynomial multiplication in promising lattice-based cryptography. However, costly division operations and complex data dependencies make efficient and flexible hardware design to be challenging, especially on resource-constrained edge devices. Existing approaches either focus on only limited parameter settings or impose substantial hardware overhead. In this paper, we introduce a hardware-algorithm methodology to efficiently accelerate NTT in various settings using in-cache computing. By leveraging an optimized bit-parallel modular multiplication and introducing costless shift operations, our proposed solution provides up to 29x higher throughput-per-area and 2.8-100x better throughput-per-area-per-joule compared to the state-of-the-art.
We investigate discretizations of a geometrically nonlinear elastic Cosserat shell with nonplanar reference configuration originally introduced by B\^irsan, Ghiba, Martin, and Neff in 2019. The shell model includes curvature terms up to order 5 in the shell thickness, which are crucial to reliably simulate high-curvature deformations such as near-folds or creases. The original model is generalized to shells that are not homeomorphic to a subset of $\mathbb{R}^2$. For this, we replace the originally planar parameter domain by an abstract two-dimensional manifold, and verify that the hyperelastic shell energy and three-dimensional reconstruction are invariant under changes of the local coordinate systems. This general approach allows to determine the elastic response for even non-orientable surfaces like the M\"obius strip and the Klein bottle. We discretize the model with a geometric finite element method and, using that geometric finite elements are $H^1$-conforming, prove that the discrete shell model has a solution. Numerical tests then show the general performance and versatility of the model and discretization method.
Sorting is a fundamental algorithmic pre-processing technique which often allows to represent data more compactly and, at the same time, speeds up search queries on it. In this paper, we focus on the well-studied problem of sorting and indexing string sets. Since the introduction of suffix trees in 1973, dozens of suffix sorting algorithms have been described in the literature. In 2017, these techniques were extended to sets of strings described by means of finite automata: the theory of Wheeler graphs [Gagie et al., TCS'17] introduced automata whose states can be totally-sorted according to the co-lexicographic (co-lex in the following) order of the prefixes of words accepted by the automaton. More recently, in [Cotumaccio, Prezza, SODA'21] it was shown how to extend these ideas to arbitrary automata by means of partial co-lex orders. This work showed that a co-lex order of minimum width (thus optimizing search query times) on deterministic finite automata (DFAs) can be computed in $O(m^2 + n^{5/2})$ time, $m$ being the number of transitions and $n$ the number of states of the input DFA. In this paper, we exhibit new combinatorial properties of the minimum-width co-lex order of DFAs and exploit them to design faster prefix sorting algorithms. In particular, we describe two algorithms sorting arbitrary DFAs in $O(mn)$ and $O(n^2\log n)$ time, respectively, and an algorithm sorting acyclic DFAs in $O(m\log n)$ time. Within these running times, all algorithms compute also a smallest chain partition of the partial order (required to index the DFA). We present an experiment result to show that an optimized implementation of the $O(n^2\log n)$-time algorithm exhibits a nearly-linear behaviour on large deterministic pan-genomic graphs and is thus also of practical interest.
Algebraic multigrid (AMG) methods are among the most efficient solvers for linear systems of equations and they are widely used for the solution of problems stemming from the discretization of Partial Differential Equations (PDEs). The most severe limitation of AMG methods is the dependence on parameters that require to be fine-tuned. In particular, the strong threshold parameter is the most relevant since it stands at the basis of the construction of successively coarser grids needed by the AMG methods. We introduce a novel Deep Learning algorithm that minimizes the computational cost of the AMG method when used as a finite element solver. We show that our algorithm requires minimal changes to any existing code. The proposed Artificial Neural Network (ANN) tunes the value of the strong threshold parameter by interpreting the sparse matrix of the linear system as a black-and-white image and exploiting a pooling operator to transform it into a small multi-channel image. We experimentally prove that the pooling successfully reduces the computational cost of processing a large sparse matrix and preserves the features needed for the regression task at hand. We train the proposed algorithm on a large dataset containing problems with a highly heterogeneous diffusion coefficient defined in different three-dimensional geometries and discretized with unstructured grids and linear elasticity problems with a highly heterogeneous Young's modulus. When tested on problems with coefficients or geometries not present in the training dataset, our approach reduces the computational time by up to 30%.
Under-approximations of reachable sets and tubes have been receiving growing research attention due to their important roles in control synthesis and verification. Available under-approximation methods applicable to continuous-time linear systems typically assume the ability to compute transition matrices and their integrals exactly, which is not feasible in general, and/or suffer from high computational costs. In this note, we attempt to overcome these drawbacks for a class of linear time-invariant (LTI) systems, where we propose a novel method to under-approximate finite-time forward reachable sets and tubes, utilizing approximations of the matrix exponential and its integral. In particular, we consider the class of continuous-time LTI systems with an identity input matrix and initial and input values belonging to full dimensional sets that are affine transformations of closed unit balls. The proposed method yields computationally efficient under-approximations of reachable sets and tubes, when implemented using zonotopes, with first-order convergence guarantees in the sense of the Hausdorff distance. To illustrate its performance, we implement our approach in three numerical examples, where linear systems of dimensions ranging between 2 and 200 are considered.
Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.