Efficient multiple precision linear numerical computation libraries such as MPLAPACK are critical in dealing with ill-conditioned problems. Specifically, there are optimization methods for matrix multiplication, such as the Strassen algorithm and the Ozaki scheme, which can be used to speed up computation. For complex matrix multiplication, the 3M method can also be used, which requires only three multiplications of real matrices, instead of the 4M method, which requires four multiplications of real matrices. In this study, we extend these optimization methods to arbitrary precision complex matrix multiplication and verify the possible increase in computation speed through benchmark tests. The optimization methods are also applied to complex LU decomposition using matrix multiplication to demonstrate that the Ozaki scheme can be used to achieve higher computation speeds.
Equilibrium propagation (EP) is a compelling alternative to the backpropagation of error algorithm (BP) for computing gradients of neural networks on biological or analog neuromorphic substrates. Still, the algorithm requires weight symmetry and infinitesimal equilibrium perturbations, i.e., nudges, to estimate unbiased gradients efficiently. Both requirements are challenging to implement in physical systems. Yet, whether and how weight asymmetry affects its applicability is unknown because, in practice, it may be masked by biases introduced through the finite nudge. To address this question, we study generalized EP, which can be formulated without weight symmetry, and analytically isolate the two sources of bias. For complex-differentiable non-symmetric networks, we show that the finite nudge does not pose a problem, as exact derivatives can still be estimated via a Cauchy integral. In contrast, weight asymmetry introduces bias resulting in low task performance due to poor alignment of EP's neuronal error vectors compared to BP. To mitigate this issue, we present a new homeostatic objective that directly penalizes functional asymmetries of the Jacobian at the network's fixed point. This homeostatic objective dramatically improves the network's ability to solve complex tasks such as ImageNet 32x32. Our results lay the theoretical groundwork for studying and mitigating the adverse effects of imperfections of physical networks on learning algorithms that rely on the substrate's relaxation dynamics.
Stochastic optimization methods have been hugely successful in making large-scale optimization problems feasible when computing the full gradient is computationally prohibitive. Using the theory of modified equations for numerical integrators, we propose a class of stochastic differential equations that approximate the dynamics of general stochastic optimization methods more closely than the original gradient flow. Analyzing a modified stochastic differential equation can reveal qualitative insights about the associated optimization method. Here, we study mean-square stability of the modified equation in the case of stochastic coordinate descent.
Bayesian binary regression is a prosperous area of research due to the computational challenges encountered by currently available methods either for high-dimensional settings or large datasets, or both. In the present work, we focus on the expectation propagation (EP) approximation of the posterior distribution in Bayesian probit regression under a multivariate Gaussian prior distribution. Adapting more general derivations in Anceschi et al. (2023), we show how to leverage results on the extended multivariate skew-normal distribution to derive an efficient implementation of the EP routine having a per-iteration cost that scales linearly in the number of covariates. This makes EP computationally feasible also in challenging high-dimensional settings, as shown in a detailed simulation study.
Recently, a stability theory has been developed to study the linear stability of modified Patankar--Runge--Kutta (MPRK) schemes. This stability theory provides sufficient conditions for a fixed point of an MPRK scheme to be stable as well as for the convergence of an MPRK scheme towards the steady state of the corresponding initial value problem, whereas the main assumption is that the initial value is sufficiently close to the steady state. Initially, numerical experiments in several publications indicated that these linear stability properties are not only local, but even global, as is the case for general linear methods. Recently, however, it was discovered that the linear stability of the MPDeC(8) scheme is indeed only local in nature. Our conjecture is that this is a result of negative Runge--Kutta (RK) parameters of MPDeC(8) and that linear stability is indeed global, if the RK parameters are nonnegative. To support this conjecture, we examine the family of MPRK22($\alpha$) methods with negative RK parameters and show that even among these methods there are methods for which the stability properties are only local. However, this local linear stability is not observed for MPRK22($\alpha$) schemes with nonnegative Runge-Kutta parameters.
Applying parallel-in-time algorithms to multiscale Hamiltonian systems to obtain stable long time simulations is very challenging. In this paper, we present novel data-driven methods aimed at improving the standard parareal algorithm developed by Lion, Maday, and Turinici in 2001, for multiscale Hamiltonian systems. The first method involves constructing a correction operator to improve a given inaccurate coarse solver through solving a Procrustes problem using data collected online along parareal trajectories. The second method involves constructing an efficient, high-fidelity solver by a neural network trained with offline generated data. For the second method, we address the issues of effective data generation and proper loss function design based on the Hamiltonian function. We show proof-of-concept by applying the proposed methods to a Fermi-Pasta-Ulum (FPU) problem. The numerical results demonstrate that the Procrustes parareal method is able to produce solutions that are more stable in energy compared to the standard parareal. The neural network solver can achieve comparable or better runtime performance compared to numerical solvers of similar accuracy. When combined with the standard parareal algorithm, the improved neural network solutions are slightly more stable in energy than the improved numerical coarse solutions.
We consider the minimal thermodynamic cost of an individual computation, where a single input $x$ is mapped to a single output $y$. In prior work, Zurek proposed that this cost was given by $K(x\vert y)$, the conditional Kolmogorov complexity of $x$ given $y$ (up to an additive constant which does not depend on $x$ or $y$). However, this result was derived from an informal argument, applied only to deterministic computations, and had an arbitrary dependence on the choice of protocol (via the additive constant). Here we use stochastic thermodynamics to derive a generalized version of Zurek's bound from a rigorous Hamiltonian formulation. Our bound applies to all quantum and classical processes, whether noisy or deterministic, and it explicitly captures the dependence on the protocol. We show that $K(x\vert y)$ is a minimal cost of mapping $x$ to $y$ that must be paid using some combination of heat, noise, and protocol complexity, implying a tradeoff between these three resources. Our result is a kind of "algorithmic fluctuation theorem" with implications for the relationship between the Second Law and the Physical Church-Turing thesis.
Determining the number of factors in high-dimensional factor modeling is essential but challenging, especially when the data are heavy-tailed. In this paper, we introduce a new estimator based on the spectral properties of Spearman sample correlation matrix under the high-dimensional setting, where both dimension and sample size tend to infinity proportionally. Our estimator is robust against heavy tails in either the common factors or idiosyncratic errors. The consistency of our estimator is established under mild conditions. Numerical experiments demonstrate the superiority of our estimator compared to existing methods.
The proximal Galerkin finite element method is a high-order, low iteration complexity, nonlinear numerical method that preserves the geometric and algebraic structure of bound constraints in infinite-dimensional function spaces. This paper introduces the proximal Galerkin method and applies it to solve free boundary problems, enforce discrete maximum principles, and develop scalable, mesh-independent algorithms for optimal design. The paper leads to a derivation of the latent variable proximal point (LVPP) algorithm: an unconditionally stable alternative to the interior point method. LVPP is an infinite-dimensional optimization algorithm that may be viewed as having an adaptive barrier function that is updated with a new informative prior at each (outer loop) optimization iteration. One of the main benefits of this algorithm is witnessed when analyzing the classical obstacle problem. Therein, we find that the original variational inequality can be replaced by a sequence of semilinear partial differential equations (PDEs) that are readily discretized and solved with, e.g., high-order finite elements. Throughout this work, we arrive at several unexpected contributions that may be of independent interest. These include (1) a semilinear PDE we refer to as the entropic Poisson equation; (2) an algebraic/geometric connection between high-order positivity-preserving discretizations and certain infinite-dimensional Lie groups; and (3) a gradient-based, bound-preserving algorithm for two-field density-based topology optimization. The complete latent variable proximal Galerkin methodology combines ideas from nonlinear programming, functional analysis, tropical algebra, and differential geometry and can potentially lead to new synergies among these areas as well as within variational and numerical analysis.
We introduce new control-volume finite-element discretization schemes suitable for solving the Stokes problem. Within a common framework, we present different approaches for constructing such schemes. The first and most established strategy employs a non-overlapping partitioning into control volumes. The second represents a new idea by splitting into two sets of control volumes, the first set yielding a partition of the domain and the second containing the remaining overlapping control volumes required for stability. The third represents a hybrid approach where finite volumes are combined with finite elements based on a hierarchical splitting of the ansatz space. All approaches are based on typical finite element function spaces but yield locally mass and momentum conservative discretization schemes that can be interpreted as finite volume schemes. We apply all strategies to the inf-sub stable MINI finite-element pair. Various test cases, including convergence tests and the numerical observation of the boundedness of the number of preconditioned Krylov solver iterations, as well as more complex scenarios of flow around obstacles or through a three-dimensional vessel bifurcation, demonstrate the stability and robustness of the schemes.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.