We propose a new framework to design and analyze accelerated methods that solve general monotone equation (ME) problems $F(x)=0$. Traditional approaches include generalized steepest descent methods and inexact Newton-type methods. If $F$ is uniformly monotone and twice differentiable, these methods achieve local convergence rates while the latter methods are globally convergent thanks to line search and hyperplane projection. However, a global rate is unknown for these methods. The variational inequality methods can be applied to yield a global rate that is expressed in terms of $\|F(x)\|$ but these results are restricted to first-order methods and a Lipschitz continuous operator. It has not been clear how to obtain global acceleration using high-order Lipschitz continuity. This paper takes a continuous-time perspective where accelerated methods are viewed as the discretization of dynamical systems. Our contribution is to propose accelerated rescaled gradient systems and prove that they are equivalent to closed-loop control systems. Based on this connection, we establish the properties of solution trajectories. Moreover, we provide a unified algorithmic framework obtained from discretization of our system, which together with two approximation subroutines yields both existing high-order methods and new first-order methods. We prove that the $p^{th}$-order method achieves a global rate of $O(k^{-p/2})$ in terms of $\|F(x)\|$ if $F$ is $p^{th}$-order Lipschitz continuous and the first-order method achieves the same rate if $F$ is $p^{th}$-order strongly Lipschitz continuous. If $F$ is strongly monotone, the restarted versions achieve local convergence with order $p$ when $p \geq 2$. Our discrete-time analysis is largely motivated by the continuous-time analysis and demonstrates the fundamental role that rescaled gradients play in global acceleration for solving ME problems.
Simulation-based inference (SBI) is constantly in search of more expressive algorithms for accurately inferring the parameters of complex models from noisy data. We present consistency models for neural posterior estimation (CMPE), a new free-form conditional sampler for scalable, fast, and amortized SBI with generative neural networks. CMPE combines the advantages of normalizing flows and flow matching methods into a single generative architecture: It essentially distills a continuous probability flow and enables rapid few-shot inference with an unconstrained architecture that can be tailored to the structure of the estimation problem. Our empirical evaluation demonstrates that CMPE not only outperforms current state-of-the-art algorithms on three hard low-dimensional problems but also achieves competitive performance in a high-dimensional Bayesian denoising experiment and in estimating a computationally demanding multi-scale model of tumor spheroid growth.
The disjoint paths logic, FOL+DP, is an extension of First-Order Logic (FOL) with the extra atomic predicate $\mathsf{dp}_k(x_1,y_1,\ldots,x_k,y_k),$ expressing the existence of internally vertex-disjoint paths between $x_i$ and $y_i,$ for $i\in\{1,\ldots, k\}$. This logic can express a wide variety of problems that escape the expressibility potential of FOL. We prove that for every proper minor-closed graph class, model-checking for FOL+DP can be done in quadratic time. We also introduce an extension of FOL+DP, namely the scattered disjoint paths logic, FOL+SDP, where we further consider the atomic predicate $s{\sf -sdp}_k(x_1,y_1,\ldots,x_k,y_k),$ demanding that the disjoint paths are within distance bigger than some fixed value $s$. Using the same technique we prove that model-checking for FOL+SDP can be done in quadratic time on classes of graphs with bounded Euler genus.
Uncertainty quantification (UQ) to detect samples with large expected errors (outliers) is applied to reactive molecular potential energy surfaces (PESs). Three methods - Ensembles, Deep Evidential Regression (DER), and Gaussian Mixture Models (GMM) - were applied to the H-transfer reaction between ${\it syn-}$Criegee and vinyl hydroxyperoxide. The results indicate that ensemble models provide the best results for detecting outliers, followed by GMM. For example, from a pool of 1000 structures with the largest uncertainty, the detection quality for outliers is $\sim 90$ \% and $\sim 50$ \%, respectively, if 25 or 1000 structures with large errors are sought. On the contrary, the limitations of the statistical assumptions of DER greatly impacted its prediction capabilities. Finally, a structure-based indicator was found to be correlated with large average error, which may help to rapidly classify new structures into those that provide an advantage for refining the neural network.
We propose a variational autoencoder (VAE)-based model for building forward and inverse structure-property linkages, a problem of paramount importance in computational materials science. Our model systematically combines VAE with regression, linking the two models through a two-level prior conditioned on the regression variables. The regression loss is optimized jointly with the reconstruction loss of the variational autoencoder, learning microstructure features relevant for property prediction and reconstruction. The resultant model can be used for both forward and inverse prediction i.e., for predicting the properties of a given microstructure as well as for predicting the microstructure required to obtain given properties. Since the inverse problem is ill-posed (one-to-many), we derive the objective function using a multi-modal Gaussian mixture prior enabling the model to infer multiple microstructures for a target set of properties. We show that for forward prediction, our model is as accurate as state-of-the-art forward-only models. Additionally, our method enables direct inverse inference. We show that the microstructures inferred using our model achieve desired properties reasonably accurately, avoiding the need for expensive optimization loops.
Mesh degeneration is a bottleneck for fluid-structure interaction (FSI) simulations and for shape optimization via the method of mappings. In both cases, an appropriate mesh motion technique is required. The choice is typically based on heuristics, e.g., the solution operators of partial differential equations (PDE), such as the Laplace or biharmonic equation. Especially the latter, which shows good numerical performance for large displacements, is expensive. Moreover, from a continuous perspective, choosing the mesh motion technique is to a certain extent arbitrary and has no influence on the physically relevant quantities. Therefore, we consider approaches inspired by machine learning. We present a hybrid PDE-NN approach, where the neural network (NN) serves as parameterization of a coefficient in a second order nonlinear PDE. We ensure existence of solutions for the nonlinear PDE by the choice of the neural network architecture. Moreover, we present an approach where a neural network corrects the harmonic extension such that the boundary displacement is not changed. In order to avoid technical difficulties in coupling finite element and machine learning software, we work with a splitting of the monolithic FSI system into three smaller subsystems. This allows to solve the mesh motion equation in a separate step. We assess the quality of the learned mesh motion technique by applying it to a FSI benchmark problem. In addition, we discuss generalizability and computational cost of the learned mesh motion operators.
We present a stochastic method for efficiently computing the solution of time-fractional partial differential equations (fPDEs) that model anomalous diffusion problems of the subdiffusive type. After discretizing the fPDE in space, the ensuing system of fractional linear equations is solved resorting to a Monte Carlo evaluation of the corresponding Mittag-Leffler matrix function. This is accomplished through the approximation of the expected value of a suitable multiplicative functional of a stochastic process, which consists of a Markov chain whose sojourn times in every state are Mittag-Leffler distributed. The resulting algorithm is able to calculate the solution at conveniently chosen points in the domain with high efficiency. In addition, we present how to generalize this algorithm in order to compute the complete solution. For several large-scale numerical problems, our method showed remarkable performance in both shared-memory and distributed-memory systems, achieving nearly perfect scalability up to 16,384 CPU cores.
We consider the two categories of termination problems of quantum programs with nondeterminism: 1) Is an input of a program terminating with probability one under all schedulers? If not, how can a scheduler be synthesized to evidence the nontermination? 2) Are all inputs terminating with probability one under their respective schedulers? If yes, a further question asks whether there is a scheduler that forces all inputs to be terminating with probability one together with how to synthesize it; otherwise, how can an input be provided to refute the universal termination? For the effective verification of the first category, we over-approximate the reachable set of quantum program states by the reachable subspace, whose algebraic structure is a linear space. On the other hand, we study the set of divergent states from which the program terminates with probability zero under some scheduler. The divergent set has an explicit algebraic structure. Exploiting them, we address the decision problem by a necessary and sufficient condition, i.e. the disjointness of the reachable subspace and the divergent set. Furthermore, the scheduler synthesis is completed in exponential time. For the second category, we reduce the decision problem to the existence of invariant subspace, from which the program terminates with probability zero under all schedulers. The invariant subspace is characterized by linear equations. The states on that invariant subspace are evidence of the nontermination. Furthermore, the scheduler synthesis is completed by seeking a pattern of finite schedulers that forces all inputs to be terminating with positive probability. The repetition of that pattern yields the desired universal scheduler that forces all inputs to be terminating with probability one. All the problems in the second category are shown to be solved in polynomial time.
We propose a novel data-driven linear inverse model, called Colored-LIM, to extract the linear dynamics and diffusion matrix that define a linear stochastic process driven by an Ornstein-Uhlenbeck colored-noise. The Colored-LIM is a new variant of the classical linear inverse model (LIM) which relies on the white noise assumption. Similar to LIM, the Colored-LIM approximates the linear dynamics from a finite realization of a stochastic process and then solves the diffusion matrix based on, for instance, a generalized fluctuation-dissipation relation, which can be done by solving a system of linear equations. The main difficulty is that in practice, the colored-noise process can be hardly observed while it is correlated to the stochastic process of interest. Nevertheless, we show that the local behavior of the correlation function of the observable encodes the dynamics of the stochastic process and the diffusive behavior of the colored-noise. In this article, we review the classical LIM and develop Colored-LIM with a mathematical background and rigorous derivations. In the numerical experiments, we examine the performance of both LIM and Colored-LIM. Finally, we discuss some false attempts to build a linear inverse model for colored-noise driven processes, and investigate the potential misuse and its consequence of LIM in the appendices.
We present a parallel algorithm for the $(1-\epsilon)$-approximate maximum flow problem in capacitated, undirected graphs with $n$ vertices and $m$ edges, achieving $O(\epsilon^{-3}\text{polylog} n)$ depth and $O(m \epsilon^{-3} \text{polylog} n)$ work in the PRAM model. Although near-linear time sequential algorithms for this problem have been known for almost a decade, no parallel algorithms that simultaneously achieved polylogarithmic depth and near-linear work were known. At the heart of our result is a polylogarithmic depth, near-linear work recursive algorithm for computing congestion approximators. Our algorithm involves a recursive step to obtain a low-quality congestion approximator followed by a "boosting" step to improve its quality which prevents a multiplicative blow-up in error. Similar to Peng [SODA'16], our boosting step builds upon the hierarchical decomposition scheme of R\"acke, Shah, and T\"aubig [SODA'14]. A direct implementation of this approach, however, leads only to an algorithm with $n^{o(1)}$ depth and $m^{1+o(1)}$ work. To get around this, we introduce a new hierarchical decomposition scheme, in which we only need to solve maximum flows on subgraphs obtained by contracting vertices, as opposed to vertex-induced subgraphs used in R\"acke, Shah, and T\"aubig [SODA'14]. In particular, we are able to directly extract congestion approximators for the subgraphs from a congestion approximator for the entire graph, thereby avoiding additional recursion on those subgraphs. Along the way, we also develop a parallel flow-decomposition algorithm that is crucial to achieving polylogarithmic depth and may be of independent interest.
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.