We study the limitations and fast-forwarding of quantum algorithms for linear ordinary differential equation (ODE) systems with a particular focus on non-quantum dynamics, where the coefficient matrix in the ODE is not anti-Hermitian or the ODE is inhomogeneous. On the one hand, for generic homogeneous linear ODEs, by proving worst-case lower bounds, we show that quantum algorithms suffer from computational overheads due to two types of ``non-quantumness'': real part gap and non-normality of the coefficient matrix. We then show that homogeneous ODEs in the absence of both types of ``non-quantumness'' are equivalent to quantum dynamics, and reach the conclusion that quantum algorithms for quantum dynamics work best. We generalize our results to the inhomogeneous case and find that existing generic quantum ODE solvers cannot be substantially improved. To obtain these lower bounds, we propose a general framework for proving lower bounds on quantum algorithms that are amplifiers, meaning that they amplify the difference between a pair of input quantum states. On the other hand, we show how to fast-forward quantum algorithms for solving special classes of ODEs which leads to improved efficiency. More specifically, we obtain quadratic improvements in the evolution time $T$ for inhomogeneous ODEs with a negative semi-definite coefficient matrix, and exponential improvements in both $T$ and the spectral norm of the coefficient matrix for inhomogeneous ODEs with efficiently implementable eigensystems, including various spatially discretized linear evolutionary partial differential equations. We give fast-forwarding algorithms that are conceptually different from existing ones in the sense that they neither require time discretization nor solving high-dimensional linear systems.
We present a categorical theory of the composition methods in finite model theory -- a key technique enabling modular reasoning about complex structures by building them out of simpler components. The crucial results required by the composition methods are Feferman-Vaught-Mostowski (FVM) type theorems, which characterize how logical equivalence behaves under composition and transformation of models. Our results are developed by extending the recently introduced game comonad semantics for model comparison games. This level of abstraction allow us to give conditions yielding FVM type results in a uniform way. Our theorems are parametric in the classes of models, logics and operations involved. Furthermore, they naturally account for the positive existential fragment, and extensions with counting quantifiers of these logics. We also reveal surprising connections between FVM type theorems, and classical concepts in the theory of monads. We illustrate our methods by recovering many classical theorems of practical interest, including a refinement of a previous result by Dawar, Severini, and Zapata concerning the 3-variable counting logic and cospectrality. To highlight the importance of our techniques being parametric in the logic of interest, we prove a family of FVM theorems for products of structures, uniformly in the logic in question, which cannot be done using specific game arguments.
In recent years there has been increased use of machine learning (ML) techniques within mathematics, including symbolic computation where it may be applied safely to optimise or select algorithms. This paper explores whether using explainable AI (XAI) techniques on such ML models can offer new insight for symbolic computation, inspiring new implementations within computer algebra systems that do not directly call upon AI tools. We present a case study on the use of ML to select the variable ordering for cylindrical algebraic decomposition. It has already been demonstrated that ML can make the choice well, but here we show how the SHAP tool for explainability can be used to inform new heuristics of a size and complexity similar to those human-designed heuristics currently commonly used in symbolic computation.
Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper is to provide concentration inequalities on the distance between the sets of minimizers of the risks for a broad spectrum of estimation problems. In particular, the risks are defined on metric spaces through probability measures that are also supported on metric spaces. A particular attention will therefore be given to include unbounded spaces and non-convex cost functions that might also be unbounded. This work identifies a set of assumptions allowing to describe a regime that seem to govern the concentration in many estimation problems, where the empirical minimizers are stable. This stability can then be leveraged to prove parametric concentration rates in probability and in expectation. The assumptions are verified, and the bounds showcased, on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.
We study multivariate Gaussian statistical models whose maximum likelihood estimator (MLE) is a rational function of the observed data. We establish a one-to-one correspondence between such models and the solutions to a nonlinear first-order partial differential equation (PDE). Using our correspondence, we reinterpret familiar classes of models with rational MLE, such as directed (and decomposable undirected) Gaussian graphical models. We also find new models with rational MLE. For linear concentration models with rational MLE, we show that homaloidal polynomials from birational geometry lead to solutions to the PDE. We thus shed light on the problem of classifying Gaussian models with rational MLE by relating it to the open problem in birational geometry of classifying homaloidal polynomials.
In this work, we establish the Freidlin--Wentzell large deviations principle (LDP) of the stochastic Cahn--Hilliard equation with small noise, which implies the one-point LDP. Further, we give the one-point LDP of the spatial finite difference method (FDM) for the stochastic Cahn--Hilliard equation. Our main result is the convergence of the one-point large deviations rate function (LDRF) of the spatial FDM, which is about the asymptotical limit of a parametric variational problem. The main idea for proving the convergence of the LDRF of the spatial FDM is via the $\Gamma$-convergence of objective functions, which relies on the qualitative analysis of skeleton equations of the original equation and the numerical method. In order to overcome the difficulty that the drift coefficient is not one-side Lipschitz, we use the equivalent characterization of the skeleton equation of the spatial FDM and the discrete interpolation inequality to obtain the uniform boundedness of the solution to the underlying skeleton equation. This plays an important role in deriving the $\Gamma$-convergence of objective functions.
Uncertainty quantification in image restoration is a prominent challenge, mainly due to the high dimensionality of the encountered problems. Recently, a Bayesian uncertainty quantification by optimization (BUQO) has been proposed to formulate hypothesis testing as a minimization problem. The objective is to determine whether a structure appearing in a maximum a posteriori estimate is true or is a reconstruction artifact due to the ill-posedness or ill-conditioness of the problem. In this context, the mathematical definition of having a ``fake structure" is crucial, and highly depends on the type of structure of interest. This definition can be interpreted as an inpainting of a neighborhood of the structure, but only simple techniques have been proposed in the literature so far, due to the complexity of the problem. In this work, we propose a data-driven method using a simple convolutional neural network to perform the inpainting task, leading to a novel plug-and-play BUQO algorithm. Compared to previous works, the proposed approach has the advantage that it can be used for a wide class of structures, without needing to adapt the inpainting operator to the area of interest. In addition, we show through simulations on magnetic resonance imaging, that compared to the original BUQO's hand-crafted inpainting procedure, the proposed approach provides greater qualitative output images. Python code will be made available for reproducibility upon acceptance of the article.
Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has been consistent over the past several years since Transformer models were originally introduced. However, the amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate, and this has made their deployment in latency-sensitive applications challenging. As such, there has been an increased focus on making Transformer models more efficient, with methods that range from changing the architecture design, all the way to developing dedicated domain-specific accelerators. In this work, we survey different approaches for efficient Transformer inference, including: (i) analysis and profiling of the bottlenecks in existing Transformer architectures and their similarities and differences with previous convolutional models; (ii) implications of Transformer architecture on hardware, including the impact of non-linear operations such as Layer Normalization, Softmax, and GELU, as well as linear operations, on hardware design; (iii) approaches for optimizing a fixed Transformer architecture; (iv) challenges in finding the right mapping and scheduling of operations for Transformer models; and (v) approaches for optimizing Transformer models by adapting the architecture using neural architecture search. Finally, we perform a case study by applying the surveyed optimizations on Gemmini, the open-source, full-stack DNN accelerator generator, and we show how each of these approaches can yield improvements, compared to previous benchmark results on Gemmini. Among other things, we find that a full-stack co-design approach with the aforementioned methods can result in up to 88.7x speedup with a minimal performance degradation for Transformer inference.
We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.
Reasoning with knowledge expressed in natural language and Knowledge Bases (KBs) is a major challenge for Artificial Intelligence, with applications in machine reading, dialogue, and question answering. General neural architectures that jointly learn representations and transformations of text are very data-inefficient, and it is hard to analyse their reasoning process. These issues are addressed by end-to-end differentiable reasoning systems such as Neural Theorem Provers (NTPs), although they can only be used with small-scale symbolic KBs. In this paper we first propose Greedy NTPs (GNTPs), an extension to NTPs addressing their complexity and scalability limitations, thus making them applicable to real-world datasets. This result is achieved by dynamically constructing the computation graph of NTPs and including only the most promising proof paths during inference, thus obtaining orders of magnitude more efficient models. Then, we propose a novel approach for jointly reasoning over KBs and textual mentions, by embedding logic facts and natural language sentences in a shared embedding space. We show that GNTPs perform on par with NTPs at a fraction of their cost while achieving competitive link prediction results on large datasets, providing explanations for predictions, and inducing interpretable models. Source code, datasets, and supplementary material are available online at //github.com/uclnlp/gntp.