The advent of quantum computers, operating on entirely different physical principles and abstractions from those of classical digital computers, sets forth a new computing paradigm that can potentially result in game-changing efficiencies and computational performance. Specifically, the ability to simultaneously evolve the state of an entire quantum system leads to quantum parallelism and interference. Despite these prospects, opportunities to bring quantum computing to bear on problems of computational mechanics remain largely unexplored. In this work, we demonstrate how quantum computing can indeed be used to solve representative volume element (RVE) problems in computational homogenisation with polylogarithmic complexity of $\mathcal{O}((\log N)^c)$, compared to $\mathcal{O}(N^c)$ in classical computing. Thus, our quantum RVE solver attains exponential acceleration with respect to classical solvers, bringing concurrent multiscale computing closer to practicality. The proposed quantum RVE solver combines conventional algorithms such as a fixed-point iteration for a homogeneous reference material and the Fast Fourier Transform (FFT). However, the quantum computing reformulation of these algorithms requires a fundamental paradigm shift and a complete rethinking and overhaul of the classical implementation. We employ or develop several techniques, including the Quantum Fourier Transform (QFT), quantum encoding of polynomials, classical piecewise Chebyshev approximation of functions and an auxiliary algorithm for implementing the fixed-point iteration and show that, indeed, an efficient implementation of RVE solvers on quantum computers is possible. We additionally provide theoretical proofs and numerical evidence confirming the anticipated $\mathcal{O} \left ((\log N)^c \right)$ complexity of the proposed solver.
We employ techniques from group theory to show that, in many cases, counting problems on graphs are almost as hard to solve in a small number of instances as they are in all instances. Specifically, we show the following results. 1. Goldreich (2020) asks if, for every constant $\delta < 1 / 2$, there is an $\tilde{O} \left( n^2 \right)$-time randomized reduction from computing the number of $k$-cliques modulo $2$ with a success probability of greater than $2 / 3$ to computing the number of $k$-cliques modulo $2$ with an error probability of at most $\delta$. In this work, we show that for almost all choices of the $\delta 2^{n \choose 2}$ corrupt answers within the average-case solver, we have a reduction taking $\tilde{O} \left( n^2 \right)$-time and tolerating an error probability of $\delta$ in the average-case solver for any constant $\delta < 1 / 2$. By "almost all", we mean that if we choose, with equal probability, any subset $S \subset \{0,1\}^{n \choose 2}$ with $|S| = \delta2^{n \choose 2}$, then with a probability of $1-2^{-\Omega \left( n^2 \right)}$, we can use an average-case solver corrupt on $S$ to obtain a probabilistic algorithm. 2. Inspired by the work of Goldreich and Rothblum in FOCS 2018 to take the weighted versions of the graph counting problems, we prove that if the RETH is true, then for a prime $p = \Theta \left( 2^n \right)$, the problem of counting the number of unique Hamiltonian cycles modulo $p$ on $n$-vertex directed multigraphs and the problem of counting the number of unique half-cliques modulo $p$ on $n$-vertex undirected multigraphs, both require exponential time to compute correctly on even a $1 / 2^{n/\log n}$-fraction of instances. Meanwhile, simply printing $0$ on all inputs is correct on at least a $\Omega \left( 1 / 2^n \right)$-fraction of instances.
This work addresses the fundamental linear inverse problem in compressive sensing (CS) by introducing a new type of regularizing generative prior. Our proposed method utilizes ideas from classical dictionary-based CS and, in particular, sparse Bayesian learning (SBL), to integrate a strong regularization towards sparse solutions. At the same time, by leveraging the notion of conditional Gaussianity, it also incorporates the adaptability from generative models to training data. However, unlike most state-of-the-art generative models, it is able to learn from a few compressed and noisy data samples and requires no optimization algorithm for solving the inverse problem. Additionally, similar to Dirichlet prior networks, our model parameterizes a conjugate prior enabling its application for uncertainty quantification. We support our approach theoretically through the concept of variational inference and validate it empirically using different types of compressible signals.
Program synthesis methods aim to automatically generate programs restricted to a language that can explain a given specification of input-output pairs. While purely symbolic approaches suffer from a combinatorial search space, recent methods leverage neural networks to learn distributions over program structures to narrow this search space significantly, enabling more efficient search. However, for challenging problems, it remains difficult to train models to perform program synthesis in one shot, making test-time search essential. Most neural methods lack structured search mechanisms during inference, relying instead on stochastic sampling or gradient updates, which can be inefficient. In this work, we propose the Latent Program Network (LPN), a general algorithm for program induction that learns a distribution over latent programs in a continuous space, enabling efficient search and test-time adaptation. We explore how to train these networks to optimize for test-time computation and demonstrate the use of gradient-based search both during training and at test time. We evaluate LPN on ARC-AGI, a program synthesis benchmark that evaluates performance by generalizing programs to new inputs rather than explaining the underlying specification. We show that LPN can generalize beyond its training distribution and adapt to unseen tasks by utilizing test-time computation, outperforming algorithms without test-time adaptation mechanisms.
We consider the problem of learning the dynamics in the topology of time-evolving point clouds, the prevalent spatiotemporal model for systems exhibiting collective behavior, such as swarms of insects and birds or particles in physics. In such systems, patterns emerge from (local) interactions among self-propelled entities. While several well-understood governing equations for motion and interaction exist, they are notoriously difficult to fit to data, as most prior work requires knowledge about individual motion trajectories, i.e., a requirement that is challenging to satisfy with an increasing number of entities. To evade such confounding factors, we investigate collective behavior from a $\textit{topological perspective}$, but instead of summarizing entire observation sequences (as done previously), we propose learning a latent dynamical model from topological features $\textit{per time point}$. The latter is then used to formulate a downstream regression task to predict the parametrization of some a priori specified governing equation. We implement this idea based on a latent ODE learned from vectorized (static) persistence diagrams and show that a combination of recent stability results for persistent homology justifies this modeling choice. Various (ablation) experiments not only demonstrate the relevance of each model component but provide compelling empirical evidence that our proposed model - $\textit{Neural Persistence Dynamics}$ - substantially outperforms the state-of-the-art across a diverse set of parameter regression tasks.
We characterize the power of constant-depth Boolean circuits in generating uniform symmetric distributions. Let $f\colon\{0,1\}^m\to\{0,1\}^n$ be a Boolean function where each output bit of $f$ depends only on $O(1)$ input bits. Assume the output distribution of $f$ on uniform input bits is close to a uniform distribution $D$ with a symmetric support. We show that $D$ is essentially one of the following six possibilities: (1) point distribution on $0^n$, (2) point distribution on $1^n$, (3) uniform over $\{0^n,1^n\}$, (4) uniform over strings with even Hamming weights, (5) uniform over strings with odd Hamming weights, and (6) uniform over all strings. This confirms a conjecture of Filmus, Leigh, Riazanov, and Sokolov (RANDOM 2023).
Partial differential equations have a wide range of applications in modeling multiple physical, biological, or social phenomena. Therefore, we need to approximate the solutions of these equations in computationally feasible terms. Nowadays, among the most popular numerical methods for solving partial differential equations in engineering, we encounter the finite difference and finite element methods. An alternative numerical method that has recently gained popularity for numerically solving partial differential equations is the use of artificial neural networks. Artificial neural networks, or neural networks for short, are mathematical structures with universal approximation properties. In addition, thanks to the extraordinary computational development of the last decade, neural networks have become accessible and powerful numerical methods for engineers and researchers. For example, imaging and language processing are applications of neural networks today that show sublime performance inconceivable years ago. This dissertation contributes to the numerical solution of partial differential equations using neural networks with the following two-fold objective: investigate the behavior of neural networks as approximators of solutions of partial differential equations and propose neural-network-based methods for frameworks that are hardly addressable via traditional numerical methods. As novel neural-network-based proposals, we first present a method inspired by the finite element method when applying mesh refinements to solve parametric problems. Secondly, we propose a general residual minimization scheme based on a generalized version of the Ritz method. Finally, we develop a memory-based strategy to overcome a usual numerical integration limitation when using neural networks to solve partial differential equations.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.