We provide a perfect sampling algorithm for the hard-sphere model on subsets of $\mathbb{R}^d$ with expected running time linear in the volume under the assumption of strong spatial mixing. A large number of perfect and approximate sampling algorithms have been devised to sample from the hard-sphere model, and our perfect sampling algorithm is efficient for a range of parameters for which only efficient approximate samplers were previously known and is faster than these known approximate approaches. Our methods also extend to the more general setting of Gibbs point processes interacting via finite-range, repulsive potentials.
Spatial processes observed in various fields, such as climate and environmental science, often occur on a large scale and demonstrate spatial nonstationarity. Fitting a Gaussian process with a nonstationary Mat\'ern covariance is challenging. Previous studies in the literature have tackled this challenge by employing spatial partitioning techniques to estimate the parameters that vary spatially in the covariance function. The selection of partitions is an important consideration, but it is often subjective and lacks a data-driven approach. To address this issue, in this study, we utilize the power of Convolutional Neural Networks (ConvNets) to derive subregions from the nonstationary data. We employ a selection mechanism to identify subregions that exhibit similar behavior to stationary fields. In order to distinguish between stationary and nonstationary random fields, we conducted training on ConvNet using various simulated data. These simulations are generated from Gaussian processes with Mat\'ern covariance models under a wide range of parameter settings, ensuring adequate representation of both stationary and nonstationary spatial data. We assess the performance of the proposed method with synthetic and real datasets at a large scale. The results revealed enhanced accuracy in parameter estimations when relying on ConvNet-based partition compared to traditional user-defined approaches.
Many asymptotically minimax procedures for function estimation often rely on somewhat arbitrary and restrictive assumptions such as isotropy or spatial homogeneity. This work enhances the theoretical understanding of Bayesian additive regression trees under substantially relaxed smoothness assumptions. We provide a comprehensive study of asymptotic optimality and posterior contraction of Bayesian forests when the regression function has anisotropic smoothness that possibly varies over the function domain. The regression function can also be possibly discontinuous. We introduce a new class of sparse {\em piecewise heterogeneous anisotropic} H\"{o}lder functions and derive their minimax lower bound of estimation in high-dimensional scenarios under the $L_2$-loss. We then find that the Bayesian tree priors, coupled with a Dirichlet subset selection prior for sparse estimation in high-dimensional scenarios, adapt to unknown heterogeneous smoothness, discontinuity, and sparsity. These results show that Bayesian forests are uniquely suited for more general estimation problems that would render other default machine learning tools, such as Gaussian processes, suboptimal. Our numerical study shows that Bayesian forests often outperform other competitors such as random forests and deep neural networks, which are believed to work well for discontinuous or complicated smooth functions. Beyond nonparametric regression, we also examined posterior contraction of Bayesian forests for density estimation and binary classification using the technique developed in this study.
Compatible finite element discretisations for the atmospheric equations of motion have recently attracted considerable interest. Semi-implicit timestepping methods require the repeated solution of a large saddle-point system of linear equations. Preconditioning this system is challenging since the velocity mass matrix is non-diagonal, leading to a dense Schur complement. Hybridisable discretisations overcome this issue: weakly enforcing continuity of the velocity field with Lagrange multipliers leads to a sparse system of equations, which has a similar structure to the pressure Schur complement in traditional approaches. We describe how the hybridised sparse system can be preconditioned with a non-nested two-level preconditioner. To solve the coarse system, we use the multigrid pressure solver that is employed in the approximate Schur complement method previously proposed by the some of the authors. Our approach significantly reduces the number of solver iterations. The method shows excellent performance and scales to large numbers of cores in the Met Office next-generation climate- and weather prediction model LFRic.
We consider Bidder Selection Problem (BSP) in position auctions motivated by practical concerns of online advertising platforms. In this problem, the platform sells ad slots via an auction to a large pool of $n$ potential buyers with independent values drawn from known prior distributions. The seller can only invite a fraction of $k<n$ advertisers to the auction due to communication and computation restrictions. She wishes to maximize either the social welfare or her revenue by selecting the set of invited bidders. We study BSP in a classic multi-winner model of position auctions for welfare and revenue objectives using the optimal (respectively, VCG mechanism, or Myerson's auction) format for the selected set of bidders. We propose a novel Poisson-Chernoff relaxation of the problem that immediately implies that 1) BSP is polynomial time solvable up to a vanishingly small error as the problem size $k$ grows; 2) PTAS for position auctions after combining our relaxation with the trivial brute force algorithm; the algorithm is in fact an Efficient PTAS (EPTAS) under a mild assumption $k\ge\log n$ with much better running time than previous PTASes for single-item auction. Our approach yields simple and practically relevant algorithms unlike all previous complex PTASes, which had at least doubly exponential dependency of their running time on $\varepsilon$. In contrast, our algorithms are even faster than popular algorithms such as greedy for submodular maximization. Furthermore, we did extensive numerical experiments, which demonstrate high efficiency and practical applicability of our solution. Our experiments corroborate the experimental findings of [Mehta, Nadav, Psomas, Rubinstein 2020] that many simple heuristics perform surprisingly well, which indicates importance of using small $\varepsilon$ for the BSP and practical irrelevance of all previous PTAS approaches.
The separate tasks of denoising, conditional expectation and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that are stable to numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown.
Learning interpretable representations of neural dynamics at a population level is a crucial first step to understanding how observed neural activity relates to perception and behavior. Models of neural dynamics often focus on either low-dimensional projections of neural activity, or on learning dynamical systems that explicitly relate to the neural state over time. We discuss how these two approaches are interrelated by considering dynamical systems as representative of flows on a low-dimensional manifold. Building on this concept, we propose a new decomposed dynamical system model that represents complex non-stationary and nonlinear dynamics of time series data as a sparse combination of simpler, more interpretable components. Our model is trained through a dictionary learning procedure, where we leverage recent results in tracking sparse vectors over time. The decomposed nature of the dynamics is more expressive than previous switched approaches for a given number of parameters and enables modeling of overlapping and non-stationary dynamics. In both continuous-time and discrete-time instructional examples we demonstrate that our model can well approximate the original system, learn efficient representations, and capture smooth transitions between dynamical modes, focusing on intuitive low-dimensional non-stationary linear and nonlinear systems. Furthermore, we highlight our model's ability to efficiently capture and demix population dynamics generated from multiple independent subnetworks, a task that is computationally impractical for switched models. Finally, we apply our model to neural "full brain" recordings of C. elegans data, illustrating a diversity of dynamics that is obscured when classified into discrete states.
The entropy production rate is a central quantity in non-equilibrium statistical physics, scoring how far a stochastic process is from being time-reversible. In this paper, we compute the entropy production of diffusion processes at non-equilibrium steady-state under the condition that the time-reversal of the diffusion remains a diffusion. We start by characterising the entropy production of both discrete and continuous-time Markov processes. We investigate the time-reversal of time-homogeneous stationary diffusions and recall the most general conditions for the reversibility of the diffusion property, which includes hypoelliptic and degenerate diffusions, and locally Lipschitz vector fields. We decompose the drift into its time-reversible and irreversible parts, or equivalently, the generator into symmetric and antisymmetric operators. We show the equivalence with a decomposition of the backward Kolmogorov equation considered in hypocoercivity theory, and a decomposition of the Fokker-Planck equation in GENERIC form. The main result shows that when the time-irreversible part of the drift is in the range of the volatility matrix (almost everywhere) the forward and time-reversed path space measures of the process are mutually equivalent, and evaluates the entropy production. When this does not hold, the measures are mutually singular and the entropy production is infinite. We verify these results using exact numerical simulations of linear diffusions. We illustrate the discrepancy between the entropy production of non-linear diffusions and their numerical simulations in several examples and illustrate how the entropy production can be used for accurate numerical simulation. Finally, we discuss the relationship between time-irreversibility and sampling efficiency, and how we can modify the definition of entropy production to score how far a process is from being generalised reversible.
We consider Shor's quantum factoring algorithm in the setting of noisy quantum gates. Under a generic model of random noise for (controlled) rotation gates, we prove that the algorithm does not factor integers of the form $pq$ when the noise exceeds a vanishingly small level in terms of $n$ -- the number of bits of the integer to be factored, where $p$ and $q$ are from a well-defined set of primes of positive density. We further prove that with probability $1 - o(1)$ over random prime pairs $(p,q)$, Shor's factoring algorithm does not factor numbers of the form $pq$, with the same level of random noise present.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.