Equilibrium properties in statistical physics are obtained by computing averages with respect to Boltzmann-Gibbs measures, sampled in practice using ergodic dynamics such as the Langevin dynamics. Some quantities however cannot be computed by simply sampling the Boltzmann-Gibbs measure, in particular transport coefficients, which relate the current of some physical quantity of interest to the forcing needed to induce it. For instance, a temperature difference induces an energy current, the proportionality factor between these two quantities being the thermal conductivity. From an abstract point of view, transport coefficients can also be considered as some form of sensitivity analysis with respect to an added forcing to the baseline dynamics. There are various numerical techniques to estimate transport coefficients, which all suffer from large errors, in particular large statistical errors. This contribution reviews the most popular methods, namely the Green-Kubo approach where the transport coefficient is expressed as some time-integrated correlation function, and the approach based on longtime averages of the stochastic dynamics perturbed by an external driving (so-called nonequilibrium molecular dynamics). In each case, the various sources of errors are made precise, in particular the bias related to the time discretization of the underlying continuous dynamics, and the variance of the associated Monte Carlo estimators. Some recent alternative techniques to estimate transport coefficients are also discussed.
We adopt an information-theoretic framework to analyze the generalization behavior of the class of iterative, noisy learning algorithms. This class is particularly suitable for study under information-theoretic metrics as the algorithms are inherently randomized, and it includes commonly used algorithms such as Stochastic Gradient Langevin Dynamics (SGLD). Herein, we use the maximal leakage (equivalently, the Sibson mutual information of order infinity) metric, as it is simple to analyze, and it implies both bounds on the probability of having a large generalization error and on its expected value. We show that, if the update function (e.g., gradient) is bounded in $L_2$-norm, then adding isotropic Gaussian noise leads to optimal generalization bounds: indeed, the input and output of the learning algorithm in this case are asymptotically statistically independent. Furthermore, we demonstrate how the assumptions on the update function affect the optimal (in the sense of minimizing the induced maximal leakage) choice of the noise. Finally, we compute explicit tight upper bounds on the induced maximal leakage for several scenarios of interest.
The power prior is a popular class of informative priors for incorporating information from historical data. It involves raising the likelihood for the historical data to a power, which acts as discounting parameter. When the discounting parameter is modelled as random, the normalized power prior is recommended. In this work, we prove that the marginal posterior for the discounting parameter for generalized linear models converges to a point mass at zero if there is any discrepancy between the historical and current data, and that it does not converge to a point mass at one when they are fully compatible. In addition, we explore the construction of optimal priors for the discounting parameter in a normalized power prior. In particular, we are interested in achieving the dual objectives of encouraging borrowing when the historical and current data are compatible and limiting borrowing when they are in conflict. We propose intuitive procedures for eliciting the shape parameters of a beta prior for the discounting parameter based on two minimization criteria, the Kullback-Leibler divergence and the mean squared error. Based on the proposed criteria, the optimal priors derived are often quite different from commonly used priors such as the uniform prior.
This article presents a general approximation-theoretic framework to analyze measure-transport algorithms for sampling and characterizing probability measures. Sampling is a task that frequently arises in data science and uncertainty quantification. We provide error estimates in the continuum limit, i.e., when the measures (or their densities) are given, but when the transport map is discretized or approximated using a finite-dimensional function space. Our analysis relies on the regularity theory of transport maps, as well as on classical approximation theory for high-dimensional functions. A third element of our analysis, which is of independent interest, is the development of new stability estimates that relate the normed distance between two maps to the divergence between the pushforward measures they define. We further present a series of applications where quantitative convergence rates are obtained for practical problems using Wasserstein metrics, maximum mean discrepancy, and Kullback-Leibler divergence. Specialized rates for approximations of the popular triangular Kn{\"o}the-Rosenblatt maps are obtained, followed by numerical experiments that demonstrate and extend our theory.
Consider a decision-maker that can pick one out of $K$ actions to control an unknown system, for $T$ turns. The actions are interpreted as different configurations or policies. Holding the same action fixed, the system asymptotically converges to a unique equilibrium, as a function of this action. The dynamics of the system are unknown to the decision-maker, which can only observe a noisy reward at the end of every turn. The decision-maker wants to maximize its accumulated reward over the $T$ turns. Learning what equilibria are better results in higher rewards, but waiting for the system to converge to equilibrium costs valuable time. Existing bandit algorithms, either stochastic or adversarial, achieve linear (trivial) regret for this problem. We present a novel algorithm, termed Upper Equilibrium Concentration Bound (UECB), that knows to switch an action quickly if it is not worth it to wait until the equilibrium is reached. This is enabled by employing convergence bounds to determine how far the system is from equilibrium. We prove that UECB achieves a regret of $\mathcal{O}(\log(T)+\tau_c\log(\tau_c)+\tau_c\log\log(T))$ for this equilibrium bandit problem where $\tau_c$ is the worst case approximate convergence time to equilibrium. We then show that both epidemic control and game control are special cases of equilibrium bandits, where $\tau_c\log \tau_c$ typically dominates the regret. We then test UECB numerically for both of these applications.
This article surveys research on the application of compatible finite element methods to large scale atmosphere and ocean simulation. Compatible finite element methods extend Arakawa's C-grid finite difference scheme to the finite element world. They are constructed from a discrete de Rham complex, which is a sequence of finite element spaces which are linked by the operators of differential calculus. The use of discrete de Rham complexes to solve partial differential equations is well established, but in this article we focus on the specifics of dynamical cores for simulating weather, oceans and climate. The most important consequence of the discrete de Rham complex is the Hodge-Helmholtz decomposition, which has been used to exclude the possibility of several types of spurious oscillations from linear equations of geophysical flow. This means that compatible finite element spaces provide a useful framework for building dynamical cores. In this article we introduce the main concepts of compatible finite element spaces, and discuss their wave propagation properties. We survey some methods for discretising the transport terms that arise in dynamical core equation systems, and provide some example discretisations, briefly discussing their iterative solution. Then we focus on the recent use of compatible finite element spaces in designing structure preserving methods, surveying variational discretisations, Poisson bracket discretisations, and consistent vorticity transport.
We introduce the Weak-form Estimation of Nonlinear Dynamics (WENDy) method for estimating model parameters for non-linear systems of ODEs. The core mathematical idea involves an efficient conversion of the strong form representation of a model to its weak form, and then solving a regression problem to perform parameter inference. The core statistical idea rests on the Errors-In-Variables framework, which necessitates the use of the iteratively reweighted least squares algorithm. Further improvements are obtained by using orthonormal test functions, created from a set of $C^{\infty}$ bump functions of varying support sizes. We demonstrate that WENDy is a highly robust and efficient method for parameter inference in differential equations. Without relying on any numerical differential equation solvers, WENDy computes accurate estimates and is robust to large (biologically relevant) levels of measurement noise. For low dimensional systems with modest amounts of data, WENDy is competitive with conventional forward solver-based nonlinear least squares methods in terms of speed and accuracy. For both higher dimensional systems and stiff systems, WENDy is typically both faster (often by orders of magnitude) and more accurate than forward solver-based approaches. We illustrate the method and its performance in some common population and neuroscience models, including logistic growth, Lotka-Volterra, FitzHugh-Nagumo, Hindmarsh-Rose, and a Protein Transduction Benchmark model. Software and code for reproducing the examples is available at (//github.com/MathBioCU/WENDy).
In this work, we extend the data-driven It\^{o} stochastic differential equation (SDE) framework for the pathwise assessment of short-term forecast errors to account for the time-dependent upper bound that naturally constrains the observable historical data and forecast. We propose a new nonlinear and time-inhomogeneous SDE model with a Jacobi-type diffusion term for the phenomenon of interest, simultaneously driven by the forecast and the constraining upper bound. We rigorously demonstrate the existence and uniqueness of a strong solution to the SDE model by imposing a condition for the time-varying mean-reversion parameter appearing in the drift term. The normalized forecast function is thresholded to keep such mean-reversion parameters bounded. The SDE model parameter calibration also covers the thresholding parameter of the normalized forecast by applying a novel iterative two-stage optimization procedure to user-selected approximations of the likelihood function. Another novel contribution is estimating the transition density of the forecast error process, not known analytically in a closed form, through a tailored kernel smoothing technique with the control variate method. We fit the model to the 2019 photovoltaic (PV) solar power daily production and forecast data in Uruguay, computing the daily maximum solar PV production estimation. Two statistical versions of the constrained SDE model are fit, with the beta and truncated normal distributions as proxies for the transition density. Empirical results include simulations of the normalized solar PV power production and pathwise confidence bands generated through an indirect inference method. An objective comparison of optimal parametric points associated with the two selected statistical approximations is provided by applying the innovative kernel density estimation technique of the transition function of the forecast error process.
We show that convex-concave Lipschitz stochastic saddle point problems (also known as stochastic minimax optimization) can be solved under the constraint of $(\epsilon,\delta)$-differential privacy with \emph{strong (primal-dual) gap} rate of $\tilde O\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{n\epsilon}\big)$, where $n$ is the dataset size and $d$ is the dimension of the problem. This rate is nearly optimal, based on existing lower bounds in differentially private stochastic optimization. Specifically, we prove a tight upper bound on the strong gap via novel implementation and analysis of the recursive regularization technique repurposed for saddle point problems. We show that this rate can be attained with $O\big(\min\big\{\frac{n^2\epsilon^{1.5}}{\sqrt{d}}, n^{3/2}\big\}\big)$ gradient complexity, and $O(n)$ gradient complexity if the loss function is smooth. As a byproduct of our method, we develop a general algorithm that, given a black-box access to a subroutine satisfying a certain $\alpha$ primal-dual accuracy guarantee with respect to the empirical objective, gives a solution to the stochastic saddle point problem with a strong gap of $\tilde{O}(\alpha+\frac{1}{\sqrt{n}})$. We show that this $\alpha$-accuracy condition is satisfied by standard algorithms for the empirical saddle point problem such as the proximal point method and the stochastic gradient descent ascent algorithm. Further, we show that even for simple problems it is possible for an algorithm to have zero weak gap and suffer from $\Omega(1)$ strong gap. We also show that there exists a fundamental tradeoff between stability and accuracy. Specifically, we show that any $\Delta$-stable algorithm has empirical gap $\Omega\big(\frac{1}{\Delta n}\big)$, and that this bound is tight. This result also holds also more specifically for empirical risk minimization problems and may be of independent interest.
Data imbalance is a common problem in machine learning that can have a critical effect on the performance of a model. Various solutions exist but their impact on the convergence of the learning dynamics is not understood. Here, we elucidate the significant negative impact of data imbalance on learning, showing that the learning curves for minority and majority classes follow sub-optimal trajectories when training with a gradient-based optimizer. This slowdown is related to the imbalance ratio and can be traced back to a competition between the optimization of different classes. Our main contribution is the analysis of the convergence of full-batch (GD) and stochastic gradient descent (SGD), and of variants that renormalize the contribution of each per-class gradient. We find that GD is not guaranteed to decrease the loss for each class but that this problem can be addressed by performing a per-class normalization of the gradient. With SGD, class imbalance has an additional effect on the direction of the gradients: the minority class suffers from a higher directional noise, which reduces the effectiveness of the per-class gradient normalization. Our findings not only allow us to understand the potential and limitations of strategies involving the per-class gradients, but also the reason for the effectiveness of previously used solutions for class imbalance such as oversampling.
Systems consisting of spheres rolling on elastic membranes have been used as educational tools to introduce a core conceptual idea of General Relativity (GR): how curvature guides the movement of matter. However, previous studies have revealed that such schemes cannot accurately represent relativistic dynamics in the laboratory. Dissipative forces cause the initially GR-like dynamics to be transient and consequently restrict experimental study to only the beginnings of trajectories; dominance of Earth's gravity forbids the difference between spatial and temporal spacetime curvatures. Here by developing a mapping between dynamics of a wheeled vehicle on a spandex membrane, we demonstrate that an active object that can prescribe its speed can not only obtain steady-state orbits, but also use the additional parameters such as speed to tune the orbits towards relativistic dynamics. Our mapping demonstrates how activity mixes space and time in a metric, shows how active particles do not necessarily follow geodesics in the real space but instead follow geodesics in a fiducial spacetime. The mapping further reveals how parameters such as the membrane elasticity and instantaneous speed allow programming a desired spacetime such as the Schwarzschild metric near a non-rotating black hole. Our mapping and framework point the way to the possibility to create a robophysical analog gravity system in the laboratory at low cost and provide insights into active matter in deformable environments and robot exploration in complex landscapes.