The entropy production rate is a central quantity in non-equilibrium statistical physics, scoring how far a stochastic process is from being time-reversible. In this paper, we compute the entropy production of diffusion processes at non-equilibrium steady-state under the condition that the time-reversal of the diffusion remains a diffusion. We start by characterising the entropy production of both discrete and continuous-time Markov processes. We investigate the time-reversal of time-homogeneous stationary diffusions and recall the most general conditions for the reversibility of the diffusion property, which includes hypoelliptic and degenerate diffusions, and locally Lipschitz vector fields. We decompose the drift into its time-reversible and irreversible parts, or equivalently, the generator into symmetric and antisymmetric operators. We show the equivalence with a decomposition of the backward Kolmogorov equation considered in hypocoercivity theory, and a decomposition of the Fokker-Planck equation in GENERIC form. The main result shows that when the time-irreversible part of the drift is in the range of the volatility matrix (almost everywhere) the forward and time-reversed path space measures of the process are mutually equivalent, and evaluates the entropy production. When this does not hold, the measures are mutually singular and the entropy production is infinite. We verify these results using exact numerical simulations of linear diffusions. We illustrate the discrepancy between the entropy production of non-linear diffusions and their numerical simulations in several examples and illustrate how the entropy production can be used for accurate numerical simulation. Finally, we discuss the relationship between time-irreversibility and sampling efficiency, and how we can modify the definition of entropy production to score how far a process is from being generalised reversible.
In literature on imprecise probability little attention is paid to the fact that imprecise probabilities are precise on some events. We call these sets system of precision. We show that, under mild assumptions, the system of precision of a lower and upper probability form a so-called (pre-)Dynkin-system. Interestingly, there are several settings, ranging from machine learning on partial data over frequential probability theory to quantum probability theory and decision making under uncertainty, in which a priori the probabilities are only desired to be precise on a specific underlying set system. At the core of all of these settings lies the observation that precise beliefs, probabilities or frequencies on two events do not necessarily imply this precision to hold for the intersection of those events. Here, (pre-)Dynkin-systems have been adopted as systems of precision, too. We show that, under extendability conditions, those pre-Dynkin-systems equipped with probabilities can be embedded into algebras of sets. Surprisingly, the extendability conditions elaborated in a strand of work in quantum physics are equivalent to coherence in the sense of Walley (1991, Statistical reasoning with imprecise probabilities, p. 84). Thus, literature on probabilities on pre-Dynkin-systems gets linked to the literature on imprecise probability. Finally, we spell out a lattice duality which rigorously relates the system of precision to credal sets of probabilities. In particular, we provide a hitherto undescribed, parametrized family of coherent imprecise probabilities.
Detecting the dimensionality of graphs is a central topic in machine learning. While the problem has been tackled empirically as well as theoretically, existing methods have several drawbacks. On the one hand, empirical tools are computationally heavy and lack theoretical foundation. On the other hand, theoretical approaches do not apply to graphs with heterogeneous degree distributions, which is often the case for complex real-world networks. To address these drawbacks, we consider geometric inhomogeneous random graphs (GIRGs) as a random graph model, which captures a variety of properties observed in practice. These include a heterogeneous degree distribution and non-vanishing clustering coefficient, which is the probability that two random neighbours of a vertex are adjacent. In GIRGs, $n$ vertices are distributed on a $d$-dimensional torus and weights are assigned to the vertices according to a power-law distribution. Two vertices are then connected with a probability that depends on their distance and their weights. Our first result shows that the clustering coefficient of GIRGs scales inverse exponentially with respect to the number of dimensions, when the latter is at most logarithmic in $n$. This gives a first theoretical explanation for the low dimensionality of real-world networks observed by Almagro et. al. [Nature '22]. Our second result is a linear-time algorithm for determining the dimensionality of a given GIRG. We prove that our algorithm returns the correct number of dimensions with high probability when the input is a GIRG. As a result, our algorithm bridges the gap between theory and practice, as it not only comes with a rigorous proof of correctness but also yields results comparable to that of prior empirical approaches, as indicated by our experiments on real-world instances.
For multivariate time series driven by underlying states, hidden Markov models (HMMs) constitute a powerful framework which can be flexibly tailored to the situation at hand. However, in practice it can be challenging to choose an adequate emission distribution for multivariate observation vectors. For example, the marginal data distribution may not immediately reveal the within-state distributional form, and also the different data streams may operate on different supports, rendering the common approach of using a multivariate normal distribution inadequate. Here we explore a nonparametric estimation of the emission distributions within a multivariate HMM based on tensor-product B-splines. In two simulation studies, we show the feasibility of our modelling approach and demonstrate potential pitfalls of inappropriate choices of parametric distributions. To illustrate the practical applicability, we present a case study where we use an HMM to model the bivariate time series comprising the lengths and angles of goalkeeper passes during the UEFA EURO 2020, investigating the effect of match dynamics on the teams' tactics.
Spatial process models popular in geostatistics often represent the observed data as the sum of a smooth underlying process and white noise. The variation in the white noise is attributed to measurement error, or micro-scale variability, and is called the "nugget". We formally establish results on the identifiability and consistency of the nugget in spatial models based upon the Gaussian process within the framework of in-fill asymptotics, i.e. the sample size increases within a sampling domain that is bounded. Our work extends results in fixed domain asymptotics for spatial models without the nugget. More specifically, we establish the identifiability of parameters in the Mat\'ern covariance function and the consistency of their maximum likelihood estimators in the presence of discontinuities due to the nugget. We also present simulation studies to demonstrate the role of the identifiable quantities in spatial interpolation.
We study the problem of controlling a partially observed Markov decision process (POMDP) to either aid or hinder the estimation of its state trajectory. We encode the estimation objectives via the smoother entropy, which is the conditional entropy of the state trajectory given measurements and controls. Consideration of the smoother entropy contrasts with previous approaches that instead resort to marginal (or instantaneous) state entropies due to tractability concerns. By establishing novel expressions for the smoother entropy in terms of the POMDP belief state, we show that both the problems of minimising and maximising the smoother entropy in POMDPs can surprisingly be reformulated as belief-state Markov decision processes with concave cost and value functions. The significance of these reformulations is that they render the smoother entropy a tractable optimisation objective, with structural properties amenable to the use of standard POMDP solution techniques for both active estimation and obfuscation. Simulations illustrate that optimisation of the smoother entropy leads to superior trajectory estimation and obfuscation compared to alternative approaches.
This paper develops and analyzes an accelerated proximal descent method for finding stationary points of nonconvex composite optimization problems. The objective function is of the form $f+h$ where $h$ is a proper closed convex function, $f$ is a differentiable function on the domain of $h$, and $\nabla f$ is Lipschitz continuous on the domain of $h$. The main advantage of this method is that it is "parameter-free" in the sense that it does not require knowledge of the Lipschitz constant of $\nabla f$ or of any global topological properties of $f$. It is shown that the proposed method can obtain an $\varepsilon$-approximate stationary point with iteration complexity bounds that are optimal, up to logarithmic terms over $\varepsilon$, in both the convex and nonconvex settings. Some discussion is also given about how the proposed method can be leveraged in other existing optimization frameworks, such as min-max smoothing and penalty frameworks for constrained programming, to create more specialized parameter-free methods. Finally, numerical experiments are presented to support the practical viability of the method.
We revisit the following problem: given a set of indices $S = \{1, \dots, n\}$ and weights $w_1, \dots, w_n \in \mathbb{R}_{> 0}$, provide samples from $S$ with distribution $p(i) = w_i / W$ where $W = \sum_j w_j$ gives the proper normalization. In the static setting, there is a simple data structure due to Walker called Alias Table that allows for samples to be drawn in constant time. A more challenging task is to maintain the distribution in a dynamic setting, where elements may be added or removed, or weights may change over time; here, existing solutions restrict the permissible weights, require rebuilding of the associated data structure after a number of updates, or are rather complex. In this paper, we describe, analyze, and engineer a simple data structure for maintaining a discrete probability distribution in the dynamic setting. Construction of the data structure for an arbitrary distribution takes time $O(n)$, sampling takes expected time $O(1)$, and updates of size $\Delta = O(W / n)$ can be processed in time $O(1)$. To evaluate the efficiency of the data structure we conduct an experimental study. The results suggest that the dynamic sampling performance is comparable to the static Alias Table with a minor slowdown.
We develop a general theoretical and algorithmic framework for sparse approximation and structured prediction in $\mathcal{P}_2(\Omega)$ with Wasserstein barycenters. The barycenters are sparse in the sense that they are computed from an available dictionary of measures but the approximations only involve a reduced number of atoms. We show that the best reconstruction from the class of sparse barycenters is characterized by a notion of best $n$-term barycenter which we introduce, and which can be understood as a natural extension of the classical concept of best $n$-term approximation in Banach spaces. We show that the best $n$-term barycenter is the minimizer of a highly non-convex, bi-level optimization problem, and we develop algorithmic strategies for practical numerical computation. We next leverage this approximation tool to build interpolation strategies that involve a reduced computational cost, and that can be used for structured prediction, and metamodelling of parametrized families of measures. We illustrate the potential of the method through the specific problem of Model Order Reduction (MOR) of parametrized PDEs. Since our approach is sparse, adaptive and preserves mass by construction, it has potential to overcome known bottlenecks of classical linear methods in hyperbolic conservation laws transporting discontinuities. It also paves the way towards MOR for measure-valued PDE problems such as gradient flows.
This paper proposes a flexible framework for inferring large-scale time-varying and time-lagged correlation networks from multivariate or high-dimensional non-stationary time series with piecewise smooth trends. Built on a novel and unified multiple-testing procedure of time-lagged cross-correlation functions with a fixed or diverging number of lags, our method can accurately disclose flexible time-varying network structures associated with complex functional structures at all time points. We broaden the applicability of our method to the structure breaks by developing difference-based nonparametric estimators of cross-correlations, achieve accurate family-wise error control via a bootstrap-assisted procedure adaptive to the complex temporal dynamics, and enhance the probability of recovering the time-varying network structures using a new uniform variance reduction technique. We prove the asymptotic validity of the proposed method and demonstrate its effectiveness in finite samples through simulation studies and empirical applications.
We develop new tools in the theory of nonlinear random matrices and apply them to study the performance of the Sum of Squares (SoS) hierarchy on average-case problems. The SoS hierarchy is a powerful optimization technique that has achieved tremendous success for various problems in combinatorial optimization, robust statistics and machine learning. It's a family of convex relaxations that lets us smoothly trade off running time for approximation guarantees. In recent works, it's been shown to be extremely useful for recovering structure in high dimensional noisy data. It also remains our best approach towards refuting the notorious Unique Games Conjecture. In this work, we analyze the performance of the SoS hierarchy on fundamental problems stemming from statistics, theoretical computer science and statistical physics. In particular, we show subexponential-time SoS lower bounds for the problems of the Sherrington-Kirkpatrick Hamiltonian, Planted Slightly Denser Subgraph, Tensor Principal Components Analysis and Sparse Principal Components Analysis. These SoS lower bounds involve analyzing large random matrices, wherein lie our main contributions. These results offer strong evidence for the truth of and insight into the low-degree likelihood ratio hypothesis, an important conjecture that predicts the power of bounded-time algorithms for hypothesis testing. We also develop general-purpose tools for analyzing the behavior of random matrices which are functions of independent random variables. Towards this, we build on and generalize the matrix variant of the Efron-Stein inequalities. In particular, our general theorem on matrix concentration recovers various results that have appeared in the literature. We expect these random matrix theory ideas to have other significant applications.