Parametricity is a property of the syntax of type theory implying e.g. that there is only one function having the type of the polymorphic identity function. Parametricity is usually proven externally, and does not hold internally. Internalising it is difficult because once there is a term witnessing parametricity, it also has to be parametric itself and this results in the appearance of higher dimensional cubes. In previous theories with internal parametricity, either an explicit syntax for higher cubes is present or the theory is extended with a new sort for the interval. In this paper we present a type theory with internal parametricity which is a simple extension of Martin-L\"of type thoery: there are a few new type formers, term formers and equations. Geometry is not explicit in this syntax, but emergent: the new operations and equations only refer to objects up to dimension 3. We show that this theory is modelled by presheaves over the BCH cube category. Fibrancy conditions are not needed because we use span-based rather than relational parametricity. We define a gluing model for this theory implying that external parametricity and canonicity hold. The theory can be seen as a special case of a new kind of modal type theory, and it is the simplest setting in which the computational properties of higher observational type theory can be demonstrated.
This paper gives a self-contained introduction to the Hilbert projective metric $\mathcal{H}$ and its fundamental properties, with a particular focus on the space of probability measures. We start by defining the Hilbert pseudo-metric on convex cones, focusing mainly on dual formulations of $\mathcal{H}$ . We show that linear operators on convex cones contract in the distance given by the hyperbolic tangent of $\mathcal{H}$, which in particular implies Birkhoff's classical contraction result for $\mathcal{H}$. Turning to spaces of probability measures, where $\mathcal{H}$ is a metric, we analyse the dual formulation of $\mathcal{H}$ in the general setting, and explore the geometry of the probability simplex under $\mathcal{H}$ in the special case of discrete probability measures. Throughout, we compare $\mathcal{H}$ with other distances between probability measures. In particular, we show how convergence in $\mathcal{H}$ implies convergence in total variation, $p$-Wasserstein distance, and any $f$-divergence. Furthermore, we derive a novel sharp bound for the total variation between two probability measures in terms of their Hilbert distance.
Stochastic optimization methods have been hugely successful in making large-scale optimization problems feasible when computing the full gradient is computationally prohibitive. Using the theory of modified equations for numerical integrators, we propose a class of stochastic differential equations that approximate the dynamics of general stochastic optimization methods more closely than the original gradient flow. Analyzing a modified stochastic differential equation can reveal qualitative insights about the associated optimization method. Here, we study mean-square stability of the modified equation in the case of stochastic coordinate descent.
This work considers Bayesian experimental design for the inverse boundary value problem of linear elasticity in a two-dimensional setting. The aim is to optimize the positions of compactly supported pressure activations on the boundary of the examined body in order to maximize the value of the resulting boundary deformations as data for the inverse problem of reconstructing the Lam\'e parameters inside the object. We resort to a linearized measurement model and adopt the framework of Bayesian experimental design, under the assumption that the prior and measurement noise distributions are mutually independent Gaussians. This enables the use of the standard Bayesian A-optimality criterion for deducing optimal positions for the pressure activations. The (second) derivatives of the boundary measurements with respect to the Lam\'e parameters and the positions of the boundary pressure activations are deduced to allow minimizing the corresponding objective function, i.e., the trace of the covariance matrix of the posterior distribution, by a gradient-based optimization algorithm. Two-dimensional numerical experiments are performed to demonstrate the functionality of our approach.
Many partial differential equations in mathematical physics describe the evolution of a time-dependent vector field. Examples arise in compressible fluid dynamics, shape analysis, optimal transport and shallow water equations. The flow of such a vector field generates a diffeomorphism, which can be viewed as the Lagrangian variable corresponding to the Eulerian vector field. From both computational and theoretical perspectives, it is natural to seek finite-dimensional analogs of vector fields and diffeomorphisms, constructed in such a way that the underlying geometric and algebraic properties persist (in particular, the induced Lie--Poisson structure). Here, we develop such a geometric discretization of the group of diffeomorphisms on a two-dimensional K\"ahler manifold, with special emphasis on the sphere. Our approach builds on quantization theory, combined with complexification of Zeitlin's model for incompressible two-dimensional hydrodynamics. Thus, we extend Zeitlin's approach from the incompressible to the compressible case. We provide a numerical example and discuss potential applications of the new, geometric discretization.
We show that for log-concave real random variables with fixed variance the Shannon differential entropy is minimized for an exponential random variable. We apply this result to derive upper bounds on capacities of additive noise channels with log-concave noise. We also improve constants in the reverse entropy power inequalities for log-concave random variables.
Recently, a stability theory has been developed to study the linear stability of modified Patankar--Runge--Kutta (MPRK) schemes. This stability theory provides sufficient conditions for a fixed point of an MPRK scheme to be stable as well as for the convergence of an MPRK scheme towards the steady state of the corresponding initial value problem, whereas the main assumption is that the initial value is sufficiently close to the steady state. Initially, numerical experiments in several publications indicated that these linear stability properties are not only local, but even global, as is the case for general linear methods. Recently, however, it was discovered that the linear stability of the MPDeC(8) scheme is indeed only local in nature. Our conjecture is that this is a result of negative Runge--Kutta (RK) parameters of MPDeC(8) and that linear stability is indeed global, if the RK parameters are nonnegative. To support this conjecture, we examine the family of MPRK22($\alpha$) methods with negative RK parameters and show that even among these methods there are methods for which the stability properties are only local. However, this local linear stability is not observed for MPRK22($\alpha$) schemes with nonnegative Runge-Kutta parameters.
Applying parallel-in-time algorithms to multiscale Hamiltonian systems to obtain stable long time simulations is very challenging. In this paper, we present novel data-driven methods aimed at improving the standard parareal algorithm developed by Lion, Maday, and Turinici in 2001, for multiscale Hamiltonian systems. The first method involves constructing a correction operator to improve a given inaccurate coarse solver through solving a Procrustes problem using data collected online along parareal trajectories. The second method involves constructing an efficient, high-fidelity solver by a neural network trained with offline generated data. For the second method, we address the issues of effective data generation and proper loss function design based on the Hamiltonian function. We show proof-of-concept by applying the proposed methods to a Fermi-Pasta-Ulum (FPU) problem. The numerical results demonstrate that the Procrustes parareal method is able to produce solutions that are more stable in energy compared to the standard parareal. The neural network solver can achieve comparable or better runtime performance compared to numerical solvers of similar accuracy. When combined with the standard parareal algorithm, the improved neural network solutions are slightly more stable in energy than the improved numerical coarse solutions.
While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a selective machine learning framework for making inferences about a finite-dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function and several candidate machine learning algorithms are available for estimating the nuisance parameters. We introduce a new selection criterion aimed at bias reduction in estimating the functional of interest based on a novel definition of pseudo-risk inspired by the double robustness property. Intuitively, the proposed criterion selects a pair of learners with the smallest pseudo-risk, so that the estimated functional is least sensitive to perturbations of a nuisance parameter. We establish an oracle property for a multi-fold cross-validation version of the new selection criterion which states that our empirical criterion performs nearly as well as an oracle with a priori knowledge of the pseudo-risk for each pair of candidate learners. Finally, we apply the approach to model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learners to account for confounding in an observational study which we illustrate in simulations and a data application.
We consider problems of minimizing functionals $\mathcal{F}$ of probability measures on the Euclidean space. To propose an accelerated gradient descent algorithm for such problems, we consider gradient flow of transport maps that give push-forward measures of an initial measure. Then we propose a deterministic accelerated algorithm by extending Nesterov's acceleration technique with momentum. This algorithm do not based on the Wasserstein geometry. Furthermore, to estimate the convergence rate of the accelerated algorithm, we introduce new convexity and smoothness for $\mathcal{F}$ based on transport maps. As a result, we can show that the accelerated algorithm converges faster than a normal gradient descent algorithm. Numerical experiments support this theoretical result.
Hawkes processes are often applied to model dependence and interaction phenomena in multivariate event data sets, such as neuronal spike trains, social interactions, and financial transactions. In the nonparametric setting, learning the temporal dependence structure of Hawkes processes is generally a computationally expensive task, all the more with Bayesian estimation methods. In particular, for generalised nonlinear Hawkes processes, Monte-Carlo Markov Chain methods applied to compute the doubly intractable posterior distribution are not scalable to high-dimensional processes in practice. Recently, efficient algorithms targeting a mean-field variational approximation of the posterior distribution have been proposed. In this work, we first unify existing variational Bayes approaches under a general nonparametric inference framework, and analyse the asymptotic properties of these methods under easily verifiable conditions on the prior, the variational class, and the nonlinear model. Secondly, we propose a novel sparsity-inducing procedure, and derive an adaptive mean-field variational algorithm for the popular sigmoid Hawkes processes. Our algorithm is parallelisable and therefore computationally efficient in high-dimensional setting. Through an extensive set of numerical simulations, we also demonstrate that our procedure is able to adapt to the dimensionality of the parameter of the Hawkes process, and is partially robust to some type of model mis-specification.