Hamiltonian systems are differential equations which describe systems in classical mechanics, plasma physics, and sampling problems. They exhibit many structural properties, such as a lack of attractors and the presence of conservation laws. To predict Hamiltonian dynamics based on discrete trajectory observations, incorporation of prior knowledge about Hamiltonian structure greatly improves predictions. This is typically done by learning the system's Hamiltonian and then integrating the Hamiltonian vector field with a symplectic integrator. For this, however, Hamiltonian data needs to be approximated based on the trajectory observations. Moreover, the numerical integrator introduces an additional discretisation error. In this paper, we show that an inverse modified Hamiltonian structure adapted to the geometric integrator can be learned directly from observations. A separate approximation step for the Hamiltonian data avoided. The inverse modified data compensates for the discretisation error such that the discretisation error is eliminated. The technique is developed for Gaussian Processes.
We propose nonparametric estimators for the second-order central moments of spherical random fields within a functional data context. We consider a measurement framework where each field among an identically distributed collection of spherical random fields is sampled at a few random directions, possibly subject to measurement error. The collection of fields could be i.i.d. or serially dependent. Though similar setups have already been explored for random functions defined on the unit interval, the nonparametric estimators proposed in the literature often rely on local polynomials, which do not readily extend to the (product) spherical setting. We therefore formulate our estimation procedure as a variational problem involving a generalized Tikhonov regularization term. The latter favours smooth covariance/autocovariance functions, where the smoothness is specified by means of suitable Sobolev-like pseudo-differential operators. Using the machinery of reproducing kernel Hilbert spaces, we establish representer theorems that fully characterizing the form of our estimators. We determine their uniform rates of convergence as the number of fields diverges, both for the dense (increasing number of spatial samples) and sparse (bounded number of spatial samples) regimes. We moreover validate and demonstrate the practical feasibility of our estimation procedure in a simulation setting, assuming a fixed number of samples per field. Our numerical estimation procedure leverages the sparsity and second-order Kronecker structure of our setup to reduce the computational and memory requirements by approximately three orders of magnitude compared to a naive implementation would require.
The principle of least action is one of the most fundamental physical principle. It says that among all possible motions connecting two points in a phase space, the system will exhibit those motions which extremise an action functional. Many qualitative features of dynamical systems, such as the presence of conservation laws and energy balance equations, are related to the existence of an action functional. Incorporating variational structure into learning algorithms for dynamical systems is, therefore, crucial in order to make sure that the learned model shares important features with the exact physical system. In this paper we show how to incorporate variational principles into trajectory predictions of learned dynamical systems. The novelty of this work is that (1) our technique relies only on discrete position data of observed trajectories. Velocities or conjugate momenta do {\em not} need to be observed or approximated and {\em no} prior knowledge about the form of the variational principle is assumed. Instead, they are recovered using backward error analysis. (2) Moreover, our technique compensates discretisation errors when trajectories are computed from the learned system. This is important when moderate to large step-sizes are used and high accuracy is required. For this, we introduce and rigorously analyse the concept of inverse modified Lagrangians by developing an inverse version of variational backward error analysis. (3) Finally, we introduce a method to perform system identification from position observations only, based on variational backward error analysis.
We establish a new perturbation theory for orthogonal polynomials using a Riemann-Hilbert approach and consider applications in numerical linear algebra and random matrix theory. We show that the orthogonal polynomials with respect to two measures can be effectively compared using the difference of their Stieltjes transforms on a suitably chosen contour. Moreover, when two measures are close and satisfy some regularity conditions, we use the theta functions of a hyperelliptic Riemann surface to derive explicit and accurate expansion formulae for the perturbed orthogonal polynomials. The leading error terms can be fully characterized by the difference of the Stieltjes transforms on the contour. The results are applied to analyze several numerical algorithms from linear algebra, including the Lanczos tridiagonalization procedure, the Cholesky factorization and the conjugate gradient algorithm (CGA). As a case study, we investigate these algorithms applied to a general spiked sample covariance matrix model by considering the eigenvector empirical spectral distribution and its limit, allowing for precise estimates on the algorithms as the number of iterations diverges. For this concrete random matrix model, beyond the first order expansion, we derive a mesoscopic central limit theorem for the associated orthogonal polynomials and other quantities relevant to numerical algorithms.
When mining large datasets in order to predict new data, limitations of the principles behind statistical machine learning pose a serious challenge not only to the Big Data deluge, but also to the traditional assumptions that data generating processes are biased toward low algorithmic complexity. Even when one assumes an underlying algorithmic-informational bias toward simplicity in finite dataset generators, we show that fully automated, with or without access to pseudo-random generators, computable learning algorithms, in particular those of statistical nature used in current approaches to machine learning (including deep learning), can always be deceived, naturally or artificially, by sufficiently large datasets. In particular, we demonstrate that, for every finite learning algorithm, there is a sufficiently large dataset size above which the algorithmic probability of an unpredictable deceiver is an upper bound (up to a multiplicative constant that only depends on the learning algorithm) for the algorithmic probability of any other larger dataset. In other words, very large and complex datasets are as likely to deceive learning algorithms into a "simplicity bubble" as any other particular dataset. These deceiving datasets guarantee that any prediction will diverge from the high-algorithmic-complexity globally optimal solution while converging toward the low-algorithmic-complexity locally optimal solution. We discuss the framework and empirical conditions for circumventing this deceptive phenomenon, moving away from statistical machine learning towards a stronger type of machine learning based on, or motivated by, the intrinsic power of algorithmic information theory and computability theory.
Adapting a definition given by Bjerkevik and Lesnick for multiparameter persistence modules, we introduce an $\ell^p$-type extension of the interleaving distance on merge trees. We show that our distance is a metric, and that it upper-bounds the $p$-Wasserstein distance between the associated barcodes. For each $p\in[1,\infty]$, we prove that this distance is stable with respect to cellular sublevel filtrations and that it is the universal (i.e., largest) distance satisfying this stability property. In the $p=\infty$ case, this gives a novel proof of universality for the interleaving distance on merge trees.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Importance sampling is one of the most widely used variance reduction strategies in Monte Carlo rendering. In this paper, we propose a novel importance sampling technique that uses a neural network to learn how to sample from a desired density represented by a set of samples. Our approach considers an existing Monte Carlo rendering algorithm as a black box. During a scene-dependent training phase, we learn to generate samples with a desired density in the primary sample space of the rendering algorithm using maximum likelihood estimation. We leverage a recent neural network architecture that was designed to represent real-valued non-volume preserving ('Real NVP') transformations in high dimensional spaces. We use Real NVP to non-linearly warp primary sample space and obtain desired densities. In addition, Real NVP efficiently computes the determinant of the Jacobian of the warp, which is required to implement the change of integration variables implied by the warp. A main advantage of our approach is that it is agnostic of underlying light transport effects, and can be combined with many existing rendering techniques by treating them as a black box. We show that our approach leads to effective variance reduction in several practical scenarios.
We propose an algorithm for real-time 6DOF pose tracking of rigid 3D objects using a monocular RGB camera. The key idea is to derive a region-based cost function using temporally consistent local color histograms. While such region-based cost functions are commonly optimized using first-order gradient descent techniques, we systematically derive a Gauss-Newton optimization scheme which gives rise to drastically faster convergence and highly accurate and robust tracking performance. We furthermore propose a novel complex dataset dedicated for the task of monocular object pose tracking and make it publicly available to the community. To our knowledge, It is the first to address the common and important scenario in which both the camera as well as the objects are moving simultaneously in cluttered scenes. In numerous experiments - including our own proposed data set - we demonstrate that the proposed Gauss-Newton approach outperforms existing approaches, in particular in the presence of cluttered backgrounds, heterogeneous objects and partial occlusions.
Methods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.