In this paper, we propose a weak approximation of the reflection coupling (RC) for stochastic differential equations (SDEs), and prove it converges weakly to the desired coupling. In contrast to the RC, the proposed approximate reflection coupling (ARC) need not take the hitting time of processes to the diagonal set into consideration and can be defined as the solution of some SDEs on the whole time interval. Therefore, ARC can work effectively against SDEs with different drift terms. As an application of ARC, an evaluation on the effectiveness of the stochastic gradient descent in a non-convex setting is also described. For the sample size $n$, the step size $\eta$, and the batch size $B$, we derive uniform evaluations on the time with orders $n^{-1}$, $\eta^{1/2}$, and $\sqrt{(n - B) / B (n - 1)}$, respectively.
In this work we are interested in general linear inverse problems where the corresponding forward problem is solved iteratively using fixed point methods. Then one-shot methods, which iterate at the same time on the forward problem solution and on the inverse problem unknown, can be applied. We analyze two variants of the so-called multi-step one-shot methods and establish sufficient conditions on the descent step for their convergence, by studying the eigenvalues of the block matrix of the coupled iterations. Several numerical experiments are provided to illustrate the convergence of these methods in comparison with the classical usual and shifted gradient descent. In particular, we observe that very few inner iterations on the forward problem are enough to guarantee good convergence of the inversion algorithm.
Heterogeneity is a dominant factor in the behaviour of many biological processes. Despite this, it is common for mathematical and statistical analyses to ignore biological heterogeneity as a source of variability in experimental data. Therefore, methods for exploring the identifiability of models that explicitly incorporate heterogeneity through variability in model parameters are relatively underdeveloped. We develop a new likelihood-based framework, based on moment matching, for inference and identifiability analysis of differential equation models that capture biological heterogeneity through parameters that vary according to probability distributions. As our novel method is based on an approximate likelihood function, it is highly flexible; we demonstrate identifiability analysis using both a frequentist approach based on profile likelihood, and a Bayesian approach based on Markov-chain Monte Carlo. Through three case studies, we demonstrate our method by providing a didactic guide to inference and identifiability analysis of hyperparameters that relate to the statistical moments of model parameters from independent observed data. Our approach has a computational cost comparable to analysis of models that neglect heterogeneity, a significant improvement over many existing alternatives. We demonstrate how analysis of random parameter models can aid better understanding of the sources of heterogeneity from biological data.
A time-varying zero-inflated serially dependent Poisson process is proposed. The model assumes that the intensity of the Poisson Process evolves according to a generalized autoregressive conditional heteroscedastic (GARCH) formulation. The proposed model is a generalization of the zero-inflated Poisson Integer GARCH model proposed by Fukang Zhu in 2012, which in return is a generalization of the Integer GARCH (INGARCH) model introduced by Ferland, Latour, and Oraichi in 2006. The proposed model builds on previous work by allowing the zero-inflation parameter to vary over time, governed by a deterministic function or by an exogenous variable. Both the Expectation Maximization (EM) and the Maximum Likelihood Estimation (MLE) approaches are presented as possible estimation methods. A simulation study shows that both parameter estimation methods provide good estimates. Applications to two real-life data sets show that the proposed INGARCH model provides a better fit than the traditional zero-inflated INGARCH model in the cases considered.
Mark-point dependence plays a critical role in research problems that can be fitted into the general framework of marked point processes. In this work, we focus on adjusting for mark-point dependence when estimating the mean and covariance functions of the mark process, given independent replicates of the marked point process. We assume that the mark process is a Gaussian process and the point process is a log-Gaussian Cox process, where the mark-point dependence is generated through the dependence between two latent Gaussian processes. Under this framework, naive local linear estimators ignoring the mark-point dependence can be severely biased. We show that this bias can be corrected using a local linear estimator of the cross-covariance function and establish uniform convergence rates of the bias-corrected estimators. Furthermore, we propose a test statistic based on local linear estimators for mark-point independence, which is shown to converge to an asymptotic normal distribution in a parametric $\sqrt{n}$-convergence rate. Model diagnostics tools are developed for key model assumptions and a robust functional permutation test is proposed for a more general class of mark-point processes. The effectiveness of the proposed methods is demonstrated using extensive simulations and applications to two real data examples.
Importance sampling (IS) is valuable in reducing the variance of Monte Carlo sampling for many areas, including finance, rare event simulation, and Bayesian inference. It is natural and obvious to combine quasi-Monte Carlo (QMC) methods with IS to achieve a faster rate of convergence. However, a naive replacement of Monte Carlo with QMC may not work well. This paper investigates the convergence rates of randomized QMC-based IS for estimating integrals with respect to a Gaussian measure, in which the IS measure is a Gaussian or $t$ distribution. We prove that if the target function satisfies the so-called boundary growth condition and the covariance matrix of the IS density has eigenvalues no smaller than 1, then randomized QMC with the Gaussian proposal has a root mean squared error of $O(N^{-1+\epsilon})$ for arbitrarily small $\epsilon>0$. Similar results of $t$ distribution as the proposal are also established. These sufficient conditions help to assess the effectiveness of IS in QMC. For some particular applications, we find that the Laplace IS, a very general approach to approximate the target function by a quadratic Taylor approximation around its mode, has eigenvalues smaller than 1, making the resulting integrand less favorable for QMC. From this point of view, when using Gaussian distributions as the IS proposal, a change of measure via Laplace IS may transform a favorable integrand into unfavorable one for QMC although the variance of Monte Carlo sampling is reduced. We also give some examples to verify our propositions and warn against naive replacement of MC with QMC under IS proposals. Numerical results suggest that using Laplace IS with $t$ distributions is more robust than that with Gaussian distributions.
The present paper addresses the convergence of a first order in time incremental projection scheme for the time-dependent incompressible Navier-Stokes equations to a weak solution, without any assumption of existence or regularity assumptions on the exact solution. We prove the convergence of the approximate solutions obtained by the semi-discrete scheme and a fully discrete scheme using a staggered finite volume scheme on non uniform rectangular meshes. Some first a priori estimates on the approximate solutions yield the existence. Compactness arguments, relying on these estimates, together with some estimates on the translates of the discrete time derivatives, are then developed to obtain convergence (up to the extraction of a subsequence), when the time step tends to zero in the semi-discrete scheme and when the space and time steps tend to zero in the fully discrete scheme; the approximate solutions are thus shown to converge to a limit function which is then shown to be a weak solution to the continuous problem by passing to the limit in these schemes.
In this paper, we study contrastive learning from an optimization perspective, aiming to analyze and address a fundamental issue of existing contrastive learning methods that either rely on a large batch size or a large dictionary of feature vectors. We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point. From the optimization perspective, we explain why existing methods such as SimCLR require a large batch size in order to achieve a satisfactory result. In order to remove such requirement, we propose a memory-efficient Stochastic Optimization algorithm for solving the Global objective of Contrastive Learning of Representations, named SogCLR. We show that its optimization error is negligible under a reasonable condition after a sufficient number of iterations or is diminishing for a slightly different global contrastive objective. Empirically, we demonstrate that SogCLR with small batch size (e.g., 256) can achieve similar performance as SimCLR with large batch size (e.g., 8192) on self-supervised learning task on ImageNet-1K. We also attempt to show that the proposed optimization technique is generic and can be applied to solving other contrastive losses, e.g., two-way contrastive losses for bimodal contrastive learning. The proposed method is implemented in our open-sourced library LibAUC (www.libauc.org).
In this work, we consider the task of improving the accuracy of dynamic models for model predictive control (MPC) in an online setting. Even though prediction models can be learned and applied to model-based controllers, these models are often learned offline. In this offline setting, training data is first collected and a prediction model is learned through an elaborated training procedure. After the model is trained to a desired accuracy, it is then deployed in a model predictive controller. However, since the model is learned offline, it does not adapt to disturbances or model errors observed during deployment. To improve the adaptiveness of the model and the controller, we propose an online dynamics learning framework that continually improves the accuracy of the dynamic model during deployment. We adopt knowledge-based neural ordinary differential equations (KNODE) as the dynamic models, and use techniques inspired by transfer learning to continually improve the model accuracy. We demonstrate the efficacy of our framework with a quadrotor robot, and verify the framework in both simulations and physical experiments. Results show that the proposed approach is able to account for disturbances that are possibly time-varying, while maintaining good trajectory tracking performance.
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.
As opaque predictive models increasingly impact many areas of modern life, interest in quantifying the importance of a given input variable for making a specific prediction has grown. Recently, there has been a proliferation of model-agnostic methods to measure variable importance (VI) that analyze the difference in predictive power between a full model trained on all variables and a reduced model that excludes the variable(s) of interest. A bottleneck common to these methods is the estimation of the reduced model for each variable (or subset of variables), which is an expensive process that often does not come with theoretical guarantees. In this work, we propose a fast and flexible method for approximating the reduced model with important inferential guarantees. We replace the need for fully retraining a wide neural network by a linearization initialized at the full model parameters. By adding a ridge-like penalty to make the problem convex, we prove that when the ridge penalty parameter is sufficiently large, our method estimates the variable importance measure with an error rate of $O(\frac{1}{\sqrt{n}})$ where $n$ is the number of training samples. We also show that our estimator is asymptotically normal, enabling us to provide confidence bounds for the VI estimates. We demonstrate through simulations that our method is fast and accurate under several data-generating regimes, and we demonstrate its real-world applicability on a seasonal climate forecasting example.