We derive high-dimensional scaling limits and fluctuations for the online least-squares Stochastic Gradient Descent (SGD) algorithm by taking the properties of the data generating model explicitly into consideration. Our approach treats the SGD iterates as an interacting particle system, where the expected interaction is characterized by the covariance structure of the input. Assuming smoothness conditions on moments of order up to eight orders, and without explicitly assuming Gaussianity, we establish the high-dimensional scaling limits and fluctuations in the form of infinite-dimensional Ordinary Differential Equations (ODEs) or Stochastic Differential Equations (SDEs). Our results reveal a precise three-step phase transition of the iterates; it goes from being ballistic, to diffusive, and finally to purely random behavior, as the noise variance goes from low, to moderate and finally to very-high noise setting. In the low-noise setting, we further characterize the precise fluctuations of the (scaled) iterates as infinite-dimensional SDEs. We also show the existence and uniqueness of solutions to the derived limiting ODEs and SDEs. Our results have several applications, including characterization of the limiting mean-square estimation or prediction errors and their fluctuations which can be obtained by analytically or numerically solving the limiting equations.
A Gaussian process (GP)-based methodology is proposed to emulate complex dynamical computer models (or simulators). The method relies on emulating the short-time numerical flow map of the system, where the flow map is a function that returns the solution of a dynamical system at a certain time point, given initial conditions. In order to predict the model output times series, a single realisation of the emulated flow map (i.e., its posterior distribution) is taken and used to iterate from the initial condition ahead in time. Repeating this procedure with multiple such draws creates a distribution over the time series whose mean and variance serve as the model output prediction and the associated uncertainty, respectively. However, since there is no known method to draw an exact sample from the GP posterior analytically, we approximate the kernel with random Fourier features and generate approximate sample paths. The proposed method is applied to emulate several dynamic nonlinear simulators including the well-known Lorenz and van der Pol models. The results suggest that our approach has a high predictive performance and the associated uncertainty can capture the dynamics of the system accurately. Additionally, our approach has potential for ``embarrassingly" parallel implementations where one can conduct the iterative predictions performed by a realisation on a single computing node.
We examine the behaviour of the Laplace and saddlepoint approximations in the high-dimensional setting, where the dimension of the model is allowed to increase with the number of observations. Approximations to the joint density, the marginal posterior density and the conditional density are considered. Our results show that under the mildest assumptions on the model, the error of the joint density approximation is $O(p^4/n)$ if $p = o(n^{1/4})$ for the Laplace approximation and saddlepoint approximation, and $O(p^3/n)$ if $p = o(n^{1/3})$ under additional assumptions on the second derivative of the log-likelihood. Stronger results are obtained for the approximation to the marginal posterior density.
Time series often reflect variation associated with other related variables. Controlling for the effect of these variables is useful when modeling or analysing the time series. We introduce a novel approach to normalize time series data conditional on a set of covariates. We do this by modeling the conditional mean and the conditional variance of the time series with generalized additive models using a set of covariates. The conditional mean and variance are then used to normalize the time series. We illustrate the use of conditionally normalized series using two applications involving river network data. First, we show how these normalized time series can be used to impute missing values in the data. Second, we show how the normalized series can be used to estimate the conditional autocorrelation function and conditional cross-correlation functions via additive models. Finally we use the conditional cross-correlations to estimate the time it takes water to flow between two locations in a river network.
Normalizing flows (NF) use a continuous generator to map a simple latent (e.g. Gaussian) distribution, towards an empirical target distribution associated with a training data set. Once trained by minimizing a variational objective, the learnt map provides an approximate generative model of the target distribution. Since standard NF implement differentiable maps, they may suffer from pathological behaviors when targeting complex distributions. For instance, such problems may appear for distributions on multi-component topologies or characterized by multiple modes with high probability regions separated by very unlikely areas. A typical symptom is the explosion of the Jacobian norm of the transformation in very low probability areas. This paper proposes to overcome this issue thanks to a new Markov chain Monte Carlo algorithm to sample from the target distribution in the latent domain before transporting it back to the target domain. The approach relies on a Metropolis adjusted Langevin algorithm (MALA) whose dynamics explicitly exploits the Jacobian of the transformation. Contrary to alternative approaches, the proposed strategy preserves the tractability of the likelihood and it does not require a specific training. Notably, it can be straightforwardly used with any pre-trained NF network, regardless of the architecture. Experiments conducted on synthetic and high-dimensional real data sets illustrate the efficiency of the method.
Variance reduction techniques are of crucial importance for the efficiency of Monte Carlo simulations in finance applications. We propose the use of neural SDEs, with control variates parameterized by neural networks, in order to learn approximately optimal control variates and hence reduce variance as trajectories of the SDEs are being simulated. We consider SDEs driven by Brownian motion and, more generally, by L\'{e}vy processes including those with infinite activity. For the latter case, we prove optimality conditions for the variance reduction. Several numerical examples from option pricing are presented.
The proliferation of news on social media platforms has led to concerns about the impact of biased and unreliable information on public discourse. This study examines differences in interaction patterns between public and private sharing of news articles on Facebook, focusing on articles with varying bias and reliability, as well as the depth of interactions. To analyze these patterns, we employed two complementary data collection methods using the CrowdTangle browser extension. We collected interaction data across all Facebook posts (private + public) referencing a manually labeled collection of over 30K news articles, as well as interaction data on public posts posted in the forums tracked by CrowdTangle. Our empirical findings, backed by rigorous statistical analysis, reveal significant differences in interaction patterns between public and private sharing across different classes of news in terms of bias and reliability, highlighting the role of user preferences and privacy settings in shaping the spread of news articles. Notably, we find that irrespective of news class, users tend to engage more deeply in private discussions compared to public ones. Additionally, Facebook users engage more deeply with content from the Right-biased class, and exhibit higher deep interaction ratio levels with content from the Most-unreliable class. This study is the first to directly compare the dynamics of public and private sharing of news articles on Facebook, specifically examining the interactions and depth of engagement with articles of varying bias and reliability. By providing new insights and shedding light on these aspects, our findings have significant implications for understanding the influence of social media on shaping public discourse.
Federated learning (FL) enables participating parties to collaboratively build a global model with boosted utility without disclosing private data information. Appropriate protection mechanisms have to be adopted to fulfill the requirements in preserving \textit{privacy} and maintaining high model \textit{utility}. The nature of the widely-adopted protection mechanisms including \textit{Randomization Mechanism} and \textit{Compression Mechanism} is to protect privacy via distorting model parameter. We measure the utility via the gap between the original model parameter and the distorted model parameter. We want to identify under what general conditions privacy-preserving federated learning can achieve near-optimal utility via data generation and parameter distortion. To provide an avenue for achieving near-optimal utility, we present an upper bound for utility loss, which is measured using two main terms called variance-reduction and model parameter discrepancy separately. Our analysis inspires the design of appropriate protection parameters for the protection mechanisms to achieve near-optimal utility and meet the privacy requirements simultaneously. The main techniques for the protection mechanism include parameter distortion and data generation, which are generic and can be applied extensively. Furthermore, we provide an upper bound for the trade-off between privacy and utility, which together with the lower bound illustrated in NFL form the conditions for achieving optimal trade-off.
A nonlinear sea-ice problem is considered in a least-squares finite element setting. The corresponding variational formulation approximating simultaneously the stress tensor and the velocity is analysed. In particular, the least-squares functional is coercive and continuous in an appropriate solution space and this proves the well-posedness of the problem. As the method does not require a compatibility condition between the finite element space, the formulation allows the use of piecewise polynomial spaces of the same approximation order for both the stress and the velocity approximations. A Newton-type iterative method is used to linearize the problem and numerical tests are provided to illustrate the theory.
We study the 2D Navier-Stokes equation with transport noise subject to periodic boundary conditions. Our main result is an error estimate for the time-discretisation showing a convergence rate of order (up to) 1/2. It holds with respect to mean square error convergence, whereas previously such a rate for the stochastic Navier-Stokes equations was only known with respect to convergence in probability. Our result is based on uniform-in-probability estimates for the continuous as well as the time-discrete solution exploiting the particular structure of the noise.
Generalized approximate message passing (GAMP) is a computationally efficient algorithm for estimating an unknown signal $w_0\in\mathbb{R}^N$ from a random linear measurement $y= Xw_0 + \epsilon\in\mathbb{R}^M$, where $X\in\mathbb{R}^{M\times N}$ is a known measurement matrix and $\epsilon$ is the noise vector. The salient feature of GAMP is that it can provide an unbiased estimator $\hat{r}^{\rm G}\sim\mathcal{N}(w_0, \hat{s}^2I_N)$, which can be used for various hypothesis-testing methods. In this study, we consider the bootstrap average of an unbiased estimator of GAMP for the elastic net. By numerically analyzing the state evolution of \emph{approximate message passing with resampling}, which has been proposed for computing bootstrap statistics of the elastic net estimator, we investigate when the bootstrap averaging reduces the variance of the unbiased estimator and the effect of optimizing the size of each bootstrap sample and hyperparameter of the elastic net regularization in the asymptotic setting $M, N\to\infty, M/N\to\alpha\in(0,\infty)$. The results indicate that bootstrap averaging effectively reduces the variance of the unbiased estimator when the actual data generation process is inconsistent with the sparsity assumption of the regularization and the sample size is small. Furthermore, we find that when $w_0$ is less sparse, and the data size is small, the system undergoes a phase transition. The phase transition indicates the existence of the region where the ensemble average of unbiased estimators of GAMP for the elastic net norm minimization problem yields the unbiased estimator with the minimum variance.