This paper concerns the numerical solution of the two-dimensional time-dependent partial integro-differential equation (PIDE) that holds for the values of European-style options under the two-asset Kou jump-diffusion model. A main feature of this equation is the presence of a nonlocal double integral term. For its numerical evaluation, we extend a highly efficient algorithm derived by Toivanen (2008) in the case of the one-dimensional Kou integral. The acquired algorithm for the two-dimensional Kou integral has optimal computational cost: the number of basic arithmetic operations is directly proportional to the number of spatial grid points in the semidiscretization. For the effective discretization in time, we study seven contemporary operator splitting schemes of the implicit-explicit (IMEX) and the alternating direction implicit (ADI) kind. All these schemes allow for a convenient, explicit treatment of the integral term. By ample numerical experiments for put-on-the-average option values, the stability and convergence behaviour as well as the mutual performance of the seven operator splitting schemes are investigated. Moreover, the Greeks Delta and Gamma are considered.
In this paper we study estimating Generalized Linear Models (GLMs) in the case where the agents (individuals) are strategic or self-interested and they concern about their privacy when reporting data. Compared with the classical setting, here we aim to design mechanisms that can both incentivize most agents to truthfully report their data and preserve the privacy of individuals' reports, while their outputs should also close to the underlying parameter. In the first part of the paper, we consider the case where the covariates are sub-Gaussian and the responses are heavy-tailed where they only have the finite fourth moments. First, motivated by the stationary condition of the maximizer of the likelihood function, we derive a novel private and closed form estimator. Based on the estimator, we propose a mechanism which has the following properties via some appropriate design of the computation and payment scheme for several canonical models such as linear regression, logistic regression and Poisson regression: (1) the mechanism is $o(1)$-jointly differentially private (with probability at least $1-o(1)$); (2) it is an $o(\frac{1}{n})$-approximate Bayes Nash equilibrium for a $(1-o(1))$-fraction of agents to truthfully report their data, where $n$ is the number of agents; (3) the output could achieve an error of $o(1)$ to the underlying parameter; (4) it is individually rational for a $(1-o(1))$ fraction of agents in the mechanism ; (5) the payment budget required from the analyst to run the mechanism is $o(1)$. In the second part, we consider the linear regression model under more general setting where both covariates and responses are heavy-tailed and only have finite fourth moments. By using an $\ell_4$-norm shrinkage operator, we propose a private estimator and payment scheme which have similar properties as in the sub-Gaussian case.
In domains where sample sizes are limited, efficient learning algorithms are critical. Learning using privileged information (LuPI) offers increased sample efficiency by allowing prediction models access to types of information at training time which is unavailable when the models are used. In recent work, it was shown that for prediction in linear-Gaussian dynamical systems, a LuPI learner with access to intermediate time series data is never worse and often better in expectation than any unbiased classical learner. We provide new insights into this analysis and generalize it to nonlinear prediction tasks in latent dynamical systems, extending theoretical guarantees to the case where the map connecting latent variables and observations is known up to a linear transform. In addition, we propose algorithms based on random features and representation learning for the case when this map is unknown. A suite of empirical results confirm theoretical findings and show the potential of using privileged time-series information in nonlinear prediction.
We present a Newton-Krylov solver for a viscous-plastic sea-ice model. This constitutive relation is commonly used in climate models to describe the material properties of sea ice. Due to the strong nonlinearity introduced by the material law in the momentum equation, the development of fast, robust and scalable solvers is still a substantial challenge. In this paper, we propose a novel primal-dual Newton linearization for the implicitly-in-time discretized momentum equation. Compared to existing methods, it converges faster and more robustly with respect to mesh refinement, and thus enables numerically converged sea-ice simulations at high resolutions. Combined with an algebraic multigrid-preconditioned Krylov method for the linearized systems, which contain strongly varying coefficients, the resulting solver scales well and can be used in parallel. We present experiments for two challenging test problems and study solver performance for problems with up to 8.4 million spatial unknowns.
We consider the stochastic gradient descent (SGD) algorithm driven by a general stochastic sequence, including i.i.d noise and random walk on an arbitrary graph, among others; and analyze it in the asymptotic sense. Specifically, we employ the notion of `efficiency ordering', a well-analyzed tool for comparing the performance of Markov Chain Monte Carlo (MCMC) samplers, for SGD algorithms in the form of Loewner ordering of covariance matrices associated with the scaled iterate errors in the long term. Using this ordering, we show that input sequences that are more efficient for MCMC sampling also lead to smaller covariance of the errors for SGD algorithms in the limit. This also suggests that an arbitrarily weighted MSE of SGD iterates in the limit becomes smaller when driven by more efficient chains. Our finding is of particular interest in applications such as decentralized optimization and swarm learning, where SGD is implemented in a random walk fashion on the underlying communication graph for cost issues and/or data privacy. We demonstrate how certain non-Markovian processes, for which typical mixing-time based non-asymptotic bounds are intractable, can outperform their Markovian counterparts in the sense of efficiency ordering for SGD. We show the utility of our method by applying it to gradient descent with shuffling and mini-batch gradient descent, reaffirming key results from existing literature under a unified framework. Empirically, we also observe efficiency ordering for variants of SGD such as accelerated SGD and Adam, open up the possibility of extending our notion of efficiency ordering to a broader family of stochastic optimization algorithms.
This paper presents a statistical model for stationary ergodic point processes, estimated from a single realization observed in a square window. With existing approaches in stochastic geometry, it is very difficult to model processes with complex geometries formed by a large number of particles. Inspired by recent works on gradient descent algorithms for sampling maximum-entropy models, we describe a model that allows for fast sampling of new configurations reproducing the statistics of the given observation. Starting from an initial random configuration, its particles are moved according to the gradient of an energy, in order to match a set of prescribed moments (functionals). Our moments are defined via a phase harmonic operator on the wavelet transform of point patterns. They allow one to capture multi-scale interactions between the particles, while controlling explicitly the number of moments by the scales of the structures to model. We present numerical experiments on point processes with various geometric structures, and assess the quality of the model by spectral and topological data analysis.
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.
Parallel-in-time methods for partial differential equations (PDEs) have been the subject of intense development over recent decades, particularly for diffusion-dominated problems. It has been widely reported in the literature, however, that many of these methods perform quite poorly for advection-dominated problems. Here we analyze the particular iterative parallel-in-time algorithm of multigrid reduction-in-time (MGRIT) for discretizations of constant-wave-speed linear advection problems. We focus on common method-of-lines discretizations that employ upwind finite differences in space and Runge-Kutta methods in time. Using a convergence framework we developed in previous work, we prove for a subclass of these discretizations that, if using the standard approach of rediscretizing the fine-grid problem on the coarse grid, robust MGRIT convergence with respect to CFL number and coarsening factor is not possible. This poor convergence and non-robustness is caused, at least in part, by an inadequate coarse-grid correction for smooth Fourier modes known as characteristic components.We propose an alternative coarse-grid that provides a better correction of these modes. This coarse-grid operator is related to previous work and uses a semi-Lagrangian discretization combined with an implicitly treated truncation error correction. Theory and numerical experiments show the coarse-grid operator yields fast MGRIT convergence for many of the method-of-lines discretizations considered, including for both implicit and explicit discretizations of high order.
We investigate the computational properties of basic mathematical notions pertaining to $\mathbb{R}\rightarrow \mathbb{R}$-functions and subsets of $\mathbb{R}$, like finiteness, countability, (absolute) continuity, bounded variation, suprema, and regularity. We work in higher-order computability theory based on Kleene's S1-S9 schemes. We show that the aforementioned italicised properties give rise to two huge and robust classes of computationally equivalent operations, the latter based on well-known theorems from the mainstream mathematics literature. As part of this endeavour, we develop an equivalent $\lambda$-calculus formulation of S1-S9 that accommodates partial objects. We show that the latter are essential to our enterprise via the study of countably based and partial functionals of type $3$.
We address here the discretization of the momentum convection operator for fluid flow simulations on 2D triangular and quadrangular meshes and 3D polyhedral meshes containing hexahedra, tetrahedra, prisms and pyramids. The finite volume scheme that we use for the full Euler equations is based on a staggered discretization: the density unknowns are associated with a primal mesh, whereas the velocity unknowns are associated with a "fictive" dual mesh. Accordingly, the convection operator of the mass balance equation is derived on the primal mesh, while the the convection operator of the momentum balance equation is discretized on the dual mesh. To avoid any hazardous interpolation of the unknowns on a possibly ill-defined dual mesh, the mass fluxes of the momentum convection operator are computed from the mass fluxes of the mass balance equation, so as to ensure the stability of the resulting operator. A coherent reconstruction of these dual fluxes is possible, based only on the kind of considered polygonal or polyhedral cell, and not on each cell itself. Moreover, we show that this process still yields a consistent convection operator in the Lax-Wendroff sense, that is, if a sequence of piecewise constant functions is supposed to converge to a a given limit, then the weak form of the corresponding discrete convection operator converges to the weak form of the continuous operator applied to this limit. The derived discrete convection operator applies to both constant and variable density flows and may thus be implemented in a scheme for incompressible or compressible flows. Numerical tests are performed for the Euler equations on several types of mesh, including hybrid meshes, and show the excellent performance of the method.
Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.