We obtained convergence rates of the collocation approximation by deep ReLU neural networks of the solution $u$ to elliptic PDEs with lognormal inputs, parametrized by $\boldsymbol{y}$ from the non-compact set $\mathbb{R}^\infty$. The approximation error is measured in the norm of the Bochner space $L_2(\mathbb{R}^\infty, V, \gamma)$, where $\gamma$ is the infinite tensor product standard Gaussian probability measure on $\mathbb{R}^\infty$ and $V$ is the energy space. Under a certain assumption on $\ell_q$-summability for the lognormal inputs $(0<q<2)$, we proved that given arbitrary number $\delta >0$ small enough, for every integer $n > 1$, one can construct a compactly supported deep ReLU neural network $\boldsymbol{\phi}_n:= \big(\phi_j\big)_{j=1}^m$ of size at most $n$ on $\mathbb{R}^m$ with $m =\mathcal{O}(n^{1 - \delta})$, and a sequence of points $\big(\boldsymbol{y}j\big)_{j=1}^m \subset \mathbb{R}^m$ (which are independent of $u$) so that the collocation approximation of $u$ by $\Phi_n u:= \sum_{j=1}^m u\big(\boldsymbol{y}^j\big) \Phi_j,$ which is based on the $m$ solvers $\Big( u\big(\boldsymbol{y}^j\big)\Big)_{j=1}^m$ and the deep ReLU network $\boldsymbol{\phi}_n$, gives the twofold error bounds: $\|u- \Phi_n u \|_{L_2(\mathbb{R}^\infty V, \gamma)} = \mathcal{O}\left(m^{- (1/q - 1/2)}\right) =\mathcal{O}\left(n^{- (1-\delta)(1/q - 1/2)}\right),$ where $\Phi_j$ are the extensions of $\phi_j$ to the whole $\mathbb{R}^\infty$. We also obtained similar results for the case when the lognormal inputs are parametrized on $\mathbb{R}^M$ with very large dimension $M$, and the approximation error is measured in the $\sqrt{g_M}$-weighted uniform norm of the Bochner space $L_\infty^{\sqrt{g}}(\mathbb{R}^M, V)$, where $g_M$ is the density function of the standard Gaussian probability measure on $\mathbb{R}^M$.
We investigate the problem of approximating the matrix function $f(A)$ by $r(A)$, with $f$ a Markov function, $r$ a rational interpolant of $f$, and $A$ a symmetric Toeplitz matrix. In a first step, we obtain a new upper bound for the relative interpolation error $1-r/f$ on the spectral interval of $A$. By minimizing this upper bound over all interpolation points, we obtain a new, simple and sharp a priori bound for the relative interpolation error. We then consider three different approaches of representing and computing the rational interpolant $r$. Theoretical and numerical evidence is given that any of these methods for a scalar argument allows to achieve high precision, even in the presence of finite precision arithmetic. We finally investigate the problem of efficiently evaluating $r(A)$, where it turns out that the relative error for a matrix argument is only small if we use a partial fraction decomposition for $r$ following Antoulas and Mayo. An important role is played by a new stopping criterion which ensures to automatically find the degree of $r$ leading to a small error, even in presence of finite precision arithmetic.
Given an $n$-point metric space $(\mathcal{X},d)$ where each point belongs to one of $m=O(1)$ different categories or groups and a set of integers $k_1, \ldots, k_m$, the fair Max-Min diversification problem is to select $k_i$ points belonging to category $i\in [m]$, such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1. We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor $2$-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a $6$-approximation that is guaranteed to satisfy the fairness constraints up to a factor $1-\epsilon$ for any constant $\epsilon$. We also present a linear time algorithm returning an $m+1$ approximation with exact fairness. The best previous result was a $3m-1$ approximation. 2. We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. For constant dimensions, categories and any constant $\epsilon>0$, we present a $1+\epsilon$ approximation algorithm that runs in $O(nk) + 2^{O(k)}$ time where $k=k_1+\ldots+k_m$. We can improve the running time to $O(nk)+ poly(k)$ at the expense of only picking $(1-\epsilon) k_i$ points from category $i\in [m]$. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.
Parametrized max-affine (PMA) and parametrized log-sum-exp (PLSE) networks are proposed for general decision-making problems. The proposed approximators generalize existing convex approximators, namely, max-affine (MA) and log-sum-exp (LSE) networks, by considering function arguments of condition and decision variables and replacing the network parameters of MA and LSE networks with continuous functions with respect to the condition variable. The universal approximation theorem of PMA and PLSE is proven, which implies that PMA and PLSE are shape-preserving universal approximators for parametrized convex continuous functions. Practical guidelines for incorporating deep neural networks within PMA and PLSE networks are provided. A numerical simulation is performed to demonstrate the performance of the proposed approximators. The simulation results support that PLSE outperforms other existing approximators in terms of minimizer and optimal value errors with scalable and efficient computation for high-dimensional cases.
We develop several deep learning algorithms for approximating families of parametric PDE solutions. The proposed algorithms approximate solutions together with their gradients, which in the context of mathematical finance means that the derivative prices and hedging strategies are computed simulatenously. Having approximated the gradient of the solution one can combine it with a Monte-Carlo simulation to remove the bias in the deep network approximation of the PDE solution (derivative price). This is achieved by leveraging the Martingale Representation Theorem and combining the Monte Carlo simulation with the neural network. The resulting algorithm is robust with respect to quality of the neural network approximation and consequently can be used as a black-box in case only limited a priori information about the underlying problem is available. We believe this is important as neural network based algorithms often require fair amount of tuning to produce satisfactory results. The methods are empirically shown to work for high-dimensional problems (e.g. 100 dimensions). We provide diagnostics that shed light on appropriate network architectures.
We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of $Q$-function estimation is well-posed in the sense of $L^2$-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor $\gamma$ imposed in the recent literature for obtaining the $L^2$ convergence rates of various $Q$-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of $Q$-function and its derivatives in both sup-norm and $L^2$-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for $Q$-function but also efficient estimation on the value of any target policy in off-policy settings.
Stochastic PDE eigenvalue problems are useful models for quantifying the uncertainty in several applications from the physical sciences and engineering, e.g., structural vibration analysis, the criticality of a nuclear reactor or photonic crystal structures. In this paper we present a simple multilevel quasi-Monte Carlo (MLQMC) method for approximating the expectation of the minimal eigenvalue of an elliptic eigenvalue problem with coefficients that are given as a series expansion of countably-many stochastic parameters. The MLQMC algorithm is based on a hierarchy of discretisations of the spatial domain and truncations of the dimension of the stochastic parameter domain. To approximate the expectations, randomly shifted lattice rules are employed. This paper is primarily dedicated to giving a rigorous analysis of the error of this algorithm. A key step in the error analysis requires bounds on the mixed derivatives of the eigenfunction with respect to both the stochastic and spatial variables simultaneously. Under stronger smoothness assumptions on the parametric dependence, our analysis also extends to multilevel higher-order quasi-Monte Carlo rules. An accompanying paper [Gilbert and Scheichl, 2022], focusses on practical extensions of the MLQMC algorithm to improve efficiency, and presents numerical results.
We study nonparametric Bayesian models for reversible multi-dimensional diffusions with periodic drift. For continuous observation paths, reversibility is exploited to prove a general posterior contraction rate theorem for the drift gradient vector field under approximation-theoretic conditions on the induced prior for the invariant measure. The general theorem is applied to Gaussian priors and $p$-exponential priors, which are shown to converge to the truth at the minimax optimal rate over Sobolev smoothness classes in any dimension
Based on the Bezout approach we propose a simple algorithm to determine the {\tt gcd} of two polynomials which doesn't need division, like the Euclidean algorithm, or determinant calculations, like the Sylvester matrix algorithm. The algorithm needs only $n$ steps for polynomials of degree $n$. Formal manipulations give the discriminant or the resultant for any degree without needing division nor determinant calculation.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.
In this paper, we propose a nonlinear distance metric learning scheme based on the fusion of component linear metrics. Instead of merging displacements at each data point, our model calculates the velocities induced by the component transformations, via a geodesic interpolation on a Lie transfor- mation group. Such velocities are later summed up to produce a global transformation that is guaranteed to be diffeomorphic. Consequently, pair-wise distances computed this way conform to a smooth and spatially varying metric, which can greatly benefit k-NN classification. Experiments on synthetic and real datasets demonstrate the effectiveness of our model.