In this work we undertake a thorough study of the non-asymptotic properties of the vanilla generative adversarial networks (GANs). We derive theoretical guarantees for the density estimation with GANs under a proper choice of the deep neural networks classes representing generators and discriminators. In particular, we prove that the resulting estimate converges to the true density $\mathsf{p}^*$ in terms of Jensen-Shannon (JS) divergence at the rate $(\log{n}/n)^{2\beta/(2\beta+d)}$ where $n$ is the sample size and $\beta$ determines the smoothness of $\mathsf{p}^*$. To the best of our knowledge, this is the first result in the literature on density estimation using vanilla GANs with JS convergence rates faster than $n^{-1/2}$ in the regime $\beta > d/2$. Moreover, we show that the obtained rate is minimax optimal (up to logarithmic factors) for the considered class of densities.
Recent quasi-optimal error estimates for the finite element approximation of total-variation regularized minimization problems using the Crouzeix--Raviart finite element require the existence of a Lipschitz continuous dual solution, which is not generally given. We provide analytic proofs showing that the Lipschitz continuity of a dual solution is not necessary, in general. Using the Lipschitz truncation technique, we, in addition, derive error estimates that depend directly on the Sobolev regularity of a given dual solution.
In this work, we determine the full expression for the global truncation error of hyperbolic partial differential equations (PDEs). In particular, we use theoretical analysis and symbolic algebra to find exact expressions for the coefficients of the generic global truncation error. Our analysis is valid for any hyperbolic PDE, be it linear or non-linear, and employing finite difference, finite volume, or finite element discretization in space, and advanced in time with a predictor-corrector, multistep, or a deferred correction method, belonging to the Method of Lines. Furthermore, we discuss the practical implications of this analysis. If we employ a stable numerical scheme and the orders of accuracy of the global solution error and the global truncation error agree, we make the following asymptotic observations: (a) the order of convergence at constant ratio of $\Delta t$ to $\Delta x$ is governed by the minimum of the orders of the spatial and temporal discretizations, and (b) convergence cannot even be guaranteed under only spatial or temporal refinement. An implication of (a) is that it is impractical to invest in a time-stepping method of order higher than the spatial discretization. In addition to (b), we demonstrate that under certain circumstances, the error can even monotonically increase with refinement only in space or only in time, and explain why this phenomenon occurs. To verify our theoretical findings, we conduct convergence studies of linear and non-linear advection equations using finite difference and finite volume spatial discretizations, and predictor-corrector and multistep time-stepping methods. Finally, we study the effect of slope limiters and monotonicity-preserving strategies on the order of accuracy.
The mathematical forces at work behind Generative Adversarial Networks raise challenging theoretical issues. Motivated by the important question of characterizing the geometrical properties of the generated distributions, we provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and asymptotic regimes. We study the specific case where the latent space is univariate and derive results valid regardless of the dimension of the output space. We show in particular that for a fixed sample size, the optimal WGANs are closely linked with connected paths minimizing the sum of the squared Euclidean distances between the sample points. We also highlight the fact that WGANs are able to approach (for the 1-Wasserstein distance) the target distribution as the sample size tends to infinity, at a given convergence rate and provided the family of generative Lipschitz functions grows appropriately. We derive in passing new results on optimal transport theory in the semi-discrete setting.
We consider a moving boundary problem with kinetic condition that describes the diffusion of solvent into rubber and study semi-discrete finite element approximations of the corresponding weak solutions. We report on both a priori and a posteriori error estimates for the mass concentration of the diffusants, and respectively, for the a priori unknown position of the moving boundary. Our working techniques include integral and energy-based estimates for a nonlinear parabolic problem posed in a transformed fixed domain combined with a suitable use of the interpolation-trace inequality to handle the interface terms. Numerical illustrations of our FEM approximations are within the experimental range and show good agreement with our theoretical investigation. This work is a preliminary investigation necessary before extending the current moving boundary modeling to account explicitly for the mechanics of hyperelastic rods to capture a directional swelling of the underlying elastomer.
Computing linear minimum mean square error (LMMSE) filters is often ill conditioned, suggesting that unconstrained minimization of the mean square error is an inadequate principle for filter design. To address this, we first develop a unifying framework for studying constrained LMMSE estimation problems. Using this framework, we expose an important structural property of all constrained LMMSE filters and show that they all involve an inherent preconditioning step. This parameterizes all such filters only by their preconditioners. Moreover, each filters is invariant to invertible linear transformations of its preconditioner. We then clarify that merely constraining the rank of the filters, leading to the well-known low-rank Wiener filter, does not suitably address the problem of ill conditioning. Instead, we use a constraint that explicitly requires solutions to be well conditioned in a certain specific sense. We introduce two well-conditioned estimators and evaluate their mean-squared-error performance. We show these two estimators converge to the standard LMMSE filter as their truncated-power ratio converges to zero, but more slowly than the low-rank Wiener filter in terms of scaling law. This exposes the price for being well conditioned. We also show quantitative results with historical VIX data to illustrate the performance of our two well-conditioned estimators.
The aim of this thesis is to develop a theoretical framework to study parameter estimation of quantum channels. We study the task of estimating unknown parameters encoded in a channel in the sequential setting. A sequential strategy is the most general way to use a channel multiple times. Our goal is to establish lower bounds (called Cramer-Rao bounds) on the estimation error. The bounds we develop are universally applicable; i.e., they apply to all permissible quantum dynamics. We consider the use of catalysts to enhance the power of a channel estimation strategy. This is termed amortization. The power of a channel for a parameter estimation is determined by its Fisher information. Thus, we study how much a catalyst quantum state can enhance the Fisher information of a channel by defining the amortized Fisher information. We establish our bounds by proving that for certain Fisher information quantities, catalyst states do not improve the performance of a sequential estimation protocol compared to a parallel one. The technical term for this is an amortization collapse. We use this to establish bounds when estimating one parameter, or multiple parameters simultaneously. Our bounds apply universally and we also cast them as optimization problems. For the single parameter case, we establish bounds for general quantum channels using both the symmetric logarithmic derivative (SLD) Fisher information and the right logarithmic derivative (RLD) Fisher information. The task of estimating multiple parameters simultaneously is more involved than the single parameter case, because the Cramer-Rao bounds take the form of matrix inequalities. We establish a scalar Cramer-Rao bound for multiparameter channel estimation using the RLD Fisher information. For both single and multiparameter estimation, we provide a no-go condition for the so-called Heisenberg scaling using our RLD-based bound.
Although Deep Neural Networks (DNNs) have shown incredible performance in perceptive and control tasks, several trustworthy issues are still open. One of the most discussed topics is the existence of adversarial perturbations, which has opened an interesting research line on provable techniques capable of quantifying the robustness of a given input. In this regard, the Euclidean distance of the input from the classification boundary denotes a well-proved robustness assessment as the minimal affordable adversarial perturbation. Unfortunately, computing such a distance is highly complex due the non-convex nature of NNs. Despite several methods have been proposed to address this issue, to the best of our knowledge, no provable results have been presented to estimate and bound the error committed. This paper addresses this issue by proposing two lightweight strategies to find the minimal adversarial perturbation. Differently from the state-of-the-art, the proposed approach allows formulating an error estimation theory of the approximate distance with respect to the theoretical one. Finally, a substantial set of experiments is reported to evaluate the performance of the algorithms and support the theoretical findings. The obtained results show that the proposed strategies approximate the theoretical distance for samples close to the classification boundary, leading to provable robustness guarantees against any adversarial attacks.
The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on a generative modeling problem.
In this paper we analyze the Schwarz alternating method for unconstrained elliptic optimal control problems. We discuss the convergence properties of the method in the continuous case first and then apply the arguments to the finite difference discretization case. In both cases, we prove that the Schwarz alternating method is convergent if its counterpart for an elliptic equation is convergent. Meanwhile, the convergence rate of the method for the elliptic equation under the maximum norm also gives a uniform upper bound (with respect to the regularization parameter $\alpha$) of the convergence rate of the method for the optimal control problem under the maximum norm of proper error merit functions in the continuous case or vectors in the discrete case. Our numerical results corroborate our theoretical results and show that with $\alpha$ decreasing to zero, the method will converge faster. We also give some exposition of this phenomenon.
Heatmap-based methods dominate in the field of human pose estimation by modelling the output distribution through likelihood heatmaps. In contrast, regression-based methods are more efficient but suffer from inferior performance. In this work, we explore maximum likelihood estimation (MLE) to develop an efficient and effective regression-based methods. From the perspective of MLE, adopting different regression losses is making different assumptions about the output density function. A density function closer to the true distribution leads to a better regression performance. In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution. Concretely, RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process. With the proposed reparameterization design, our method is compatible with off-the-shelf flow models. The proposed method is effective, efficient and flexible. We show its potential in various human pose estimation tasks with comprehensive experiments. Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead. Moreover, for the first time, especially on multi-person pose estimation, our regression method is superior to the heatmap-based methods. Our code is available at //github.com/Jeff-sjtu/res-loglikelihood-regression