This paper approaches the unsupervised learning problem by gradient descent in the space of probability density functions. A main result shows that along the gradient flow induced by a distribution-dependent ordinary differential equation (ODE), the unknown data distribution emerges as the long-time limit. That is, one can uncover the data distribution by simulating the distribution-dependent ODE. Intriguingly, the simulation of the ODE is shown equivalent to the training of generative adversarial networks (GANs). This equivalence provides a new "cooperative" view of GANs and, more importantly, sheds new light on the divergence of GANs. In particular, it reveals that the GAN algorithm implicitly minimizes the mean squared error (MSE) between two sets of samples, and this MSE fitting alone can cause GANs to diverge. To construct a solution to the distribution-dependent ODE, we first show that the associated nonlinear Fokker-Planck equation has a unique weak solution, by the Crandall-Liggett theorem for differential equations in Banach spaces. Based on this solution to the Fokker-Planck equation, we construct a unique solution to the ODE, using Trevisan's superposition principle. The convergence of the induced gradient flow to the data distribution is obtained by analyzing the Fokker-Planck equation.
Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging. In contrast to the tableau setting, one can not enumerate all the states and then iteratively update the policies for each state. This prevents the application of many well-studied RL methods especially those with provable convergence guarantees. In this paper, we first present a substantial generalization of the recently developed policy mirror descent method to deal with general state and action spaces. We introduce new approaches to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all. Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation techniques can be applied. We establish linear convergence rate to global optimality or sublinear convergence to stationarity for these methods applied to solve different classes of RL problems under exact policy evaluation. We then define proper notions of the approximation errors for policy evaluation and investigate their impact on the convergence of these methods applied to general-state RL problems with either finite-action or continuous-action spaces. To the best of our knowledge, the development of these algorithmic frameworks as well as their convergence analysis appear to be new in the literature.
Particle Marginal Metropolis-Hastings (PMMH) is a general approach to Bayesian inference when the likelihood is intractable, but can be estimated unbiasedly. Our article develops an efficient PMMH method that scales up better to higher dimensional state vectors than previous approaches. The improvement is achieved by the following innovations. First, the trimmed mean of the unbiased likelihood estimates of the multiple particle filters is used. Second, a novel block version of PMMH that works with multiple particle filters is proposed. Third, the article develops an efficient auxiliary disturbance particle filter, which is necessary when the bootstrap disturbance filter is inefficient, but the state transition density cannot be expressed in closed form. Fourth, a novel sorting algorithm, which is as effective as previous approaches but significantly faster than them, is developed to preserve the correlation between the logs of the likelihood estimates at the current and proposed parameter values. The performance of the sampler is investigated empirically by applying it to non-linear Dynamic Stochastic General Equilibrium models with relatively high state dimensions and with intractable state transition densities and to multivariate stochastic volatility in the mean models. Although our focus is on applying the method to state space models, the approach will be useful in a wide range of applications such as large panel data models and stochastic differential equation models with mixed effects.
In this paper, we study a sampling and transmission scheduling problem for multi-source remote estimation, where a scheduler determines when to take samples from multiple continuous-time Gauss-Markov processes and send the samples over multiple channels to remote estimators. The sample transmission times are i.i.d. across samples and channels. The objective of the scheduler is to minimize the weighted sum of the time-average expected estimation errors of these Gauss-Markov sources. This problem is a continuous-time Restless Multi-arm Bandit (RMAB) problem with a continuous state space. We prove that the arms are indexable and derive an exact expression of the Whittle index. To the extent of our knowledge, this is the first Whittle index policy for multi-source signal-aware remote estimation. This result has two degenerated cases of interest: (i) In the single-source case, the Whittle index policy reproduces earlier threshold-based sampling policies for the remote estimation of Wiener and Ornstein-Uhlenbeck processes. When the instantaneous estimation error of the Gauss-Markov process crosses the optimal threshold, the Whittle index is precisely equal to 0. In addition, a new optimal sampling policy for the remote estimation of the unstable Ornstein-Uhlenbeck process is obtained. (ii) In the signal-agnostic case, we find an exact expression of the Whittle index for Age of Information (AoI)-based remote estimation, which complements earlier results by allowing for random transmission times. Our numerical results show that the proposed policy performs better than the signal-agnostic AoI-based Whittle index policy and the Maximum-Age-First, Zero-Wait (MAF-ZW) policy. The performance gain of the proposed policy is high when some of the Gauss-Markov processes are highly unstable or when the sample transmission times follow a heavy-tail distribution.
In this paper, we study the computation of the rate-distortion-perception function (RDPF) for discrete memoryless sources subject to a single-letter average distortion constraint and a perception constraint that belongs to the family of f-divergences. For that, we leverage the fact that RDPF, assuming mild regularity conditions on the perception constraint, forms a convex programming problem. We first develop parametric characterizations of the optimal solution and utilize them in an alternating minimization approach for which we prove convergence guarantees. The resulting structure of the iterations of the alternating minimization approach renders the implementation of a generalized Blahut-Arimoto (BA) type of algorithm infeasible. To overcome this difficulty, we propose a relaxed formulation of the structure of the iterations in the alternating minimization approach, which allows for the implementation of an approximate iterative scheme. This approximation is shown, via the derivation of necessary and sufficient conditions, to guarantee convergence to a globally optimal solution. We also provide sufficient conditions on the distortion and the perception constraints which guarantee that our algorithm converges exponentially fast. We corroborate our theoretical results with numerical simulations, and we draw connections with existing results.
Diffusion models have emerged as a key pillar of foundation models in visual domains. One of their critical applications is to universally solve different downstream inverse tasks via a single diffusion prior without re-training for each task. Most inverse tasks can be formulated as inferring a posterior distribution over data (e.g., a full image) given a measurement (e.g., a masked image). This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable. To cope with this challenge, we propose a variational approach that by design seeks to approximate the true posterior distribution. We show that our approach naturally leads to regularization by denoising diffusion process (RED-Diff) where denoisers at different timesteps concurrently impose different structural constraints over the image. To gauge the contribution of denoisers from different timesteps, we propose a weighting mechanism based on signal-to-noise-ratio (SNR). Our approach provides a new variational perspective for solving inverse problems with diffusion models, allowing us to formulate sampling as stochastic optimization, where one can simply apply off-the-shelf solvers with lightweight iterates. Our experiments for image restoration tasks such as inpainting and superresolution demonstrate the strengths of our method compared with state-of-the-art sampling-based diffusion models.
Profile likelihoods are rarely used in geostatistical models due to the computational burden imposed by repeated decompositions of large variance matrices. Accounting for uncertainty in covariance parameters can be highly consequential in geostatistical models as some covariance parameters are poorly identified, the problem is severe enough that the differentiability parameter of the Matern correlation function is typically treated as fixed. The problem is compounded with anisotropic spatial models as there are two additional parameters to consider. In this paper, we make the following contributions: 1, A methodology is created for profile likelihoods for Gaussian spatial models with Mat\'ern family of correlation functions, including anisotropic models. This methodology adopts a novel reparametrization for generation of representative points, and uses GPUs for parallel profile likelihoods computation in software implementation. 2, We show the profile likelihood of the Mat\'ern shape parameter is often quite flat but still identifiable, it can usually rule out very small values. 3, Simulation studies and applications on real data examples show that profile-based confidence intervals of covariance parameters and regression parameters have superior coverage to the traditional standard Wald type confidence intervals.
Let $\Omega \subset \mathbb{R}^n$ be a convex polytope ($n \leq 3$). The Ritz projection is the best approximation, in the $W^{1,2}_0$-norm, to a given function in a finite element space. When such finite element spaces are constructed on the basis of quasiuniform triangulations, we show a pointwise estimate on the Ritz projection. Namely, that the gradient at any point in $\Omega$ is controlled by the Hardy--Littlewood maximal function of the gradient of the original function at the same point. From this estimate, the stability of the Ritz projection on a wide range of spaces that are of interest in the analysis of PDEs immediately follows. Among those are weighted spaces, Orlicz spaces and Lorentz spaces.
In Bayesian inference, the approximation of integrals of the form $\psi = \mathbb{E}_{F}{l(X)} = \int_{\chi} l(\mathbf{x}) d F(\mathbf{x})$ is a fundamental challenge. Such integrals are crucial for evidence estimation, which is important for various purposes, including model selection and numerical analysis. The existing strategies for evidence estimation are classified into four categories: deterministic approximation, density estimation, importance sampling, and vertical representation (Llorente et al., 2020). In this paper, we show that the Riemann sum estimator due to Yakowitz (1978) can be used in the context of nested sampling (Skilling, 2006) to achieve a $O(n^{-4})$ rate of convergence, faster than the usual Ergodic Central Limit Theorem. We provide a brief overview of the literature on the Riemann sum estimators and the nested sampling algorithm and its connections to vertical likelihood Monte Carlo. We provide theoretical and numerical arguments to show how merging these two ideas may result in improved and more robust estimators for evidence estimation, especially in higher dimensional spaces. We also briefly discuss the idea of simulating the Lorenz curve that avoids the problem of intractable $\Lambda$ functions, essential for the vertical representation and nested sampling.
Generative Adversarial Nets (GAN) have received considerable attention since the 2014 groundbreaking work by Goodfellow et al. Such attention has led to an explosion in new ideas, techniques and applications of GANs. To better understand GANs we need to understand the mathematical foundation behind them. This paper attempts to provide an overview of GANs from a mathematical point of view. Many students in mathematics may find the papers on GANs more difficulty to fully understand because most of them are written from computer science and engineer point of view. The aim of this paper is to give more mathematically oriented students an introduction to GANs in a language that is more familiar to them.
Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.