Generative Adversarial Networks (GANs) have achieved great success in data generation. However, its statistical properties are not fully understood. In this paper, we consider the statistical behavior of the general $f$-divergence formulation of GAN, which includes the Kullback--Leibler divergence that is closely related to the maximum likelihood principle. We show that for parametric generative models that are correctly specified, all $f$-divergence GANs with the same discriminator classes are asymptotically equivalent under suitable regularity conditions. Moreover, with an appropriately chosen local discriminator, they become equivalent to the maximum likelihood estimate asymptotically. For generative models that are misspecified, GANs with different $f$-divergences {converge to different estimators}, and thus cannot be directly compared. However, it is shown that for some commonly used $f$-divergences, the original $f$-GAN is not optimal in that one can achieve a smaller asymptotic variance when the discriminator training in the original $f$-GAN formulation is replaced by logistic regression. The resulting estimation method is referred to as Adversarial Gradient Estimation (AGE). Empirical studies are provided to support the theory and to demonstrate the advantage of AGE over the original $f$-GANs under model misspecification.
Conformal inference is a powerful tool for quantifying the uncertainty around predictions made by black-box models (e.g. neural nets, random forests). Formally, this methodology guarantees that if the training and test data are exchangeable (e.g. i.i.d.) then we can construct a prediction set $C$ for the target $Y$ such that $P(Y \in C) = 1-\alpha$ for any target level $\alpha$. In this article, we extend this methodology to an online prediction setting where the distribution generating the data is allowed to vary over time. To account for the non-exchangeability, we develop a protective layer that lies on top of conformal inference and gradually re-calibrates its predictions to adapt to the observed changes in the environment. Our methods are highly flexible and can be used in combination with any predictive algorithm that produces estimates of the target or its conditional distribution and without any assumptions on the size or type of the distribution shift. We test our techniques on two real-world datasets aimed at predicting stock market volatility and COVID-19 case counts and find that they are robust and adaptive to real-world distribution shifts.
In some inferential statistical methods, such as tests and confidence intervals, it is important to describe the stochastic behavior of statistical functionals, aside from their large sample properties. We study such behavior in terms of the usual stochastic order. For this purpose, we introduce a generalized family of stochastic orders, which is referred to as transform orders, showing that it provides a flexible framework for deriving stochastic monotonicity results. Given that our general definition makes it possible to obtain some well-known ordering relations as particular cases, we can easily apply our method to different families of functionals. These include some prominent inequality measures, such as the generalized entropy, the Gini index, and its generalizations. We also illustrate the applicability of our approach by determining the least favorable distribution, and the behavior of some bootstrap statistics, in some goodness-of-fit testing procedures.
The objective of this paper is to investigate a new numerical method for the approximation of the self-diffusion matrix of a tagged particle process defined on a grid. While standard numerical methods make use of long-time averages of empirical means of deviations of some stochastic processes, and are thus subject to statistical noise, we propose here a tensor method in order to compute an approximation of the solution of a high-dimensional quadratic optimization problem, which enables to obtain a numerical approximation of the self-diffusion matrix. The tensor method we use here relies on an iterative scheme which builds low-rank approximations of the quantity of interest and on a carefully tuned variance reduction method so as to evaluate the various terms arising in the functional to minimize. In particular, we numerically observe here that it is much less subject to statistical noise than classical approaches.
We study the problem of learning generalized linear models under adversarial corruptions. We analyze a classical heuristic called the iterative trimmed maximum likelihood estimator which is known to be effective against label corruptions in practice. Under label corruptions, we prove that this simple estimator achieves minimax near-optimal risk on a wide range of generalized linear models, including Gaussian regression, Poisson regression and Binomial regression. Finally, we extend the estimator to the more challenging setting of label and covariate corruptions and demonstrate its robustness and optimality in that setting as well.
In randomized experiments and observational studies, weighting methods are often used to generalize and transport treatment effect estimates to a target population. Traditional methods construct the weights by separately modeling the treatment assignment and study selection probabilities and then multiplying functions (e.g., inverses) of their estimates. However, these estimated multiplicative weights may not produce adequate covariate balance and can be highly variable, resulting in biased and unstable estimators, especially when there is limited covariate overlap across populations or treatment groups. To address these limitations, we propose a general weighting approach that weights each treatment group towards the target population in a single step. We present a framework and provide a justification for this one-step approach in terms of generic probability distributions. We show a formal connection between our method and inverse probability and inverse odds weighting. By construction, the proposed approach balances covariates and produces stable estimators. We show that our estimator for the target average treatment effect is consistent, asymptotically Normal, multiply robust, and semiparametrically efficient. We demonstrate the performance of this approach using a simulation study and a randomized case study on the effects of physician racial diversity on preventive healthcare utilization among Black men in California.
We present a novel sequential Monte Carlo approach to online smoothing of additive functionals in a very general class of path-space models. Hitherto, the solutions proposed in the literature suffer from either long-term numerical instability due to particle-path degeneracy or, in the case that degeneracy is remedied by particle approximation of the so-called backward kernel, high computational demands. In order to balance optimally computational speed against numerical stability, we propose to furnish a (fast) naive particle smoother, propagating recursively a sample of particles and associated smoothing statistics, with an adaptive backward-sampling-based updating rule which allows the number of (costly) backward samples to be kept at a minimum. This yields a new, function-specific additive smoothing algorithm, AdaSmooth, which is computationally fast, numerically stable and easy to implement. The algorithm is provided with rigorous theoretical results guaranteeing its consistency, asymptotic normality and long-term stability as well as numerical results demonstrating empirically the clear superiority of AdaSmooth to existing algorithms.
Imputation of missing images via source-to-target modality translation can improve diversity in medical imaging protocols. A pervasive approach for synthesizing target images involves one-shot mapping through generative adversarial networks (GAN). Yet, GAN models that implicitly characterize the image distribution can suffer from limited sample fidelity. Here, we propose a novel method based on adversarial diffusion modeling, SynDiff, for improved performance in medical image translation. To capture a direct correlate of the image distribution, SynDiff leverages a conditional diffusion process that progressively maps noise and source images onto the target image. For fast and accurate image sampling during inference, large diffusion steps are taken with adversarial projections in the reverse diffusion direction. To enable training on unpaired datasets, a cycle-consistent architecture is devised with coupled diffusive and non-diffusive modules that bilaterally translate between two modalities. Extensive assessments are reported on the utility of SynDiff against competing GAN and diffusion models in multi-contrast MRI and MRI-CT translation. Our demonstrations indicate that SynDiff offers quantitatively and qualitatively superior performance against competing baselines.
Sequential Monte Carlo squared (SMC$^2$) methods can be used for parameter inference of intractable likelihood state-space models. These methods replace the likelihood with an unbiased particle filter estimator, similarly to particle Markov chain Monte Carlo (MCMC). As with particle MCMC, the efficiency of SMC$^2$ greatly depends on the variance of the likelihood estimator, and therefore on the number of state particles used within the particle filter. We introduce novel methods to adaptively select the number of state particles within SMC$^2$ using the expected squared jumping distance to trigger the adaptation, and modifying the exchange importance sampling method of \citet{Chopin2012a} to replace the current set of state particles with the new set of state particles. The resulting algorithm is fully automatic, and can significantly improve current methods. Code for our methods is available at //github.com/imkebotha/adaptive-exact-approximate-smc.
Today, an increasing number of Adaptive Deep Neural Networks (AdNNs) are being used on resource-constrained embedded devices. We observe that, similar to traditional software, redundant computation exists in AdNNs, resulting in considerable performance degradation. The performance degradation is dependent on the input and is referred to as input-dependent performance bottlenecks (IDPBs). To ensure an AdNN satisfies the performance requirements of resource-constrained applications, it is essential to conduct performance testing to detect IDPBs in the AdNN. Existing neural network testing methods are primarily concerned with correctness testing, which does not involve performance testing. To fill this gap, we propose DeepPerform, a scalable approach to generate test samples to detect the IDPBs in AdNNs. We first demonstrate how the problem of generating performance test samples detecting IDPBs can be formulated as an optimization problem. Following that, we demonstrate how DeepPerform efficiently handles the optimization problem by learning and estimating the distribution of AdNNs' computational consumption. We evaluate DeepPerform on three widely used datasets against five popular AdNN models. The results show that DeepPerform generates test samples that cause more severe performance degradation (FLOPs: increase up to 552\%). Furthermore, DeepPerform is substantially more efficient than the baseline methods in generating test inputs(runtime overhead: only 6-10 milliseconds).
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.