The idea of approximating the Shapley value of an n-person game by Monte Carlo simulation was first suggested by Mann and Shapley (1960) and they also introduced four different heuristical methods to reduce the estimation error. Since 1960, several statistical methods have been developed to reduce the standard deviation of the estimate. In this paper, we develop an algorithm that uses a pair of negatively correlated samples to reduce the variance of the estimate. Although the observations generated are not independent, the sample is ergodic (obeys the strong law of large numbers), hence the name "ergodic sampling". Unlike Shapley and Mann, we do not use heuristics, the algorithm uses a small sample to learn the best ergodic transformation for a given game. We illustrate the algorithm on eight games with different characteristics to test the performance and understand how the proposed algorithm works. The experiments show that this method has at least as low variance as an independent sample, and in five test games, it significantly improves the quality of the estimation, up to 75 percent.
We consider the problem of signal estimation in generalized linear models defined via rotationally invariant design matrices. Since these matrices can have an arbitrary spectral distribution, this model is well suited for capturing complex correlation structures which often arise in applications. We propose a novel family of approximate message passing (AMP) algorithms for signal estimation, and rigorously characterize their performance in the high-dimensional limit via a state evolution recursion. Our rotationally invariant AMP has complexity of the same order as the existing AMP derived under the restrictive assumption of a Gaussian design; our algorithm also recovers this existing AMP as a special case. Numerical results showcase a performance close to Vector AMP (which is conjectured to be Bayes-optimal in some settings), but obtained with a much lower complexity, as the proposed algorithm does not require a computationally expensive singular value decomposition.
Quantitative Ultrasound (QUS) provides important information about the tissue properties. QUS parametric image can be formed by dividing the envelope data into small overlapping patches and computing different speckle statistics such as parameters of the Nakagami and Homodyned K-distributions (HK-distribution). The calculated QUS parametric images can be erroneous since only a few independent samples are available inside the patches. Another challenge is that the envelope samples inside the patch are assumed to come from the same distribution, an assumption that is often violated given that the tissue is usually not homogenous. In this paper, we propose a method based on Convolutional Neural Networks (CNN) to estimate QUS parametric images without patching. We construct a large dataset sampled from the HK-distribution, having regions with random shapes and QUS parameter values. We then use a well-known network to estimate QUS parameters in a multi-task learning fashion. Our results confirm that the proposed method is able to reduce errors and improve border definition in QUS parametric images.
Density-based Out-of-distribution (OOD) detection has recently been shown unreliable for the task of detecting OOD images. Various density ratio based approaches achieve good empirical performance, however methods typically lack a principled probabilistic modelling explanation. In this work, we propose to unify density ratio based methods under a novel framework that builds energy-based models and employs differing base distributions. Under our framework, the density ratio can be viewed as the unnormalized density of an implicit semantic distribution. Further, we propose to directly estimate the density ratio of a data sample through class ratio estimation. We report competitive results on OOD image problems in comparison with recent work that alternatively requires training of deep generative models for the task. Our approach enables a simple and yet effective path towards solving the OOD detection problem.
In high-dimensional prediction settings, it remains challenging to reliably estimate the test performance. To address this challenge, a novel performance estimation framework is presented. This framework, called Learn2Evaluate, is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. Learn2Evaluate has several advantages compared to commonly applied performance estimation methodologies. Firstly, a learning curve offers a graphical overview of a learner. This overview assists in assessing the potential benefit of adding training samples and it provides a more complete comparison between learners than performance estimates at a fixed subsample size. Secondly, a learning curve facilitates in estimating the performance at the total sample size rather than a subsample size. Thirdly, Learn2Evaluate allows the computation of a theoretically justified and useful lower confidence bound. Furthermore, this bound may be tightened by performing a bias correction. The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.
We develop an \textit{a posteriori} error analysis for a numerical estimate of the time at which a functional of the solution to a partial differential equation (PDE) first achieves a threshold value on a given time interval. This quantity of interest (QoI) differs from classical QoIs which are modeled as bounded linear (or nonlinear) functionals {of the solution}. Taylor's theorem and an adjoint-based \textit{a posteriori} analysis is used to derive computable and accurate error estimates in the case of semi-linear parabolic and hyperbolic PDEs. The accuracy of the error estimates is demonstrated through numerical solutions of the one-dimensional heat equation and linearized shallow water equations (SWE), representing parabolic and hyperbolic cases, respectively.
We prove concentration inequalities and associated PAC bounds for continuous- and discrete-time additive functionals for possibly unbounded functions of multivariate, nonreversible diffusion processes. Our analysis relies on an approach via the Poisson equation allowing us to consider a very broad class of subexponentially ergodic processes. These results add to existing concentration inequalities for additive functionals of diffusion processes which have so far been only available for either bounded functions or for unbounded functions of processes from a significantly smaller class. We demonstrate the power of these exponential inequalities by two examples of very different areas. Considering a possibly high-dimensional parametric nonlinear drift model under sparsity constraints, we apply the continuous-time concentration results to validate the restricted eigenvalue condition for Lasso estimation, which is fundamental for the derivation of oracle inequalities. The results for discrete additive functionals are used to investigate the unadjusted Langevin MCMC algorithm for sampling of moderately heavy-tailed densities $\pi$. In particular, we provide PAC bounds for the sample Monte Carlo estimator of integrals $\pi(f)$ for polynomially growing functions $f$ that quantify sufficient sample and step sizes for approximation within a prescribed margin with high probability.
Based on the manifold hypothesis, real-world data often lie on a low-dimensional manifold, while normalizing flows as a likelihood-based generative model are incapable of finding this manifold due to their structural constraints. So, one interesting question arises: $\textit{"Can we find sub-manifold(s) of data in normalizing flows and estimate the density of the data on the sub-manifold(s)?"}$. In this paper, we introduce two approaches, namely per-pixel penalized log-likelihood and hierarchical training, to answer the mentioned question. We propose a single-step method for joint manifold learning and density estimation by disentangling the transformed space obtained by normalizing flows to manifold and off-manifold parts. This is done by a per-pixel penalized likelihood function for learning a sub-manifold of the data. Normalizing flows assume the transformed data is Gaussianizationed, but this imposed assumption is not necessarily true, especially in high dimensions. To tackle this problem, a hierarchical training approach is employed to improve the density estimation on the sub-manifold. The results validate the superiority of the proposed methods in simultaneous manifold learning and density estimation using normalizing flows in terms of generated image quality and likelihood.
The crude Monte Carlo approximates the integral $$S(f)=\int_a^b f(x)\,\mathrm dx$$ with expected error (deviation) $\sigma(f)N^{-1/2},$ where $\sigma(f)^2$ is the variance of $f$ and $N$ is the number of random samples. If $f\in C^r$ then special variance reduction techniques can lower this error to the level $N^{-(r+1/2)}.$ In this paper, we consider methods of the form $$\overline M_{N,r}(f)=S(L_{m,r}f)+M_n(f-L_{m,r}f),$$ where $L_{m,r}$ is the piecewise polynomial interpolation of $f$ of degree $r-1$ using a partition of the interval $[a,b]$ into $m$ subintervals, $M_n$ is a Monte Carlo approximation using $n$ samples of $f,$ and $N$ is the total number of function evaluations used. We derive asymptotic error formulas for the methods $\overline M_{N,r}$ that use nonadaptive as well as adaptive partitions. Although the convergence rate $N^{-(r+1/2)}$ cannot be beaten, the asymptotic constants make a huge difference. For example, for $\int_0^1(x+d)^{-1}\mathrm dx$ and $r=4$ the best adaptive methods overcome the nonadaptive ones roughly $10^{12}$ times if $d=10^{-4},$ and $10^{29}$ times if $d=10^{-8}.$ In addition, the proposed adaptive methods are easily implementable and can be well used for automatic integration. We believe that the obtained results can be generalized to multivariate integration.
We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. While there is abundant literature on this classical problem under the item-level privacy setup where each user contributes only one data point, little has been known for the user-level counterpart. We consider the heterogeneous scenario where both the quantity and distribution of data can be different for each user. We propose an algorithm based on a clipping strategy that almost achieves a two-approximation with respect to the best clipping threshold in hindsight. This result holds without any distribution assumptions on the data. We also prove that the clipping bias can be significantly reduced when the counts are from non-i.i.d. Poisson distributions and show empirically that our debiasing method provides improvements even without such constraints. Experiments on both real and synthetic datasets verify our theoretical findings and demonstrate the effectiveness of our algorithms.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.