The Fisher information matrix is a quantity of fundamental importance for information geometry and asymptotic statistics. In practice, it is widely used to quickly estimate the expected information available in a data set and guide experimental design choices. In many modern applications, it is intractable to analytically compute the Fisher information and Monte Carlo methods are used instead. The standard Monte Carlo method produces estimates of the Fisher information that can be biased when the Monte-Carlo noise is non-negligible. Most problematic is noise in the derivatives as this leads to an overestimation of the available constraining power, given by the inverse Fisher information. In this work we find another simple estimate that is oppositely biased and produces an underestimate of the constraining power. This estimator can either be used to give approximate bounds on the parameter constraints or can be combined with the standard estimator to give improved, approximately unbiased estimates. Both the alternative and the combined estimators are asymptotically unbiased so can be also used as a convergence check of the standard approach. We discuss potential limitations of these estimators and provide methods to assess their reliability. These methods accelerate the convergence of Fisher forecasts, as unbiased estimates can be achieved with fewer Monte Carlo samples, and so can be used to reduce the simulated data set size by several orders of magnitude.
We propose a general purpose confidence interval procedure (CIP) for statistical functionals constructed using data from a stationary time series. The procedures we propose are based on derived distribution-free analogues of the $\chi^2$ and Student's $t$ random variables for the statistical functional context, and hence apply in a wide variety of settings including quantile estimation, gradient estimation, M-estimation, CVAR-estimation, and arrival process rate estimation, apart from more traditional statistical settings. Like the method of subsampling, we use overlapping batches of time series data to estimate the underlying variance parameter; unlike subsampling and the bootstrap, however, we assume that the implied point estimator of the statistical functional obeys a central limit theorem (CLT) to help identify the weak asymptotics (called OB-x limits, x=I,II,III) of batched Studentized statistics. The OB-x limits, certain functionals of the Wiener process parameterized by the size of the batches and the extent of their overlap, form the essential machinery for characterizing dependence, and consequently the correctness of the proposed CIPs. The message from extensive numerical experimentation is that in settings where a functional CLT on the point estimator is in effect, using \emph{large overlapping batches} alongside OB-x critical values yields confidence intervals that are often of significantly higher quality than those obtained from more generic methods like subsampling or the bootstrap. We illustrate using examples from CVaR estimation, ARMA parameter estimation, and NHPP rate estimation; R and MATLAB code for OB-x critical values is available at~\texttt{web.ics.purdue.edu/~pasupath/}.
We define an optimal preconditioning for the Langevin diffusion by analytically optimizing the expected squared jumped distance. This yields as the optimal preconditioning an inverse Fisher information covariance matrix, where the covariance matrix is computed as the outer product of log target gradients averaged under the target. We apply this result to the Metropolis adjusted Langevin algorithm (MALA) and derive a computationally efficient adaptive MCMC scheme that learns the preconditioning from the history of gradients produced as the algorithm runs. We show in several experiments that the proposed algorithm is very robust in high dimensions and significantly outperforms other methods, including a closely related adaptive MALA scheme that learns the preconditioning with standard adaptive MCMC as well as the position-dependent Riemannian manifold MALA sampler.
In this work, we propose a numerical method to compute the Wasserstein Hamiltonian flow (WHF), which is a Hamiltonian system on the probability density manifold. Many well-known PDE systems can be reformulated as WHFs. We use parameterized function as push-forward map to characterize the solution of WHF, and convert the PDE to a finite-dimensional ODE system, which is a Hamiltonian system in the phase space of the parameter manifold. We establish error analysis results for the continuous time approximation scheme in Wasserstein metric. For the numerical implementation, we use neural networks as push-forward maps. We apply an effective symplectic scheme to solve the derived Hamiltonian ODE system so that the method preserves some important quantities such as total energy. The computation is done by fully deterministic symplectic integrator without any neural network training. Thus, our method does not involve direct optimization over network parameters and hence can avoid the error introduced by stochastic gradient descent (SGD) methods, which is usually hard to quantify and measure. The proposed algorithm is a sampling-based approach that scales well to higher dimensional problems. In addition, the method also provides an alternative connection between the Lagrangian and Eulerian perspectives of the original WHF through the parameterized ODE dynamics.
Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we propose to relax BMF continuously using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that it converges quickly, recovers the ground truth precisely, and estimates the simulated rank exactly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful.
In the metric distortion problem there is a set of candidates and a set of voters, all residing in the same metric space. The objective is to choose a candidate with minimum social cost, defined as the total distance of the chosen candidate from all voters. The challenge is that the algorithm receives only ordinal input from each voter, in the form of a ranked list of candidates in non-decreasing order of their distances from her, whereas the objective function is cardinal. The distortion of an algorithm is its worst-case approximation factor with respect to the optimal social cost. A series of papers culminated in a 3-distortion algorithm, which is tight with respect to all deterministic algorithms. Aiming to overcome the limitations of worst-case analysis, we revisit the metric distortion problem through the learning-augmented framework, where the algorithm is provided with some prediction regarding the optimal candidate. The quality of this prediction is unknown, and the goal is to evaluate the performance of the algorithm under a accurate prediction (known as consistency), while simultaneously providing worst-case guarantees even for arbitrarily inaccurate predictions (known as robustness). For our main result, we characterize the robustness-consistency Pareto frontier for the metric distortion problem. We first identify an inevitable trade-off between robustness and consistency. We then devise a family of learning-augmented algorithms that achieves any desired robustness-consistency pair on this Pareto frontier. Furthermore, we provide a more refined analysis of the distortion bounds as a function of the prediction error (with consistency and robustness being two extremes). Finally, we also prove distortion bounds that integrate the notion of $\alpha$-decisiveness, which quantifies the extent to which a voter prefers her favorite candidate relative to the rest.
In uncertainty quantification, variance-based global sensitivity analysis quantitatively determines the effect of each input random variable on the output by partitioning the total output variance into contributions from each input. However, computing conditional expectations can be prohibitively costly when working with expensive-to-evaluate models. Surrogate models can accelerate this, yet their accuracy depends on the quality and quantity of training data, which is expensive to generate (experimentally or computationally) for complex engineering systems. Thus, methods that work with limited data are desirable. We propose a diffeomorphic modulation under observable response preserving homotopy (D-MORPH) regression to train a polynomial dimensional decomposition surrogate of the output that minimizes the number of training data. The new method first computes a sparse Lasso solution and uses it to define the cost function. A subsequent D-MORPH regression minimizes the difference between the D-MORPH and Lasso solution. The resulting D-MORPH surrogate is more robust to input variations and more accurate with limited training data. We illustrate the accuracy and computational efficiency of the new surrogate for global sensitivity analysis using mathematical functions and an expensive-to-simulate model of char combustion. The new method is highly efficient, requiring only 15% of the training data compared to conventional regression.
Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure. For instance, the ordinary least squares (OLS) estimator in an adaptive linear regression model can exhibit non-normal asymptotic behavior, posing challenges for accurate inference and interpretation. In this paper, we propose a general method for constructing debiased estimator which remedies this issue. It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of asymptotic normality, supplemented by discussions on achieving near-optimal asymptotic variance. A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance of the least square estimator while obtaining asymptotic normality property. Consequently, this work helps connect two fruitful paradigms of adaptive inference: a) non-asymptotic inference using concentration inequalities and b) asymptotic inference via asymptotic normality.
High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, resulting in a one-dimensional representation of the spherical data with minimal computational overhead. We demonstrate the superior performance of our model for semantic segmentation and depth regression tasks on both synthetic and real automotive datasets. Our code is available at //github.com/JanEGerken/HEAL-SWIN.
Variance reduction is a crucial idea for Monte Carlo simulation and the stochastic Lanczos quadrature method is a dedicated method to approximate the trace of a matrix function. Inspired by their advantages, we combine these two techniques to approximate the log-determinant of large-scale symmetric positive definite matrices. Key questions to be answered for such a method are how to construct or choose an appropriate projection subspace and derive guaranteed theoretical analysis. This paper applies some probabilistic approaches including the projection-cost-preserving sketch and matrix concentration inequalities to construct a suboptimal subspace. Furthermore, we provide some insights on choosing design parameters in the underlying algorithm by deriving corresponding approximation error and probabilistic error estimations. Numerical experiments demonstrate our method's effectiveness and illustrate the quality of the derived error bounds.
We introduce a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on estimating equations that are $U$-statistics in the observations. The $U$-statistics are based on higher order influence functions that extend ordinary linear influence functions of the parameter of interest, and represent higher derivatives of this parameter. For parameters for which the representation cannot be perfect the method leads to a bias-variance trade-off, and results in estimators that converge at a slower than $\sqrt n$-rate. In a number of examples the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at $\sqrt n$-rate, but we also consider efficient $\sqrt n$-estimation using novel nonlinear estimators. The general approach is applied in detail to the example of estimating a mean response when the response is not always observed.