We study the large sample properties of sparse M-estimators in the presence of pseudo-observations. Our framework covers a broad class of semi-parametric copula models, for which the marginal distributions are unknown and replaced by their empirical counterparts. It is well known that the latter modification significantly alters the limiting laws compared to usual M-estimation. We establish the consistency and the asymptotic normality of our sparse penalized M-estimator and we prove the asymptotic oracle property with pseudo-observations, possibly in the case when the number of parameters is diverging. Our framework allows to manage copula-based loss functions that are potentially unbounded. Additionally, we state the weak limit of multivariate rank statistics for an arbitrary dimension and the weak convergence of empirical copula processes indexed by maps. We apply our inference method to Canonical Maximum Likelihood losses with Gaussian copulas, mixtures of copulas or conditional copulas. The theoretical results are illustrated by two numerical experiments.
Studentisation upon rank-based linear estimators is generally considered an unnecessary topic, due to the domain restriction upon $S_{n}$, which exhibits constant variance. This assertion is functionally inconsistent with general analytic practice though. We introduce a general unbiased and minimum variance estimator upon the Beta-Binomially distributed Kemeny Hilbert space, which allows for permutation ties to exist and be uniquely measured. As individual permutation samples now exhibit unique random variance, a sample dependent variance estimator must now be introduced into the linear model. We derive and prove the Slutsky conditions to enable $t_{\nu}$-distributed Wald test statistics to be constructed, while stably exhibiting Gauss-Markov properties upon finite samples. Simulations demonstrate convergent decisions upon the two orthonormal Slutsky corrected Wald test statistics, verifying the projective geometric duality which exists upon the affine-linear Kemeny metric.
An inner-product Hilbert space formulation is defined over a domain of all permutations with ties upon the extended real line. We demonstrate this work to resolve the common first and second order biases found in the pervasive Kendall and Spearman non-parametric correlation estimators, while presenting as unbiased minimum variance (Gauss-Markov) estimators. We conclude by showing upon finite samples that a strictly sub-Gaussian probability distribution is to be preferred for the Kemeny $\tau_{\kappa}$ and $\rho_{\kappa}$ estimators, allowing for the construction of expected Wald test statistics which are analytically consistent with the Gauss-Markov properties upon finite samples.
Density power divergence (DPD) [Basu et al. (1998), Biometrika], which is designed to estimate the underlying distribution of the observations robustly against outliers, comprises an integral term of the power of the parametric density models to be estimated. While the explicit form of the integral term can be obtained for some specific densities (such as normal density and exponential density), its computational intractability has prohibited the application of DPD-based estimation to more general parametric densities, over a quarter of a century since the proposal of DPD. This study proposes a simple stochastic optimization approach to minimize DPD for general parametric density models and explains its adequacy by referring to conventional theories on stochastic optimization. The proposed approach also can be applied to the minimization of another density power-based $\gamma$-divergence with the aid of unnormalized models.
This paper is devoted to the statistical and numerical properties of the geometric median, and its applications to the problem of robust mean estimation via the median of means principle. Our main theoretical results include (a) an upper bound for the distance between the mean and the median for general absolutely continuous distributions in R^d, and examples of specific classes of distributions for which these bounds do not depend on the ambient dimension d; (b) exponential deviation inequalities for the distance between the sample and the population versions of the geometric median, which again depend only on the trace-type quantities and not on the ambient dimension. As a corollary, we deduce improved bounds for the (geometric) median of means estimator that hold for large classes of heavy-tailed distributions. Finally, we address the error of numerical approximation, which is an important practical aspect of any statistical estimation procedure. We demonstrate that the objective function minimized by the geometric median satisfies a "local quadratic growth" condition that allows one to translate suboptimality bounds for the objective function to the corresponding bounds for the numerical approximation to the median itself, and propose a simple stopping rule applicable to any optimization method which yields explicit error guarantees. We conclude with the numerical experiments including the application to estimation of mean values of log-returns for S&P 500 data.
We adopt an information-theoretic framework to analyze the generalization behavior of the class of iterative, noisy learning algorithms. This class is particularly suitable for study under information-theoretic metrics as the algorithms are inherently randomized, and it includes commonly used algorithms such as Stochastic Gradient Langevin Dynamics (SGLD). Herein, we use the maximal leakage (equivalently, the Sibson mutual information of order infinity) metric, as it is simple to analyze, and it implies both bounds on the probability of having a large generalization error and on its expected value. We show that, if the update function (e.g., gradient) is bounded in $L_2$-norm and the additive noise is isotropic Gaussian noise, then one can obtain an upper-bound on maximal leakage in semi-closed form. Furthermore, we demonstrate how the assumptions on the update function affect the optimal (in the sense of minimizing the induced maximal leakage) choice of the noise. Finally, we compute explicit tight upper bounds on the induced maximal leakage for other scenarios of interest.
Manifold learning is a central task in modern statistics and data science. Many datasets (cells, documents, images, molecules) can be represented as point clouds embedded in a high dimensional ambient space, however the degrees of freedom intrinsic to the data are usually far fewer than the number of ambient dimensions. The task of detecting a latent manifold along which the data are embedded is a prerequisite for a wide family of downstream analyses. Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge. We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation that constructs a sparse and adaptive affinity matrix, that can be interpreted as a generalisation of the bistochastic kernel normalisation. We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in simulations. We identify a highly efficient computational scheme for computing this optimal transport for discrete data and demonstrate that it outperforms competing methods in a set of examples.
Moderate calibration, the expected event probability among observations with predicted probability $\pi$ being equal to $\pi$, is a desired property of risk prediction models. Current graphical and numerical techniques for evaluating moderate calibration of clinical prediction models are mostly based on smoothing or grouping the data. As well, there is no widely accepted inferential method for the null hypothesis that a model is moderately calibrated. In this work, we discuss recently-developed, and propose novel, methods for the assessment of moderate calibration for binary responses. The methods are based on the limiting distributions of functions of standardized partial sums of prediction errors converging to the corresponding laws of Brownian motion. The novel method relies on well-known properties of the Brownian bridge which enables joint inference on mean and moderate calibration, leading to a unified 'bridge' test for detecting miscalibration. Simulation studies indicate that the bridge test is more powerful, often substantially, than the alternative test. As a case study we consider a prediction model for short-term mortality after a heart attack. Moderate calibration can be assessed without requiring arbitrary grouping of data or using methods that require tuning of parameters. We suggest graphical presentation of the partial sum curves and reporting the strength of evidence indicated by the proposed methods when examining model calibration.
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model, a key problem in statistical machine learning. Given $n$ samples from a multivariate Gaussian distribution with $p$ variables, the goal is to estimate the $p \times p$ inverse covariance matrix (aka precision matrix), assuming it is sparse (i.e., has a few nonzero entries). We propose GraphL0BnB, a new estimator based on an $\ell_0$-penalized version of the pseudolikelihood function, while most earlier approaches are based on the $\ell_1$-relaxation. Our estimator can be formulated as a convex mixed integer program (MIP) which can be difficult to compute at scale using off-the-shelf commercial solvers. To solve the MIP, we propose a custom nonlinear branch-and-bound (BnB) framework that solves node relaxations with tailored first-order methods. As a by-product of our BnB framework, we propose large-scale solvers for obtaining good primal solutions that are of independent interest. We derive novel statistical guarantees (estimation and variable selection) for our estimator and discuss how our approach improves upon existing estimators. Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 10^4$ -- corresponding to a symmetric matrix of size $p \times p$ with $p^2/2$ binary variables. We demonstrate the usefulness of GraphL0BnB versus various state-of-the-art approaches on a range of datasets.
For a fixed $T$ and $k \geq 2$, a $k$-dimensional vector stochastic differential equation $dX_t=\mu(X_t, \theta)dt+\nu(X_t)dW_t,$ is studied over a time interval $[0,T]$. Vector of drift parameters $\theta$ is unknown. The dependence in $\theta$ is in general nonlinear. We prove that the difference between approximate maximum likelihood estimator of the drift parameter $\overline{\theta}_n\equiv \overline{\theta}_{n,T}$ obtained from discrete observations $(X_{i\Delta_n}, 0 \leq i \leq n)$ and maximum likelihood estimator $\hat{\theta}\equiv \hat{\theta}_T$ obtained from continuous observations $(X_t, 0\leq t\leq T)$, when $\Delta_n=T/n$ tends to zero, converges stably in law to the mixed normal random vector with covariance matrix that depends on $\hat{\theta}$ and on path $(X_t, 0 \leq t\leq T)$. The uniform ellipticity of diffusion matrix $S(x)=\nu(x)\nu(x)^T$ emerges as the main assumption on the diffusion coefficient function.
We present a method for sampling-based model predictive control that makes use of a generic physics simulator as the dynamical model. In particular, we propose a Model Predictive Path Integral controller (MPPI), that uses the GPU-parallelizable IsaacGym simulator to compute the forward dynamics of a problem. By doing so, we eliminate the need for manual encoding of robot dynamics and interactions among objects and allow one to effortlessly solve complex navigation and contact-rich tasks. Since no explicit dynamic modeling is required, the method is easily extendable to different objects and robots. We demonstrate the effectiveness of this method in several simulated and real-world settings, among which mobile navigation with collision avoidance, non-prehensile manipulation, and whole-body control for high-dimensional configuration spaces. This method is a powerful and accessible tool to solve a large variety of contact-rich motion planning tasks.