In this short note, we prove an asymptotic expansion for the ratio of the Dirichlet density to the multivariate normal density with the same mean and covariance matrix. The expansion is then used to derive an upper bound on the total variation between the corresponding probability measures and rederive the asymptotic variance of the Dirichlet kernel estimators introduced by Aitchison & Lauder (1985) and studied theoretically in Ouimet (2020). Another potential application related to the asymptotic equivalence between the Gaussian variance regression problem and the Gaussian white noise problem is briefly mentioned but left open for future research.
In geosciences, the use of classical Euclidean methods is unsuitable for treating and analyzing some types of data, as this may not belong to a vector space. This is the case for correlation matrices, belonging to a subfamily of symmetric positive definite matrices, which in turn form a cone shape Riemannian manifold. We propose two novel applications for dealing with the problem of accounting with the non-linear behavior usually presented on multivariate geological data by exploiting the manifold features of correlations matrices. First, we employ an extension for the linear model of coregionalization (LMC) that alters the linear mixture, which is assumed fixed on the domain, and making it locally varying according to the local strength in the dependency of the coregionalized variables. The main challenge, once this relaxation on the LMC is assumed, is to solve appropriately the interpolation of the different known correlation matrices throughout the domain, in a reliable and coherent fashion. The present work adopts the non-euclidean framework to achieve our objective by locally averaging and interpolating the correlations between the variables, retaining the intrinsic geometry of correlation matrices. A second application deals with the problem of clustering of multivariate data.
New upper bounds are developed for the $L_2$ distance between $\xi/\text{Var}[\xi]^{1/2}$ and linear and quadratic functions of $z\sim N(0,I_n)$ for random variables of the form $\xi=bz^\top f(z) - \text{div} f(z)$. The linear approximation yields a central limit theorem when the squared norm of $f(z)$ dominates the squared Frobenius norm of $\nabla f(z)$ in expectation. Applications of this normal approximation are given for the asymptotic normality of de-biased estimators in linear regression with correlated design and convex penalty in the regime $p/n \le \gamma$ for constant $\gamma\in(0,{\infty})$. For the estimation of linear functions $\langle a_0,\beta\rangle$ of the unknown coefficient vector $\beta$, this analysis leads to asymptotic normality of the de-biased estimate for most normalized directions $a_0$, where ``most'' is quantified in a precise sense. This asymptotic normality holds for any convex penalty if $\gamma<1$ and for any strongly convex penalty if $\gamma\ge 1$. In particular the penalty needs not be separable or permutation invariant. By allowing arbitrary regularizers, the results vastly broaden the scope of applicability of de-biasing methodologies to obtain confidence intervals in high-dimensions. In the absence of strong convexity for $p>n$, asymptotic normality of the de-biased estimate is obtained for the Lasso and the group Lasso under additional conditions. For general convex penalties, our analysis also provides prediction and estimation error bounds of independent interest.
We study the statistical estimation problem of orthogonal group synchronization and rotation group synchronization. The model is $Y_{ij} = Z_i^* Z_j^{*T} + \sigma W_{ij}\in\mathbb{R}^{d\times d}$ where $W_{ij}$ is a Gaussian random matrix and $Z_i^*$ is either an orthogonal matrix or a rotation matrix, and each $Y_{ij}$ is observed independently with probability $p$. We analyze an iterative polar decomposition algorithm for the estimation of $Z^*$ and show it has an error of $(1+o(1))\frac{\sigma^2 d(d-1)}{2np}$ when initialized by spectral methods. A matching minimax lower bound is further established which leads to the optimality of the proposed algorithm as it achieves the exact minimax risk.
When causal quantities cannot be point identified, researchers often pursue partial identification to quantify the range of possible values. However, the peculiarities of applied research conditions can make this analytically intractable. We present a general and automated approach to causal inference in discrete settings. We show causal questions with discrete data reduce to polynomial programming problems, and we present an algorithm to automatically bound causal effects using efficient dual relaxation and spatial branch-and-bound techniques. The user declares an estimand, states assumptions, and provides data (however incomplete or mismeasured). The algorithm then searches over admissible data-generating processes and outputs the most precise possible range consistent with available information -- i.e., sharp bounds -- including a point-identified solution if one exists. Because this search can be computationally intensive, our procedure reports and continually refines non-sharp ranges that are guaranteed to contain the truth at all times, even when the algorithm is not run to completion. Moreover, it offers an additional guarantee we refer to as $\epsilon$-sharpness, characterizing the worst-case looseness of the incomplete bounds. Analytically validated simulations show the algorithm accommodates classic obstacles, including confounding, selection, measurement error, noncompliance, and nonresponse.
We obtain explicit $p$-Wasserstein distance error bounds between the distribution of the multi-parameter MLE and the multivariate normal distribution. Our general bounds are given for possibly high-dimensional, independent and identically distributed random vectors. Our general bounds are of the optimal $\mathcal{O}(n^{-1/2})$ order. Explicit numerical constants are given when $p\in(1,2]$, and in the case $p>2$ the bounds are explicit up to a constant factor that only depends on $p$. We apply our general bounds to derive Wasserstein distance error bounds for the multivariate normal approximation of the MLE in several settings; these being single-parameter exponential families, the normal distribution under canonical parametrisation, and the multivariate normal distribution under non-canonical parametrisation. In addition, we provide upper bounds with respect to the bounded Wasserstein distance when the MLE is implicitly defined.
This paper establishes the (nearly) optimal approximation error characterization of deep rectified linear unit (ReLU) networks for smooth functions in terms of both width and depth simultaneously. To that end, we first prove that multivariate polynomials can be approximated by deep ReLU networks of width $\mathcal{O}(N)$ and depth $\mathcal{O}(L)$ with an approximation error $\mathcal{O}(N^{-L})$. Through local Taylor expansions and their deep ReLU network approximations, we show that deep ReLU networks of width $\mathcal{O}(N\ln N)$ and depth $\mathcal{O}(L\ln L)$ can approximate $f\in C^s([0,1]^d)$ with a nearly optimal approximation error $\mathcal{O}(\|f\|_{C^s([0,1]^d)}N^{-2s/d}L^{-2s/d})$. Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, respectively.
We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyse a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a $d$--dimensional strongly log-concave distribution with condition number $\kappa$, the algorithm is shown to produce with an $\mathcal{O}\big(\kappa^{5/4} d^{1/4}\epsilon^{-1/2} \big)$ complexity samples from a distribution that, in Wasserstein distance, is at most $\epsilon>0$ away from the target distribution.
Smoothing splines have been used pervasively in nonparametric regressions. However, the computational burden of smoothing splines is significant when the sample size $n$ is large. When the number of predictors $d\geq2$, the computational cost for smoothing splines is at the order of $O(n^3)$ using the standard approach. Many methods have been developed to approximate smoothing spline estimators by using $q$ basis functions instead of $n$ ones, resulting in a computational cost of the order $O(nq^2)$. These methods are called the basis selection methods. Despite algorithmic benefits, most of the basis selection methods require the assumption that the sample is uniformly-distributed on a hyper-cube. These methods may have deteriorating performance when such an assumption is not met. To overcome the obstacle, we develop an efficient algorithm that is adaptive to the unknown probability density function of the predictors. Theoretically, we show the proposed estimator has the same convergence rate as the full-basis estimator when $q$ is roughly at the order of $O[n^{2d/\{(pr+1)(d+2)\}}\quad]$, where $p\in[1,2]$ and $r\approx 4$ are some constants depend on the type of the spline. Numerical studies on various synthetic datasets demonstrate the superior performance of the proposed estimator in comparison with mainstream competitors.
Heatmap-based methods dominate in the field of human pose estimation by modelling the output distribution through likelihood heatmaps. In contrast, regression-based methods are more efficient but suffer from inferior performance. In this work, we explore maximum likelihood estimation (MLE) to develop an efficient and effective regression-based methods. From the perspective of MLE, adopting different regression losses is making different assumptions about the output density function. A density function closer to the true distribution leads to a better regression performance. In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution. Concretely, RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process. With the proposed reparameterization design, our method is compatible with off-the-shelf flow models. The proposed method is effective, efficient and flexible. We show its potential in various human pose estimation tasks with comprehensive experiments. Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead. Moreover, for the first time, especially on multi-person pose estimation, our regression method is superior to the heatmap-based methods. Our code is available at //github.com/Jeff-sjtu/res-loglikelihood-regression
Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.