Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e.g. inference, hypothesis testing). Whereas many depth functions have been proposed \textit{ad-hoc} in the literature since the seminal contribution of \cite{Tukey75}, not all of them possess the properties desirable to emulate the notion of quantile function for univariate probability distributions. In this paper, we propose an extension of the \textit{integrated rank-weighted} statistical depth (IRW depth in abbreviated form) originally introduced in \cite{IRW}, modified in order to satisfy the property of \textit{affine-invariance}, fulfilling thus all the four key axioms listed in the nomenclature elaborated by \cite{ZuoS00a}. The variant we propose, referred to as the Affine-Invariant IRW depth (AI-IRW in short), involves the covariance/precision matrices of the (supposedly square integrable) $d$-dimensional random vector $X$ under study, in order to take into account the directions along which $X$ is most variable to assign a depth value to any point $x\in \mathbb{R}^d$. The accuracy of the sampling version of the AI-IRW depth is investigated from a nonasymptotic perspective. Namely, a concentration result for the statistical counterpart of the AI-IRW depth is proved. Beyond the theoretical analysis carried out, applications to anomaly detection are considered and numerical results are displayed, providing strong empirical evidence of the relevance of the depth function we propose here.
We study a multivariate version of trend filtering, called Kronecker trend filtering or KTF, for the case in which the design points form a lattice in $d$ dimensions. KTF is a natural extension of univariate trend filtering (Steidl et al., 2006; Kim et al., 2009; Tibshirani, 2014), and is defined by minimizing a penalized least squares problem whose penalty term sums the absolute (higher-order) differences of the parameter to be estimated along each of the coordinate directions. The corresponding penalty operator can be written in terms of Kronecker products of univariate trend filtering penalty operators, hence the name Kronecker trend filtering. Equivalently, one can view KTF in terms of an $\ell_1$-penalized basis regression problem where the basis functions are tensor products of falling factorial functions, a piecewise polynomial (discrete spline) basis that underlies univariate trend filtering. This paper is a unification and extension of the results in Sadhanala et al. (2016, 2017). We develop a complete set of theoretical results that describe the behavior of $k^{\mathrm{th}}$ order Kronecker trend filtering in $d$ dimensions, for every $k \geq 0$ and $d \geq 1$. This reveals a number of interesting phenomena, including the dominance of KTF over linear smoothers in estimating heterogeneously smooth functions, and a phase transition at $d=2(k+1)$, a boundary past which (on the high dimension-to-smoothness side) linear smoothers fail to be consistent entirely. We also leverage recent results on discrete splines from Tibshirani (2020), in particular, discrete spline interpolation results that enable us to extend the KTF estimate to any off-lattice location in constant-time (independent of the size of the lattice $n$).
This study concerns probability distribution estimation of sample maximum. The traditional approach is the parametric fitting to the limiting distribution - the generalized extreme value distribution; however, the model in finite cases is misspecified to a certain extent. We propose a plug-in type of the kernel distribution estimator which does not need model specification. It is proved that both asymptotic convergence rates depend on the tail index and the second order parameter. As the tail gets light, the degree of misspecification of the parametric fitting becomes large, that means the convergence rate becomes slow. In the Weibull cases, which can be seen as the limit of tail-lightness, only the nonparametric distribution estimator keeps its consistency. Finally, we report results of numerical experiments and two real case studies.
Let $Q_{n}^{r}$ be the graph with vertex set $\{-1,1\}^{n}$ in which two vertices are joined if their Hamming distance is at most $r$. The edge-isoperimetric problem for $Q_{n}^{r}$ is that: For every $(n,r,M)$ such that $1\le r\le n$ and $1\le M\le2^{n}$, determine the minimum edge-boundary size of a subset of vertices of $Q_{n}^{r}$ with a given size $M$. In this paper, we apply two different approaches to prove bounds for this problem. The first approach is a linear programming approach and the second is a probabilistic approach. Our bound derived by the first approach generalizes the tight bound for $M=2^{n-1}$ derived by Kahn, Kalai, and Linial in 1989. Moreover, our bound is also tight for $M=2^{n-2}$ and $r\le\frac{n}{2}-1$. Our bounds derived by the second approach are expressed in terms of the \emph{noise stability}, and they are shown to be asymptotically tight as $n\to\infty$ when $r=2\lfloor\frac{\beta n}{2}\rfloor+1$ and $M=\lfloor\alpha2^{n}\rfloor$ for fixed $\alpha,\beta\in(0,1)$, and is tight up to a factor $2$ when $r=2\lfloor\frac{\beta n}{2}\rfloor$ and $M=\lfloor\alpha2^{n}\rfloor$. In fact, the edge-isoperimetric problem is equivalent to a ball-noise stability problem which is a variant of the traditional (i.i.d.-) noise stability problem. Our results can be interpreted as bounds for the ball-noise stability problem.
We study continuity of the roots of nonmonic polynomials as a function of their coefficients using only the most elementary results from an introductory course in real analysis and the theory of single variable polynomials. Our approach gives both qualitative and quantitative results in the case that the degree of the unperturbed polynomial can change under a perturbation of its coefficients, a case that naturally occurs, for instance, in stability theory of polynomials, singular perturbation theory, or in the perturbation theory for generalized eigenvalue problems. An application of our results in multivariate stability theory is provided which is important in, for example, the study of hyperbolic polynomials or realizability and synthesis problems in passive electrical network theory, and will be of general interest to mathematicians as well as physicists and engineers.
In this paper we prove upper and lower bounds on the minimal spherical dispersion. In particular, we see that the inverse $N(\varepsilon,d)$ of the minimal spherical dispersion is, for fixed $\varepsilon>0$, up to logarithmic terms linear in the dimension $d$. We also derive upper and lower bounds on the expected dispersion for points chosen independently and uniformly at random from the Euclidean unit sphere.
The mutual information (MI) of Gaussian multi-input multi-output (MIMO) channels has been evaluated by utilizing random matrix theory (RMT) and shown to asymptotically follow Gaussian distribution, where the ergodic mutual information (EMI) converges to a deterministic quantity. However, with non-Gaussian channels, there is a bias between the EMI and its deterministic equivalent (DE), whose evaluation is not available in the literature. This bias of the EMI is related to the bias for the trace of the resolvent in large RMT. In this paper, we first derive the bias for the trace of the resolvent, which is further extended to compute the bias for the linear spectral statistics (LSS). Then, we apply the above results on non-Gaussian MIMO channels to determine the bias for the EMI. It is also proved that the bias for the EMI is $-0.5$ times of that for the variance of the MI. Finally, the derived bias is utilized to modify the central limit theory (CLT) and calculate the outage probability. Numerical results show that the modified CLT significantly outperforms previous methods in approximating the distribution of the MI and improves the accuracy for the outage probability evaluation.
Existing analyses of optimization in deep learning are either continuous, focusing on (variants of) gradient flow, or discrete, directly treating (variants of) gradient descent. Gradient flow is amenable to theoretical analysis, but is stylized and disregards computational efficiency. The extent to which it represents gradient descent is an open question in the theory of deep learning. The current paper studies this question. Viewing gradient descent as an approximate numerical solution to the initial value problem of gradient flow, we find that the degree of approximation depends on the curvature around the gradient flow trajectory. We then show that over deep neural networks with homogeneous activations, gradient flow trajectories enjoy favorable curvature, suggesting they are well approximated by gradient descent. This finding allows us to translate an analysis of gradient flow over deep linear neural networks into a guarantee that gradient descent efficiently converges to global minimum almost surely under random initialization. Experiments suggest that over simple deep neural networks, gradient descent with conventional step size is indeed close to gradient flow. We hypothesize that the theory of gradient flows will unravel mysteries behind deep learning.
Mediation analysis is a strategy for understanding the mechanisms by which treatments or interventions affect later outcomes. Mediation analysis is frequently applied in randomized trial settings, but typically assumes: a) that randomized assignment is the exposure of interest as opposed to actual take-up of the intervention, and b) no unobserved confounding of the mediator-outcome relationship. In contrast to the rich literature on instrumental variable (IV) methods to estimate a total effect of a non-randomized exposure, there has been almost no research into using IV as an identification strategy in the presence of both exposure-outcome and mediator-outcome unobserved confounding. In response, we define and identify novel estimands -- complier interventional direct and indirect effects (i.e., IV mediational effects) in two scenarios: 1) with a single IV for the exposure, and 2) with two IVs, one for the exposure and another for the mediator, that may be related. We propose nonparametric, robust, efficient estimators, and apply them to a housing voucher experiment.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.
This paper considers the integrated problem of quay crane assignment, quay crane scheduling, yard location assignment, and vehicle dispatching operations at a container terminal. The main objective is to minimize vessel turnover times and maximize the terminal throughput, which are key economic drivers in terminal operations. Due to their computational complexities, these problems are not optimized jointly in existing work. This paper revisits this limitation and proposes Mixed Integer Programming (MIP) and Constraint Programming (CP) models for the integrated problem, under some realistic assumptions. Experimental results show that the MIP formulation can only solve small instances, while the CP model finds optimal solutions in reasonable times for realistic instances derived from actual container terminal operations.