Linear discriminant analysis is a typical method used in the case of large dimension and small samples. There are various types of linear discriminant analysis methods, which are based on the estimations of the covariance matrix and mean vectors. Although there are many methods for estimating the inverse matrix of covariance and the mean vectors, we consider shrinkage methods based on non-parametric approach. In the case of the precision matrix, the methods based on either the sparsity structure or the data splitting are considered. Regarding the estimation of mean vectors, nonparametric empirical Bayes (NPEB) estimator and nonparametric maximum likelihood estimation (NPMLE) methods are adopted which are also called f-modeling and g-modeling, respectively. We analyzed the performances of linear discriminant rules which are based on combined estimation strategies of the covariance matrix and mean vectors. In particular, we present a theoretical result on the performance of the NPEB method and compare that with the results from other methods in previous studies. We provide simulation studies for various structures of covariance matrices and mean vectors to evaluate the methods considered in this paper. In addition, real data examples such as gene expressions and EEG data are presented.
We consider least squares estimators of the finite regression parameter $\alpha$ in the single index regression model $Y=\psi(\alpha^T X)+\epsilon$, where $X$ is a $d$-dimensional random vector, $\E(Y|X)=\psi(\alpha^T X)$, and where $\psi$ is monotone. It has been suggested to estimate $\alpha$ by a profile least squares estimator, minimizing $\sum_{i=1}^n(Y_i-\psi(\alpha^T X_i))^2$ over monotone $\psi$ and $\alpha$ on the boundary $S_{d-1}$of the unit ball. Although this suggestion has been around for a long time, it is still unknown whether the estimate is $\sqrt{n}$ convergent. We show that a profile least squares estimator, using the same pointwise least squares estimator for fixed $\alpha$, but using a different global sum of squares, is $\sqrt{n}$-convergent and asymptotically normal. The difference between the corresponding loss functions is studied and also a comparison with other methods is given.
Learning decompositions of expensive-to-evaluate black-box functions promises to scale Bayesian optimisation (BO) to high-dimensional problems. However, the success of these techniques depends on finding proper decompositions that accurately represent the black-box. While previous works learn those decompositions based on data, we investigate data-independent decomposition sampling rules in this paper. We find that data-driven learners of decompositions can be easily misled towards local decompositions that do not hold globally across the search space. Then, we formally show that a random tree-based decomposition sampler exhibits favourable theoretical guarantees that effectively trade off maximal information gain and functional mismatch between the actual black-box and its surrogate as provided by the decomposition. Those results motivate the development of the random decomposition upper-confidence bound algorithm (RDUCB) that is straightforward to implement - (almost) plug-and-play - and, surprisingly, yields significant empirical gains compared to the previous state-of-the-art on a comprehensive set of benchmarks. We also confirm the plug-and-play nature of our modelling component by integrating our method with HEBO, showing improved practical gains in the highest dimensional tasks from Bayesmark.
Background: Outcome measures that are count variables with excessive zeros are common in health behaviors research. There is a lack of empirical data about the relative performance of prevailing statistical models when outcomes are zero-inflated, particularly compared with recently developed approaches. Methods: The current simulation study examined five commonly used analytical approaches for count outcomes, including two linear models (with outcomes on raw and log-transformed scales, respectively) and three count distribution-based models (i.e., Poisson, negative binomial, and zero-inflated Poisson (ZIP) models). We also considered the marginalized zero-inflated Poisson (MZIP) model, a novel alternative that estimates the effects on overall mean while adjusting for zero-inflation. Extensive simulations were conducted to evaluate their the statistical power and Type I error rate across various data conditions. Results: Under zero-inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non-zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on raw scale, negative binomial model, and ZIP model. The performance of a linear model with a log-transformed outcome variable was unsatisfactory. When only one of the effects on the zero (vs. non-zero) part and the count part existed, the ZIP model had the highest statistical power. Conclusions: The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero-inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.
When an exposure of interest is confounded by unmeasured factors, an instrumental variable (IV) can be used to identify and estimate certain causal contrasts. Identification of the marginal average treatment effect (ATE) from IVs typically relies on strong untestable structural assumptions. When one is unwilling to assert such structural assumptions, IVs can nonetheless be used to construct bounds on the ATE. Famously, Balke and Pearl (1997) employed linear programming techniques to prove tight bounds on the ATE for a binary outcome, in a randomized trial with noncompliance and no covariate information. We demonstrate how these bounds remain useful in observational settings with baseline confounders of the IV, as well as randomized trials with measured baseline covariates. The resulting lower and upper bounds on the ATE are non-smooth functionals, and thus standard nonparametric efficiency theory is not immediately applicable. To remedy this, we propose (1) estimators of smooth approximations of these bounds, and (2) under a novel margin condition, influence function-based estimators of the ATE bounds that can attain parametric convergence rates when the nuisance functions are modeled flexibly. We propose extensions to continuous outcomes, and finally, illustrate the proposed estimators in a randomized experiment studying the effects of influenza vaccination encouragement on flu-related hospital visits.
The multivariate coefficient of variation (MCV) is an attractive and easy-to-interpret effect size for the dispersion in multivariate data. Recently, the first inference methods for the MCV were proposed by Ditzhaus and Smaga (2022) for general factorial designs covering k-sample settings but also complex higher-way layouts. However, two questions are still pending: (1) The theory on inference methods for MCV is primarily derived for one special MCV variant while there are several reasonable proposals. (2) When rejecting a global null hypothesis in factorial designs, a more in-depth analysis is typically of high interest to find the specific contrasts of MCV leading to the aforementioned rejection. In this paper, we tackle both by, first, extending the aforementioned nonparametric permutation procedure to the other MCV variants and, second, by proposing a max-type test for post hoc analysis. To improve the small sample performance of the latter, we suggest a novel studentized bootstrap strategy and prove its asymptotic validity. The actual performance of all proposed tests and post hoc procedures are compared in an extensive simulation study and illustrated by a real data analysis.
We consider the problem of estimating the optimal transport map between two probability distributions, $P$ and $Q$ in $\mathbb R^d$, on the basis of i.i.d. samples. All existing statistical analyses of this problem require the assumption that the transport map is Lipschitz, a strong requirement that, in particular, excludes any examples where the transport map is discontinuous. As a first step towards developing estimation procedures for discontinuous maps, we consider the important special case where the data distribution $Q$ is a discrete measure supported on a finite number of points in $\mathbb R^d$. We study a computationally efficient estimator initially proposed by Pooladian and Niles-Weed (2021), based on entropic optimal transport, and show in the semi-discrete setting that it converges at the minimax-optimal rate $n^{-1/2}$, independent of dimension. Other standard map estimation techniques both lack finite-sample guarantees in this setting and provably suffer from the curse of dimensionality. We confirm these results in numerical experiments, and provide experiments for other settings, not covered by our theory, which indicate that the entropic estimator is a promising methodology for other discontinuous transport map estimation problems.
The geometric optimisation of crystal structures is a procedure widely used in Chemistry that changes the geometrical placement of the particles inside a structure. It is called structural relaxation and constitutes a local minimization problem with a non-convex objective function whose domain complexity increases along with the number of particles involved. In this work we study the performance of the two most popular first order optimisation methods, Gradient Descent and Conjugate Gradient, in structural relaxation. The respective pseudocodes can be found in Section 6. Although frequently employed, there is a lack of their study in this context from an algorithmic point of view. In order to accurately define the problem, we provide a thorough derivation of all necessary formulae related to the crystal structure energy function and the function's differentiation. We run each algorithm in combination with a constant step size, which provides a benchmark for the methods' analysis and direct comparison. We also design dynamic step size rules and study how these improve the two algorithms' performance. Our results show that there is a trade-off between convergence rate and the possibility of an experiment to succeed, hence we construct a function to assign utility to each method based on our respective preference. The function is built according to a recently introduced model of preference indication concerning algorithms with deadline and their run time. Finally, building on all our insights from the experimental results, we provide algorithmic recipes that best correspond to each of the presented preferences and select one recipe as the optimal for equally weighted preferences.
We provide sparse principal loading analysis which is a new concept that reduces dimensionality of cross sectional data and identifies the underlying covariance structure. Sparse principal loading analysis selects a subset of existing variables for dimensionality reduction while variables that have a small distorting effect on the covariance matrix are discarded. Therefore, we show how to detect these variables and provide methods to assess their magnitude of distortion. Sparse principal loading analysis is twofold and can also identify the underlying block diagonal covariance structure using sparse loadings. This is a new approach in this context and we provide a required criterion to evaluate if the found block-structure fits the sample. The method uses sparse loadings rather than eigenvectors to decompose the covariance matrix which can result in a large loss of information if the loadings of choice are too sparse. However, we show that this is no concern in our new concept because sparseness is controlled by the aforementioned evaluation criterion. Further, we show the advantages of sparse principal loading analysis both in the context of variable selection and covariance structure detection, and illustrate the performance of the method with simulations and on real datasets. Supplementary material for this article is available online.
We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes. We achieve this by assigning probability measures to the spatial domain, which allows us to treat collocation grids probabilistically as random variables to be marginalised out. Adapting this spatial statistics view, we solve forward and inverse problems for parametric PDEs in a way that leads to the construction of Gaussian process models of solution fields. The implementation of these random grids poses a unique set of challenges for inverse physics informed deep learning frameworks and we propose a new architecture called Grid Invariant Convolutional Networks (GICNets) to overcome these challenges. We further show how to incorporate noisy data in a principled manner into our physics informed model to improve predictions for problems where data may be available but whose measurement location does not coincide with any fixed mesh or grid. The proposed method is tested on a nonlinear Poisson problem, Burgers equation, and Navier-Stokes equations, and we provide extensive numerical comparisons. We demonstrate significant computational advantages over current physics informed neural learning methods for parametric PDEs while improving the predictive capabilities and flexibility of these models.
Hyperdimensional computing (HDC) uses binary vectors of high dimensions to perform classification. Due to its simplicity and massive parallelism, HDC can be highly energy-efficient and well-suited for resource-constrained platforms. However, in trading off orthogonality with efficiency, hypervectors may use tens of thousands of dimensions. In this paper, we will examine the necessity for such high dimensions. In particular, we give a detailed theoretical analysis of the relationship among dimensions of hypervectors, accuracy, and orthogonality. The main conclusion of this study is that a much lower dimension, typically less than 100, can also achieve similar or even higher detecting accuracy compared with other state-of-the-art HDC models. Based on this insight, we propose a suite of novel techniques to build HDC models that use binary hypervectors of dimensions that are orders of magnitude smaller than those found in the state-of-the-art HDC models, yet yield equivalent or even improved accuracy and efficiency. For image classification, we achieved an HDC accuracy of 96.88\% with a dimension of only 32 on the MNIST dataset. We further explore our methods on more complex datasets like CIFAR-10 and show the limits of HDC computing.