Diffusion-weighted magnetic resonance imaging (D-MRI) is an in-vivo and non-invasive imaging technology to probe anatomical architectures of biological samples. The anatomy of white matter fiber tracts in the brain can be revealed to help understanding of the connectivity patterns among different brain regions. In this paper, we propose a novel Nearest-neighbor Adaptive Regression Model (NARM) for adaptive estimation of the fiber orientation distribution (FOD) function based on D-MRI data, where spatial homogeneity is used to improve FOD estimation by incorporating neighborhood information. Specifically, we formulate the FOD estimation problem as a weighted linear regression problem, where the weights are chosen to account for spatial proximity and potential heterogeneity due to different fiber configurations. The weights are adaptively updated and a stopping rule based on nearest neighbor distance is designed to prevent over-smoothing. NARM is further extended to accommodate D-MRI data with multiple bvalues. Comprehensive simulation results demonstrate that NARM leads to satisfactory FOD reconstructions and performs better than voxel-wise estimation as well as competing smoothing methods. By applying NARM to real 3T D-MRI datasets, we demonstrate the effectiveness of NARM in recovering more realistic crossing fiber patterns and producing more coherent fiber tracking results, establishing the practical value of NARM for analyzing D-MRI data and providing reliable information on brain structural connectivity. * Jilei Yang and Seungyong Hwang are co-first authors
We study lower bounds for the problem of approximating a one dimensional distribution given (noisy) measurements of its moments. We show that there are distributions on $[-1,1]$ that cannot be approximated to accuracy $\epsilon$ in Wasserstein-1 distance even if we know \emph{all} of their moments to multiplicative accuracy $(1\pm2^{-\Omega(1/\epsilon)})$; this result matches an upper bound of Kong and Valiant [Annals of Statistics, 2017]. To obtain our result, we provide a hard instance involving distributions induced by the eigenvalue spectra of carefully constructed graph adjacency matrices. Efficiently approximating such spectra in Wasserstein-1 distance is a well-studied algorithmic problem, and a recent result of Cohen-Steiner et al. [KDD 2018] gives a method based on accurately approximating spectral moments using $2^{O(1/\epsilon)}$ random walks initiated at uniformly random nodes in the graph. As a strengthening of our main result, we show that improving the dependence on $1/\epsilon$ in this result would require a new algorithmic approach. Specifically, no algorithm can compute an $\epsilon$-accurate approximation to the spectrum of a normalized graph adjacency matrix with constant probability, even when given the transcript of $2^{\Omega(1/\epsilon)}$ random walks of length $2^{\Omega(1/\epsilon)}$ started at random nodes.
The negative multinomial distribution appears in many areas of applications such as polarimetric image processing and the analysis of longitudinal count data. In previous studies, Mosimann (1963) derived general formulas for the falling factorial moments of the negative multinomial distribution, while Withers & Nadarajah (2014) obtained expressions for the cumulants. Despite the availability of the moment generating function, no comprehensive formulas for the moments have been calculated thus far. This paper addresses this gap by presenting general formulas for both central and non-central moments of the negative multinomial distribution. These formulas are expressed in terms of binomial coefficients and Stirling numbers of the second kind. Utilizing these formulas, we provide explicit expressions for all central moments up to the 4th order and all non-central moments up to the 8th order.
Consistent weighted least square estimators are proposed for a wide class of nonparametric regression models with random regression function, where this real-valued random function of $k$ arguments is assumed to be continuous with probability 1. We obtain explicit upper bounds for the rate of uniform convergence in probability of the new estimators to the unobservable random regression function for both fixed or random designs. In contrast to the predecessors' results, the bounds for the convergence are insensitive to the correlation structure of the $k$-variate design points. As an application, we study the problem of estimating the mean and covariance functions of random fields with additive noise under dense data conditions. The theoretical results of the study are illustrated by simulation examples which show that the new estimators are more accurate in some cases than the Nadaraya--Watson ones. An example of processing real data on earthquakes in Japan in 2012--2021 is included.
Cross-validation is a widely used technique for evaluating the performance of prediction models. It helps avoid the optimism bias in error estimates, which can be significant for models built using complex statistical learning algorithms. However, since the cross-validation estimate is a random value dependent on observed data, it is essential to accurately quantify the uncertainty associated with the estimate. This is especially important when comparing the performance of two models using cross-validation, as one must determine whether differences in error estimates are a result of chance fluctuations. Although various methods have been developed for making inferences on cross-validation estimates, they often have many limitations, such as stringent model assumptions This paper proposes a fast bootstrap method that quickly estimates the standard error of the cross-validation estimate and produces valid confidence intervals for a population parameter measuring average model performance. Our method overcomes the computational challenge inherent in bootstrapping the cross-validation estimate by estimating the variance component within a random effects model. It is just as flexible as the cross-validation procedure itself. To showcase the effectiveness of our approach, we employ comprehensive simulations and real data analysis across three diverse applications.
We investigate a general matrix factorization for deviance-based data losses, extending the ubiquitous singular value decomposition beyond squared error loss. While similar approaches have been explored before, our method leverages classical statistical methodology from generalized linear models (GLMs) and provides an efficient algorithm that is flexible enough to allow for structural zeros via entry weights. Moreover, by adapting results from GLM theory, we provide support for these decompositions by (i) showing strong consistency under the GLM setup, (ii) checking the adequacy of a chosen exponential family via a generalized Hosmer-Lemeshow test, and (iii) determining the rank of the decomposition via a maximum eigenvalue gap method. To further support our findings, we conduct simulation studies to assess robustness to decomposition assumptions and extensive case studies using benchmark datasets from image face recognition, natural language processing, network analysis, and biomedical studies. Our theoretical and empirical results indicate that the proposed decomposition is more flexible, general, and robust, and can thus provide improved performance when compared to similar methods. To facilitate applications, an R package with efficient model fitting and family and rank determination is also provided.
The univariate generalized extreme value (GEV) distribution is the most commonly used tool for analyzing the properties of rare events. The ever greater utilization of Bayesian methods for extreme value analysis warrants detailed theoretical investigation, which has thus far been underdeveloped. Even the most basic asymptotic results are difficult to obtain because the GEV fails to satisfy standard regularity conditions. Here, we prove that the posterior distribution of the GEV parameter vector, given $n$ independent and identically distributed samples, converges in distribution to a trivariate normal distribution. The proof necessitates analyzing integrals of the GEV likelihood function over the entire parameter space, which requires considerable care because the support of the GEV density depends on the parameters in complicated ways.
We address the computational efficiency in solving the A-optimal Bayesian design of experiments problems for which the observational model is based on partial differential equations and, consequently, is computationally expensive to evaluate. A-optimality is a widely used and easy-to-interpret criterion for the Bayesian design of experiments. The criterion seeks the optimal experiment design by minimizing the expected conditional variance, also known as the expected posterior variance. This work presents a novel likelihood-free method for seeking the A-optimal design of experiments without sampling or integrating the Bayesian posterior distribution. In our approach, the expected conditional variance is obtained via the variance of the conditional expectation using the law of total variance, while we take advantage of the orthogonal projection property to approximate the conditional expectation. Through an asymptotic error estimation, we show that the intractability of the posterior does not affect the performance of our approach. We use an artificial neural network (ANN) to approximate the nonlinear conditional expectation to implement our method. For dealing with continuous experimental design parameters, we integrate the training process of the ANN into minimizing the expected conditional variance. Specifically, we propose a non-local approximation of the conditional expectation and apply transfer learning to reduce the number of evaluations of the observation model. Through numerical experiments, we demonstrate that our method significantly reduces the number of observational model evaluations compared with common importance sampling-based approaches. This reduction is crucial, considering the computationally expensive nature of these models.
Univariate and multivariate normal probability distributions are widely used when modeling decisions under uncertainty. Computing the performance of such models requires integrating these distributions over specific domains, which can vary widely across models. Besides some special cases, there exist no general analytical expressions, standard numerical methods or software for these integrals. Here we present mathematical results and open-source software that provide (i) the probability in any domain of a normal in any dimensions with any parameters, (ii) the probability density, cumulative distribution, and inverse cumulative distribution of any function of a normal vector, (iii) the classification errors among any number of normal distributions, the Bayes-optimal discriminability index and relation to the operating characteristic, (iv) dimension reduction and visualizations for such problems, and (v) tests for how reliably these methods may be used on given data. We demonstrate these tools with vision research applications of detecting occluding objects in natural scenes, and detecting camouflage.
We present prompt distribution learning for effectively adapting a pre-trained vision-language model to address downstream recognition tasks. Our method not only learns low-bias prompts from a few samples but also captures the distribution of diverse prompts to handle the varying visual representations. In this way, we provide high-quality task-related content for facilitating recognition. This prompt distribution learning is realized by an efficient approach that learns the output embeddings of prompts instead of the input embeddings. Thus, we can employ a Gaussian distribution to model them effectively and derive a surrogate loss for efficient training. Extensive experiments on 12 datasets demonstrate that our method consistently and significantly outperforms existing methods. For example, with 1 sample per category, it relatively improves the average result by 9.1% compared to human-crafted prompts.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.