Gaussian elimination with partial pivoting (GEPP) remains the most common method to solve dense linear systems. Each GEPP step uses a row transposition pivot movement if needed to ensure the leading pivot entry is maximal in magnitude for the leading column of the remaining untriangularized subsystem. We will use theoretical and numerical approaches to study how often this pivot movement is needed. We provide full distributional descriptions for the number of pivot movements needed using GEPP using particular Haar random ensembles, as well as compare these models to other common transformations from randomized numerical linear algebra. Additionally, we introduce new random ensembles with fixed pivot movement counts and fixed sparsity, $\alpha$. Experiments estimating the empirical spectral density (ESD) of these random ensembles leads to a new conjecture on a universality class of random matrices with fixed sparsity whose scaled ESD converges to a measure on the complex unit disk that depends on $\alpha$ and is an interpolation of the uniform measure on the unit disk and the Dirac measure at the origin.
Efficient and accurate estimation of multivariate empirical probability distributions is fundamental to the calculation of information-theoretic measures such as mutual information and transfer entropy. Common techniques include variations on histogram estimation which, whilst computationally efficient, are often unable to precisely capture the probability density of samples with high correlation, kurtosis or fine substructure, especially when sample sizes are small. Adaptive partitions, which adjust heuristically to the sample, can reduce the bias imparted from the geometry of the histogram itself, but these have commonly focused on the location, scale and granularity of the partition, the effects of which are limited for highly correlated distributions. In this paper, I reformulate the differential entropy estimator for the special case of an equiprobable histogram, using a k-d tree to partition the sample space into bins of equal probability mass. By doing so, I expose an implicit rotational orientation parameter, which is conjectured to be suboptimally specified in the typical marginal alignment. I propose that the optimal orientation minimises the variance of the bin volumes, and demonstrate that improved entropy estimates can be obtained by rotationally aligning the partition to the sample distribution accordingly. Such optimal partitions are observed to be more accurate than existing techniques in estimating entropies of correlated bivariate Gaussian distributions with known theoretical values, across varying sample sizes (99% CI).
We present a simple method to approximate Rao's distance between multivariate normal distributions based on discretizing curves joining normal distributions and approximating Rao's distances between successive nearby normal distributions on the curves by the square root of Jeffreys divergence, the symmetrized Kullback-Leibler divergence. We consider experimentally the linear interpolation curves in the ordinary, natural and expectation parameterizations of the normal distributions, and compare these curves with a curve derived from the Calvo and Oller's isometric embedding of the Fisher-Rao $d$-variate normal manifold into the cone of $(d+1)\times (d+1)$ symmetric positive-definite matrices [Journal of multivariate analysis 35.2 (1990): 223-242]. We report on our experiments and assess the quality of our approximation technique by comparing the numerical approximations with both lower and upper bounds. Finally, we present several information-geometric properties of the Calvo and Oller's isometric embedding.
We review Quasi Maximum Likelihood estimation of factor models for high-dimensional panels of time series. We consider two cases: (1) estimation when no dynamic model for the factors is specified (Bai and Li, 2016); (2) estimation based on the Kalman smoother and the Expectation Maximization algorithm thus allowing to model explicitly the factor dynamics (Doz et al., 2012). Our interest is in approximate factor models, i.e., when we allow for the idiosyncratic components to be mildly cross-sectionally, as well as serially, correlated. Although such setting apparently makes estimation harder, we show, in fact, that factor models do not suffer of the curse of dimensionality problem, but instead they enjoy a blessing of dimensionality property. In particular, we show that if the cross-sectional dimension of the data, $N$, grows to infinity, then: (i) identification of the model is still possible, (ii) the mis-specification error due to the use of an exact factor model log-likelihood vanishes. Moreover, if we let also the sample size, $T$, grow to infinity, we can also consistently estimate all parameters of the model and make inference. The same is true for estimation of the latent factors which can be carried out by weighted least-squares, linear projection, or Kalman filtering/smoothing. We also compare the approaches presented with: Principal Component analysis and the classical, fixed $N$, exact Maximum Likelihood approach. We conclude with a discussion on efficiency of the considered estimators.
This paper develops fast and efficient algorithms for computing Tucker decomposition with a given multilinear rank. By combining random projection and the power scheme, we propose two efficient randomized versions for the truncated high-order singular value decomposition (T-HOSVD) and the sequentially T-HOSVD (ST-HOSVD), which are two common algorithms for approximating Tucker decomposition. To reduce the complexities of these two algorithms, fast and efficient algorithms are designed by combining two algorithms and approximate matrix multiplication. The theoretical results are also achieved based on the bounds of singular values of standard Gaussian matrices and the theoretical results for approximate matrix multiplication. Finally, the efficiency of these algorithms are illustrated via some test tensors from synthetic and real datasets.
We present an extension of the linear sampling method for solving the sound-soft inverse acoustic scattering problem with randomly distributed point sources. The theoretical justification of our sampling method is based on the Helmholtz--Kirchhoff identity, the cross-correlation between measurements, and the volume and imaginary near-field operators, which we introduce and analyze. Implementations in MATLAB using boundary elements, the SVD, Tikhonov regularization, and Morozov's discrepancy principle are also discussed. We demonstrate the robustness and accuracy of our algorithms with several numerical experiments in two dimensions.
In this article, we propose a two-sample test for functional observations modeled as elements of a separable Hilbert space. We present a general recipe for constructing a measure of dissimilarity between the distributions of two Hilbertian random variables and study the theoretical properties of one such measure which is constructed using Maximum Mean Discrepancy (MMD) on random linear projections of the distributions and aggregating them. We propose a data-driven estimate of this measure and use it as the test statistic. Large sample distributions of this statistic are derived both under null and alternative hypotheses. This test statistic involves a kernel function and the associated bandwidth. We prove that the resulting test has large-sample consistency for any data-driven choice of bandwidth that converges in probability to a positive number. Since the theoretical quantiles of the limiting null distribution are intractable, in practice, the test is calibrated using the permutation method. We also derive the limiting distribution of the permuted test statistic and the asymptotic power of the permutation test under local contiguous alternatives. This shows that the permutation test is consistent and statistically efficient in the Pitman sense. Extensive simulation studies are carried out and a real data set is analyzed to compare the performance of our proposed test with some state-of-the-art methods.
We consider the truncated multivariate normal distributions for which every component is one-sided truncated. We show that this family of distributions is an exponential family. We identify $\mathcal{D}$, the corresponding natural parameter space, and deduce that the family of distributions is not regular. We prove that the gradient of the cumulant-generating function of the family of distributions remains bounded near certain boundary points in $\mathcal{D}$, and therefore the family also is not steep. We also consider maximum likelihood estimation for $\boldsymbol{\mu}$, the location vector parameter, and $\boldsymbol{\Sigma}$, the positive definite (symmetric) matrix dispersion parameter, of a truncated non-singular multivariate normal distribution. We prove that each solution to the score equations for $(\boldsymbol{\mu},\boldsymbol{\Sigma})$ satisfies the method-of-moments equations, and we obtain a necessary condition for the existence of solutions to the score equations.
Physics-based covariance models provide a systematic way to construct covariance models that are consistent with the underlying physical laws in Gaussian process analysis. The unknown parameters in the covariance models can be estimated using maximum likelihood estimation, but direct construction of the covariance matrix and classical strategies of computing with it requires $n$ physical model runs, $n^2$ storage complexity, and $n^3$ computational complexity. To address such challenges, we propose to approximate the discretized covariance function using hierarchical matrices. By utilizing randomized range sketching for individual off-diagonal blocks, the construction process of the hierarchical covariance approximation requires $O(\log{n})$ physical model applications and the maximum likelihood computations require $O(n\log^2{n})$ effort per iteration. We propose a new approach to compute exactly the trace of products of hierarchical matrices which results in the expected Fischer information matrix being computable in $O(n\log^2{n})$ as well. The construction is totally matrix-free and the derivatives of the covariance matrix can then be approximated in the same hierarchical structure by differentiating the whole process. Numerical results are provided to demonstrate the effectiveness, accuracy, and efficiency of the proposed method for parameter estimations and uncertainty quantification.
The Weibull distribution, with shape parameter $k>0$ and scale parameter $\lambda>0$, is one of the most popular parametric distributions in survival analysis with complete or censored data. Although inference of the parameters of the Weibull distribution is commonly done through maximum likelihood, it is well established that the maximum likelihood estimate of the shape parameter is inadequate due to the associated large bias when the sample size is small or the proportion of censored data is large. This manuscript demonstrates how the Bayesian information-theoretic minimum message length principle coupled with a suitable choice of weakly informative prior distributions, can be used to infer Weibull distribution parameters given complete data or data with type I censoring. Empirical experiments show that the proposed minimum message length estimate of the shape parameter is superior to the maximum likelihood estimate and appears superior to other recently proposed modified maximum likelihood estimates in terms of Kullback-Leibler risk. Lastly, we derive an extension of the proposed method to data with type II censoring.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.