Sufficient conditions are provided under which the log-likelihood ratio test statistic fails to have a limiting chi-squared distribution under the null hypothesis when testing between one and two components under a general two-component mixture model, but rather tends to infinity in probability. These conditions are verified when the component densities describe continuous-time, discrete-statespace Markov chains and the results are illustrated via a parametric bootstrap simulation on an analysis of the migrations over time of a set of corporate bonds ratings. The precise limiting distribution is derived in a simple case with two states, one of which is absorbing which leads to a right-censored exponential scale mixture model. In that case, when centred by a function growing logarithmically in the sample size, the statistic has a limiting distribution of Gumbel extreme-value type rather than chi-squared.
This paper considers the problem of measure estimation under the barycentric coding model (BCM), in which an unknown measure is assumed to belong to the set of Wasserstein-2 barycenters of a finite set of known measures. Estimating a measure under this model is equivalent to estimating the unknown barycenteric coordinates. We provide novel geometrical, statistical, and computational insights for measure estimation under the BCM, consisting of three main results. Our first main result leverages the Riemannian geometry of Wasserstein-2 space to provide a procedure for recovering the barycentric coordinates as the solution to a quadratic optimization problem assuming access to the true reference measures. The essential geometric insight is that the parameters of this quadratic problem are determined by inner products between the optimal displacement maps from the given measure to the reference measures defining the BCM. Our second main result then establishes an algorithm for solving for the coordinates in the BCM when all the measures are observed empirically via i.i.d. samples. We prove precise rates of convergence for this algorithm -- determined by the smoothness of the underlying measures and their dimensionality -- thereby guaranteeing its statistical consistency. Finally, we demonstrate the utility of the BCM and associated estimation procedures in three application areas: (i) covariance estimation for Gaussian measures; (ii) image processing; and (iii) natural language processing.
One of the main features of interest in analysing the light curves of stars is the underlying periodic behaviour. The corresponding observations are a complex type of time series with unequally spaced time points and are sometimes accompanied by varying measures of accuracy. The main tools for analysing these type of data rely on the periodogram-like functions, constructed with a desired feature so that the peaks indicate the presence of a potential period. In this paper, we explore a particular periodogram for the irregularly observed time series data, similar to Thieler et. al. (2013). We identify the potential periods at the appropriate peaks and more importantly with a quantifiable uncertainty. Our approach is shown to easily generalise to non-parametric methods including a weighted Gaussian process regression periodogram. We also extend this approach to correlated background noise. The proposed method for period detection relies on a test based on quadratic forms with normally distributed components. We implement the saddlepoint approximation, as a faster and more accurate alternative to the simulation-based methods that are currently used. The power analysis of the testing methodology is reported together with applications using light curves from the Hunting Outbursting Young Stars citizen science project.
This paper presents DRE-CUSUM, an unsupervised density-ratio estimation (DRE) based approach to determine statistical changes in time-series data when no knowledge of the pre-and post-change distributions are available. The core idea behind the proposed approach is to split the time-series at an arbitrary point and estimate the ratio of densities of distribution (using a parametric model such as a neural network) before and after the split point. The DRE-CUSUM change detection statistic is then derived from the cumulative sum (CUSUM) of the logarithm of the estimated density ratio. We present a theoretical justification as well as accuracy guarantees which show that the proposed statistic can reliably detect statistical changes, irrespective of the split point. While there have been prior works on using density ratio based methods for change detection, to the best of our knowledge, this is the first unsupervised change detection approach with a theoretical justification and accuracy guarantees. The simplicity of the proposed framework makes it readily applicable in various practical settings (including high-dimensional time-series data); we also discuss generalizations for online change detection. We experimentally show the superiority of DRE-CUSUM using both synthetic and real-world datasets over existing state-of-the-art unsupervised algorithms (such as Bayesian online change detection, its variants as well as several other heuristic methods).
We introduce a novel, probabilistic binary latent variable model to detect noisy or approximate repeats of patterns in sparse binary data. The model is based on the "Noisy-OR model" (Heckerman, 1990), used previously for disease and topic modelling. The model's capability is demonstrated by extracting structure in recordings from retinal neurons, but it can be widely applied to discover and model latent structure in noisy binary data. In the context of spiking neural data, the task is to "explain" spikes of individual neurons in terms of groups of neurons, "Cell Assemblies" (CAs), that often fire together, due to mutual interactions or other causes. The model infers sparse activity in a set of binary latent variables, each describing the activity of a cell assembly. When the latent variable of a cell assembly is active, it reduces the probabilities of neurons belonging to this assembly to be inactive. The conditional probability kernels of the latent components are learned from the data in an expectation maximization scheme, involving inference of latent states and parameter adjustments to the model. We thoroughly validate the model on synthesized spike trains constructed to statistically resemble recorded retinal responses to white noise stimulus and natural movie stimulus in data. We also apply our model to spiking responses recorded in retinal ganglion cells (RGCs) during stimulation with a movie and discuss the found structure.
The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.
Statistical divergences (SDs), which quantify the dissimilarity between probability distributions, are a basic constituent of statistical inference and machine learning. A modern method for estimating those divergences relies on parametrizing an empirical variational form by a neural network (NN) and optimizing over parameter space. Such neural estimators are abundantly used in practice, but corresponding performance guarantees are partial and call for further exploration. We establish non-asymptotic absolute error bounds for a neural estimator realized by a shallow NN, focusing on four popular $\mathsf{f}$-divergences -- Kullback-Leibler, chi-squared, squared Hellinger, and total variation. Our analysis relies on non-asymptotic function approximation theorems and tools from empirical process theory to bound the two sources of error involved: function approximation and empirical estimation. The bounds characterize the effective error in terms of NN size and the number of samples, and reveal scaling rates that ensure consistency. For compactly supported distributions, we further show that neural estimators of the first three divergences above with appropriate NN growth-rate are minimax rate-optimal, achieving the parametric convergence rate.
Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).
Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose 'Localization Recall Precision (LRP) Error', a new metric which we specifically designed for object detection. LRP Error is composed of three components related to localization, false negative (FN) rate and false positive (FP) rate. Based on LRP, we introduce the 'Optimal LRP', the minimum achievable LRP error representing the best achievable configuration of the detector in terms of recall-precision and the tightness of the boxes. In contrast to AP, which considers precisions over the entire recall domain, Optimal LRP determines the 'best' confidence score threshold for a class, which balances the trade-off between localization and recall-precision. In our experiments, we show that, for state-of-the-art object (SOTA) detectors, Optimal LRP provides richer and more discriminative information than AP. We also demonstrate that the best confidence score thresholds vary significantly among classes and detectors. Moreover, we present LRP results of a simple online video object detector which uses a SOTA still image object detector and show that the class-specific optimized thresholds increase the accuracy against the common approach of using a general threshold for all classes. We provide the source code that can compute LRP for the PASCAL VOC and MSCOCO datasets in //github.com/cancam/LRP. Our source code can easily be adapted to other datasets as well.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.