We introduce the Gaussian orthogonal latent factor processes for modeling and predicting large correlated data. To handle the computational challenge, we first decompose the likelihood function of the Gaussian random field with a multi-dimensional input domain into a product of densities at the orthogonal components with lower-dimensional inputs. The continuous-time Kalman filter is implemented to compute the likelihood function efficiently without making approximations. We also show that the posterior distribution of the factor processes is independent, as a consequence of prior independence of factor processes and orthogonal factor loading matrix. For studies with large sample sizes, we propose a flexible way to model the mean, and we derive the marginal posterior distribution to solve identifiability issues in sampling these parameters. Both simulated and real data applications confirm the outstanding performance of this method.
Crimes emerge out of complex interactions of human behaviors and situations. Linkages between crime incidents are highly complex. Detecting crime linkage given a set of incidents is a highly challenging task since we only have limited information, including text descriptions, incident times, and locations. In practice, there are very few labels. We propose a new statistical modeling framework for {\it spatio-temporal-textual} data and demonstrate its usage on crime linkage detection. We capture linkages of crime incidents via multivariate marked spatio-temporal Hawkes processes and treat embedding vectors of the free-text as {\it marks} of the incident, inspired by the notion of {\it modus operandi} (M.O.) in crime analysis. Numerical results using real data demonstrate the good performance of our method as well as reveals interesting patterns in the crime data: the joint modeling of space, time, and text information enhances crime linkage detection compared with the state-of-the-art, and the learned spatial dependence from data can be useful for police operations.
Conditional Neural Processes (CNP; Garnelo et al., 2018) are an attractive family of meta-learning models which produce well-calibrated predictions, enable fast inference at test time, and are trainable via a simple maximum likelihood procedure. A limitation of CNPs is their inability to model dependencies in the outputs. This significantly hurts predictive performance and renders it impossible to draw coherent function samples, which limits the applicability of CNPs in down-stream applications and decision making. Neural Processes (NPs; Garnelo et al., 2018) attempt to alleviate this issue by using latent variables, relying on these to model output dependencies, but introduces difficulties stemming from approximate inference. One recent alternative (Bruinsma et al.,2021), which we refer to as the FullConvGNP, models dependencies in the predictions while still being trainable via exact maximum-likelihood. Unfortunately, the FullConvGNP relies on expensive 2D-dimensional convolutions, which limit its applicability to only one-dimensional data. In this work, we present an alternative way to model output dependencies which also lends itself maximum likelihood training but, unlike the FullConvGNP, can be scaled to two- and three-dimensional data. The proposed models exhibit good performance in synthetic experiments.
Inferring the input parameters of simulators from observations is a crucial challenge with applications from epidemiology to molecular dynamics. Here we show a simple approach in the regime of sparse data and approximately correct models, which is common when trying to use an existing model to infer latent variables with observed data. This approach is based on the principle of maximum entropy (MaxEnt) and provably makes the smallest change in the latent joint distribution to fit new data. This method requires no likelihood or model derivatives and its fit is insensitive to prior strength, removing the need to balance observed data fit with prior belief. The method requires the ansatz that data is fit in expectation, which is true in some settings and may be reasonable in all with few data points. The method is based on sample reweighting, so its asymptotic run time is independent of prior distribution dimension. We demonstrate this MaxEnt approach and compare with other likelihood-free inference methods across three systems: a point particle moving in a gravitational field, a compartmental model of epidemic spread and finally molecular dynamics simulation of a protein.
Replication studies are increasingly conducted but there is no established statistical criterion for replication success. We propose a novel approach combining reverse-Bayes analysis with Bayesian hypothesis testing: a sceptical prior is determined for the effect size such that the original finding is no longer convincing in terms of a Bayes factor. This prior is then contrasted to an advocacy prior (the reference posterior of the effect size based on the original study), and replication success is declared if the replication data favour the advocacy over the sceptical prior at a higher level than the original data favoured the sceptical prior over the null hypothesis. The sceptical Bayes factor is the highest level where replication success can be declared. A comparison to existing methods reveals that the sceptical Bayes factor combines several notions of replicability: it ensures that both studies show sufficient evidence against the null and penalises incompatibility of their effect estimates. Analysis of asymptotic properties and error rates, as well as case studies from the Social Sciences Replication Project show the advantages of the method for the assessment of replicability.
We study the distribution of the {\it matrix product} $G_1 G_2 \cdots G_r$ of $r$ independent Gaussian matrices of various sizes, where $G_i$ is $d_{i-1} \times d_i$, and we denote $p = d_0$, $q = d_r$, and require $d_1 = d_{r-1}$. Here the entries in each $G_i$ are standard normal random variables with mean $0$ and variance $1$. Such products arise in the study of wireless communication, dynamical systems, and quantum transport, among other places. We show that, provided each $d_i$, $i = 1, \ldots, r$, satisfies $d_i \geq C p \cdot q$, where $C \geq C_0$ for a constant $C_0 > 0$ depending on $r$, then the matrix product $G_1 G_2 \cdots G_r$ has variation distance at most $\delta$ to a $p \times q$ matrix $G$ of i.i.d.\ standard normal random variables with mean $0$ and variance $\prod_{i=1}^{r-1} d_i$. Here $\delta \rightarrow 0$ as $C \rightarrow \infty$. Moreover, we show a converse for constant $r$ that if $d_i < C' \max\{p,q\}^{1/2}\min\{p,q\}^{3/2}$ for some $i$, then this total variation distance is at least $\delta'$, for an absolute constant $\delta' > 0$ depending on $C'$ and $r$. This converse is best possible when $p=\Theta(q)$.
Non-Euclidean data that are indexed with a scalar predictor such as time are increasingly encountered in data applications, while statistical methodology and theory for such random objects are not well developed yet. To address the need for new methodology in this area, we develop a total variation regularization technique for nonparametric Fr\'echet regression, which refers to a regression setting where a response residing in a metric space is paired with a scalar predictor and the target is a conditional Fr\'echet mean. Specifically, we seek to approximate an unknown metric-space valued function by an estimator that minimizes the Fr\'echet version of least squares and at the same time has small total variation, appropriately defined for metric-space valued objects. We show that the resulting estimator is representable by a piece-wise constant function and establish the minimax convergence rate of the proposed estimator for metric data objects that reside in Hadamard spaces. We illustrate the numerical performance of the proposed method for both simulated and real data, including metric spaces of symmetric positive-definite matrices with the affine-invariant distance, of probability distributions on the real line with the Wasserstein distance, and of phylogenetic trees with the Billera--Holmes--Vogtmann metric.
It is shown that some theoretically identifiable parameters cannot be identified from data, meaning that no consistent estimator of them can exist. An important example is a constant correlation between Gaussian observations (in presence of such correlation not even the mean can be identified from data). Identifiability and three versions of distinguishability from data are defined. Two different constant correlations between Gaussian observations cannot even be distinguished from data. A further example are cluster membership parameters in $k$-means clustering. Several existing results in the literature are connected to the new framework.
In this paper, we present a maximum likelihood method for estimating the parameters of a univariate Hawkes process with self-excitation or inhibition. Our work generalizes techniques and results that were restricted to the self-exciting scenario. The proposed estimator is implemented for the classical exponential kernel and we show that, in the inhibition context, our procedure provides more accurate estimations than current alternative approaches.
We consider multivariate centered Gaussian models for the random variable $Z=(Z_1,\ldots, Z_p)$, invariant under the action of a subgroup of the group of permutations on $\{1,\ldots, p\}$. Using the representation theory of the symmetric group on the field of reals, we derive the distribution of the maximum likelihood estimate of the covariance parameter $\Sigma$ and also the analytic expression of the normalizing constant of the Diaconis-Ylvisaker conjugate prior for the precision parameter $K=\Sigma^{-1}$. We can thus perform Bayesian model selection in the class of complete Gaussian models invariant by the action of a subgroup of the symmetric group, which we could also call complete RCOP models. We illustrate our results with a toy example of dimension $4$ and several examples for selection within cyclic groups, including a high dimensional example with $p=100$.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.