Temporal data, obtained in the setting where it is only possible to observe one time point per trajectory, is widely used in different research fields, yet remains insufficiently addressed from the statistical point of view. Such data often contain observations of a large number of entities, in which case it is of interest to identify a small number of representative behavior types. In this paper, we propose a new method performing clustering simultaneously with alignment of temporal objects inferred from these data, providing insight into the relationships between the entities. A series of simulations confirm the ability of the proposed approach to leverage multiple properties of the complex data we target such as accessible uncertainties, correlations and a small number of time points. We illustrate it on real data encoding cellular response to a radiation treatment with high energy, supported with the results of an enrichment analysis.
We introduce a new type of influence function, the asymptotic expected sensitivity function, which is often equivalent to but mathematically more tractable than the traditional one based on the Gateaux derivative. To illustrate, we study the robustness of some important rank correlations, including Spearman's and Kendall's correlations, and the recently developed Chatterjee's correlation.
In the conventional change detection (CD) pipeline, two manually registered and labeled remote sensing datasets serve as the input of the model for training and prediction. However, in realistic scenarios, data from different periods or sensors could fail to be aligned as a result of various coordinate systems. Geometric distortion caused by coordinate shifting remains a thorny issue for CD algorithms. In this paper, we propose a reusable self-supervised framework for bitemporal geometric distortion in CD tasks. The whole framework is composed of Pretext Representation Pre-training, Bitemporal Image Alignment, and Down-stream Decoder Fine-Tuning. With only single-stage pre-training, the key components of the framework can be reused for assistance in the bitemporal image alignment, while simultaneously enhancing the performance of the CD decoder. Experimental results in 2 large-scale realistic scenarios demonstrate that our proposed method can alleviate the bitemporal geometric distortion in CD tasks.
The aim of change-point detection is to discover the changes in behavior that lie behind time sequence data. In this article, we study the case where the data comes from an inhomogeneous Poisson process or a marked Poisson process. We present a methodology for detecting multiple offline change-points based on a minimum contrast estimator. In particular, we explain how to handle the continuous nature of the process with the available discrete observations. In addition, we select the appropriate number of regimes via a cross-validation procedure which is really handy here due to the nature of the Poisson process. Through experiments on simulated and real data sets, we demonstrate the interest of the proposed method. The proposed method has been implemented in the R package \texttt{CptPointProcess} R.
Defining a successful notion of a multivariate quantile has been an open problem for more than half a century, motivating a plethora of possible solutions. Of these, the approach of [8] and [25] leading to M-quantiles, is very appealing for its mathematical elegance combining elements of convex analysis and probability theory. The key idea is the description of a convex function (the K-function) whose gradient (the K-transform) is in one-to-one correspondence between all of R^d and the unit ball in R^d. By analogy with the d=1 case where the K-transform is a cumulative distribution function-like object (an M-distribution), the fact that its inverse is guaranteed to exist lends itself naturally to providing the basis for the definition of a quantile function for all d>=1. Over the past twenty years the resulting M-quantiles have seen applications in a variety of fields, primarily for the purpose of detecting outliers in multidimensional spaces. In this article we prove that for odd d>=3, it is not the gradient but a poly-Laplacian of the K-function that is (almost everywhere) proportional to the density function. For d even one cannot establish a differential equation connecting the K-function with the density. These results show that usage of the K-transform for outlier detection in higher odd-dimensions is in principle flawed, as the K-transform does not originate from inversion of a true M-distribution. We demonstrate these conclusions in two dimensions through examples from non-standard asymmetric distributions. Our examples illustrate a feature of the K-transform whereby regions in the domain with higher density map to larger volumes in the co-domain, thereby producing a magnification effect that moves inliers closer to the boundary of the co-domain than outliers. This feature obviously disrupts any outlier detection mechanism that relies on the inverse K-transform.
We consider a logic with truth values in the unit interval and which uses aggregation functions instead of quantifiers, and we describe a general approach to asymptotic elimination of aggregation functions and, indirectly, of asymptotic elimination of Mostowski style generalized quantifiers, since such can be expressed by using aggregation functions. The notion of ``local continuity'' of an aggregation function, which we make precise in two (related) ways, plays a central role in this approach.
A general theory of efficient estimation for ergodic diffusion processes sampled at high frequency with an infinite time horizon is presented. High frequency sampling is common in many applications, with finance as a prominent example. The theory is formulated in term of approximate martingale estimating functions and covers a large class of estimators including most of the previously proposed estimators for diffusion processes. Easily checked conditions ensuring that an estimating function is an approximate martingale are derived, and general conditions ensuring consistency and asymptotic normality of estimators are given. Most importantly, simple conditions are given that ensure rate optimality and efficiency. Rate optimal estimators of parameters in the diffusion coefficient converge faster than estimators of drift coefficient parameters because they take advantage of the information in the quadratic variation. The conditions facilitate the choice among the multitude of estimators that have been proposed for diffusion models. Optimal martingale estimating functions in the sense of Godambe and Heyde and their high frequency approximations are, under weak conditions, shown to satisfy the conditions for rate optimality and efficiency. This provides a natural feasible method of constructing explicit rate optimal and efficient estimating functions by solving a linear equation.
Utilizing massive web-scale datasets has led to unprecedented performance gains in machine learning models, but also imposes outlandish compute requirements for their training. In order to improve training and data efficiency, we here push the limits of pruning large-scale multimodal datasets for training CLIP-style models. Today's most effective pruning method on ImageNet clusters data samples into separate concepts according to their embedding and prunes away the most prototypical samples. We scale this approach to LAION and improve it by noting that the pruning rate should be concept-specific and adapted to the complexity of the concept. Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training. By filtering from the LAION dataset, we find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs. More specifically, we are able to outperform the LAION-trained OpenCLIP-ViT-B32 model on ImageNet zero-shot accuracy by 1.1p.p. while only using 27.7% of the data and training compute. Despite a strong reduction in training cost, we also see improvements on ImageNet dist. shifts, retrieval tasks and VTAB. On the DataComp Medium benchmark, we achieve a new state-of-the-art ImageNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks.
We show how to learn discrete field theories from observational data of fields on a space-time lattice. For this, we train a neural network model of a discrete Lagrangian density such that the discrete Euler--Lagrange equations are consistent with the given training data. We, thus, obtain a structure-preserving machine learning architecture. Lagrangian densities are not uniquely defined by the solutions of a field theory. We introduce a technique to derive regularisers for the training process which optimise numerical regularity of the discrete field theory. Minimisation of the regularisers guarantees that close to the training data the discrete field theory behaves robust and efficient when used in numerical simulations. Further, we show how to identify structurally simple solutions of the underlying continuous field theory such as travelling waves. This is possible even when travelling waves are not present in the training data. This is compared to data-driven model order reduction based approaches, which struggle to identify suitable latent spaces containing structurally simple solutions when these are not present in the training data. Ideas are demonstrated on examples based on the wave equation and the Schr\"odinger equation.
We propose a MANOVA test for semicontinuous data that is applicable also when the dimensionality exceeds the sample size. The test statistic is obtained as a likelihood ratio, where numerator and denominator are computed at the maxima of penalized likelihood functions under each hypothesis. Closed form solutions for the regularized estimators allow us to avoid computational overheads. We derive the null distribution using a permutation scheme. The power and level of the resulting test are evaluated in a simulation study. We illustrate the new methodology with two original data analyses, one regarding microRNA expression in human blastocyst cultures, and another regarding alien plant species invasion in the island of Socotra (Yemen).
One tuple of probability vectors is more informative than another tuple when there exists a single stochastic matrix transforming the probability vectors of the first tuple into the probability vectors of the other. This is called matrix majorization. Solving an open problem raised by Mu et al, we show that if certain monotones - namely multivariate extensions of R\'{e}nyi divergences - are strictly ordered between the two tuples, then for sufficiently large $n$, there exists a stochastic matrix taking the $n$-fold Kronecker power of each input distribution to the $n$-fold Kronecker power of the corresponding output distribution. The same conditions, with non-strict ordering for the monotones, are also necessary for such matrix majorization in large samples. Our result also gives conditions for the existence of a sequence of statistical maps that asymptotically (with vanishing error) convert a single copy of each input distribution to the corresponding output distribution with the help of a catalyst that is returned unchanged. Allowing for transformation with arbitrarily small error, we find conditions that are both necessary and sufficient for such catalytic matrix majorization. We derive our results by building on a general algebraic theory of preordered semirings recently developed by one of the authors. This also allows us to recover various existing results on majorization in large samples and in the catalytic regime as well as relative majorization in a unified manner.