Clustering has received much attention in Statistics and Machine learning with the aim of developing statistical models and autonomous algorithms which are capable of acquiring information from raw data in order to perform exploratory analysis.Several techniques have been developed to cluster sampled univariate vectors only considering the average value over the whole period and as such they have not been able to explore fully the underlying distribution as well as other features of the data, especially in presence of structured time series. We propose a model-based clustering technique that is based on quantile regression permitting us to cluster bivariate time series at different quantile levels. We model the within cluster density using asymmetric Laplace distribution allowing us to take into account asymmetry in the distribution of the data. We evaluate the performance of the proposed technique through a simulation study. The method is then applied to cluster time series observed from Glob-colour satellite data related to trophic status indices with aim of evaluating their temporal dynamics in order to identify homogeneous areas, in terms of trophic status, in the Gulf of Gabes.
Recently, high dimensional vector auto-regressive models (VAR), have attracted a lot of interest, due to novel applications in the health, engineering and social sciences. The presence of temporal dependence poses additional challenges to the theory of penalized estimation techniques widely used in the analysis of their iid counterparts. However, recent work (e.g., [Basu and Michailidis, 2015, Kock and Callot, 2015]) has established optimal consistency of $\ell_1$-LASSO regularized estimates applied to models involving high dimensional stable, Gaussian processes. The only price paid for temporal dependence is an extra multiplicative factor that equals 1 for independent and identically distributed (iid) data. Further, [Wong et al., 2020] extended these results to heavy tailed VARs that exhibit "$\beta$-mixing" dependence, but the rates rates are sub-optimal, while the extra factor is intractable. This paper improves these results in two important directions: (i) We establish optimal consistency rates and corresponding finite sample bounds for the underlying model parameters that match those for iid data, modulo a price for temporal dependence, that is easy to interpret and equals 1 for iid data. (ii) We incorporate more general penalties in estimation (which are not decomposable unlike the $\ell_1$ norm) to induce general sparsity patterns. The key technical tool employed is a novel, easy-to-use concentration bound for heavy tailed linear processes, that do not rely on "mixing" notions and give tighter bounds.
In data science, vector autoregression (VAR) models are popular in modeling multivariate time series in the environmental sciences and other applications. However, these models are computationally complex with the number of parameters scaling quadratically with the number of time series. In this work, we propose a so-called neighborhood vector autoregression (NVAR) model to efficiently analyze large-dimensional multivariate time series. We assume that the time series have underlying neighborhood relationships, e.g., spatial or network, among them based on the inherent setting of the problem. When this neighborhood information is available or can be summarized using a distance matrix, we demonstrate that our proposed NVAR method provides a computationally efficient and theoretically sound estimation of model parameters. The performance of the proposed method is compared with other existing approaches in both simulation studies and a real application of stream nitrogen study.
Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and forecasting with machine learning models in academia and industry are becoming more frequent, related concepts and methods have not been formalized and structured under a holistic view of the entire field. Here, we review the topic of predictive uncertainty estimation with machine learning algorithms, as well as the related metrics (consistent scoring functions and proper scoring rules) for assessing probabilistic predictions. The review covers a time period spanning from the introduction of early statistical (linear regression and time series models, based on Bayesian statistics or quantile regression) to recent machine learning algorithms (including generalized additive models for location, scale and shape, random forests, boosting and deep learning algorithms) that are more flexible by nature. The review of the progress in the field, expedites our understanding on how to develop new algorithms tailored to users' needs, since the latest advancements are based on some fundamental concepts applied to more complex algorithms. We conclude by classifying the material and discussing challenges that are becoming a hot topic of research.
Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data distribution drifts over time, then the data points are no longer exchangeable; moreover, in such settings, we might want to use a nonsymmetric algorithm that treats recent observations as more relevant. This paper generalizes conformal prediction to deal with both aspects: we employ weighted quantiles to introduce robustness against distribution drift, and design a new randomization technique to allow for algorithms that do not treat data points symmetrically. Our new methods are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable. We demonstrate the practical utility of these new tools with simulations and real-data experiments on electricity and election forecasting.
As a special infinite-order vector autoregressive (VAR) model, the vector autoregressive moving average (VARMA) model can capture much richer temporal patterns than the widely used finite-order VAR model. However, its practicality has long been hindered by its non-identifiability, computational intractability, and relative difficulty of interpretation. This paper introduces a novel infinite-order VAR model which, with only a little sacrifice of generality, inherits the essential temporal patterns of the VARMA model but avoids all of the above drawbacks. As another attractive feature, the temporal and cross-sectional dependence structures of this model can be interpreted separately, since they are characterized by different sets of parameters. For high-dimensional time series, this separation motivates us to impose sparsity on the parameters determining the cross-sectional dependence. As a result, greater statistical efficiency and interpretability can be achieved, while no loss of temporal information is incurred by the imposed sparsity. We introduce an $\ell_1$-regularized estimator for the proposed model and derive the corresponding nonasymptotic error bounds. An efficient block coordinate descent algorithm and a consistent model order selection method are developed. The merit of the proposed approach is supported by simulation studies and a real-world macroeconomic data analysis.
A group of cooperative aerial robots can be deployed to efficiently patrol a terrain, in which each robot flies around an assigned area and shares information with the neighbors periodically in order to protect or supervise it. To ensure robustness, previous works on these synchronized systems propose sending a robot to the neighboring area in case it detects a failure. In order to deal with unpredictability and to improve on the efficiency in the deterministic patrolling scheme, this paper proposes random strategies to cover the areas distributed among the agents. First, a theoretical study of the stochastic process is addressed in this paper for two metrics: the \emph{idle time}, the expected time between two consecutive observations of any point of the terrain and the \emph{isolation time}, the expected time that a robot is without communication with any other robot. After that, the random strategies are experimentally compared with the deterministic strategy adding another metric: the \emph{broadcast time}, the expected time elapsed from the moment a robot emits a message until it is received by all the other robots of the team. The simulations show that theoretical results are in good agreement with the simulations and the random strategies outperform the behavior obtained with the deterministic protocol proposed in the literature.
Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also intrigues great interests in the time series community. Among multiple advantages of transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review transformer schemes for time series modeling by highlighting their strengths as well as limitations through a new taxonomy to summarize existing time series transformers in two perspectives. From the perspective of network modifications, we summarize the adaptations of module level and architecture level of the time series transformers. From the perspective of applications, we categorize time series transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance. To the best of our knowledge, this paper is the first work to comprehensively and systematically summarize the recent advances of Transformers for modeling time series data. We hope this survey will ignite further research interests in time series Transformers.
Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivariate time series forecasting is that its variables depend on one another but, upon looking closely, it is fair to say that existing methods fail to fully exploit latent spatial dependencies between pairs of variables. In recent years, meanwhile, graph neural networks (GNNs) have shown high capability in handling relational dependencies. GNNs require well-defined graph structures for information propagation which means they cannot be applied directly for multivariate time series where the dependencies are not known in advance. In this paper, we propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module, into which external knowledge like variable attributes can be easily integrated. A novel mix-hop propagation layer and a dilated inception layer are further proposed to capture the spatial and temporal dependencies within the time series. The graph learning, graph convolution, and temporal convolution modules are jointly learned in an end-to-end framework. Experimental results show that our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets and achieves on-par performance with other approaches on two traffic datasets which provide extra structural information.
Small data challenges have emerged in many learning problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address it, many efforts have been made on training complex models with small data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of small data models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. We will review the criteria of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, which underpin the foundations of recent developments. Many instantiations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. While we focus on the unsupervised and semi-supervised methods, we will also provide a broader review of other emerging topics, from unsupervised and semi-supervised domain adaptation to the fundamental roles of transformation equivariance and invariance in training a wide spectrum of deep networks. It is impossible for us to write an exclusive encyclopedia to include all related works. Instead, we aim at exploring the main ideas, principles and methods in this area to reveal where we are heading on the journey towards addressing the small data challenges in this big data era.
There is a recent large and growing interest in generative adversarial networks (GANs), which offer powerful features for generative modeling, density estimation, and energy function learning. GANs are difficult to train and evaluate but are capable of creating amazingly realistic, though synthetic, image data. Ideas stemming from GANs such as adversarial losses are creating research opportunities for other challenges such as domain adaptation. In this paper, we look at the field of GANs with emphasis on these areas of emerging research. To provide background for adversarial techniques, we survey the field of GANs, looking at the original formulation, training variants, evaluation methods, and extensions. Then we survey recent work on transfer learning, focusing on comparing different adversarial domain adaptation methods. Finally, we take a look forward to identify open research directions for GANs and domain adaptation, including some promising applications such as sensor-based human behavior modeling.