This article introduces the class of periodic trawl processes, which are continuous-time, infinitely divisible, stationary stochastic processes, that allow for periodicity and flexible forms of their serial correlation, including both short- and long-memory settings. We derive some of the key probabilistic properties of periodic trawl processes and present relevant examples. Moreover, we show how such processes can be simulated and establish the asymptotic theory for their sample mean and sample autocovariances. Consequently, we prove the asymptotic normality of a (generalised) method-of-moments estimator for the model parameters. We illustrate the new model and estimation methodology in an application to electricity prices.
The problem of recovering partial derivatives of high orders of bivariate functions with finite smoothness is studied. Based on the truncation method, a numerical differentiation algorithm was constructed, which is optimal by the order, both in the sense of accuracy and in the sense of the amount of Galerkin information involved. Numerical demonstrations are provided to illustrate that the proposed method can be implemented successfully.
A fundamental aspect of statistics is the integration of data from different sources. Classically, Fisher and others were focused on how to integrate homogeneous (or only mildly heterogeneous) sets of data. More recently, as data is becoming more accessible, the question of if data sets from different sources should be integrated is becoming more relevant. The current literature treats this as a question with only two answers: integrate or don't. Here we take a different approach, motivated by information-sharing principles coming from the shrinkage estimation literature. In particular, we deviate from the do/don't perspective and propose a dial parameter that controls the extent to which two data sources are integrated. How far this dial parameter should be turned is shown to depend, for example, on the informativeness of the different data sources as measured by Fisher information. In the context of generalized linear models, this more nuanced data integration framework leads to relatively simple parameter estimates and valid tests/confidence intervals. Moreover, we demonstrate both theoretically and empirically that setting the dial parameter according to our recommendation leads to more efficient estimation compared to other binary data integration schemes.
With the increasing availability of large scale datasets, computational power and tools like automatic differentiation and expressive neural network architectures, sequential data are now often treated in a data-driven way, with a dynamical model trained from the observation data. While neural networks are often seen as uninterpretable black-box architectures, they can still benefit from physical priors on the data and from mathematical knowledge. In this paper, we use a neural network architecture which leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, enabling a number of appealing features. We introduce methods that enable to train such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques, with applications to e.g. time series interpolation and forecasting.
Cram\'er's moderate deviations give a quantitative estimate for the relative error of the normal approximation and provide theoretical justifications for many estimator used in statistics. In this paper, we establish self-normalized Cram\'{e}r type moderate deviations for martingales under some mile conditions. The result extends an earlier work of Fan, Grama, Liu and Shao [Bernoulli, 2019]. Moreover, applications of our result to Student's statistic, stationary martingale difference sequences and branching processes in a random environment are also discussed. In particular, we establish Cram\'{e}r type moderate deviations for Student's $t$-statistic for branching processes in a random environment.
Besov priors are nonparametric priors that can model spatially inhomogeneous functions. They are routinely used in inverse problems and imaging, where they exhibit attractive sparsity-promoting and edge-preserving features. A recent line of work has initiated the study of their asymptotic frequentist convergence properties. In the present paper, we consider the theoretical recovery performance of the posterior distributions associated to Besov-Laplace priors in the density estimation model, under the assumption that the observations are generated by a possibly spatially inhomogeneous true density belonging to a Besov space. We improve on existing results and show that carefully tuned Besov-Laplace priors attain optimal posterior contraction rates. Furthermore, we show that hierarchical procedures involving a hyper-prior on the regularity parameter lead to adaptation to any smoothness level.
We introduce a general IFS Bayesian method for getting posterior probabilities from prior probabilities, and also a generalized Bayes' rule, which will contemplate a dynamical, as well as a non-dynamical setting. Given a loss function ${l}$, we detail the prior and posterior items, their consequences and exhibit several examples. Taking $\Theta$ as the set of parameters and $Y$ as the set of data (which usually provides {random samples}), a general IFS is a measurable map $\tau:\Theta\times Y \to Y$, which can be interpreted as a family of maps $\tau_\theta:Y\to Y,\,\theta\in\Theta$. The main inspiration for the results we will get here comes from a paper by Zellner (with no dynamics), where Bayes' rule is related to a principle of minimization of {information.} We will show that our IFS Bayesian method which produces posterior probabilities (which are associated to holonomic probabilities) is related to the optimal solution of a variational principle, somehow corresponding to the pressure in Thermodynamic Formalism, and also to the principle of minimization of information in Information Theory. Among other results, we present the prior dynamical elements and we derive the corresponding posterior elements via the Ruelle operator of Thermodynamic Formalism; getting in this way a form of dynamical Bayes' rule.
Over the past decades, cognitive neuroscientists and behavioral economists have recognized the value of describing the process of decision making in detail and modeling the emergence of decisions over time. For example, the time it takes to decide can reveal more about an agent's true hidden preferences than only the decision itself. Similarly, data that track the ongoing decision process such as eye movements or neural recordings contain critical information that can be exploited, even if no decision is made. Here, we argue that artificial intelligence (AI) research would benefit from a stronger focus on insights about how decisions emerge over time and incorporate related process data to improve AI predictions in general and human-AI interactions in particular. First, we introduce a highly established computational framework that assumes decisions to emerge from the noisy accumulation of evidence, and we present related empirical work in psychology, neuroscience, and economics. Next, we discuss to what extent current approaches in multi-agent AI do or do not incorporate process data and models of decision making. Finally, we outline how a more principled inclusion of the evidence-accumulation framework into the training and use of AI can help to improve human-AI interactions in the future.
This paper introduces a prognostic method called FLASH that addresses the problem of joint modelling of longitudinal data and censored durations when a large number of both longitudinal and time-independent features are available. In the literature, standard joint models are either of the shared random effect or joint latent class type. Combining ideas from both worlds and using appropriate regularisation techniques, we define a new model with the ability to automatically identify significant prognostic longitudinal features in a high-dimensional context, which is of increasing importance in many areas such as personalised medicine or churn prediction. We develop an estimation methodology based on the EM algorithm and provide an efficient implementation. The statistical performance of the method is demonstrated both in extensive Monte Carlo simulation studies and on publicly available real-world datasets. Our method significantly outperforms the state-of-the-art joint models in predicting the latent class membership probability in terms of the C-index in a so-called ``real-time'' prediction setting, with a computational speed that is orders of magnitude faster than competing methods. In addition, our model automatically identifies significant features that are relevant from a practical perspective, making it interpretable.
In this paper we establish limit theorems for power variations of stochastic processes controlled by fractional Brownian motions with Hurst parameter $H\leq 1/2$. We show that the power variations of such processes can be decomposed into the mix of several weighted random sums plus some remainder terms, and the convergences of power variations are dominated by different combinations of those weighted sums depending on whether $H<1/4$, $H=1/4$, or $H>1/4$. We show that when $H\geq 1/4$ the centered power variation converges stably at the rate $n^{-1/2}$, and when $H<1/4$ it converges in probability at the rate $n^{-2H}$. We determine the limit of the mixed weighted sum based on a rough path approach developed in \cite{LT20}.
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.