Identifying the instances of jumps in a discrete time series sample of a jump diffusion model is a challenging task. We have developed a novel statistical technique for jump detection and volatility estimation in a return time series data using a threshold method. Since we derive the threshold and the volatility estimator simultaneously by solving an implicit equation, we obtain unprecedented accuracy across a wide range of parameter values. Using this method, the increments attributed to jumps have been removed from a large collection of historical data of Indian sectorial indices. Subsequently, we test the presence of regime switching dynamics in the volatility coefficient using a new discriminating statistic. The statistic is shown to be sensitive to the transition kernel of the regime switching model. We perform the testing using bootstrap method and find a clear indication of presence of multiple regimes of volatility in the data.
Switching dynamical systems provide a powerful, interpretable modeling framework for inference in time-series data in, e.g., the natural sciences or engineering applications. Since many areas, such as biology or discrete-event systems, are naturally described in continuous time, we present a model based on an Markov jump process modulating a subordinated diffusion process. We provide the exact evolution equations for the prior and posterior marginal densities, the direct solutions of which are however computationally intractable. Therefore, we develop a new continuous-time variational inference algorithm, combining a Gaussian process approximation on the diffusion level with posterior inference for Markov jump processes. By minimizing the path-wise Kullback-Leibler divergence we obtain (i) Bayesian latent state estimates for arbitrary points on the real axis and (ii) point estimates of unknown system parameters, utilizing variational expectation maximization. We extensively evaluate our algorithm under the model assumption and for real-world examples.
In this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression model with spatial constraints, to simultaneously handle the spatial nonstationarity, local homogeneity, and outlier contaminations. Compared with existing spatial regression models, our proposed model assumes the existence a few distinct regression models that are estimated based on observations that exhibit similar response-predictor relationships. As such, the proposed model not only accounts for nonstationarity in the spatial trend, but also clusters observations into a few distinct and homogenous groups. This provides an advantage on interpretation with a few stationary sub-processes identified that capture the predominant relationships between response and predictor variables. Moreover, the proposed method incorporates robust procedures to handle contaminations from both regression outliers and spatial outliers. By doing so, we robustly segment the spatial domain into distinct local regions with similar regression coefficients, and sporadic locations that are purely outliers. Rigorous statistical hypothesis testing procedure has been designed to test the significance of such segmentation. Experimental results on many synthetic and real-world datasets demonstrate the robustness, accuracy, and effectiveness of our proposed method, compared with other robust finite mixture regression, spatial regression and spatial segmentation methods.
A new mixture vector autoressive model based on Gaussian and Student's $t$ distributions is introduced. The G-StMVAR model incorporates conditionally homoskedastic linear Gaussian vector autoregressions and conditionally heteroskedastic linear Student's $t$ vector autoregressions as its mixture components, and mixing weights that, for a $p$th order model, depend on the full distribution of the preceding $p$ observations. Also a structural version of the model with time-varying B-matrix and statistically identified shocks is proposed. We derive the stationary distribution of $p+1$ consecutive observations and show that the process is ergodic. It is also shown that the maximum likelihood estimator is strongly consistent, and thereby has the conventional limiting distribution under conventional high-level conditions.
This article describes an R package bqror that estimates Bayesian quantile regression for ordinal models introduced in \citet{Rahman-2016}. The paper classifies ordinal models into two types and offers two computationally efficient, yet simple, MCMC algorithms for estimating ordinal quantile regression. The generic ordinal model with more than 3 outcomes (labeled $OR_{I}$ model) is estimated by a combination of Gibbs sampling and Metropolis-Hastings algorithm. Whereas an ordinal model with exactly 3 outcomes (labeled $OR_{II}$ model) is estimated using Gibbs sampling only. In line with the Bayesian literature, we suggest using marginal likelihood for comparing alternative quantile regression models and explain how to calculate the same. The models and their estimation procedures are illustrated via multiple simulation studies and implemented in the two applications presented in \citet{Rahman-2016}. The article also describes several other functions contained within the bqror package, which are necessary for estimation, inference, and assessing model fit.
Function-on-function linear regression is important for understanding the relationship between the response and the predictor that are both functions. In this article, we propose a reproducing kernel Hilbert space approach to function-on-function linear regressionvia the penalised least square, regularized by the thin-plate spline smoothness penalty. The minimax optimal convergence rate of our estimator of the coefficient function is studied. We derive the Bahadur representation, which allows us to propose statistical inference methods using bootstrap and the convergence of Banach-valued random variables in the sup-norm. We illustrate our method and verify our theoretical results via simulated data experiments and a real data example.
When causal quantities cannot be point identified, researchers often pursue partial identification to quantify the range of possible values. However, the peculiarities of applied research conditions can make this analytically intractable. We present a general and automated approach to causal inference in discrete settings. We show causal questions with discrete data reduce to polynomial programming problems, and we present an algorithm to automatically bound causal effects using efficient dual relaxation and spatial branch-and-bound techniques. The user declares an estimand, states assumptions, and provides data (however incomplete or mismeasured). The algorithm then searches over admissible data-generating processes and outputs the most precise possible range consistent with available information -- i.e., sharp bounds -- including a point-identified solution if one exists. Because this search can be computationally intensive, our procedure reports and continually refines non-sharp ranges that are guaranteed to contain the truth at all times, even when the algorithm is not run to completion. Moreover, it offers an additional guarantee we refer to as $\epsilon$-sharpness, characterizing the worst-case looseness of the incomplete bounds. Analytically validated simulations show the algorithm accommodates classic obstacles, including confounding, selection, measurement error, noncompliance, and nonresponse.
We consider the problem of estimating volatility based on high-frequency data when the observed price process is a continuous It\^o semimartingale contaminated by microstructure noise. Assuming that the noise process is compatible across different sampling frequencies, we argue that it typically has a similar local behavior to fractional Brownian motion. For the resulting class of processes, which we call mixed semimartingales, we derive consistent estimators and asymptotic confidence intervals for the roughness parameter of the noise and the integrated price and noise volatilities, in all cases where these quantities are identifiable. Our model can explain key features of recent stock price data, most notably divergence rates in volatility signature plots that vary considerably over time and between assets.
Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d_n$ and $n$ both increase to infinity together at some prescribed relative rate. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d_n/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference -- developing methods whose validity does not depend on any assumption on $d_n$. We introduce a new, generic approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonals. We exemplify our technique for a handful of classical problems including one-sample mean and covariance testing. Our tests are shown to have minimax rate-optimal power against appropriate local alternatives, and without explicitly targeting the high-dimensional setting their power is optimal up to a $\sqrt 2$ factor. A hidden advantage is that our proofs are simple and transparent. We end by describing several fruitful open directions.
Physically-inspired latent force models offer an interpretable alternative to purely data driven tools for inference in dynamical systems. They carry the structure of differential equations and the flexibility of Gaussian processes, yielding interpretable parameters and dynamics-imposed latent functions. However, the existing inference techniques associated with these models rely on the exact computation of posterior kernel terms which are seldom available in analytical form. Most applications relevant to practitioners, such as Hill equations or diffusion equations, are hence intractable. In this paper, we overcome these computational problems by proposing a variational solution to a general class of non-linear and parabolic partial differential equation latent force models. Further, we show that a neural operator approach can scale our model to thousands of instances, enabling fast, distributed computation. We demonstrate the efficacy and flexibility of our framework by achieving competitive performance on several tasks where the kernels are of varying degrees of tractability.
Topic models have been widely explored as probabilistic generative models of documents. Traditional inference methods have sought closed-form derivations for updating the models, however as the expressiveness of these models grows, so does the difficulty of performing fast and accurate inference over their parameters. This paper presents alternative neural approaches to topic modelling by providing parameterisable distributions over topics which permit training by backpropagation in the framework of neural variational inference. In addition, with the help of a stick-breaking construction, we propose a recurrent network that is able to discover a notionally unbounded number of topics, analogous to Bayesian non-parametric topic models. Experimental results on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the effectiveness and efficiency of these neural topic models.