The estimation of absorption time distributions of Markov jump processes is an important task in various branches of statistics and applied probability. While the time-homogeneous case is classic, the time-inhomogeneous case has recently received increased attention due to its added flexibility and advances in computational power. However, commuting sub-intensity matrices are assumed, which in various cases limits the parsimonious properties of the resulting representation. This paper develops the theory required to solve the general case through maximum likelihood estimation, and in particular, using the expectation-maximization algorithm. A reduction to a piecewise constant intensity matrix function is proposed in order to provide succinct representations, where a parametric linear model binds the intensities together. Practical aspects are discussed and illustrated through the estimation of notoriously demanding theoretical distributions and real data, from the perspective of matrix analytic methods.
We study the problem of unbiased estimation of expectations with respect to (w.r.t.) $\pi$ a given, general probability measure on $(\mathbb{R}^d,\mathcal{B}(\mathbb{R}^d))$ that is absolutely continuous with respect to a standard Gaussian measure. We focus on simulation associated to a particular class of diffusion processes, sometimes termed the Schr\"odinger-F\"ollmer Sampler, which is a simulation technique that approximates the law of a particular diffusion bridge process $\{X_t\}_{t\in [0,1]}$ on $\mathbb{R}^d$, $d\in \mathbb{N}_0$. This latter process is constructed such that, starting at $X_0=0$, one has $X_1\sim \pi$. Typically, the drift of the diffusion is intractable and, even if it were not, exact sampling of the associated diffusion is not possible. As a result, \cite{sf_orig,jiao} consider a stochastic Euler-Maruyama scheme that allows the development of biased estimators for expectations w.r.t.~$\pi$. We show that for this methodology to achieve a mean square error of $\mathcal{O}(\epsilon^2)$, for arbitrary $\epsilon>0$, the associated cost is $\mathcal{O}(\epsilon^{-5})$. We then introduce an alternative approach that provides unbiased estimates of expectations w.r.t.~$\pi$, that is, it does not suffer from the time discretization bias or the bias related with the approximation of the drift function. We prove that to achieve a mean square error of $\mathcal{O}(\epsilon^2)$, the associated cost is, with high probability, $\mathcal{O}(\epsilon^{-2}|\log(\epsilon)|^{2+\delta})$, for any $\delta>0$. We implement our method on several examples including Bayesian inverse problems.
Recently, high dimensional vector auto-regressive models (VAR), have attracted a lot of interest, due to novel applications in the health, engineering and social sciences. The presence of temporal dependence poses additional challenges to the theory of penalized estimation techniques widely used in the analysis of their iid counterparts. However, recent work (e.g., [Basu and Michailidis, 2015, Kock and Callot, 2015]) has established optimal consistency of $\ell_1$-LASSO regularized estimates applied to models involving high dimensional stable, Gaussian processes. The only price paid for temporal dependence is an extra multiplicative factor that equals 1 for independent and identically distributed (iid) data. Further, [Wong et al., 2020] extended these results to heavy tailed VARs that exhibit "$\beta$-mixing" dependence, but the rates rates are sub-optimal, while the extra factor is intractable. This paper improves these results in two important directions: (i) We establish optimal consistency rates and corresponding finite sample bounds for the underlying model parameters that match those for iid data, modulo a price for temporal dependence, that is easy to interpret and equals 1 for iid data. (ii) We incorporate more general penalties in estimation (which are not decomposable unlike the $\ell_1$ norm) to induce general sparsity patterns. The key technical tool employed is a novel, easy-to-use concentration bound for heavy tailed linear processes, that do not rely on "mixing" notions and give tighter bounds.
In data science, vector autoregression (VAR) models are popular in modeling multivariate time series in the environmental sciences and other applications. However, these models are computationally complex with the number of parameters scaling quadratically with the number of time series. In this work, we propose a so-called neighborhood vector autoregression (NVAR) model to efficiently analyze large-dimensional multivariate time series. We assume that the time series have underlying neighborhood relationships, e.g., spatial or network, among them based on the inherent setting of the problem. When this neighborhood information is available or can be summarized using a distance matrix, we demonstrate that our proposed NVAR method provides a computationally efficient and theoretically sound estimation of model parameters. The performance of the proposed method is compared with other existing approaches in both simulation studies and a real application of stream nitrogen study.
Binary stars undergo a variety of interactions and evolutionary phases, critical for predicting and explaining observed properties. Binary population synthesis with full stellar-structure and evolution simulations are computationally expensive requiring a large number of mass-transfer sequences. The recently developed binary population synthesis code POSYDON incorporates grids of MESA binary star simulations which are then interpolated to model large-scale populations of massive binaries. The traditional method of computing a high-density rectilinear grid of simulations is not scalable for higher-dimension grids, accounting for a range of metallicities, rotation, and eccentricity. We present a new active learning algorithm, psy-cris, which uses machine learning in the data-gathering process to adaptively and iteratively select targeted simulations to run, resulting in a custom, high-performance training set. We test psy-cris on a toy problem and find the resulting training sets require fewer simulations for accurate classification and regression than either regular or randomly sampled grids. We further apply psy-cris to the target problem of building a dynamic grid of MESA simulations, and we demonstrate that, even without fine tuning, a simulation set of only $\sim 1/4$ the size of a rectilinear grid is sufficient to achieve the same classification accuracy. We anticipate further gains when algorithmic parameters are optimized for the targeted application. We find that optimizing for classification only may lead to performance losses in regression, and vice versa. Lowering the computational cost of producing grids will enable future versions of POSYDON to cover more input parameters while preserving interpolation accuracies.
This work considers Gaussian process interpolation with a periodized version of the Mat{\'e}rn covariance function (Stein, 1999, Section 6.7) with Fourier coefficients $\phi$($\alpha$^2 + j^2)^(--$\nu$--1/2). Convergence rates are studied for the joint maximum likelihood estimation of $\nu$ and $\phi$ when the data is sampled according to the model. The mean integrated squared error is also analyzed with fixed and estimated parameters, showing that maximum likelihood estimation yields asymptotically the same error as if the ground truth was known. Finally, the case where the observed function is a ''deterministic'' element of a continuous Sobolev space is also considered, suggesting that bounding assumptions on some parameters can lead to different estimates.
Interval-censored multi-state data arise in many studies of chronic diseases, where the health status of a subject can be characterized by a finite number of disease states and the transition between any two states is only known to occur over a broad time interval. We formulate the effects of potentially time-dependent covariates on multi-state processes through semiparametric proportional intensity models with random effects. We adopt nonparametric maximum likelihood estimation (NPMLE) under general interval censoring and develop a stable expectation-maximization (EM) algorithm. We show that the resulting parameter estimators are consistent and that the finite-dimensional components are asymptotically normal with a covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we demonstrate through extensive simulation studies that the proposed numerical and inferential procedures perform well in realistic settings. Finally, we provide an application to a major epidemiologic cohort study.
Traditional nonparametric estimation methods often lead to a slow convergence rate in large dimensions and require unrealistically enormous sizes of datasets for reliable conclusions. We develop an approach based on mixed gradients, either observed or estimated, to effectively estimate the function at near-parametric convergence rates. The novel approach and computational algorithm could lead to methods useful to practitioners in many areas of science and engineering. Our theoretical results reveal a behavior universal to this class of nonparametric estimation problems. We explore a general setting involving tensor product spaces and build upon the smoothing spline analysis of variance (SS-ANOVA) framework. For $d$-dimensional models under full interaction, the optimal rates with gradient information on $p$ covariates are identical to those for the $(d-p)$-interaction models without gradients and, therefore, the models are immune to the "curse of interaction". For additive models, the optimal rates using gradient information are root-$n$, thus achieving the "parametric rate". We demonstrate aspects of the theoretical results through synthetic and real data applications.
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.
Markov decision processes (MDPs) are formal models commonly used in sequential decision-making. MDPs capture the stochasticity that may arise, for instance, from imprecise actuators via probabilities in the transition function. However, in data-driven applications, deriving precise probabilities from (limited) data introduces statistical errors that may lead to unexpected or undesirable outcomes. Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions, accounting for such limited data. Tools from the formal verification community efficiently compute robust policies that provably adhere to formal specifications, like safety constraints, under the worst-case instance in the uncertainty set. We continuously learn the transition probabilities of an MDP in a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies. In particular, our method (1) approximates probabilities as intervals, (2) adapts to new data that may be inconsistent with an intermediate model, and (3) may be stopped at any time to compute a robust policy on the uMDP that faithfully captures the data so far. We show the effectiveness of our approach and compare it to robust policies computed on uMDPs learned by the UCRL2 reinforcement learning algorithm in an experimental evaluation on several benchmarks.
To estimate causal effects, analysts performing observational studies in health settings utilize several strategies to mitigate bias due to confounding by indication. There are two broad classes of approaches for these purposes: use of confounders and instrumental variables (IVs). Because such approaches are largely characterized by untestable assumptions, analysts must operate under an indefinite paradigm that these methods will work imperfectly. In this tutorial, we formalize a set of general principles and heuristics for estimating causal effects in the two approaches when the assumptions are potentially violated. This crucially requires reframing the process of observational studies as hypothesizing potential scenarios where the estimates from one approach are less inconsistent than the other. While most of our discussion of methodology centers around the linear setting, we touch upon complexities in non-linear settings and flexible procedures such as target minimum loss-based estimation (TMLE) and double machine learning (DML). To demonstrate the application of our principles, we investigate the use of donepezil off-label for mild cognitive impairment (MCI). We compare and contrast results from confounder and IV methods, traditional and flexible, within our analysis and to a similar observational study and clinical trial.