A popular way to estimate the parameters of a hidden Markov model (HMM) is direct numerical maximization (DNM) of the (log-)likelihood function. The advantages of employing the TMB (Kristensen et al., 2016) framework in R for this purpose were illustrated recently Bacri et al. (2022). In this paper, we present extensions of these results in two directions. First, we present a practical way to obtain uncertainty estimates in form of confidence intervals (CIs) for the so-called smoothing probabilities at moderate computational and programming effort via TMB. Our approach thus permits to avoid computer-intensive bootstrap methods. By means of several examples, we illustrate patterns present for the derived CIs. Secondly, we investigate the performance of popular optimizers available in R when estimating HMMs via DNM. Hereby, our focus lies on the potential benefits of employing TMB. Investigated criteria via a number of simulation studies are convergence speed, accuracy, and the impact of (poor) initial values. Our findings suggest that all optimizers considered benefit in terms of speed from using the gradient supplied by TMB. When supplying both gradient and Hessian from TMB, the number of iterations reduces, suggesting a more efficient convergence to the maximum of the log-likelihood. Last, we briefly point out potential advantages of a hybrid approach.
We address the computational efficiency in solving the A-optimal Bayesian design of experiments problems for which the observational model is based on partial differential equations and, consequently, is computationally expensive to evaluate. A-optimality is a widely used and easy-to-interpret criterion for the Bayesian design of experiments. The criterion seeks the optimal experiment design by minimizing the expected conditional variance, also known as the expected posterior variance. This work presents a novel likelihood-free method for seeking the A-optimal design of experiments without sampling or integrating the Bayesian posterior distribution. In our approach, the expected conditional variance is obtained via the variance of the conditional expectation using the law of total variance, while we take advantage of the orthogonal projection property to approximate the conditional expectation. Through an asymptotic error estimation, we show that the intractability of the posterior does not affect the performance of our approach. We use an artificial neural network (ANN) to approximate the nonlinear conditional expectation to implement our method. For dealing with continuous experimental design parameters, we integrate the training process of the ANN into minimizing the expected conditional variance. Specifically, we propose a non-local approximation of the conditional expectation and apply transfer learning to reduce the number of evaluations of the observation model. Through numerical experiments, we demonstrate that our method significantly reduces the number of observational model evaluations compared with common importance sampling-based approaches. This reduction is crucial, considering the computationally expensive nature of these models.
In this paper, practically computable low-order approximations of potentially high-dimensional differential equations driven by geometric rough paths are proposed and investigated. In particular, equations are studied that cover the linear setting, but we allow for a certain type of dissipative nonlinearity in the drift as well. In a first step, a linear subspace is found that contains the solution space of the underlying rough differential equation (RDE). This subspace is associated to covariances of linear Ito-stochastic differential equations which is shown exploiting a Gronwall lemma for matrix differential equations. Orthogonal projections onto the identified subspace lead to a first exact reduced order system. Secondly, a linear map of the RDE solution (quantity of interest) is analyzed in terms of redundant information meaning that state variables are found that do not contribute to the quantity of interest. Once more, a link to Ito-stochastic differential equations is used. Removing such unnecessary information from the RDE provides a further dimension reduction without causing an error. Finally, we discretize a linear parabolic rough partial differential equation in space. The resulting large-order RDE is subsequently tackled with the exact reduction techniques studied in this paper. We illustrate the enormous complexity reduction potential in the corresponding numerical experiments.
In this paper, we evaluate the challenges and best practices associated with the Markov bases approach to sampling from conditional distributions. We provide insights and clarifications after 25 years of the publication of the fundamental theorem for Markov bases by Diaconis and Sturmfels. In addition to a literature review we prove three new results on the complexity of Markov bases in hierarchical models, relaxations of the fibers in log-linear models, and limitations of partial sets of moves in providing an irreducible Markov chain.
The convexification numerical method with the rigorously established global convergence property is constructed for a problem for the Mean Field Games System of the second order. This is the problem of the retrospective analysis of a game of infinitely many rational players. In addition to traditional initial and terminal conditions, one extra terminal condition is assumed to be known. Carleman estimates and a Carleman Weight Function play the key role. Numerical experiments demonstrate a good performance for complicated functions. Various versions of the convexification have been actively used by this research team for a number of years to numerically solve coefficient inverse problems.
This paper considers a joint survival and mixed-effects model to explain the survival time from longitudinal data and high-dimensional covariates. The longitudinal data is modeled using a nonlinear effects model, where the regression function serves as a link function incorporated into a Cox model as a covariate. In that way, the longitudinal data is related to the survival time at a given time. Additionally, the Cox model takes into account the inclusion of high-dimensional covariates. The main objectives of this research are two-fold: first, to identify the relevant covariates that contribute to explaining survival time, and second, to estimate all unknown parameters of the joint model. For that purpose, we consider the maximization of a Lasso penalized likelihood. To tackle the optimization problem, we implement a pre-conditioned stochastic gradient to handle the latent variables of the nonlinear mixed-effects model associated with a proximal operator to manage the non-differentiability of the penalty. We provide relevant simulations that showcase the performance of the proposed variable selection and parameters' estimation method in the joint modeling of a Cox and logistic model.
Individualized treatment rules, cornerstones of precision medicine, inform patient treatment decisions with the goal of optimizing patient outcomes. These rules are generally unknown functions of patients' pre-treatment covariates, meaning they must be estimated from clinical or observational study data. Myriad methods have been developed to learn these rules, and these procedures are demonstrably successful in traditional asymptotic settings with moderate number of covariates. The finite-sample performance of these methods in high-dimensional covariate settings, which are increasingly the norm in modern clinical trials, has not been well characterized, however. We perform a comprehensive comparison of state-of-the-art individualized treatment rule estimators, assessing performance on the basis of the estimators' accuracy, interpretability, and computational efficacy. Sixteen data-generating processes with continuous outcomes and binary treatment assignments are considered, reflecting a diversity of randomized and observational studies. We summarize our findings and provide succinct advice to practitioners needing to estimate individualized treatment rules in high dimensions. All code is made publicly available, facilitating modifications and extensions to our simulation study. A novel pre-treatment covariate filtering procedure is also proposed and is shown to improve estimators' accuracy and interpretability.
Hidden Markov models (HMMs) are flexible tools for clustering dependent data coming from unknown populations, allowing nonparametric modelling of the population densities. Identifiability fails when the data is in fact independent, and we study the frontier between learnable and unlearnable two-state nonparametric HMMs. Interesting new phenomena emerge when the cluster distributions are modelled via density functions (the 'emission' densities) belonging to standard smoothness classes compared to the multinomial setting. Notably, in contrast to the multinomial setting previously considered, the identification of a direction separating the two emission densities becomes a critical, and challenging, issue. Surprisingly, it is possible to "borrow strength" from estimators of the smoother density to improve estimation of the other. We conduct precise analysis of minimax rates, showing a transition depending on the relative smoothnesses of the emission densities.
We present a novel numerical method for solving the anisotropic diffusion equation in toroidally confined magnetic fields which is efficient, accurate and provably stable. The continuous problem is written in terms of a derivative operator for the perpendicular transport and a linear operator, obtained through field line tracing, for the parallel transport. We derive energy estimates of the solution of the continuous initial boundary value problem. A discrete formulation is presented using operator splitting in time with the summation by parts finite difference approximation of spatial derivatives for the perpendicular diffusion operator. Weak penalty procedures are derived for implementing both boundary conditions and parallel diffusion operator obtained by field line tracing. We prove that the fully-discrete approximation is unconditionally stable and asymptotic preserving. Discrete energy estimates are shown to match the continuous energy estimate given the correct choice of penalty parameters. Convergence tests are shown for the perpendicular operator by itself, and the ``NIMROD benchmark" problem is used as a manufactured solution to show the full scheme converges even in the case where the perpendicular diffusion is zero. Finally, we present a magnetic field with chaotic regions and islands and show the contours of the anisotropic diffusion equation reproduce key features in the field.
The analysis of multiple time-to-event outcomes in a randomised controlled clinical trial can be accomplished with exisiting methods. However, depending on the characteristics of the disease under investigation and the circumstances in which the study is planned, it may be of interest to conduct interim analyses and adapt the study design if necessary. Due to the expected dependency of the endpoints, the full available information on the involved endpoints may not be used for this purpose. We suggest a solution to this problem by embedding the endpoints in a multi-state model. If this model is Markovian, it is possible to take the disease history of the patients into account and allow for data-dependent design adaptiations. To this end, we introduce a flexible test procedure for a variety of applications, but are particularly concerned with the simultaneous consideration of progression-free survival (PFS) and overall survival (OS). This setting is of key interest in oncological trials. We conduct simulation studies to determine the properties for small sample sizes and demonstrate an application based on data from the NB2004-HR study.
We prove the first polynomial separation between randomized and deterministic time-space tradeoffs of multi-output functions. In particular, we present a total function that on the input of $n$ elements in $[n]$, outputs $O(n)$ elements, such that: (1) There exists a randomized oblivious algorithm with space $O(\log n)$, time $O(n\log n)$ and one-way access to randomness, that computes the function with probability $1-O(1/n)$; (2) Any deterministic oblivious branching program with space $S$ and time $T$ that computes the function must satisfy $T^2S\geq\Omega(n^{2.5}/\log n)$. This implies that logspace randomized algorithms for multi-output functions cannot be black-box derandomized without an $\widetilde{\Omega}(n^{1/4})$ overhead in time. Since previously all the polynomial time-space tradeoffs of multi-output functions are proved via the Borodin-Cook method, which is a probabilistic method that inherently gives the same lower bound for randomized and deterministic branching programs, our lower bound proof is intrinsically different from previous works. We also examine other natural candidates for proving such separations, and show that any polynomial separation for these problems would resolve the long-standing open problem of proving $n^{1+\Omega(1)}$ time lower bound for decision problems with $\mathrm{polylog}(n)$ space.