There is a growing interest in the implementation of platform trials, which provide the flexibility to incorporate new treatment arms during the trial and the ability to halt treatments early based on lack of benefit or observed superiority. In such trials, it can be important to ensure that error rates are controlled. This paper introduces a multi-stage design that enables the addition of new treatment arms, at any point, in a pre-planned manner within a platform trial, while still maintaining control over the family-wise error rate. This paper focuses on finding the required sample size to achieve a desired level of statistical power when treatments are continued to be tested even after a superior treatment has already been found. This may be of interest if there are other sponsors treatments which are also superior to the current control or multiple doses being tested. The calculations to determine the expected sample size is given. A motivating trial is presented in which the sample size of different configurations is studied. Additionally the approach is compared to running multiple separate trials and it is shown that in many scenarios if family wise error rate control is needed there may not be benefit in using a platform trial when comparing the sample size of the trial.
Signal detection is one of the main challenges of data science. As it often happens in data analysis, the signal in the data may be corrupted by noise. There is a wide range of techniques aimed at extracting the relevant degrees of freedom from data. However, some problems remain difficult. It is notably the case of signal detection in almost continuous spectra when the signal-to-noise ratio is small enough. This paper follows a recent bibliographic line which tackles this issue with field-theoretical methods. Previous analysis focused on equilibrium Boltzmann distributions for some effective field representing the degrees of freedom of data. It was possible to establish a relation between signal detection and $\mathbb{Z}_2$-symmetry breaking. In this paper, we consider a stochastic field framework inspiring by the so-called "Model A", and show that the ability to reach or not an equilibrium state is correlated with the shape of the dataset. In particular, studying the renormalization group of the model, we show that the weak ergodicity prescription is always broken for signals small enough, when the data distribution is close to the Marchenko-Pastur (MP) law. This, in particular, enables the definition of a detection threshold in the regime where the signal-to-noise ratio is small enough.
A data-driven modeling approach is presented to quantify the influence of morphology on effective properties in nanostructured sodium vanadium phosphate $\mathrm{Na}_3\mathrm{V}_2(\mathrm{PO}_4)_3$/ carbon composites (NVP/C), which are used as cathode material in sodium-ion batteries. This approach is based on the combination of advanced imaging techniques, experimental nanostructure characterization and stochastic modeling of the 3D nanostructure consisting of NVP, carbon and pores. By 3D imaging and subsequent post-processing involving image segmentation, the spatial distribution of NVP is resolved in 3D, and the spatial distribution of carbon and pores is resolved in 2D. Based on this information, a parametric stochastic model, specifically a Pluri-Gaussian model, is calibrated to the 3D morphology of the nanostructured NVP/C particles. Model validation is performed by comparing the nanostructure of simulated NVP/C composites with image data in terms of morphological descriptors which have not been used for model calibration. Finally, the stochastic model is used for predictive simulation to quantify the effect of varying the amount of carbon while keeping the amount of NVP constant. The presented methodology opens new possibilities for a ressource-efficient optimization of the morphology of NVP/C particles by modeling and simulation.
Bayesian cross-validation (CV) is a popular method for predictive model assessment that is simple to implement and broadly applicable. A wide range of CV schemes is available for time series applications, including generic leave-one-out (LOO) and K-fold methods, as well as specialized approaches intended to deal with serial dependence such as leave-future-out (LFO), h-block, and hv-block. Existing large-sample results show that both specialized and generic methods are applicable to models of serially-dependent data. However, large sample consistency results overlook the impact of sampling variability on accuracy in finite samples. Moreover, the accuracy of a CV scheme depends on many aspects of the procedure. We show that poor design choices can lead to elevated rates of adverse selection. In this paper, we consider the problem of identifying the regression component of an important class of models of data with serial dependence, autoregressions of order p with q exogenous regressors (ARX(p,q)), under the logarithmic scoring rule. We show that when serial dependence is present, scores computed using the joint (multivariate) density have lower variance and better model selection accuracy than the popular pointwise estimator. In addition, we present a detailed case study of the special case of ARX models with fixed autoregressive structure and variance. For this class, we derive the finite-sample distribution of the CV estimators and the model selection statistic. We conclude with recommendations for practitioners.
Friction-induced vibration (FIV) is very common in engineering areas. Analysing the dynamic behaviour of systems containing a multiple-contact point frictional interface is an important topic. However, accurately simulating nonsmooth/discontinuous dynamic behaviour due to friction is challenging. This paper presents a new physics-informed neural network approach for solving nonsmooth friction-induced vibration or friction-involved vibration problems. Compared with schemes of the conventional time-stepping methodology, in this new computational framework, the theoretical formulations of nonsmooth multibody dynamics are transformed and embedded in the training process of the neural network. Major findings include that the new framework not only can perform accurate simulation of nonsmooth dynamic behaviour, but also eliminate the need for extremely small time steps typically associated with the conventional time-stepping methodology for multibody systems, thus saving much computation work while maintaining high accuracy. Specifically, four kinds of high-accuracy PINN-based methods are proposed: (1) single PINN; (2) dual PINN; (3) advanced single PINN; (4) advanced dual PINN. Two typical dynamics problems with nonsmooth contact are tested: one is a 1-dimensional contact problem with stick-slip, and the other is a 2-dimensional contact problem considering separation-reattachment and stick-slip oscillation. Both single and dual PINN methods show their advantages in dealing with the 1-dimensional stick-slip problem, which outperforms conventional methods across friction models that are difficult to simulate by the conventional time-stepping method. For the 2-dimensional problem, the capability of the advanced single and advanced dual PINN on accuracy improvement is shown, and they provide good results even in the cases when conventional methods fail.
This work is concerned with the uniform accuracy of implicit-explicit backward differentiation formulas for general linear hyperbolic relaxation systems satisfying the structural stability condition proposed previously by the third author. We prove the uniform stability and accuracy of a class of IMEX-BDF schemes discretized spatially by a Fourier spectral method. The result reveals that the accuracy of the fully discretized schemes is independent of the relaxation time in all regimes. It is verified by numerical experiments on several applications to traffic flows, rarefied gas dynamics and kinetic theory.
In the analyses of cluster-randomized trials, mixed-model analysis of covariance (ANCOVA) is a standard approach for covariate adjustment and handling within-cluster correlations. However, when the normality, linearity, or the random-intercept assumption is violated, the validity and efficiency of the mixed-model ANCOVA estimators for estimating the average treatment effect remain unclear. Under the potential outcomes framework, we prove that the mixed-model ANCOVA estimators for the average treatment effect are consistent and asymptotically normal under arbitrary misspecification of its working model. If the probability of receiving treatment is 0.5 for each cluster, we further show that the model-based variance estimator under mixed-model ANCOVA1 (ANCOVA without treatment-covariate interactions) remains consistent, clarifying that the confidence interval given by standard software is asymptotically valid even under model misspecification. Beyond robustness, we discuss several insights on precision among classical methods for analyzing cluster-randomized trials, including the mixed-model ANCOVA, individual-level ANCOVA, and cluster-level ANCOVA estimators. These insights may inform the choice of methods in practice. Our analytical results and insights are illustrated via simulation studies and analyses of three cluster-randomized trials.
In clinical trials of longitudinal continuous outcomes, reference based imputation (RBI) has commonly been applied to handle missing outcome data in settings where the estimand incorporates the effects of intercurrent events, e.g. treatment discontinuation. RBI was originally developed in the multiple imputation framework, however recently conditional mean imputation (CMI) combined with the jackknife estimator of the standard error was proposed as a way to obtain deterministic treatment effect estimates and correct frequentist inference. For both multiple and CMI, a mixed model for repeated measures (MMRM) is often used for the imputation model, but this can be computationally intensive to fit to multiple data sets (e.g. the jackknife samples) and lead to convergence issues with complex MMRM models with many parameters. Therefore, a step-wise approach based on sequential linear regression (SLR) of the outcomes at each visit was developed for the imputation model in the multiple imputation framework, but similar developments in the CMI framework are lacking. In this article, we fill this gap in the literature by proposing a SLR approach to implement RBI in the CMI framework, and justify its validity using theoretical results and simulations. We also illustrate our proposal on a real data application.
Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. This phenomenon occurs because each sample is used in both the training and testing procedures during CV and as a result, the CV estimates of the errors become correlated. Without accounting for this correlation, the estimate of the variance is smaller than it should be. One way to mitigate this issue is by estimating the mean squared error of the prediction error instead using nested CV. This approach has been shown to achieve superior coverage compared to intervals derived from standard CV. In this work, we generalize the nested CV idea to the Cox proportional hazards model and explore various choices of test error for this setting.
We present a nonlinear interpolation technique for parametric fields that exploits optimal transportation of coherent structures of the solution to achieve accurate performance. The approach generalizes the nonlinear interpolation procedure introduced in [Iollo, Taddei, J. Comput. Phys., 2022] to multi-dimensional parameter domains and to datasets of several snapshots. Given a library of high-fidelity simulations, we rely on a scalar testing function and on a point set registration method to identify coherent structures of the solution field in the form of sorted point clouds. Given a new parameter value, we exploit a regression method to predict the new point cloud; then, we resort to a boundary-aware registration technique to define bijective mappings that deform the new point cloud into the point clouds of the neighboring elements of the dataset, while preserving the boundary of the domain; finally, we define the estimate as a weighted combination of modes obtained by composing the neighboring snapshots with the previously-built mappings. We present several numerical examples for compressible and incompressible, viscous and inviscid flows to demonstrate the accuracy of the method. Furthermore, we employ the nonlinear interpolation procedure to augment the dataset of simulations for linear-subspace projection-based model reduction: our data augmentation procedure is designed to reduce offline costs -- which are dominated by snapshot generation -- of model reduction techniques for nonlinear advection-dominated problems.
The emergence of complex structures in the systems governed by a simple set of rules is among the most fascinating aspects of Nature. The particularly powerful and versatile model suitable for investigating this phenomenon is provided by cellular automata, with the Game of Life being one of the most prominent examples. However, this simplified model can be too limiting in providing a tool for modelling real systems. To address this, we introduce and study an extended version of the Game of Life, with the dynamical process governing the rule selection at each step. We show that the introduced modification significantly alters the behaviour of the game. We also demonstrate that the choice of the synchronization policy can be used to control the trade-off between the stability and the growth in the system.