This paper is concerned with sample size determination methodology for prediction models. We propose combining the individual calculations via a learning-type curve. We suggest two distinct ways of doing so, a deterministic skeleton of a learning curve and a Gaussian process centred upon its deterministic counterpart. We employ several learning algorithms for modelling the primary endpoint and distinct measures for trial efficacy. We find that the performance may vary with the sample size, but borrowing information across sample size universally improves the performance of such calculations. The Gaussian process-based learning curve appears more robust and statistically efficient, while computational efficiency is comparable. We suggest that anchoring against historical evidence when extrapolating sample sizes should be adopted when such data are available. The methods are illustrated on binary and survival endpoints.
A finite element method is introduced to track interface evolution governed by the level set equation. The method solves for the level set indicator function in a narrow band around the interface. An extension procedure, which is essential for a narrow band level set method, is introduced based on a finite element $L^2$- or $H^1$-projection combined with the ghost-penalty method. This procedure is formulated as a linear variational problem in a narrow band around the surface, making it computationally efficient and suitable for rigorous error analysis. The extension method is combined with a discontinuous Galerkin space discretization and a BDF time-stepping scheme. The paper analyzes the stability and accuracy of the extension procedure and evaluates the performance of the resulting narrow band finite element method for the level set equation through numerical experiments.
We propose a new full discretization of the Biot's equations in poroelasticity. The construction is driven by the inf-sup theory, which we recently developed. It builds upon the four-field formulation of the equations obtained by introducing the total pressure and the total fluid content. We discretize in space with Lagrange finite elements and in time with backward Euler. We establish inf-sup stability and quasi-optimality of the proposed discretization, with robust constants with respect to all material parameters. We further construct an interpolant showing how the error decays for smooth solutions.
Threshold selection is a fundamental problem in any threshold-based extreme value analysis. While models are asymptotically motivated, selecting an appropriate threshold for finite samples is difficult and highly subjective through standard methods. Inference for high quantiles can also be highly sensitive to the choice of threshold. Too low a threshold choice leads to bias in the fit of the extreme value model, while too high a choice leads to unnecessary additional uncertainty in the estimation of model parameters. We develop a novel methodology for automated threshold selection that directly tackles this bias-variance trade-off. We also develop a method to account for the uncertainty in the threshold estimation and propagate this uncertainty through to high quantile inference. Through a simulation study, we demonstrate the effectiveness of our method for threshold selection and subsequent extreme quantile estimation, relative to the leading existing methods, and show how the method's effectiveness is not sensitive to the tuning parameters. We apply our method to the well-known, troublesome example of the River Nidd dataset.
Generalized linear models (GLMs) arguably represent the standard approach for statistical regression beyond the Gaussian likelihood scenario. When Bayesian formulations are employed, the general absence of a tractable posterior distribution has motivated the development of deterministic approximations, which are generally more scalable than sampling techniques. Among them, expectation propagation (EP) showed extreme accuracy, usually higher than many variational Bayes solutions. However, the higher computational cost of EP posed concerns about its practical feasibility, especially in high-dimensional settings. We address these concerns by deriving a novel efficient formulation of EP for GLMs, whose cost scales linearly in the number of covariates p. This reduces the state-of-the-art O(p^2 n) per-iteration computational cost of the EP routine for GLMs to O(p n min{p,n}), with n being the sample size. We also show that, for binary models and log-linear GLMs approximate predictive means can be obtained at no additional cost. To preserve efficient moment matching for count data, we propose employing a combination of log-normal Laplace transform approximations, avoiding numerical integration. These novel results open the possibility of employing EP in settings that were believed to be practically impossible. Improvements over state-of-the-art approaches are illustrated both for simulated and real data. The efficient EP implementation is available at //github.com/niccoloanceschi/EPglm.
This paper develops and discusses a residual-based a posteriori error estimate and a space--time adaptive algorithm for solving parabolic surface partial differential equations on closed stationary surfaces. The full discretization uses the surface finite element method in space and the backward Euler method in time. The proposed error indicator bounds the error quantities globally in space from above and below, and globally in time from above and locally from below. A space--time adaptive algorithm is proposed using the derived error indicator. Numerical experiments illustrate and complement the theory.
This paper introduces a new numerical scheme for a system that includes evolution equations describing a perfect plasticity model with a time-dependent yield surface. We demonstrate that the solution to the proposed scheme is stable under suitable norms. Moreover, the stability leads to the existence of an exact solution, and we also prove that the solution to the proposed scheme converges strongly to the exact solution under suitable norms.
The paper considers standard iterative methods for solving the generalized Stokes problem arising from the time and space approximation of the time-dependent incompressible Navier-Stokes equations. Various preconditioning techniques are considered (Cahouet&Chabard and augmented Lagrangian), and one investigates whether these methods can compete with traditional pressure-correction and velocity-correction methods in terms of CPU time per degree of freedom and per time step. Numerical tests on fine unstructured meshes (68 millions degrees of freedoms) demonstrate convergence rates that are independent of the mesh size and improve with the Reynolds number. Three conclusions are drawn from the paper: (1) Although very good parallel scalability is observed for the augmented Lagrangian method, thorough tests on large problems reveal that the overall CPU time per degree of freedom and per time step is best for the standard Cahouet&Chabar preconditioner. (2) Whether solving the pressure Schur complement problem or solving the full couple system at once does not make any significant difference in term of CPU time per degree of freedom and per time step. (3) All the methods tested in the paper, whether matrix-free or not, are on average 30 times slower than traditional pressure-correction and velocity-correction methods. Hence, although all these methods are very efficient for solving steady state problems, they are not yet competitive for solving time-dependent problems.
In shape-constrained nonparametric inference, it is often necessary to perform preliminary tests to verify whether a probability mass function (p.m.f.) satisfies qualitative constraints such as monotonicity, convexity or in general $k$-monotonicity. In this paper, we are interested in testing $k$-monotonicity of a compactly supported p.m.f. and we put our main focus on monotonicity and convexity; i.e., $k \in \{1,2\}$. We consider new testing procedures that are directly derived from the definition of $k$-monotonicity and rely exclusively on the empirical measure, as well as tests that are based on the projection of the empirical measure on the class of $k$-monotone p.m.f.s. The asymptotic behaviour of the introduced test statistics is derived and a simulation study is performed to assess the finite sample performance of all the proposed tests. Applications to real datasets are presented to illustrate the theory.
The paper analyzes how the enlarging of the sample affects to the mitigation of collinearity concluding that it may mitigate the consequences of collinearity related to statistical analysis but not necessarily the numerical instability. The problem that is addressed is of importance in the teaching of social sciences since it discusses one of the solutions proposed almost unanimously to solve the problem of multicollinearity. For a better understanding and illustration of the contribution of this paper, two empirical examples are presented and not highly technical developments are used.
The use of variable grid BDF methods for parabolic equations leads to structures that are called variable (coefficient) Toeplitz. Here, we consider a more general class of matrix-sequences and we prove that they belong to the maximal $*$-algebra of generalized locally Toeplitz (GLT) matrix-sequences. Then, we identify the associated GLT symbols in the general setting and in the specific case, by providing in both cases a spectral and singular value analysis. More specifically, we use the GLT tools in order to study the asymptotic behaviour of the eigenvalues and singular values of the considered BDF matrix-sequences, in connection with the given non-uniform grids. Numerical examples, visualizations, and open problems end the present work.