We introduce a new method analyzing the cumulative sum (CUSUM) procedure in sequential change-point detection. When observations are phase-type distributed and the post-change distribution is given by exponential tilting of its pre-change distribution, the first passage analysis of the CUSUM statistic is reduced to that of a certain Markov additive process. By using the theory of the so-called scale matrix and further developing it, we derive exact expressions of the average run length, average detection delay, and false alarm probability under the CUSUM procedure. The proposed method is robust and applicable in a general setting with non-i.i.d. observations. Numerical results also are given.
With the rapid development of new anti-cancer agents which are cytostatic, new endpoints are needed to better measure treatment efficacy in phase II trials. For this purpose, Von Hoff (1998) proposed the growth modulation index (GMI), i.e. the ratio between times to progression or progression-free survival times in two successive treatment lines. An essential task in studies using GMI as an endpoint is to estimate the distribution of GMI. Traditional methods for survival data have been used for estimating the GMI distribution because censoring is common for GMI data. However, we point out that the independent censoring assumption required by traditional survival methods is always violated for GMI, which may lead to severely biased results. In this paper, we construct nonparametric estimators for the distribution of GMI, accounting for the dependent censoring of GMI. We prove that the proposed estimators are consistent and converge weakly to zero-mean Gaussian processes upon proper normalization. Extensive simulation studies show that our estimators perform well in practical situations and outperform traditional methods. A phase II clinical trial using GMI as the primary endpoint is provided for illustration.
We consider flux-corrected finite element discretizations of 3D convection-dominated transport problems and assess the computational efficiency of algorithms based on such approximations. The methods under investigation include flux-corrected transport schemes and monolithic limiters. We discretize in space using a continuous Galerkin method and $\mathbb{P}_1$ or $\mathbb{Q}_1$ finite elements. Time integration is performed using the Crank-Nicolson method or an explicit strong stability preserving Runge-Kutta method. Nonlinear systems are solved using a fixed-point iteration method, which requires solution of large linear systems at each iteration or time step. The great variety of options in the choice of discretization methods and solver components calls for a dedicated comparative study of existing approaches. To perform such a study, we define new 3D test problems for time-dependent and stationary convection-diffusion-reaction equations. The results of our numerical experiments illustrate how the limiting technique, time discretization and solver impact on the overall performance.
In numerical simulations of complex flows with discontinuities, it is necessary to use nonlinear schemes. The spectrum of the scheme used have a significant impact on the resolution and stability of the computation. Based on the approximate dispersion relation method, we combine the corresponding spectral property with the dispersion relation preservation proposed by De and Eswaran (J. Comput. Phys. 218 (2006) 398-416) and propose a quasi-linear dispersion relation preservation (QL-DRP) analysis method, through which the group velocity of the nonlinear scheme can be determined. In particular, we derive the group velocity property when a high-order Runge-Kutta scheme is used and compare the performance of different time schemes with QL-DRP. The rationality of the QL-DRP method is verified by a numerical simulation and the discrete Fourier transform method. To further evaluate the performance of a nonlinear scheme in finding the group velocity, new hyperbolic equations are designed. The validity of QL-DRP and the group velocity preservation of several schemes are investigated using two examples of the equation for one-dimensional wave propagation and the new hyperbolic equations. The results show that the QL-DRP method integrated with high-order time schemes can determine the group velocity for nonlinear schemes and evaluate their performance reasonably and efficiently.
We develop methods for forming prediction sets in an online setting where the data generating distribution is allowed to vary over time in an unknown fashion. Our framework builds on ideas from conformal inference to provide a general wrapper that can be combined with any black box method that produces point predictions of the unseen label or estimated quantiles of its distribution. While previous conformal inference methods rely on the assumption that the data points are exchangeable, our adaptive approach provably achieves the desired coverage frequency over long-time intervals irrespective of the true data generating process. We accomplish this by modelling the distribution shift as a learning problem in a single parameter whose optimal value is varying over time and must be continuously re-estimated. We test our method, adaptive conformal inference, on two real world datasets and find that its predictions are robust to visible and significant distribution shifts.
In data analysis problems where we are not able to rely on distributional assumptions, what types of inference guarantees can still be obtained? Many popular methods, such as holdout methods, cross-validation methods, and conformal prediction, are able to provide distribution-free guarantees for predictive inference, but the problem of providing inference for the underlying regression function (for example, inference on the conditional mean $\mathbb{E}[Y|X]$) is more challenging. In the setting where the features $X$ are continuously distributed, recent work has established that any confidence interval for $\mathbb{E}[Y|X]$ must have non-vanishing width, even as sample size tends to infinity. At the other extreme, if $X$ takes only a small number of possible values, then inference on $\mathbb{E}[Y|X]$ is trivial to achieve. In this work, we study the problem in settings in between these two extremes. We find that there are several distinct regimes in between the finite setting and the continuous setting, where vanishing-width confidence intervals are achievable if and only if the effective support size of the distribution of $X$ is smaller than the square of the sample size.
The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on the Euclidean space $\mathbb{R}^d$, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametric statistical tool that measures the centrality of any element $x\in\mathbb{R}^d$ with respect to (w.r.t.) a probability distribution or a data set. It is a natural median-oriented extension of the cumulative distribution function (cdf) to the multivariate case. Thus, its upper-level sets -- the depth-trimmed regions -- give rise to a definition of multivariate quantiles. The new pseudo-metric relies on the average of the Hausdorff distance between the depth-based quantile regions w.r.t. each distribution. Its good behavior w.r.t. major transformation groups, as well as its ability to factor out translations, are depicted. Robustness, an appealing feature of this pseudo-metric, is studied through the finite sample breakdown point. Moreover, we propose an efficient approximation method with linear time complexity w.r.t. the size of the data set and its dimension. The quality of this approximation as well as the performance of the proposed approach are illustrated in numerical experiments.
Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optimization to account for uncertainty due to the limited samples. By leveraging large deviation results, we obtain explicit generalization bounds with respect to the unknown shifted distribution. Lastly, we demonstrate the versatility of our framework by demonstrating it on two rather distinct applications: (1) training classifiers on systematically biased data and (2) off-policy evaluation in Markov Decision Processes.
We extend the theory of distance (Brownian) covariance from Euclidean spaces, where it was introduced by Sz\'{e}kely, Rizzo and Bakirov, to general metric spaces. We show that for testing independence, it is necessary and sufficient that the metric space be of strong negative type. In particular, we show that this holds for separable Hilbert spaces, which answers a question of Kosorok. Instead of the manipulations of Fourier transforms used in the original work, we use elementary inequalities for metric spaces and embeddings in Hilbert spaces.
Methods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.