Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum likelihood does not directly optimize these criteria. Directly optimizing criteria like BS requires inverse-weighting by the censoring distribution, estimation of which itself also requires inverse-weighted by the failure distribution. But neither are known. To resolve this dilemma, we introduce Inverse-Weighted Survival Games to train both failure and censoring models with respect to criteria such as BS or BLL. In these games, objectives for each model are built from re-weighted estimates featuring the other model, where the re-weighting model is held fixed during training. When the loss is proper, we show that the games always have the true failure and censoring distributions as a stationary point. This means models in the game do not leave the correct distributions once reached. We construct one case where this stationary point is unique. We show that these games optimize BS on simulations and then apply these principles on real world cancer and critically-ill patient data.
Value-based reinforcement-learning algorithms have shown strong performances in games, robotics, and other real-world applications. The most popular sample-based method is $Q$-Learning. A $Q$-value is the expected return for a state-action pair when following a particular policy, and the algorithm subsequently performs updates by adjusting the current $Q$-value towards the observed reward and the maximum of the $Q$-values of the next state. The procedure introduces maximization bias, and solutions like Double $Q$-Learning have been considered. We frame the bias problem statistically and consider it an instance of estimating the maximum expected value (MEV) of a set of random variables. We propose the $T$-Estimator (TE) based on two-sample testing for the mean. The TE flexibly interpolates between over- and underestimation by adjusting the level of significance of the underlying hypothesis tests. A generalization termed $K$-Estimator (KE) obeys the same bias and variance bounds as the TE while relying on a nearly arbitrary kernel function. Using the TE and the KE, we introduce modifications of $Q$-Learning and its neural network analog, the Deep $Q$-Network. The proposed estimators and algorithms are thoroughly tested and validated on a diverse set of tasks and environments, illustrating the performance potential of the TE and KE.
We propose an empirical likelihood ratio test for nonparametric model selection, where the competing models may be nested, nonnested, overlapping, misspecified, or correctly specified. It compares the squared prediction errors of models based on the cross-validation and allows for heteroscedasticity of the errors of models. We develop its asymptotic distributions for comparing additive models and varying-coefficient models and extend it to test significance of variables in additive models with massive data. The method is applicable to model selection among supervised learning models. To facilitate implementation of the test, we provide a fast calculation procedure. Simulations show that the proposed tests work well and have favorable finite sample performance over some existing approaches. The methodology is validated on an empirical application.
The recently proposed statistical finite element (statFEM) approach synthesises measurement data with finite element models and allows for making predictions about the true system response. We provide a probabilistic error analysis for a prototypical statFEM setup based on a Gaussian process prior under the assumption that the noisy measurement data are generated by a deterministic true system response function that satisfies a second-order elliptic partial differential equation for an unknown true source term. In certain cases, properties such as the smoothness of the source term may be misspecified by the Gaussian process model. The error estimates we derive are for the expectation with respect to the measurement noise of the $L^2$-norm of the difference between the true system response and the mean of the statFEM posterior. The estimates imply polynomial rates of convergence in the numbers of measurement points and finite element basis functions and depend on the Sobolev smoothness of the true source term and the Gaussian process model. A numerical example for Poisson's equation is used to illustrate these theoretical results.
Different from what happens for most types of software systems, testing video games has largely remained a manual activity performed by human testers. This is mostly due to the continuous and intelligent user interaction video games require. Recently, reinforcement learning (RL) has been exploited to partially automate functional testing. RL enables training smart agents that can even achieve super-human performance in playing games, thus being suitable to explore them looking for bugs. We investigate the possibility of using RL for load testing video games. Indeed, the goal of game testing is not only to identify functional bugs, but also to examine the game's performance, such as its ability to avoid lags and keep a minimum number of frames per second (FPS) when high-demanding 3D scenes are shown on screen. We define a methodology employing RL to train an agent able to play the game as a human while also trying to identify areas of the game resulting in a drop of FPS. We demonstrate the feasibility of our approach on three games. Two of them are used as proof-of-concept, by injecting artificial performance bugs. The third one is an open-source 3D game that we load test using the trained agent showing its potential to identify areas of the game resulting in lower FPS.
Federated learning involves training statistical models over remote devices such as mobile phones while keeping data localized. Training in heterogeneous and potentially massive networks introduces opportunities for privacy-preserving data analysis and diversifying these models to become more inclusive of the population. Federated learning can be viewed as a unique opportunity to bring fairness and parity to many existing models by enabling model training to happen on a diverse set of participants and on data that is generated regularly and dynamically. In this paper, we discuss the current metrics and approaches that are available to measure and evaluate fairness in the context of spatial-temporal models. We propose how these metrics and approaches can be re-defined to address the challenges that are faced in the federated learning setting.
In the presence of heterogeneity between the randomized controlled trial (RCT) participants and the target population, evaluating the treatment effect solely based on the RCT often leads to biased quantification of the real-world treatment effect. To address the problem of lack of generalizability for the treatment effect estimated by the RCT sample, we leverage observational studies with large samples that are representative of the target population. This paper concerns evaluating treatment effects on survival outcomes for a target population and considers a broad class of estimands that are functionals of treatment-specific survival functions, including differences in survival probability and restricted mean survival times. Motivated by two intuitive but distinct approaches, i.e., imputation based on survival outcome regression and weighting based on inverse probability of sampling, censoring, and treatment assignment, we propose a semiparametric estimator through the guidance of the efficient influence function. The proposed estimator is doubly robust in the sense that it is consistent for the target population estimands if either the survival model or the weighting model is correctly specified, and is locally efficient when both are correct.Simulation studies confirm the theoretical properties of the proposed estimator and show it outperforms competitors. We apply the proposed method to estimate the effect of adjuvant chemotherapy on survival in patients with early-stage resected non-small lung cancer.
Iterative random sketching (IRS) offers a computationally expedient approach to solving linear systems. However, IRS' benefits can only be realized if the procedure can be appropriately tracked and stopped -- otherwise, the algorithm may stop before the desired accuracy is achieved, or it may run longer than necessary. Unfortunately, IRS solvers cannot access the residual norm without undermining their computational efficiency. While iterative random sketching solvers have access to noisy estimates of the residual, such estimates turn out to be insufficient to generate accurate estimates and confidence bounds for the true residual. Thus, in this work, we propose a moving average estimator for the system's residual, and rigorously develop practical uncertainty sets for our estimator. We then demonstrate the accuracy of our methods on a number of linear systems problems.
Predictions of hydrological models should be probabilistic in nature. Our aim is to introduce a method that estimates directly the uncertainty of hydrological simulations using expectiles, thus complementing previous quantile-based direct approaches. Expectiles are new risk measures in hydrology. They are least square analogues of quantiles and can characterize the probability distribution in much the same way as quantiles do. To this end, we propose calibrating hydrological models using the expectile loss function, which is consistent for expectiles. We apply our method to 511 basins in contiguous US and deliver predictive expectiles of hydrological simulations with the GR4J, GR5J and GR6J hydrological models at expectile levels 0.500, 0.900, 0.950 and 0.975. An honest assessment empirically proves that the GR6J model outperforms the other two models at all expectile levels. Great opportunities are offered for moving beyond the mean in hydrological modelling by simply adjusting the objective function.
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.
We propose an Active Learning approach to image segmentation that exploits geometric priors to streamline the annotation process. We demonstrate this for both background-foreground and multi-class segmentation tasks in 2D images and 3D image volumes. Our approach combines geometric smoothness priors in the image space with more traditional uncertainty measures to estimate which pixels or voxels are most in need of annotation. For multi-class settings, we additionally introduce two novel criteria for uncertainty. In the 3D case, we use the resulting uncertainty measure to show the annotator voxels lying on the same planar patch, which makes batch annotation much easier than if they were randomly distributed in the volume. The planar patch is found using a branch-and-bound algorithm that finds a patch with the most informative instances. We evaluate our approach on Electron Microscopy and Magnetic Resonance image volumes, as well as on regular images of horses and faces. We demonstrate a substantial performance increase over state-of-the-art approaches.