秋霞网一区二区三区,真实国产乱子伦对白视频37P

Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum likelihood does not directly optimize these criteria. Directly optimizing criteria like BS requires inverse-weighting by the censoring distribution, estimation of which itself also requires inverse-weighted by the failure distribution. But neither are known. To resolve this dilemma, we introduce Inverse-Weighted Survival Games to train both failure and censoring models with respect to criteria such as BS or BLL. In these games, objectives for each model are built from re-weighted estimates featuring the other model, where the re-weighting model is held fixed during training. When the loss is proper, we show that the games always have the true failure and censoring distributions as a stationary point. This means models in the game do not leave the correct distributions once reached. We construct one case where this stationary point is unique. We show that these games optimize BS on simulations and then apply these principles on real world cancer and critically-ill patient data.

相關內容

極大似然

關注 0

Performer · 估計/估計量 · 有偏 · 欠估計 · 期望回報 ·

2022 年 1 月 20 日

Two-Sample Testing in Reinforcement Learning

Martin Waltz,Ostap Okhrin

Value-based reinforcement-learning algorithms have shown strong performances in games, robotics, and other real-world applications. The most popular sample-based method is $Q$-Learning. A $Q$-value is the expected return for a state-action pair when following a particular policy, and the algorithm subsequently performs updates by adjusting the current $Q$-value towards the observed reward and the maximum of the $Q$-values of the next state. The procedure introduces maximization bias, and solutions like Double $Q$-Learning have been considered. We frame the bias problem statistically and consider it an instance of estimating the maximum expected value (MEV) of a set of random variables. We propose the $T$-Estimator (TE) based on two-sample testing for the mean. The TE flexibly interpolates between over- and underestimation by adjusting the level of significance of the underlying hypothesis tests. A generalization termed $K$-Estimator (KE) obeys the same bias and variance bounds as the TE while relying on a nearly arbitrary kernel function. Using the TE and the KE, we introduce modifications of $Q$-Learning and its neural network analog, the Deep $Q$-Network. The proposed estimators and algorithms are thoroughly tested and validated on a diverse set of tasks and environments, illustrating the performance potential of the TE and KE.

模型選擇 · MoDELS · 似然 · 異方差 · Performer ·

2022 年 1 月 20 日

Nonnested model selection based on empirical likelihood

Jiancheng Jiang,Jiang Xuejun,Wang Haofeng

from arxiv, 31 pages for main body and 15 pages for supplementary material, 4 tables

We propose an empirical likelihood ratio test for nonparametric model selection, where the competing models may be nested, nonnested, overlapping, misspecified, or correctly specified. It compares the squared prediction errors of models based on the cross-validation and allows for heteroscedasticity of the errors of models. We develop its asymptotic distributions for comparing additive models and varying-coefficient models and extend it to test significance of variables in additive models with massive data. The method is applicable to model selection among supervised learning models. To facilitate implementation of the test, we provide a fast calculation procedure. Simulations show that the proposed tests work well and have favorable finite sample performance over some existing approaches. The methodology is validated on an empirical application.

統計量 · 估計/估計量 · Processing（編程語言） · MoDELS · 平滑 ·

2022 年 1 月 19 日

Error analysis for a statistical finite element method

Toni Karvonen,Fehmi Cirak,Mark Girolami

The recently proposed statistical finite element (statFEM) approach synthesises measurement data with finite element models and allows for making predictions about the true system response. We provide a probabilistic error analysis for a prototypical statFEM setup based on a Gaussian process prior under the assumption that the noisy measurement data are generated by a deterministic true system response function that satisfies a second-order elliptic partial differential equation for an unknown true source term. In certain cases, properties such as the smoothness of the source term may be misspecified by the Gaussian process model. The error estimates we derive are for the expectation with respect to the measurement noise of the $L^2$-norm of the difference between the true system response and the mean of the statFEM posterior. The estimates imply polynomial rates of convergence in the numbers of measurement points and finite element basis functions and depend on the Sobolev smoothness of the true source term and the Gaussian process model. A numerical example for Poisson's equation is used to illustrate these theoretical results.

可辨認的 · Performer · FPS · Automator · 強化學習 ·

2022 年 1 月 18 日

Using Reinforcement Learning for Load Testing of Video Games

Rosalia Tufano,Simone Scalabrino,Luca Pascarella,Emad Aghajani,Rocco Oliveto,Gabriele Bavota

from arxiv, accepted for publication at ICSE 2022

Different from what happens for most types of software systems, testing video games has largely remained a manual activity performed by human testers. This is mostly due to the continuous and intelligent user interaction video games require. Recently, reinforcement learning (RL) has been exploited to partially automate functional testing. RL enables training smart agents that can even achieve super-human performance in playing games, thus being suitable to explore them looking for bugs. We investigate the possibility of using RL for load testing video games. Indeed, the goal of game testing is not only to identify functional bugs, but also to examine the game's performance, such as its ability to avoid lags and keep a minimum number of frames per second (FPS) when high-demanding 3D scenes are shown on screen. We define a methodology employing RL to train an agent able to play the game as a human while also trying to identify areas of the game resulting in a drop of FPS. We demonstrate the feasibility of our approach on three games. Two of them are used as proof-of-concept, by injecting artificial performance bugs. The third one is an open-source 3D game that we load test using the trained agent showing its potential to identify areas of the game resulting in lower FPS.

聯邦學習 · Facebook AI Research · MoDELS · 學成 · 情景 ·

2022 年 1 月 17 日

Fairness in Federated Learning for Spatial-Temporal Applications

Afra Mashhadi,Alex Kyllo,Reza M. Parizi

Federated learning involves training statistical models over remote devices such as mobile phones while keeping data localized. Training in heterogeneous and potentially massive networks introduces opportunities for privacy-preserving data analysis and diversifying these models to become more inclusive of the population. Federated learning can be viewed as a unique opportunity to bring fairness and parity to many existing models by enabling model training to happen on a diverse set of participants and on data that is generated regularly and dynamically. In this paper, we discuss the current metrics and approaches that are available to measure and evaluate fairness in the context of spatial-temporal models. We propose how these metrics and approaches can be re-defined to address the challenges that are faced in the federated learning setting.

估計/估計量 · Weight · 控制器 · 泛函 · Guidance ·

2022 年 1 月 17 日

Generalizable survival analysis of randomized controlled trials with observational studies

Dasom Lee,Shu Yang,Xiaofei Wang

In the presence of heterogeneity between the randomized controlled trial (RCT) participants and the target population, evaluating the treatment effect solely based on the RCT often leads to biased quantification of the real-world treatment effect. To address the problem of lack of generalizability for the treatment effect estimated by the RCT sample, we leverage observational studies with large samples that are representative of the target population. This paper concerns evaluating treatment effects on survival outcomes for a target population and considers a broad class of estimands that are functionals of treatment-specific survival functions, including differences in survival probability and restricted mean survival times. Motivated by two intuitive but distinct approaches, i.e., imputation based on survival outcome regression and weighting based on inverse probability of sampling, censoring, and treatment assignment, we propose a semiparametric estimator through the guidance of the efficient influence function. The proposed estimator is doubly robust in the sense that it is consistent for the target population estimands if either the survival model or the weighting model is correctly specified, and is locally efficient when both are correct.Simulation studies confirm the theoretical properties of the proposed estimator and show it outperforms competitors. We apply the proposed method to estimate the effect of adjuvant chemotherapy on survival in patients with early-stage resected non-small lung cancer.

估計/估計量 · 模型評估 · 線性的 · 移動平均 · 置信度 ·

2022 年 1 月 15 日

Residual Tracking and Stopping for Iterative Random Sketching

Nathaniel Pritchard,Vivak Patel

from arxiv, 29 pages, 8 figures. Submitted to SIAM Journal Scientific Computing

Iterative random sketching (IRS) offers a computationally expedient approach to solving linear systems. However, IRS' benefits can only be realized if the procedure can be appropriately tracked and stopped -- otherwise, the algorithm may stop before the desired accuracy is achieved, or it may run longer than necessary. Unfortunately, IRS solvers cannot access the residual norm without undermining their computational efficiency. While iterative random sketching solvers have access to noisy estimates of the residual, such estimates turn out to be insufficient to generate accurate estimates and confidence bounds for the true residual. Thus, in this work, we propose a moving average estimator for the system's residual, and rigorously develop practical uncertainty sets for our estimator. We then demonstrate the accuracy of our methods on a number of linear systems problems.

MoDELS · 均值 · 損失函數（機器學習） · 泛函 · 估計/估計量 ·

2022 年 1 月 14 日

Expectile-based hydrological modelling for uncertainty estimation: Life after mean

Hristos Tyralis,Georgia Papacharalampous,Sina Khatami

from arxiv, 26 pages, 7 figures

Predictions of hydrological models should be probabilistic in nature. Our aim is to introduce a method that estimates directly the uncertainty of hydrological simulations using expectiles, thus complementing previous quantile-based direct approaches. Expectiles are new risk measures in hydrology. They are least square analogues of quantiles and can characterize the probability distribution in much the same way as quantiles do. To this end, we propose calibrating hydrological models using the expectile loss function, which is consistent for expectiles. We apply our method to 511 basins in contiguous US and deliver predictive expectiles of hydrological simulations with the GR4J, GR5J and GR6J hydrological models at expectile levels 0.500, 0.900, 0.950 and 0.975. An honest assessment empirically proves that the GR6J model outperforms the other two models at all expectile levels. Great opportunities are offered for moving beyond the mean in hydrological modelling by simply adjusting the objective function.

獎勵函數 · 泛函 · Performer · 情景 · 可理解性 ·

2021 年 11 月 1 日

On the Expressivity of Markov Reward

David Abel,Will Dabney,Anna Harutyunyan,Mark K. Ho,Michael L. Littman,Doina Precup,Satinder Singh

from arxiv, Accepted to NeurIPS 2021

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

主動學習 · binary · 圖像分割 · 平滑先驗 · 學成 ·

2018 年 1 月 16 日

Geometry in Active Learning for Binary and Multi-class Image Segmentation

Ksenia Konyushkova,Raphael Sznitman,Pascal Fua

from arxiv, Extension of our previous paper arXiv:1508.04955

We propose an Active Learning approach to image segmentation that exploits geometric priors to streamline the annotation process. We demonstrate this for both background-foreground and multi-class segmentation tasks in 2D images and 3D image volumes. Our approach combines geometric smoothness priors in the image space with more traditional uncertainty measures to estimate which pixels or voxels are most in need of annotation. For multi-class settings, we additionally introduce two novel criteria for uncertainty. In the 3D case, we use the resulting uncertainty measure to show the annotator voxels lying on the same planar patch, which makes batch annotation much easier than if they were randomly distributed in the volume. The planar patch is found using a branch-and-bound algorithm that finds a patch with the most informative instances. We evaluate our approach on Electron Microscopy and Magnetic Resonance image volumes, as well as on regular images of horses and faces. We demonstrate a substantial performance increase over state-of-the-art approaches.