精品亚洲中文一区二区三区_一级欧美一级日韩大片_日本成年黄色一区二区三区_无码中文电影大全在线播放_国产影片一区二区三区_亚洲QV无码一区二区入口_精品中文字幕在线免费观看

In clinical prediction modeling, model updating refers to the practice of modifying a prediction model before it is used in a new setting. In the context of logistic regression for a binary outcome, one of the simplest updating methods is a fixed odds-ratio transformation of predicted risks to improve calibration-in-the-large. Previous authors have proposed equations for calculating this odds-ratio based on the discrepancy between the prevalence in the original and the new population, or between the average of predicted and observed risks. We show that this method fails to consider the non-collapsibility of odds-ratio. Consequently, it under-corrects predicted risks, especially when predicted risks are more dispersed (i.e., for models with good discrimination). We suggest an approximate equation for recovering the conditional odds-ratio from the mean and variance of predicted risks. Brief simulations and a case study show that this approach reduces under-correction, sometimes substantially. R code for implementation is provided.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 學成 · 穩健性 · 共因 · 樣本 ·

2022 年 1 月 14 日

SPLDExtraTrees: Robust machine learning approach for predicting kinase inhibitor resistance

Ziyi Yang,Zhaofeng Ye,Yijia Xiao,Changyu Hsieh,Shengyu Zhang

from arxiv, 14 pages, 5 figures

Drug resistance is a major threat to the global health and a significant concern throughout the clinical treatment of diseases and drug development. The mutation in proteins that is related to drug binding is a common cause for adaptive drug resistance. Therefore, quantitative estimations of how mutations would affect the interaction between a drug and the target protein would be of vital significance for the drug development and the clinical practice. Computational methods that rely on molecular dynamics simulations, Rosetta protocols, as well as machine learning methods have been proven to be capable of predicting ligand affinity changes upon protein mutation. However, the severely limited sample size and heavy noise induced overfitting and generalization issues have impeded wide adoption of machine learning for studying drug resistance. In this paper, we propose a robust machine learning method, termed SPLDExtraTrees, which can accurately predict ligand binding affinity changes upon protein mutation and identify resistance-causing mutations. Especially, the proposed method ranks training data following a specific scheme that starts with easy-to-learn samples and gradually incorporates harder and diverse samples into the training, and then iterates between sample weight recalculations and model updates. In addition, we calculate additional physics-based structural features to provide the machine learning model with the valuable domain knowledge on proteins for this data-limited predictive tasks. The experiments substantiate the capability of the proposed method for predicting kinase inhibitor resistance under three scenarios, and achieves predictive accuracy comparable to that of molecular dynamics and Rosetta methods with much less computational costs.

估計/估計量 · 原點 · 模型評估 · CASES · Extensibility ·

2022 年 1 月 14 日

Deep Learning for Agile Effort Estimation Have We Solved the Problem Yet?

Vali Tawosi,Rebecca Moussa,Federica Sarro

from arxiv, has been submitted for possible publication

In the last decade, several studies have proposed the use of automated techniques to estimate the effort of agile software development. In this paper we perform a close replication and extension of a seminal work proposing the use of Deep Learning for agile effort estimation (namely Deep-SE), which has set the state-of-the-art since. Specifically, we replicate three of the original research questions aiming at investigating the effectiveness of Deep-SE for both within-project and cross-project effort estimation. We benchmark Deep-SE against three baseline techniques (i.e., Random, Mean and Median effort prediction) and a previously proposed method to estimate agile software project development effort (dubbed TF/IDF-SE), as done in the original study. To this end, we use both the data from the original study and a new larger dataset of 31,960 issues, which we mined from 29 open-source projects. Using more data allows us to strengthen our confidence in the results and further mitigate the threat to the external validity of the study. We also extend the original study by investigating two additional research questions. One evaluates the accuracy of Deep-SE when the training set is augmented with issues from all other projects available in the repository at the time of estimation, and the other examines whether an expensive pre-training step used by the original Deep-SE, has any beneficial effect on its accuracy and convergence speed. The results of our replication show that Deep-SE outperforms the Median baseline estimator and TF/IDF-SE in only very few cases with statistical significance (8/42 and 9/32 cases, respectively), thus confounding previous findings on the efficacy of Deep-SE. The two additional RQs revealed that neither augmenting the training set nor pre-training Deep-SE play a role in improving its accuracy and convergence speed. ...

分解的 · 在線 · MoDELS · Performer · DATE ·

2022 年 1 月 13 日

Cardinality Constrained Scheduling in Online Models

Leah Epstein,Alexandra Lassota,Asaf Levin,Marten Maack,Lars Rohwedder

from arxiv, An extended abstract will appear in the proceedings of STACS'22

Makespan minimization on parallel identical machines is a classical and intensively studied problem in scheduling, and a classic example for online algorithm analysis with Graham's famous list scheduling algorithm dating back to the 1960s. In this problem, jobs arrive over a list and upon an arrival, the algorithm needs to assign the job to a machine. The goal is to minimize the makespan, that is, the maximum machine load. In this paper, we consider the variant with an additional cardinality constraint: The algorithm may assign at most $k$ jobs to each machine where $k$ is part of the input. While the offline (strongly NP-hard) variant of cardinality constrained scheduling is well understood and an EPTAS exists here, no non-trivial results are known for the online variant. We fill this gap by making a comprehensive study of various different online models. First, we show that there is a constant competitive algorithm for the problem and further, present a lower bound of $2$ on the competitive ratio of any online algorithm. Motivated by the lower bound, we consider a semi-online variant where upon arrival of a job of size $p$, we are allowed to migrate jobs of total size at most a constant times $p$. This constant is called the migration factor of the algorithm. Algorithms with small migration factors are a common approach to bridge the performance of online algorithms and offline algorithms. One can obtain algorithms with a constant migration factor by rounding the size of each incoming job and then applying an ordinal algorithm to the resulting rounded instance. With this in mind, we also consider the framework of ordinal algorithms and characterize the competitive ratio that can be achieved using the aforementioned approaches.

Facebook AI Research · 異常點 · 近似 · 約束 · 情景 ·

2022 年 1 月 13 日

Approximate the individually fair k-center with outliers

Lu Han,Dachuan Xu,Yicheng Xu,Ping Yang

In this paper, we propose and investigate the individually fair $k$-center with outliers (IF$k$CO). In the IF$k$CO, we are given an $n$-sized vertex set in a metric space, as well as integers $k$ and $q$. At most $k$ vertices can be selected as the centers and at most $q$ vertices can be selected as the outliers. The centers are selected to serve all the not-an-outlier (i.e., served) vertices. The so-called individual fairness constraint restricts that every served vertex must have a selected center not too far way. More precisely, it is supposed that there exists at least one center among its $\lceil (n-q) / k \rceil$ closest neighbors for every served vertex. Because every center serves $(n-q) / k$ vertices on the average. The objective is to select centers and outliers, assign every served vertex to some center, so as to minimize the maximum fairness ratio over all served vertices, where the fairness ratio of a vertex is defined as the ratio between its distance with the assigned center and its distance with a $\lceil (n - q )/k \rceil_{\rm th}$ closest neighbor. As our main contribution, a 4-approximation algorithm is presented, based on which we develop an improved algorithm from a practical perspective.

似然 · 估計/估計量 · MoDELS · 邊緣化 · 點估計 ·

2022 年 1 月 12 日

Geometric Conditions for the Discrepant Posterior Phenomenon and Connections to Simpson's Paradox

Yang Chen,Ruobin Gong,Min-ge Xie

The discrepant posterior phenomenon (DPP) is a counter-intuitive phenomenon that can frequently occur in a Bayesian analysis of multivariate parameters. It refers to the phenomenon that a parameter estimate based on a posterior is more extreme than both of those inferred based on either the prior or the likelihood alone. Inferential claims that exhibit DPP defy the common intuition that the posterior is a prior-data compromise, and the phenomenon can be surprisingly ubiquitous in well-behaved Bayesian models. In this paper we revisit this phenomenon and, using point estimation as an example, derive conditions under which the DPP occurs in Bayesian models with exponential quadratic likelihoods and conjugate multivariate Gaussian priors. The family of exponential quadratic likelihood models includes Gaussian models and those models with local asymptotic normality property. We provide an intuitive geometric interpretation of the phenomenon and show that there exists a nontrivial space of marginal directions such that the DPP occurs. We further relate the phenomenon to the Simpson's paradox and discover their deep-rooted connection that is associated with marginalization. We also draw connections with Bayesian computational algorithms when difficult geometry exists. Our discovery demonstrates that DPP is more prevalent than previously understood and anticipated. Theoretical results are complemented by numerical illustrations. Scenarios covered in this study have implications for parameterization, sensitivity analysis, and prior choice for Bayesian modeling.

MoDELS · COVID-19 · SimPLe · 可理解性 · 學成 ·

2022 年 1 月 7 日

Unifying Epidemic Models with Mixtures

Arnab Sarker,Ali Jadbabaie,Devavrat Shah

The COVID-19 pandemic has emphasized the need for a robust understanding of epidemic models. Current models of epidemics are classified as either mechanistic or non-mechanistic: mechanistic models make explicit assumptions on the dynamics of disease, whereas non-mechanistic models make assumptions on the form of observed time series. Here, we introduce a simple mixture-based model which bridges the two approaches while retaining benefits of both. The model represents time series of cases and fatalities as a mixture of Gaussian curves, providing a flexible function class to learn from data compared to traditional mechanistic models. Although the model is non-mechanistic, we show that it arises as the natural outcome of a stochastic process based on a networked SIR framework. This allows learned parameters to take on a more meaningful interpretation compared to similar non-mechanistic models, and we validate the interpretations using auxiliary mobility data collected during the COVID-19 pandemic. We provide a simple learning algorithm to identify model parameters and establish theoretical results which show the model can be efficiently learned from data. Empirically, we find the model to have low prediction error. The model is available live at covidpredictions.mit.edu. Ultimately, this allows us to systematically understand the impacts of interventions on COVID-19, which is critical in developing data-driven solutions to controlling epidemics.

多樣性 · 優化器 · MoDELS · 潛在 · 正則化項 ·

2019 年 2 月 28 日

Jointly Optimizing Diversity and Relevance in Neural Response Generation

Xiang Gao,Sungjin Lee,Yizhe Zhang,Chris Brockett,Michel Galley,Jianfeng Gao,Bill Dolan

from arxiv, Long paper accepted at NAACL 2019

Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a method to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.

圖像分割 · Performer · Performance · 模式識別 · 計算機視覺 ·

2018 年 7 月 30 日

A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation

Jack Spencer

In two-phase image segmentation, convex relaxation has allowed global minimisers to be computed for a variety of data fitting terms. Many efficient approaches exist to compute a solution quickly. However, we consider whether the nature of the data fitting in this formulation allows for reasonable assumptions to be made about the solution that can improve the computational performance further. In particular, we employ a well known dual formulation of this problem and solve the corresponding equations in a restricted domain. We present experimental results that explore the dependence of the solution on this restriction and quantify imrovements in the computational performance. This approach can be extended to analogous methods simply and could provide an efficient alternative for problems of this type.

隨機森林 · 估計/估計量 · 秩 · TEAM · MoDELS ·

2018 年 6 月 13 日

Prediction of the FIFA World Cup 2018 - A random forest approach with an emphasis on estimated team ability parameters

Andreas Groll,Christophe Ley,Gunther Schauberger,Hans Van Eetvelde

from arxiv, First revised version, corrected typo in introduction when referring to the winning probabilities derived by Zeileis, Leitner, and Hornik (2018), which are for Germany 15.8% instead of 12.8%. Second revised version, slight changes in notation in Section 3.3

In this work, we compare three different modeling approaches for the scores of soccer matches with regard to their predictive performances based on all matches from the four previous FIFA World Cups 2002 - 2014: Poisson regression models, random forests and ranking methods. While the former two are based on the teams' covariate information, the latter method estimates adequate ability parameters that reflect the current strength of the teams best. Within this comparison the best-performing prediction methods on the training data turn out to be the ranking methods and the random forests. However, we show that by combining the random forest with the team ability parameters from the ranking methods as an additional covariate we can improve the predictive power substantially. Finally, this combination of methods is chosen as the final model and based on its estimates, the FIFA World Cup 2018 is simulated repeatedly and winning probabilities are obtained for all teams. The model slightly favors Spain before the defending champion Germany. Additionally, we provide survival probabilities for all teams and at all tournament stages as well as the most probable tournament outcome.

馬爾可夫隨機場 · 共軛梯度 · 隨機場 · 共軛 · 圖像分割 ·

2018 年 3 月 13 日

Combination of Hidden Markov Random Field and Conjugate Gradient for Brain Image Segmentation

EL-Hachemi Guerrout,Samy Ait-Aoudia,Dominique Michelucci,Ramdane Mahiou

Image segmentation is the process of partitioning the image into significant regions easier to analyze. Nowadays, segmentation has become a necessity in many practical medical imaging methods as locating tumors and diseases. Hidden Markov Random Field model is one of several techniques used in image segmentation. It provides an elegant way to model the segmentation process. This modeling leads to the minimization of an objective function. Conjugate Gradient algorithm (CG) is one of the best known optimization techniques. This paper proposes the use of the Conjugate Gradient algorithm (CG) for image segmentation, based on the Hidden Markov Random Field. Since derivatives are not available for this expression, finite differences are used in the CG algorithm to approximate the first derivative. The approach is evaluated using a number of publicly available images, where ground truth is known. The Dice Coefficient is used as an objective criterion to measure the quality of segmentation. The results show that the proposed CG approach compares favorably with other variants of Hidden Markov Random Field segmentation algorithms.