成年人日屄视频免费观看,99热日韩这里只有国产中文精品

Background/aims: While randomized controlled trials are the gold standard for measuring causal effects, robust conclusions about causal relationships can be obtained using data from observational studies if proper statistical techniques are used to account for the imbalance of pretreatment confounders across groups. Propensity score (PS) and balance weighting are useful techniques that aim to reduce the observed imbalances between treatment groups by weighting the groups to be as similar as possible with respect to observed confounders. Methods: We have created CoBWeb, a free and easy-to-use web application for the estimation of causal treatment effects from observational data, using PS and balancing weights to control for confounding bias. CoBWeb uses multiple algorithms to estimate the PS and balancing weights, to allow for more flexible relations between the treatment indicator and the observed confounders (as different algorithms make different (or no) assumptions about the structural relationship between the treatment covariate and the confounders). The optimal algorithm can be chosen by selecting the one that achieves the best trade-off between balance and effective sample size. Results: CoBWeb follows all the key steps required for robust estimation of the causal treatment effect from observational study data and includes sensitivity analysis of the potential impact of unobserved confounders. We illustrate the practical use of the app using a dataset derived from a study of an intervention for adolescents with substance use disorder, which is available for users within the app environment. Conclusion: CoBWeb is intended to enable non-specialists to understand and apply all the key steps required to perform robust estimation of causal treatment effects using observational data.

相關內容

估計/估計量

關注 3

估計/估計量 · Continuity · COVID-19 · 有偏 · 推斷 ·

2022 年 2 月 7 日

Causal survival analysis under competing risks using longitudinal modified treatment policies

Iván Díaz,Katherine L Hoffman,Nima S. Hejazi

Longitudinal modified treatment policies (LMTP) have been recently developed as a novel method to define and estimate causal parameters that depend on the natural value of treatment. LMTPs represent an important advancement in causal inference for longitudinal studies as they allow the non-parametric definition and estimation of the joint effect of multiple categorical, numerical, or continuous exposures measured at several time points. We extend the LMTP methodology to problems in which the outcome is a time-to-event variable subject to right-censoring and competing risks. We present identification results and non-parametric locally efficient estimators that use flexible data-adaptive regression techniques to alleviate model misspecification bias, while retaining important asymptotic properties such as $\sqrt{n}$-consistency. We present an application to the estimation of the effect of the time-to-intubation on acute kidney injury amongst COVID-19 hospitalized patients, where death by other causes is taken to be the competing event.

MoDELS · 估計/估計量 · 基準 · Networking · INFORMS ·

2022 年 2 月 7 日

A two-stage prediction model for heterogeneous effects of many treatment options: application to drugs for Multiple Sclerosis

Konstantina Chalkou,Ewout Steyerberg,Matthias Egger,Andrea Manca,Fabio Pellegrini,Georgia Salanti

Treatment effects vary across different patients and estimation of this variability is important for clinical decisions. The aim is to develop a model to estimate the benefit of alternative treatment options for individual patients. Hence, we developed a two-stage prediction model for heterogeneous treatment effects, by combining prognosis research and network meta-analysis methods when individual patient data is available. In a first stage, we develop a prognostic model and we predict the baseline risk of the outcome. In the second stage, we use this baseline risk score from the first stage as a single prognostic factor and effect modifier in a network meta-regression model. We apply the approach to a network meta-analysis of three randomized clinical trials comparing the relapse rate in Natalizumab, Glatiramer Acetate and Dimethyl Fumarate including 3590 patients diagnosed with relapsing-remitting multiple sclerosis. We find that the baseline risk score modifies the relative and absolute treatment effects. Several patient characteristics such as age and disability status impact on the baseline risk of relapse, and this in turn moderates the benefit that may be expected for each of the treatments. For high-risk patients, the treatment that minimizes the risk to relapse in two years is Natalizumab, whereas for low-risk patients Dimethyl Fumarate Fumarate might be a better option. Our approach can be easily extended to all outcomes of interest and has the potential to inform a personalised treatment approach.

估計/估計量 · 學成 · Performer · MoDELS · 訓練數據 ·

2022 年 2 月 6 日

Learning to be a Statistician: Learned Estimator for Number of Distinct Values

Renzhi Wu,Bolin Ding,Xu Chu,Zhewei Wei,Xiening Dai,Tao Guan,Jingren Zhou

from arxiv, Published at International Conference on Very Large Data Bases (VLDB) 2022

Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems, such as columnstore compression and data profiling. In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples. Such efficient estimation is critical for tasks where it is prohibitive to scan the data even once. Existing sample-based estimators typically rely on heuristics or assumptions and do not have robust performance across different datasets as the assumptions on data can easily break. On the other hand, deriving an estimator from a principled formulation such as maximum likelihood estimation is very challenging due to the complex structure of the formulation. We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator. To this end, we need to answer several questions: i) how to make the learned model workload agnostic; ii) how to obtain training data; iii) how to perform model training. We derive conditions of the learning framework under which the learned model is workload agnostic, in the sense that the model/estimator can be trained with synthetically generated training data, and then deployed into any data warehouse simply as, e.g., user-defined functions (UDFs), to offer efficient (within microseconds on CPU) and accurate NDV estimations for unseen tables and workloads. We compare the learned estimator with the state-of-the-art sample-based estimators on nine real-world datasets to demonstrate its superior estimation accuracy. We publish our code for training data generation, model training, and the learned estimator online for reproducibility.

AIM · 可約的 · MoDELS · SimPLe · 統計量 ·

2022 年 2 月 5 日

K-2 rotated goodness-of-fit for multivariate data

Sara Algeri

Consider a set of multivariate distributions, $F_1,\dots,F_M$, aiming to explain the same phenomenon. For instance, each $F_m$ may correspond to a different candidate background model for calibration data, or to one of many possible signal models we aim to validate on experimental data. In this article, we show that tests for a wide class of apparently different models $F_{m}$ can be mapped into a single test for a reference distribution $Q$. As a result, valid inference for each $F_m$ can be obtained by simulating \underline{only} the distribution of the test statistic under $Q$. Furthermore, $Q$ can be chosen conveniently simple to substantially reduce the computational time.

Extensibility · 泛化理論 · 離散化 · MoDELS · Use Case ·

2022 年 2 月 4 日

Generalized Causal Tree for Uplift Modeling

Preetam Nandy,Xiufan Yu,Wanjun Liu,Ye Tu,Kinjal Basu,Shaunak Chatterjee

Uplift modeling is crucial in various applications ranging from marketing and policy-making to personalized recommendations. The main objective is to learn optimal treatment allocations for a heterogeneous population. A primary line of existing work modifies the loss function of the decision tree algorithm to identify cohorts with heterogeneous treatment effects. Another line of work estimates the individual treatment effects separately for the treatment group and the control group using off-the-shelf supervised learning algorithms. The former approach that directly models the heterogeneous treatment effect is known to outperform the latter in practice. However, the existing tree-based methods are mostly limited to a single treatment and a single control use case, except for a handful of extensions to multiple discrete treatments. In this paper, we fill this gap in the literature by proposing a generalization to the tree-based approaches to tackle multiple discrete and continuous-valued treatments. We focus on a generalization of the well-known causal tree algorithm due to its desirable statistical properties, but our generalization technique can be applied to other tree-based approaches as well. We perform extensive experiments to showcase the efficacy of our method when compared to other methods.

推斷 · Performer · 端到端 · Continuity · 估計/估計量 ·

2022 年 2 月 4 日

Deep End-to-end Causal Inference

Tomas Geffner,Javier Antoran,Adam Foster,Wenbo Gong,Chao Ma,Emre Kiciman,Amit Sharma,Angus Lamb,Martin Kukla,Nick Pawlowski,Miltiadis Allamanis,Cheng Zhang

Causal inference is essential for data-driven decision making across domains such as business engagement, medical treatment or policy making. However, research on causal discovery and inference has evolved separately, and the combination of the two domains is not trivial. In this work, we develop Deep End-to-end Causal Inference (DECI), a single flow-based method that takes in observational data and can perform both causal discovery and inference, including conditional average treatment effect (CATE) estimation. We provide a theoretical guarantee that DECI can recover the ground truth causal graph under mild assumptions. In addition, our method can handle heterogeneous, real-world, mixed-type data with missing values, allowing for both continuous and discrete treatment decisions. Moreover, the design principle of our method can generalize beyond DECI, providing a general End-to-end Causal Inference (ECI) recipe, which enables different ECI frameworks to be built using existing methods. Our results show the superior performance of DECI when compared to relevant baselines for both causal discovery and (C)ATE estimation in over a thousand experiments on both synthetic datasets and other causal machine learning benchmark datasets.

MoDELS · Better · 估計/估計量 · 閾值 · Extensibility ·

2022 年 2 月 4 日

Decision curve analysis for personalized treatment choice between multiple options

Konstantina Chalkou,Andrew J. Vickers,Fabio Pellegrini,Andrea Manca,Georgia Salanti

Decision curve analysis can be used to determine whether a personalized model for treatment benefit would lead to better clinical decisions. Decision curve analysis methods have been described to estimate treatment benefit using data from a single RCT. Our main objective is to extend the decision curve analysis methodology to the scenario where several treatment options exist and evidence about their effects comes from a set of trials, synthesized using network meta-analysis (NMA). We describe the steps needed to estimate the net benefit of a prediction model using evidence from studies synthesized in an NMA. We show how to compare personalized versus one-size-fit-all treatment decision-making strategies, like "treat none" or "treat all patients with a specific treatment" strategies. The net benefit per strategy can then be plotted for a plausible range of threshold probabilities to reveal the most clinically useful strategy. We applied our methodology to an NMA prediction model for relapsing-remitting multiple sclerosis, which can be used to choose between Natalizumab, Dimethyl Fumarate, Glatiramer Acetate, and placebo. We illustrated the extended decision curve analysis methodology using several threshold values combinations for each available treatment. For the examined threshold values, the "treat patients according to the prediction model" strategy performs either better than or close to the one-size-fit-all treatment strategies. However, even small differences may be important in clinical decision-making. As the advantage of the personalized model was not consistent across all thresholds, an improved model may be needed before advocating its applicability for decision-making. This novel extension of decision curve analysis can be applied to NMA based prediction models to evaluate their use to aid treatment decision-making.

估計/估計量 · INFORMS · 有偏 · 可辨認的 · Performer ·

2022 年 2 月 4 日

To Impute or not to Impute? -- Missing Data in Treatment Effect Estimation

Jeroen Berrevoets,Fergus Imrie,Trent Kyono,James Jordon,Mihaela van der Schaar

Missing data is a systemic problem in practical scenarios that causes noise and bias when estimating treatment effects. This makes treatment effect estimation from data with missingness a particularly tricky endeavour. A key reason for this is that standard assumptions on missingness are rendered insufficient due to the presence of an additional variable, treatment, besides the individual and the outcome. Having a treatment variable introduces additional complexity with respect to why some variables are missing that is not fully explored by previous work. In our work we identify a new missingness mechanism, which we term mixed confounded missingness (MCM), where some missingness determines treatment selection and other missingness is determined by treatment selection. Given MCM, we show that naively imputing all data leads to poor performing treatment effects models, as the act of imputation effectively removes information necessary to provide unbiased estimates. However, no imputation at all also leads to biased estimates, as missingness determined by treatment divides the population in distinct subpopulations, where estimates across these populations will be biased. Our solution is selective imputation, where we use insights from MCM to inform precisely which variables should be imputed and which should not. We empirically demonstrate how various learners benefit from selective imputation compared to other solutions for missing data.

估計/估計量 · 變換 · Extensibility · 有偏 · 歸納偏好 ·

2022 年 2 月 4 日

Can Transformers be Strong Treatment Effect Estimators?

Yi-Fan Zhang,Hanlin Zhang,Zachary C. Lipton,Li Erran Li,Eric P. Xing

from arxiv, Technical Report. The first two authors contributed equally to this work

In this paper, we develop a general framework based on the Transformer architecture to address a variety of challenging treatment effect estimation (TEE) problems. Our methods are applicable both when covariates are tabular and when they consist of sequences (e.g., in text), and can handle discrete, continuous, structured, or dosage-associated treatments. While Transformers have already emerged as dominant methods for diverse domains, including natural language and computer vision, our experiments with Transformers as Treatment Effect Estimators (TransTEE) demonstrate that these inductive biases are also effective on the sorts of estimation problems and datasets that arise in research aimed at estimating causal effects. Moreover, we propose a propensity score network that is trained with TransTEE in an adversarial manner to promote independence between covariates and treatments to further address selection bias. Through extensive experiments, we show that TransTEE significantly outperforms competitive baselines with greater parameter efficiency over a wide range of benchmarks and settings.

估計/估計量 · 圖 · 學成 · 連續優化 · 有向非循環圖 ·

2021 年 11 月 3 日

Multi-task Learning of Order-Consistent Causal Graphs

Xinshi Chen,Haoran Sun,Caleb Ellington,Eric Xing,Le Song

from arxiv, 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.