2020久久精品亚洲热综合_日本成年黄色一区二区三区_亚洲AV日韩AV天堂影片人人网_成人免费在线国产_欧美视频精品二区_色妞网欧美欧美黄色A级_国产精品久久一区二区三区无码

from arxiv, 86 pages (28 of supplementary appendices and references), 5 figures. Updated after PhD viva comments. arXiv admin note: text overlap with arXiv:2004.14800

Population adjustment methods such as matching-adjusted indirect comparison (MAIC) are increasingly used to compare marginal treatment effects when there are cross-trial differences in effect modifiers and limited patient-level data. MAIC is sensitive to poor covariate overlap and cannot extrapolate beyond the observed covariate space. Current outcome regression-based alternatives can extrapolate but target a conditional treatment effect that is incompatible in the indirect comparison. When adjusting for covariates, one must integrate or average the conditional estimate over the population of interest to recover a compatible marginal treatment effect. We propose a marginalization method based on parametric G-computation that can be easily applied where the outcome regression is a generalized linear model or a Cox model. In addition, we introduce a novel general-purpose method based on multiple imputation, which we term multiple imputation marginalization (MIM) and is applicable to a wide range of models. Both methods can accommodate a Bayesian statistical framework, which naturally integrates the analysis into a probabilistic framework. A simulation study provides proof-of-principle for the methods and benchmarks their performance against MAIC and the conventional outcome regression. The marginalized outcome regression approaches achieve more precise and more accurate estimates than MAIC, particularly when covariate overlap is poor, and yield unbiased marginal treatment effect estimates under no failures of assumptions. Furthermore, the marginalized covariate-adjusted estimates provide greater precision and accuracy than the conditional estimates produced by the conventional outcome regression, which are systematically biased because the measure of effect is non-collapsible.

相關內容

估計(ji)/估計(ji)量

關注 3

廣義線性模型 · 規范化的 · 線性模型 · MoDELS · 線性的 ·

2021 年 12 月 29 日

BayesPPD: An R Package for Bayesian Sample Size Determination Using the Power and Normalized Power Prior for Generalized Linear Models

Yueqi Shen,Matthew A. Psioda,Joseph G. Ibrahim

from arxiv, 28 pages, 1 figure

The R package BayesPPD (Bayesian Power Prior Design) supports Bayesian power and type I error calculation and model fitting after incorporating historical data with the power prior and the normalized power prior for generalized linear models (GLM). The package accommodates summary level data or subject level data with covariate information. It supports use of multiple historical datasets as well as design without historical data. Supported distributions for responses include normal, binary (Bernoulli/binomial), Poisson and exponential. The power parameter $a_0$ can be fixed or modeled as random using a normalized power prior for each of these distributions. In addition, the package supports the use of arbitrary sampling priors for computing Bayesian power and type I error rates, and has specific features for GLMs that semi-automatically generate sampling priors from historical data. Since sample size determination (SSD) for GLMs is computationally intensive, an approximation method based on asymptotic theory has been implemented to support applications using the power prior. In addition to describing the statistical methodology and functions implemented in the package to enable SSD, we also demonstrate the use of BayesPPD in two comprehensive case studies.

對數幾率回歸 · 可約的 · 欠估計 · 代價 · 樣本方差 ·

2021 年 12 月 29 日

Scalable logistic regression with crossed random effects

Swarnadip Ghosh,Trevor Hastie,Art B. Owen

from arxiv, 32 pages, 5 figures

The cost of both generalized least squares (GLS) and Gibbs sampling in a crossed random effects model can easily grow faster than $N^{3/2}$ for $N$ observations. Ghosh et al. (2020) develop a backfitting algorithm that reduces the cost to $O(N)$. Here we extend that method to a generalized linear mixed model for logistic regression. We use backfitting within an iteratively reweighted penalized least square algorithm. The specific approach is a version of penalized quasi-likelihood due to Schall (1991). A straightforward version of Schall's algorithm would also cost more than $N^{3/2}$ because it requires the trace of the inverse of a large matrix. We approximate that quantity at cost $O(N)$ and prove that this substitution makes an asymptotically negligible difference. Our backfitting algorithm also collapses the fixed effect with one random effect at a time in a way that is analogous to the collapsed Gibbs sampler of Papaspiliopoulos et al. (2020). We use a symmetric operator that facilitates efficient covariance computation. We illustrate our method on a real dataset from Stitch Fix. By properly accounting for crossed random effects we show that a naive logistic regression could underestimate sampling variances by several hundred fold.

預測器/決策函數 · BART · 可辨認的 · Continuity · INTERACT ·

2021 年 12 月 28 日

Variable Selection Using Bayesian Additive Regression Trees

Chuji Luo,Michael J. Daniels

from arxiv, 40 pages, 13 figures

Variable selection is an important statistical problem. This problem becomes more challenging when the candidate predictors are of mixed type (e.g. continuous and binary) and impact the response variable in nonlinear and/or non-additive ways. In this paper, we review existing variable selection approaches for the Bayesian additive regression trees (BART) model, a nonparametric regression model, which is flexible enough to capture the interactions between predictors and nonlinear relationships with the response. An emphasis of this review is on the capability of identifying relevant predictors. We also propose two variable importance measures which can be used in a permutation-based variable selection approach, and a backward variable selection procedure for BART. We present simulations demonstrating that our approaches exhibit improved performance in terms of the ability to recover all the relevant predictors in a variety of data settings, compared to existing BART-based variable selection methods.

估計/估計量 · 可約的 · MoDELS · 試驗 · 共線性 ·

2021 年 12 月 28 日

Analysis of N-of-1 trials using Bayesian distributed lag model with autocorrelated errors

Ziwei Liao,Min Qian,Ian M. Kronish,Ying Kuen Cheung

An N-of-1 trial is a multi-period crossover trial performed in a single individual, with a primary goal to estimate treatment effect on the individual instead of population-level mean responses. As in a conventional crossover trial, it is critical to understand carryover effects of the treatment in an N-of-1 trial, especially when no washout periods between treatment periods are instituted to reduce trial duration. To deal with this issue in situations where high volume of measurements is made during the study, we introduce a novel Bayesian distributed lag model that facilitates the estimation of carryover effects, while accounting for temporal correlations using an autoregressive model. Specifically, we propose a prior variance-covariance structure on the lag coefficients to address collinearity caused by the fact that treatment exposures are typically identical on successive days. A connection between the proposed Bayesian model and penalized regression is noted. Simulation results demonstrate that the proposed model substantially reduces the root mean squared error in the estimation of carryover effects and immediate effects when compared to other existing methods, while being comparable in the estimation of the total effects. We also apply the proposed method to assess the extent of carryover effects of light therapies in relieving depressive symptoms in cancer survivors.

預測器/決策函數 · 估計/估計量 · 方陣 · 向量化 · Conformer ·

2021 年 12 月 28 日

Concurrent Object Regression

Satarupa Bhattacharjee,Hans-Georg Mueller

Modern-day problems in statistics often face the challenge of exploring and analyzing complex non-Euclidean object data that do not conform to vector space structures or operations. Examples of such data objects include covariance matrices, graph Laplacians of networks, and univariate probability distribution functions. In the current contribution a new concurrent regression model is proposed to characterize the time-varying relation between an object in a general metric space (as a response) and a vector in $\reals^p$ (as a predictor), where concepts from Fr\'echet regression is employed. Concurrent regression has been a well-developed area of research for Euclidean predictors and responses, with many important applications for longitudinal studies and functional data. However, there is no such model available so far for general object data as responses. We develop generalized versions of both global least squares regression and locally weighted least squares smoothing in the context of concurrent regression for responses that are situated in general metric spaces and propose estimators that can accommodate sparse and/or irregular designs. Consistency results are demonstrated for sample estimates of appropriate population targets along with the corresponding rates of convergence. The proposed models are illustrated with human mortality data and resting state functional Magnetic Resonance Imaging data (fMRI) as responses.

Facebook AI Research · MoDELS · 樣例 · 可辨認的 · 訓練數據 ·

2020 年 12 月 21 日

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

Naman Goel,Alfonso Amayuelas,Amit Deshpande,Amit Sharma

from arxiv, To appear in the Proceedings of AAAI 2021

Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those who were not. This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed. Using causal graphs, we characterize the missingness mechanisms in different real-world scenarios. We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data. Our theoretical results imply that many of these algorithms can not guarantee fairness in practice. Modeling missingness also helps to identify correct design principles for fair algorithms. For example, in multi-stage settings where decisions are made in multiple screening rounds, we use our framework to derive the minimal distributions required to design a fair algorithm. Our proposed algorithm decentralizes the decision-making process and still achieves similar performance to the optimal algorithm that requires centralization and non-recoverable distributions.

估計/估計量 · 估計誤差 · MoDELS · 學成 · 無偏 ·

2020 年 12 月 17 日

The Causal Learning of Retail Delinquency

Yiyan Huang,Cheuk Hang Leung,Xing Yan,Qi Wu,Nanbo Peng,Dongdong Wang,Zhixiang Huang

from arxiv, This paper was accepted and will be published in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

有向非循環圖 · 圖 · 自編碼器 · 變分自編碼 · 有向 ·

2019 年 5 月 30 日

D-VAE: A Variational Autoencoder for Directed Acyclic Graphs

Muhan Zhang,Shali Jiang,Zhicheng Cui,Roman Garnett,Yixin Chen

Graph structured data are abundant in the real world. Among different graph types, directed acyclic graphs (DAGs) are of particular interest to machine learning researchers, as many machine learning models are realized as computations on DAGs, including neural networks and Bayesian networks. In this paper, we study deep generative models for DAGs, and propose a novel DAG variational autoencoder (D-VAE). To encode DAGs into the latent space, we leverage graph neural networks. We propose an asynchronous message passing scheme that allows encoding the computations on DAGs, rather than using existing simultaneous message passing schemes to encode local graph structures. We demonstrate the effectiveness of our proposed D-VAE through two tasks: neural architecture search and Bayesian network structure learning. Experiments show that our model not only generates novel and valid DAGs, but also produces a smooth latent space that facilitates searching for DAGs with better performance through Bayesian optimization.

模型評估 · Performer · 數據增強 · 生成式對抗網絡 · Networking ·

2019 年 2 月 8 日

Pixel Level Data Augmentation for Semantic Image Segmentation using Generative Adversarial Networks

Shuangting Liu,Jiaqi Zhang,Yuxin Chen,Yifan Liu,Zengchang Qin,Tao Wan

from arxiv, 5 pages

Semantic segmentation is one of the basic topics in computer vision, it aims to assign semantic labels to every pixel of an image. Unbalanced semantic label distribution could have a negative influence on segmentation accuracy. In this paper, we investigate using data augmentation approach to balance the semantic label distribution in order to improve segmentation performance. We propose using generative adversarial networks (GANs) to generate realistic images for improving the performance of semantic segmentation networks. Experimental results show that the proposed method can not only improve segmentation performance on those classes with low accuracy, but also obtain 1.3% to 2.1% increase in average segmentation accuracy. It shows that this augmentation method can boost accuracy and be easily applicable to any other segmentation models.

隨機森林 · 估計/估計量 · 秩 · TEAM · MoDELS ·

2018 年 6 月 13 日

Prediction of the FIFA World Cup 2018 - A random forest approach with an emphasis on estimated team ability parameters

Andreas Groll,Christophe Ley,Gunther Schauberger,Hans Van Eetvelde

from arxiv, First revised version, corrected typo in introduction when referring to the winning probabilities derived by Zeileis, Leitner, and Hornik (2018), which are for Germany 15.8% instead of 12.8%. Second revised version, slight changes in notation in Section 3.3

In this work, we compare three different modeling approaches for the scores of soccer matches with regard to their predictive performances based on all matches from the four previous FIFA World Cups 2002 - 2014: Poisson regression models, random forests and ranking methods. While the former two are based on the teams' covariate information, the latter method estimates adequate ability parameters that reflect the current strength of the teams best. Within this comparison the best-performing prediction methods on the training data turn out to be the ranking methods and the random forests. However, we show that by combining the random forest with the team ability parameters from the ranking methods as an additional covariate we can improve the predictive power substantially. Finally, this combination of methods is chosen as the final model and based on its estimates, the FIFA World Cup 2018 is simulated repeatedly and winning probabilities are obtained for all teams. The model slightly favors Spain before the defending champion Germany. Additionally, we provide survival probabilities for all teams and at all tournament stages as well as the most probable tournament outcome.