韩国成年性午夜免费视频,精品人妻视频一区二区三区

Return-to-baseline is an important method to impute missing values or unobserved potential outcomes when certain hypothetical strategies are used to handle intercurrent events in clinical trials. Current return-to-baseline approaches seen in literature and in practice inflate the variability of the "complete" dataset after imputation and lead to biased mean estimators {when the probability of missingness depends on the observed baseline and/or postbaseline intermediate outcomes}. In this article, we first provide a set of criteria a return-to-baseline imputation method should satisfy. Under this framework, we propose a novel return-to-baseline imputation method. Simulations show the completed data after the new imputation approach have the proper distribution, and the estimators based on the new imputation method outperform the traditional method in terms of both bias and variance, when missingness depends on the observed values. The new method can be implemented easily with the existing multiple imputation procedures in commonly used statistical packages.

相關內容

缺失值

關注 0

缺失值 · INFORMS · 估計/估計量 · 標準差 · xgboost ·

2022 年 1 月 24 日

Imputing Missing Values in the Occupational Requirements Survey

Terry Leitch,Debjani Saha

from arxiv, A (preliminary) software package implementing our method and further downstream analyses is available on Github at //github.com/saharaja/imputeORS

The U.S. Bureau of Labor Statistics allows public access to much of the data acquired through its Occupational Requirements Survey (ORS). This data can be used to draw inferences about the requirements of various jobs and job classes within the United States workforce. However, the dataset contains a multitude of missing observations and estimates, which somewhat limits its utility. Here, we propose a method by which to impute these missing values that leverages many of the inherent features present in the survey data, such as known population limit and correlations between occupations and tasks. An iterative regression fit, implemented with a recent version of XGBoost and executed across a set of simulated values drawn from the distribution described by the known values and their standard deviations reported in the survey, is the approach used to arrive at a distribution of predicted values for each missing estimate. This allows us to calculate a mean prediction and bound said estimate with a 95% confidence interval. We discuss the use of our method and how the resulting imputations can be utilized to inform and pursue future areas of study stemming from the data collected in the ORS. Finally, we conclude with an outline of WIGEM, a generalized version of our weighted, iterative imputation algorithm that could be applied to other contexts.

Integration · Principle · 模型評估 · 近似 · MoDELS ·

2022 年 1 月 24 日

Small-Signal Stability Analysis of Numerical Integration Methods

Georgios Tzounas,Ioannis Dassios,Federico Milano

The paper provides a novel framework to study the accuracy and stability of numerical integration schemes when employed for the time domain simulation of power systems. A matrix pencil-based approach is adopted to evaluate the error between the dynamic modes of the power system and the modes of the approximated discrete-time system arising from the application of the numerical method. The proposed approach can provide meaningful insights on how different methods compare to each other when applied to a power system, while being general enough to be systematically utilized for, in principle, any numerical method. The framework is illustrated for a handful of well-known explicit and implicit methods, while simulation results are presented based on the WSCC 9-bus system, as well as on a 1, 479-bus dynamic model of the All-Island Irish Transmission System.

估計/估計量 · 穩健性 · 縮放 · Weight · 缺失值 ·

2022 年 1 月 23 日

Robust Wavelet-based Assessment of Scaling with Applications

Erin K. Hamilton,Seonghye Jeon,Pepa Ramirez Cobo,Kichun Sky Lee,Brani Vidakovic

from arxiv, 26 pages, 2 figures, 6 tables

A number of approaches have dealt with statistical assessment of self-similarity, and many of those are based on multiscale concepts. Most rely on certain distributional assumptions which are usually violated by real data traces, often characterized by large temporal or spatial mean level shifts, missing values or extreme observations. A novel, robust approach based on Theil-type weighted regression is proposed for estimating self-similarity in two-dimensional data (images). The method is compared to two traditional estimation techniques that use wavelet decompositions; ordinary least squares (OLS) and Abry-Veitch bias correcting estimator (AV). As an application, the suitability of the self-similarity estimate resulting from the the robust approach is illustrated as a predictive feature in the classification of digitized mammogram images as cancerous or non-cancerous. The diagnostic employed here is based on the properties of image backgrounds, which is typically an unused modality in breast cancer screening. Classification results show nearly 68% accuracy, varying slightly with the choice of wavelet basis, and the range of multiresolution levels used.

學成 · 表征學習 · Extensibility · Performer · state-of-the-art ·

2022 年 1 月 23 日

Deep Learning on Attributed Sequences

Zhongfang Zhuang

from arxiv, PhD thesis

Recent research in feature learning has been extended to sequence data, where each instance consists of a sequence of heterogeneous items with a variable length. However, in many real-world applications, the data exists in the form of attributed sequences, which is composed of a set of fixed-size attributes and variable-length sequences with dependencies between them. In the attributed sequence context, feature learning remains challenging due to the dependencies between sequences and their associated attributes. In this dissertation, we focus on analyzing and building deep learning models for four new problems on attributed sequences. Our extensive experiments on real-world datasets demonstrate that the proposed solutions significantly improve the performance of each task over the state-of-the-art methods on attributed sequences.

估計/估計量 · COVID-19 · Weight · 試驗 · CASES ·

2022 年 1 月 22 日

Estimation and Hypothesis Testing of Strain-Specific Vaccine Efficacy with Missing Strain Types, with Applications to a COVID-19 Vaccine Trial

Fei Heng,Yanqing Sun,Peter B. Gilbert

Statistical methods are developed for analysis of clinical and virus genetics data from phase 3 randomized, placebo-controlled trials of vaccines against novel coronavirus COVID-19. Vaccine efficacy (VE) of a vaccine to prevent COVID-19 caused by one of finitely many genetic strains of SARS-CoV-2 may vary by strain. The problem of assessing differential VE by viral genetics can be formulated under a competing risks model where the endpoint is virologically confirmed COVID-19 and the cause-of-failure is the infecting SARS-CoV-2 genotype. Strain-specific VE is defined as one minus the cause-specific hazard ratio (vaccine/placebo). For the COVID-19 VE trials, the time to COVID-19 is right-censored, and a substantial percentage of failure cases are missing the infecting virus genotype. We develop estimation and hypothesis testing procedures for strain-specific VE when the failure time is subject to right censoring and the cause-of-failure is subject to missingness, focusing on $J \ge 2$ discrete categorical unordered or ordered virus genotypes. The stratified Cox proportional hazards model is used to relate the cause-specific outcomes to explanatory variables. The inverse probability weighted complete-case (IPW) estimator and the augmented inverse probability weighted complete-case (AIPW) estimator are investigated. Hypothesis tests are developed to assess whether the vaccine provides at least a specified level of efficacy against some viral genotypes and whether VE varies across genotypes, adjusting for covariates. The finite-sample properties of the proposed tests are studied through simulations and are shown to have good performances. In preparation for the real data analyses, the developed methods are applied to a pseudo dataset mimicking the Moderna COVE trial.

估計/估計量 · 層 · Integration · GM · MoDELS ·

2022 年 1 月 21 日

Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models

Subhabrata Majumdar,George Michailidis

from arxiv, Journal of Machine Learning Research, 2022, //jmlr.org/papers/v23/18-131.html

The rapid development of high-throughput technologies has enabled the generation of data from biological or disease processes that span multiple layers, like genomic, proteomic or metabolomic data, and further pertain to multiple sources, like disease subtypes or experimental conditions. In this work, we propose a general statistical framework based on Gaussian graphical models for horizontal (i.e. across conditions or subtypes) and vertical (i.e. across different layers containing data on molecular compartments) integration of information in such datasets. We start with decomposing the multi-layer problem into a series of two-layer problems. For each two-layer problem, we model the outcomes at a node in the lower layer as dependent on those of other nodes in that layer, as well as all nodes in the upper layer. We use a combination of neighborhood selection and group-penalized regression to obtain sparse estimates of all model parameters. Following this, we develop a debiasing technique and asymptotic distributions of inter-layer directed edge weights that utilize already computed neighborhood selection coefficients for nodes in the upper layer. Subsequently, we establish global and simultaneous testing procedures for these edge weights. Performance of the proposed methodology is evaluated on synthetic and real data.

數據填補 · MoDELS · AIM · CASE · 多變量回歸 ·

2022 年 1 月 20 日

Evaluation of data imputation strategies in complex, deeply-phenotyped data sets: the case of the EU-AIMS Longitudinal European Autism Project

A. Llera,M. Brammer,B. Oakley,J. Tillmann,M. Zabihi,T. Mei,T. Charman,C. Ecker,F. Dell Acqua,T. Banaschewski,C. Moessnang,S. Baron-Cohen,R. Holt,S. Durston,D. Murphy,E. Loth,J. K. Buitelaar,D. L. Floris,C. F. Beckmann

from arxiv, 22 pages, 3 figures, 3 tables

An increasing number of large-scale multi-modal research initiatives has been conducted in the typically developing population, as well as in psychiatric cohorts. Missing data is a common problem in such datasets due to the difficulty of assessing multiple measures on a large number of participants. The consequences of missing data accumulate when researchers aim to explore relationships between multiple measures. Here we aim to evaluate different imputation strategies to fill in missing values in clinical data from a large (total N=764) and deeply characterised (i.e. range of clinical and cognitive instruments administered) sample of N=453 autistic individuals and N=311 control individuals recruited as part of the EU-AIMS Longitudinal European Autism Project (LEAP) consortium. In particular we consider a total of 160 clinical measures divided in 15 overlapping subsets of participants. We use two simple but common univariate strategies, mean and median imputation, as well as a Round Robin regression approach involving four independent multivariate regression models including a linear model, Bayesian Ridge regression, as well as several non-linear models, Decision Trees, Extra Trees and K-Neighbours regression. We evaluate the models using the traditional mean square error towards removed available data, and consider in addition the KL divergence between the observed and the imputed distributions. We show that all of the multivariate approaches tested provide a substantial improvement compared to typical univariate approaches. Further, our analyses reveal that across all 15 data-subsets tested, an Extra Trees regression approach provided the best global results. This allows the selection of a unique model to impute missing data for the LEAP project and deliver a fixed set of imputed clinical data to be used by researchers working with the LEAP dataset in the future.

有偏 · Performer · Facebook AI Research · 可辨認的 · 訓練數據 ·

2022 年 1 月 19 日

Investigating underdiagnosis of AI algorithms in the presence of multiple sources of dataset bias

Melanie Bernhardt,Charles Jones,Ben Glocker

from arxiv, This commentary was submitted as Matters Arising to Nature Medicine on 20 December 2021

Deep learning models have shown great potential for image-based diagnosis assisting clinical decision making. At the same time, an increasing number of reports raise concerns about the potential risk that machine learning could amplify existing health disparities due to human biases that are embedded in the training data. It is of great importance to carefully investigate the extent to which biases may be reproduced or even amplified if we wish to build fair artificial intelligence systems. Seyyed-Kalantari et al. advance this conversation by analysing the performance of a disease classifier across population subgroups. They raise performance disparities related to underdiagnosis as a point of concern; we identify areas from this analysis which we believe deserve additional attention. Specifically, we wish to highlight some theoretical and practical difficulties associated with assessing model fairness through testing on data drawn from the same biased distribution as the training data, especially when the sources and amount of biases are unknown.

模型復雜度 · MoDELS · 深度學習 · 泛化理論 · 模型選擇 ·

2021 年 3 月 8 日

Model Complexity of Deep Learning: A Survey

Xia Hu,Lingyang Chu,Jian Pei,Weiqing Liu,Jiang Bian

Model complexity is a fundamental problem in deep learning. In this paper we conduct a systematic overview of the latest studies on model complexity in deep learning. Model complexity of deep learning can be categorized into expressive capacity and effective model complexity. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process and data complexity. We also discuss the applications of deep learning model complexity including understanding model generalization capability, model optimization, and model selection and design. We conclude by proposing several interesting future directions.

學成 · Processing（編程語言） · 目標函數 · 增廣拉格朗日法 · 泛函 ·

2019 年 3 月 25 日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Zonghao Huang,Rui Hu,Yuanxiong Guo,Eric Chan-Tin,Yanmin Gong

from arxiv, under revision

Alternating Direction Method of Multipliers (ADMM) is a widely used tool for machine learning in distributed settings, where a machine learning model is trained over distributed data sources through an interactive process of local computation and message passing. Such an iterative process could cause privacy concerns of data owners. The goal of this paper is to provide differential privacy for ADMM-based distributed machine learning. Prior approaches on differentially private ADMM exhibit low utility under high privacy guarantee and often assume the objective functions of the learning problems to be smooth and strongly convex. To address these concerns, we propose a novel differentially private ADMM-based distributed learning algorithm called DP-ADMM, which combines an approximate augmented Lagrangian function with time-varying Gaussian noise addition in the iterative process to achieve higher utility for general objective functions under the same differential privacy guarantee. We also apply the moments accountant method to bound the end-to-end privacy loss. The theoretical analysis shows that DP-ADMM can be applied to a wider class of distributed learning problems, is provably convergent, and offers an explicit utility-privacy tradeoff. To our knowledge, this is the first paper to provide explicit convergence and utility properties for differentially private ADMM-based distributed learning algorithms. The evaluation results demonstrate that our approach can achieve good convergence and model accuracy under high end-to-end differential privacy guarantee.