高清国产三级在线播放-在线成人免费影片

Randomized controlled trials (RCTs) are increasingly prevalent in education research, and are often regarded as a gold standard of causal inference. Two main virtues of randomized experiments are that they (1) do not suffer from confounding, thereby allowing for an unbiased estimate of an intervention's causal impact, and (2) allow for design-based inference, meaning that the physical act of randomization largely justifies the statistical assumptions made. However, RCT sample sizes are often small, leading to low precision; in many cases RCT estimates may be too imprecise to guide policy or inform science. Observational studies, by contrast, have strengths and weaknesses complementary to those of RCTs. Observational studies typically offer much larger sample sizes, but may suffer confounding. In many contexts, experimental and observational data exist side by side, allowing the possibility of integrating "big observational data" with "small but high-quality experimental data" to get the best of both. Such approaches hold particular promise in the field of education, where RCT sample sizes are often small due to cost constraints, but automatic collection of observational data, such as in computerized educational technology applications, or in state longitudinal data systems (SLDS) with administrative data on hundreds of thousand of students, has made rich, high-dimensional observational data widely available. We outline an approach that allows one to employ machine learning algorithms to learn from the observational data, and use the resulting models to improve precision in randomized experiments. Importantly, there is no requirement that the machine learning models are "correct" in any sense, and the final experimental results are guaranteed to be exactly unbiased. Thus, there is no danger of confounding biases in the observational data leaking into the experiment.

相關內容

查準率/準確率

關注 0

機器學習 · 測量技術 · 論文 · 潛在 · 數字孿生 ·

2023 年 3 月 29 日

The transformative potential of machine learning for experiments in fluid mechanics

Ricardo Vinuesa,Steven L. Brunton,Beverley J. McKeon

The field of machine learning has rapidly advanced the state of the art in many fields of science and engineering, including experimental fluid dynamics, which is one of the original big-data disciplines. This perspective will highlight several aspects of experimental fluid mechanics that stand to benefit from progress advances in machine learning, including: 1) augmenting the fidelity and quality of measurement techniques, 2) improving experimental design and surrogate digital-twin models and 3) enabling real-time estimation and control. In each case, we discuss recent success stories and ongoing challenges, along with caveats and limitations, and outline the potential for new avenues of ML-augmented and ML-enabled experimental fluid mechanics.

網絡韌性 · 網絡攻擊 · 攻擊 · 系統 · 定量 ·

2023 年 3 月 28 日

Quantitative Measurement of Cyber Resilience: Modeling and Experimentation

Michael J. Weisman,Alexander Kott,Jason E. Ellis,Brian J. Murphy,Travis W. Parker,Sidney Smith,Joachim Vandekerckhove

from arxiv, arXiv admin note: text overlap with arXiv:2302.04413, arXiv:2302.07941

Cyber resilience is the ability of a system to resist and recover from a cyber attack, thereby restoring the system's functionality. Effective design and development of a cyber resilient system requires experimental methods and tools for quantitative measuring of cyber resilience. This paper describes an experimental method and test bed for obtaining resilience-relevant data as a system (in our case -- a truck) traverses its route, in repeatable, systematic experiments. We model a truck equipped with an autonomous cyber-defense system and which also includes inherent physical resilience features. When attacked by malware, this ensemble of cyber-physical features (i.e., "bonware") strives to resist and recover from the performance degradation caused by the malware's attack. We propose parsimonious mathematical models to aid in quantifying systems' resilience to cyber attacks. Using the models, we identify quantitative characteristics obtainable from experimental data, and show that these characteristics can serve as useful quantitative measures of cyber resilience.

異質性 · 異質 · 試驗 · 非參數 · 開放領域 ·

2023 年 3 月 28 日

Methods for Integrating Trials and Non-Experimental Data to Examine Treatment Effect Heterogeneity

Carly Lupton Brantner,Ting-Hsuan Chang,Trang Quynh Nguyen,Hwanhee Hong,Leon Di Stefano,Elizabeth A. Stuart

Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials and/or observational datasets. With many new methods available for assessing treatment effect heterogeneity using multiple studies, it is important to understand which methods are best used in which setting, how the methods compare to one another, and what needs to be done to continue progress in this field. This paper reviews these methods broken down by data setting: aggregate-level data, federated learning, and individual participant-level data. We define the conditional average treatment effect and discuss differences between parametric and nonparametric estimators, and we list key assumptions, both those that are required within a single study and those that are necessary for data combination. After describing existing approaches, we compare and contrast them and reveal open areas for future research. This review demonstrates that there are many possible approaches for estimating treatment effect heterogeneity through the combination of datasets, but that there is substantial work to be done to compare these methods through case studies and simulations, extend them to different settings, and refine them to account for various challenges present in real data.

參數推理 · 參數化 · 算法 · 神經元 · 近似 ·

2023 年 3 月 28 日

Simulation-based Inference for Model Parameterization on Analog Neuromorphic Hardware

Jakob Kaiser,Raphael Stock,Eric Müller,Johannes Schemmel,Sebastian Schmitt

The BrainScaleS-2 (BSS-2) system implements physical models of neurons as well as synapses and aims for an energy-efficient and fast emulation of biological neurons. When replicating neuroscientific experiment results, a major challenge is finding suitable model parameters. This study investigates the suitability of the sequential neural posterior estimation (SNPE) algorithm for parameterizing a multi-compartmental neuron model emulated on the BSS-2 analog neuromorphic hardware system. In contrast to other optimization methods such as genetic algorithms or stochastic searches, the SNPE algorithms belongs to the class of approximate Bayesian computing (ABC) methods and estimates the posterior distribution of the model parameters; access to the posterior allows classifying the confidence in parameter estimations and unveiling correlation between model parameters. In previous applications, the SNPE algorithm showed a higher computational efficiency than traditional ABC methods. For our multi-compartmental model, we show that the approximated posterior is in agreement with experimental observations and that the identified correlation between parameters is in agreement with theoretical expectations. Furthermore, we show that the algorithm can deal with high-dimensional observations and parameter spaces. These results suggest that the SNPE algorithm is a promising approach for automating the parameterization of complex models, especially when dealing with characteristic properties of analog neuromorphic substrates, such as trial-to-trial variations or limited parameter ranges.

可穿戴傳感器 · 蒙特卡羅方法 · 蒙特卡羅 · 傳感 · 傳感器 ·

2023 年 3 月 27 日

Model-Twin Randomization (MoTR): A Monte Carlo Method for Estimating the Within-Individual Average Treatment Effect Using Wearable Sensors

Eric J. Daza,Logan Schneider

from arxiv, 27 pages, 2 figures, 5 tables; appendix included

Temporally dense single-person "small data" have become widely available thanks to mobile apps and wearable sensors. Many caregivers and self-trackers want to use these data to help a specific person change their behavior to achieve desired health outcomes. Ideally, this involves discerning possible causes from correlations using that person's own observational time series data. In this paper, we estimate within-individual average treatment effects of physical activity on sleep duration, and vice-versa. We introduce the model twin randomization (MoTR; "motor") method for analyzing an individual's intensive longitudinal data. Formally, MoTR is an application of the g-formula (i.e., standardization, back-door adjustment) under serial interference. It estimates stable recurring effects, as is done in n-of-1 trials and single case experimental designs. We compare our approach to standard methods (with possible confounding) to show how to use causal inference to make better personalized recommendations for health behavior change, and analyze 222 days of Fitbit sleep and steps data for one of the authors.

污染 · 一致 · 知識集成 · 專家知識 · 集成 ·

2023 年 3 月 27 日

Expert Kaplan--Meier estimation

Martin Bladt,Christian Furrer

The setting of a right-censored random sample subject to contamination is considered. In various fields, expert information is often available and used to overcome the contamination. This paper integrates expert knowledge into the product-limit estimator in two different ways with distinct interpretations. Strong uniform consistency is proved for both cases under certain assumptions on the kind of contamination and the quality of expert information, which sheds light on the techniques and decisions that practitioners may take. The nuances of the techniques are discussed -- also with a view towards semi-parametric estimation -- and they are illustrated using simulated and real-world insurance data.

Wasserstein距離 · 蒙特卡羅 · 貝葉斯 · 貝葉斯方法 · 貝葉斯推斷 ·

2023 年 3 月 27 日

Parameter estimation for many-particle models from aggregate observations: A Wasserstein distance based sequential Monte Carlo sampler

Chen Cheng,Linjie Wen,Jinglai Li

In this work we study systems consisting of a group of moving particles. In such systems, often some important parameters are unknown and have to be estimated from observed data. Such parameter estimation problems can often be solved via a Bayesian inference framework. However in many practical problems, only data at the aggregate level is available and as a result the likelihood function is not available, which poses challenge for Bayesian methods. In particular, we consider the situation where the distributions of the particles are observed. We propose a Wasserstein distance based sequential Monte Carlo sampler to solve the problem: the Wasserstein distance is used to measure the similarity between the observed and the simulated particle distributions and the sequential Monte Carlo samplers is used to deal with the sequentially available observations. Two real-world examples are provided to demonstrate the performance of the proposed method.

特征選擇 · 高維 · 自適應 · 稀疏 · 高選擇 ·

2023 年 3 月 26 日

FAStEN: an efficient adaptive method for feature selection and estimation in high-dimensional functional regressions

Tobia Boschi,Lorenzo Testa,Francesca Chiaromonte,Matthew Reimherr

Functional regression analysis is an established tool for many contemporary scientific applications. Regression problems involving large and complex data sets are ubiquitous, and feature selection is crucial for avoiding overfitting and achieving accurate predictions. We propose a new, flexible, and ultra-efficient approach to perform feature selection in a sparse high dimensional function-on-function regression problem, and we show how to extend it to the scalar-on-function framework. Our method combines functional data, optimization, and machine learning techniques to perform feature selection and parameter estimation simultaneously. We exploit the properties of Functional Principal Components, and the sparsity inherent to the Dual Augmented Lagrangian problem to significantly reduce computational cost, and we introduce an adaptive scheme to improve selection accuracy. Through an extensive simulation study, we benchmark our approach to the best existing competitors and demonstrate a massive gain in terms of CPU time and selection performance without sacrificing the quality of the coefficients' estimation. Finally, we present an application to brain fMRI data from the AOMIC PIOP1 study.

多變量 · 無偏 · 磁流變材料 · 偏差 · 無偏估計 ·

2023 年 3 月 23 日

Unbiased estimation and asymptotically valid inference in multivariable Mendelian randomization with many weak instrumental variables

Yihe Yang,Noah Lorincz-Comi,Xiaofeng Zhu

from arxiv, 25 pages, 3 figures

Mendelian randomization (MR) is an instrumental variable (IV) approach to infer causal relationships between exposures and outcomes with genome-wide association studies (GWAS) summary data. However, the multivariable inverse-variance weighting (IVW) approach, which serves as the foundation for most MR approaches, cannot yield unbiased causal effect estimates in the presence of many weak IVs. In this paper, we prove that the bias of the multivariable IVW estimate is a product of weak instrument and estimation error biases, where the latter is linearly composed of measurement error and confounder biases with a trade-off due to sample overlap among multiple GWAS cohorts. To address this problem, we propose a novel multivariable MR approach, MR using Bias-corrected Estimating Equation (MRBEE), which can infer unbiased causal relationships with many weak IVs. Asymptotic behaviors of multivariable IVW and MRBEE are investigated under moderate conditions, showing that MRBEE outperforms multivariable IVW in terms of unbiasedness and asymptotic validity. We apply MRBEE to examine myopia and confirm that schooling and driving time are causal factors for myopia. A novel locus of myopia is identified in the subsequent whole-genome pleiotropy test.

估計/估計量 · 圖 · 圖形處理器 · 結點 · Neural Networks ·

2019 年 5 月 21 日

Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks

Namyong Park,Andrey Kan,Xin Luna Dong,Tong Zhao,Christos Faloutsos

from arxiv, KDD 2019 Research Track

How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNN-based method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicate-aware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in real-world KGs with different characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.