婷婷激情五月天中文字幕,蜜臀A永久无码精品,日韩丝袜亚洲国产中文欧美一区,京东热一区二区三区免费视频

For randomized trials that use text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by trained human raters. This process, the current standard, is both time-consuming and limiting: even the largest human coding efforts are typically constrained to measure only a small set of dimensions across a subsample of available texts. In this work, we present an inferential framework that can be used to increase the power of an impact assessment, given a fixed human-coding budget, by taking advantage of any ``untapped" observations -- those documents not manually scored due to time or resource constraints -- as a supplementary resource. Our approach, a methodological combination of causal inference, survey sampling methods, and machine learning, has four steps: (1) select and code a sample of documents; (2) build a machine learning model to predict the human-coded outcomes from a set of automatically extracted text features; (3) generate machine-predicted scores for all documents and use these scores to estimate treatment impacts; and (4) adjust the final impact estimates using the residual differences between human-coded and machine-predicted outcomes. As an extension to this approach, we also develop a strategy for identifying an optimal subset of documents to code in Step 1 in order to further enhance precision. Through an extensive simulation study based on data from a recent field trial in education, we show that our proposed approach can be used to reduce the scope of a human-coding effort while maintaining nominal power to detect a significant treatment impact.

相關內容

Extensibility

關注 5

iOS 8 提供的應用間和應用跟系統的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source:

優化器 · MoDELS · 操作 · 可辨認的 · 設計 ·

2023 年 11 月 8 日

Optimization approaches for the design and operation of open-loop shallow geothermal systems

S. Halilovic,F. B?ttcher,K. Zosseder,T. Hamacher

from arxiv, 16 pages, 3 figures; submitted to Advances in Geosciences

The optimization of open-loop shallow geothermal systems, which includes both design and operational aspects, is an important research area aimed at improving their efficiency and sustainability and the effective management of groundwater as a shallow geothermal resource. This paper investigates various approaches to address optimization problems arising from these research and implementation questions about GWHP systems. The identified optimization approaches are thoroughly analyzed based on criteria such as computational cost and applicability. Moreover, a novel classification scheme is introduced that categorizes the approaches according to the types of groundwater simulation model and the optimization algorithm used. Simulation models are divided into two types: numerical and simplified (analytical or data-driven) models, while optimization algorithms are divided into gradient-based and derivative-free algorithms. Finally, a comprehensive review of existing approaches in the literature is provided, highlighting their strengths and limitations and offering recommendations for both the use of existing approaches and the development of new, improved ones in this field.

泛函 · Performer · Weight · 統計方法 ·

2023 年 11 月 8 日

Testing semiparametric model-equivalence hypotheses based on the characteristic function

Feifei Chen,Simos G. Meintanis,Lixing Zhu

We propose three test criteria each of which is appropriate for testing, respectively, the equivalence hypotheses of symmetry, of homogeneity, and of independence, with multivariate data. All quantities have the common feature of involving weighted--type distances between characteristic functions and are convenient from the computational point of view if the weight function is properly chosen. The asymptotic behavior of the tests under the null hypothesis is investigated, and numerical studies are conducted in order to examine the performance of the criteria in finite samples.

GPT-3.5 · Performer · 語言模型化 · GPT-4 · 置信度 ·

2023 年 11 月 7 日

Evaluating multiple large language models in pediatric ophthalmology

Jason Holmes,Rui Peng,Yiwei Li,Jinyu Hu,Zhengliang Liu,Zihao Wu,Huan Zhao,Xi Jiang,Wei Liu,Hong Wei,Jie Zou,Tianming Liu,Yi Shao

from arxiv, 6 figures, 1 table

IMPORTANCE The response effectiveness of different large language models (LLMs) and various individuals, including medical students, graduate students, and practicing physicians, in pediatric ophthalmology consultations, has not been clearly established yet. OBJECTIVE Design a 100-question exam based on pediatric ophthalmology to evaluate the performance of LLMs in highly specialized scenarios and compare them with the performance of medical students and physicians at different levels. DESIGN, SETTING, AND PARTICIPANTS This survey study assessed three LLMs, namely ChatGPT (GPT-3.5), GPT-4, and PaLM2, were assessed alongside three human cohorts: medical students, postgraduate students, and attending physicians, in their ability to answer questions related to pediatric ophthalmology. It was conducted by administering questionnaires in the form of test papers through the LLM network interface, with the valuable participation of volunteers. MAIN OUTCOMES AND MEASURES Mean scores of LLM and humans on 100 multiple-choice questions, as well as the answer stability, correlation, and response confidence of each LLM. RESULTS GPT-4 performed comparably to attending physicians, while ChatGPT (GPT-3.5) and PaLM2 outperformed medical students but slightly trailed behind postgraduate students. Furthermore, GPT-4 exhibited greater stability and confidence when responding to inquiries compared to ChatGPT (GPT-3.5) and PaLM2. CONCLUSIONS AND RELEVANCE Our results underscore the potential for LLMs to provide medical assistance in pediatric ophthalmology and suggest significant capacity to guide the education of medical students.

Analysis · INFORMS · Kronecker積 · 易處理的 · 統計量 ·

2023 年 11 月 7 日

A novel analysis of utility in privacy pipelines, using Kronecker products and quantitative information flow

Mário S. Alvim,Natasha Fernandes,Annabelle McIver,Carroll Morgan,Gabriel H. Nunes

We combine Kronecker products, and quantitative information flow, to give a novel formal analysis for the fine-grained verification of utility in complex privacy pipelines. The combination explains a surprising anomaly in the behaviour of utility of privacy-preserving pipelines -- that sometimes a reduction in privacy results also in a decrease in utility. We use the standard measure of utility for Bayesian analysis, introduced by Ghosh at al., to produce tractable and rigorous proofs of the fine-grained statistical behaviour leading to the anomaly. More generally, we offer the prospect of formal-analysis tools for utility that complement extant formal analyses of privacy. We demonstrate our results on a number of common privacy-preserving designs.

MoDELS · 可理解性 · Learning · Transformer模型 · 變換 ·

2023 年 11 月 7 日

LISBET: a self-supervised Transformer model for the automatic segmentation of social behavior motifs

Giuseppe Chindemi,Benoit Girard,Camilla Bellone

Social behavior, defined as the process by which individuals act and react in response to others, is crucial for the function of societies and holds profound implications for mental health. To fully grasp the intricacies of social behavior and identify potential therapeutic targets for addressing social deficits, it is essential to understand its core principles. Although machine learning algorithms have made it easier to study specific aspects of complex behavior, current methodologies tend to focus primarily on single-animal behavior. In this study, we introduce LISBET (seLf-supervIsed Social BEhavioral Transformer), a model designed to detect and segment social interactions. Our model eliminates the need for feature selection and extensive human annotation by using self-supervised learning to detect and quantify social behaviors from dynamic body parts tracking data. LISBET can be used in hypothesis-driven mode to automate behavior classification using supervised finetuning, and in discovery-driven mode to segment social behavior motifs using unsupervised learning. We found that motifs recognized using the discovery-driven approach not only closely match the human annotations but also correlate with the electrophysiological activity of dopaminergic neurons in the Ventral Tegmental Area (VTA). We hope LISBET will help the community improve our understanding of social behaviors and their neural underpinnings.

統計量 · 泛函 · 向量化 · 樣例 · 蒙特卡羅 ·

2023 年 11 月 7 日

Multivariate quantile-based permutation tests with application to functional data

Zdeněk Hlávka,Daniel Hlubinka,?árka Hudecová

Permutation tests enable testing statistical hypotheses in situations when the distribution of the test statistic is complicated or not available. In some situations, the test statistic under investigation is multivariate, with the multiple testing problem being an important example. The corresponding multivariate permutation tests are then typically based on a suitableone-dimensional transformation of the vector of partial permutation p-values via so called combining functions. This paper proposes a new approach that utilizes the optimal measure transportation concept. The final single p-value is computed from the empirical center-outward distribution function of the permuted multivariate test statistics. This method avoids computation of the partial p-values and it is easy to be implemented. In addition, it allows to compute and interpret contributions of the components of the multivariate test statistic to the non-conformity score and to the rejection of the null hypothesis. Apart from this method, the measure transportation is applied also to the vector of partial p-values as an alternative to the classical combining functions. Both techniques are compared with the standard approaches using various practical examples in a Monte Carlo study. An application on a functional data set is provided as well.

Conformer · Analysis · 覆蓋 · Less · 數據點 ·

2023 年 11 月 6 日

Conformalized survival analysis with adaptive cutoffs

Yu Gui,Rohan Hore,Zhimei Ren,Rina Foygel Barber

from arxiv, Accepted by Biometrika; 22 pages

This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data. We build on recent work by Cand\`es et al. (2021), whose approach first subsets the data to discard any data points with early censoring times, and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to LPBs that are less conservative and give more accurate information. We show that in the Type I right-censoring setting, if either of the censoring mechanism or the conditional quantile of survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate LPBs for users' active times on a mobile app.

Networking · 評論員 · Learning · 計算成本 · GNN ·

2023 年 11 月 5 日

A graph-based probabilistic geometric deep learning framework with online enforcement of physical constraints to predict the criticality of defects in porous materials

Vasilis Krokos,Stéphane P. A. Bordas,Pierre Kerfriden

from arxiv, 68 pages; 52 figures

Stress prediction in porous materials and structures is challenging due to the high computational cost associated with direct numerical simulations. Convolutional Neural Network (CNN) based architectures have recently been proposed as surrogates to approximate and extrapolate the solution of such multiscale simulations. These methodologies are usually limited to 2D problems due to the high computational cost of 3D voxel based CNNs. We propose a novel geometric learning approach based on a Graph Neural Network (GNN) that efficiently deals with three-dimensional problems by performing convolutions over 2D surfaces only. Following our previous developments using pixel-based CNN, we train the GNN to automatically add local fine-scale stress corrections to an inexpensively computed coarse stress prediction in the porous structure of interest. Our method is Bayesian and generates densities of stress fields, from which credible intervals may be extracted. As a second scientific contribution, we propose to improve the extrapolation ability of our network by deploying a strategy of online physics-based corrections. Specifically, we condition the posterior predictions of our probabilistic predictions to satisfy partial equilibrium at the microscale, at the inference stage. This is done using an Ensemble Kalman algorithm, to ensure tractability of the Bayesian conditioning operation. We show that this innovative methodology allows us to alleviate the effect of undesirable biases observed in the outputs of the uncorrected GNN, and improves the accuracy of the predictions in general.

binary · 稀疏 · UniFormer · 控制器 · 頻率主義學派 ·

2023 年 11 月 3 日

Empirical Bayes large-scale multiple testing for high-dimensional sparse binary sequences

Bo Y. -C. Ning

from arxiv, 87 pages, 7 figures

This paper investigates the multiple testing problem for high-dimensional sparse binary sequences, motivated by the crowdsourcing problem in machine learning. We study the empirical Bayes approach for multiple testing on the high-dimensional Bernoulli model with a conjugate spike and uniform slab prior. We first show that the hard thresholding rule deduced from the posterior distribution is suboptimal. Consequently, the $\ell$-value procedure constructed using this posterior tends to be overly conservative in estimating the false discovery rate (FDR). We then propose two new procedures based on $\adj\ell$-values and $q$-values to correct this issue. Sharp frequentist theoretical results are obtained, demonstrating that both procedures can effectively control the FDR under sparsity. Numerical experiments are conducted to validate our theory in finite samples. To our best knowledge, this work provides the first uniform FDR control result in multiple testing for high-dimensional sparse binary data.

Performer · MoDELS · 線性的 · Extensibility · INFORMS ·

2023 年 11 月 2 日

A reluctant additive model framework for interpretable nonlinear individualized treatment rules

Jacob M. Maronge,Jared D. Huling,Guanhua Chen

Individualized treatment rules (ITRs) for treatment recommendation is an important topic for precision medicine as not all beneficial treatments work well for all individuals. Interpretability is a desirable property of ITRs, as it helps practitioners make sense of treatment decisions, yet there is a need for ITRs to be flexible to effectively model complex biomedical data for treatment decision making. Many ITR approaches either focus on linear ITRs, which may perform poorly when true optimal ITRs are nonlinear, or black-box nonlinear ITRs, which may be hard to interpret and can be overly complex. This dilemma indicates a tension between interpretability and accuracy of treatment decisions. Here we propose an additive model-based nonlinear ITR learning method that balances interpretability and flexibility of the ITR. Our approach aims to strike this balance by allowing both linear and nonlinear terms of the covariates in the final ITR. Our approach is parsimonious in that the nonlinear term is included in the final ITR only when it substantially improves the ITR performance. To prevent overfitting, we combine cross-fitting and a specialized information criterion for model selection. Through extensive simulations, we show that our methods are data-adaptive to the degree of nonlinearity and can favorably balance ITR interpretability and flexibility. We further demonstrate the robust performance of our methods with an application to a cancer drug sensitive study.