又大又硬又长又粗免费看-国产一级一区二区三区四区

As large language models (LLMs) become more capable, there is growing excitement about the possibility of using LLMs as proxies for humans in real-world tasks where subjective labels are desired, such as in surveys and opinion polling. One widely-cited barrier to the adoption of LLMs is their sensitivity to prompt wording - but interestingly, humans also display sensitivities to instruction changes in the form of response biases. As such, we argue that if LLMs are going to be used to approximate human opinions, it is necessary to investigate the extent to which LLMs also reflect human response biases, if at all. In this work, we use survey design as a case study, where human response biases caused by permutations in wordings of "prompts" have been extensively studied. Drawing from prior work in social psychology, we design a dataset and propose a framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior. These inconsistencies tend to be more prominent in models that have been instruction fine-tuned. Furthermore, even if a model shows a significant change in the same direction as humans, we find that perturbations that are not meant to elicit significant changes in humans may also result in a similar change. These results highlight the potential pitfalls of using LLMs to substitute humans in parts of the annotation pipeline, and further underscore the importance of finer-grained characterizations of model behavior. Our code, dataset, and collected samples are available at //github.com/lindiatjuatja/BiasMonkey

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 測試數據 · Learning · Processing（編程語言） · 3D ·

2024 年 1 月 23 日

Gas trap prediction from 3D seismic and well test data using machine learning

Dmitry Ivlev

from arxiv, 11 pages, 3 figures

The aim of this work is to create and apply a methodological approach for predicting gas traps from 3D seismic data and gas well testing. The paper formalizes the approach to creating a training dataset by selecting volumes with established gas saturation and filtration properties within the seismic wavefield. The training dataset thus created is used in a process stack of sequential application of data processing methods and ensemble machine learning algorithms. As a result, a cube of calibrated probabilities of belonging of the study space to gas reservoirs was obtained. The high efficiency of this approach is shown on a delayed test sample of three wells (blind wells). The final value of the gas reservoir prediction quality metric f1 score was 0.893846.

Networking · 似然 · Analysis · Extensibility · 邊緣似然函數 ·

2024 年 1 月 23 日

Multilevel network meta-regression for general likelihoods: synthesis of individual and aggregate data with applications to survival analysis

David M. Phillippo,Sofia Dias,Nicky J. Welton,A. E. Ades

from arxiv, 43 pages, 8 figures

Network meta-analysis combines aggregate data (AgD) from multiple randomised controlled trials, assuming that any effect modifiers are balanced across populations. Individual patient data (IPD) meta-regression is the ``gold standard'' method to relax this assumption, however IPD are frequently only available in a subset of studies. Multilevel network meta-regression (ML-NMR) extends IPD meta-regression to incorporate AgD studies whilst avoiding aggregation bias, but currently requires the aggregate-level likelihood to have a known closed form. Notably, this prevents application to time-to-event outcomes. We extend ML-NMR to individual-level likelihoods of any form, by integrating the individual-level likelihood function over the AgD covariate distributions to obtain the respective marginal likelihood contributions. We illustrate with two examples of time-to-event outcomes, showing the performance of ML-NMR in a simulated comparison with little loss of precision from a full IPD analysis, and demonstrating flexible modelling of baseline hazards using cubic M-splines with synthetic data on newly diagnosed multiple myeloma. ML-NMR is a general method for synthesising individual and aggregate level data in networks of all sizes. Extension to general likelihoods, including for survival outcomes, greatly increases the applicability of the method. R and Stan code is provided, and the methods are implemented in the multinma R package.

隱馬爾科夫模型 · Markov · 潛變量/隱變量 · 潛在 · motivation ·

2024 年 1 月 23 日

Bayesian hidden Markov models for latent variable labeling assignments in conflict research: application to the role ceasefires play in conflict dynamics

Jonathan P Williams,Gudmund H Hermansen,H?vard Strand,Govinda Clayton,H?vard Mokleiv Nyg?rd

A crucial challenge for solving problems in conflict research is in leveraging the semi-supervised nature of the data that arise. Observed response data such as counts of battle deaths over time indicate latent processes of interest such as intensity and duration of conflicts, but defining and labeling instances of these unobserved processes requires nuance and imprecision. The availability of such labels, however, would make it possible to study the effect of intervention-related predictors - such as ceasefires - directly on conflict dynamics (e.g., latent intensity) rather than through an intermediate proxy like observed counts of battle deaths. Motivated by this problem and the new availability of the ETH-PRIO Civil Conflict Ceasefires data set, we propose a Bayesian autoregressive (AR) hidden Markov model (HMM) framework as a sufficiently flexible machine learning approach for semi-supervised regime labeling with uncertainty quantification. We motivate our approach by illustrating the way it can be used to study the role that ceasefires play in shaping conflict dynamics. This ceasefires data set is the first systematic and globally comprehensive data on ceasefires, and our work is the first to analyze this new data and to explore the effect of ceasefires on conflict dynamics in a comprehensive and cross-country manner.

判別器 · AI · GROUP · 分解的 · Analysis ·

2024 年 1 月 22 日

AI, insurance, discrimination and unfair differentiation. An overview and research agenda

Marvin S. L. van Bekkum,Frederik J. Zuiderveen Borgesius

Insurers increasingly use AI. We distinguish two situations in which insurers use AI: (i) data-intensive underwriting, and (ii) behaviour-based insurance. (i) First, insurers can use AI for data analysis to assess risks: data-intensive underwriting. Underwriting is, in short, calculating risks and amending the insurance premium accordingly. (ii) Second, insurers can use AI to monitor the behaviour of consumers in real-time: behaviour-based insurance. For example, some car insurers give a discount if a consumer agrees to being tracked by the insurer and drives safely. While the two trends bring many advantages, they may also have discriminatory effects. This paper focuses on the following question. Which discrimination-related effects may occur if insurers use data-intensive underwriting and behaviour-based insurance? We focus on two types of discrimination-related effects: discrimination and other unfair differentiation. (i) Discrimination harms certain groups who are protected by non-discrimination law, for instance people with certain ethnicities. (ii) Unfair differentiation does not harm groups that are protected by non-discrimination law, but it does seem unfair. We introduce four factors to consider when assessing the fairness of insurance practices. The paper builds on literature from various disciplines including law, philosophy, and computer science.

估計/估計量 · 無偏 · Continuity · Oracle · 學習器 ·

2024 年 1 月 20 日

Estimating heterogeneous treatment effect from survival outcomes via (orthogonal) censoring unbiased learning

Shenbo Xu,Raluca Cobzaru,Bang Zheng,Stan N. Finkelstein,Roy E. Welsch,Kenney Ng,Ioanna Tzoulaki,Zach Shahn

Methods for estimating heterogeneous treatment effects (HTE) from observational data have largely focused on continuous or binary outcomes, with less attention paid to survival outcomes and almost none to settings with competing risks. In this work, we develop censoring unbiased transformations (CUTs) for survival outcomes both with and without competing risks.After converting time-to-event outcomes using these CUTs, direct application of HTE learners for continuous outcomes yields consistent estimates of heterogeneous cumulative incidence effects, total effects, and separable direct effects. Our CUTs enable application of a much larger set of state of the art HTE learners for censored outcomes than had previously been available, especially in competing risks settings. We provide generic model-free learner-specific oracle inequalities bounding the finite-sample excess risk. The oracle efficiency results depend on the oracle selector and estimated nuisance functions from all steps involved in the transformation. We demonstrate the empirical performance of the proposed methods in simulation studies.

命名實體識別 · entity · 數據集 · Performer · 語言模型化 ·

2024 年 1 月 19 日

A survey on recent advances in named entity recognition

Imed Keraghel,Stanislas Morbieu,Mohamed Nadif

from arxiv, 30 pages

Named Entity Recognition seeks to extract substrings within a text that name real-world objects and to determine their type (for example, whether they refer to persons or organizations). In this survey, we first present an overview of recent popular approaches, but we also look at graph- and transformer- based methods including Large Language Models (LLMs) that have not had much coverage in other surveys. Second, we focus on methods designed for datasets with scarce annotations. Third, we evaluate the performance of the main NER implementations on a variety of datasets with differing characteristics (as regards their domain, their size, and their number of classes). We thus provide a deep comparison of algorithms that are never considered together. Our experiments shed some light on how the characteristics of datasets affect the behavior of the methods that we compare.

動力系統 · 不變 · Learning · 值域 · Networks ·

2024 年 1 月 19 日

Let's do the time-warp-attend: Learning topological invariants of dynamical systems

Noa Moriel,Matthew Ricci,Mor Nitzan

Dynamical systems across the sciences, from electrical circuits to ecological networks, undergo qualitative and often catastrophic changes in behavior, called bifurcations, when their underlying parameters cross a threshold. Existing methods predict oncoming catastrophes in individual systems but are primarily time-series-based and struggle both to categorize qualitative dynamical regimes across diverse systems and to generalize to real data. To address this challenge, we propose a data-driven, physically-informed deep-learning framework for classifying dynamical regimes and characterizing bifurcation boundaries based on the extraction of topologically invariant features. We focus on the paradigmatic case of the supercritical Hopf bifurcation, which is used to model periodic dynamics across a wide range of applications. Our convolutional attention method is trained with data augmentations that encourage the learning of topological invariants which can be used to detect bifurcation boundaries in unseen systems and to design models of biological systems like oscillatory gene regulatory networks. We further demonstrate our method's use in analyzing real data by recovering distinct proliferation and differentiation dynamics along pancreatic endocrinogenesis trajectory in gene expression space based on single-cell data. Our method provides valuable insights into the qualitative, long-term behavior of a wide range of dynamical systems, and can detect bifurcations or catastrophic transitions in large-scale physical and biological systems.

估計/估計量 · 規范化的 · 均值 · 似然 · MoDELS ·

2024 年 1 月 18 日

Asymptotics for power posterior mean estimation

Ruchira Ray,Marco Avella Medina,Cynthia Rush

from arxiv, 8 pages, In Proceedings of the 59th Annual Allerton Conference on Communication, Control, and Computing

Power posteriors "robustify" standard Bayesian inference by raising the likelihood to a constant fractional power, effectively downweighting its influence in the calculation of the posterior. Power posteriors have been shown to be more robust to model misspecification than standard posteriors in many settings. Previous work has shown that power posteriors derived from low-dimensional, parametric locally asymptotically normal models are asymptotically normal (Bernstein-von Mises) even under model misspecification. We extend these results to show that the power posterior moments converge to those of the limiting normal distribution suggested by the Bernstein-von Mises theorem. We then use this result to show that the mean of the power posterior, a point estimator, is asymptotically equivalent to the maximum likelihood estimator.

任務對話系統 · 可辨認的 · 評論員 · 張成子空間 · 語言模型化 ·

2024 年 1 月 18 日

Inconsistent dialogue responses and how to recover from them

Mian Zhang,Lifeng Jin,Linfeng Song,Haitao Mi,Dong Yu

from arxiv, Accepted in EACL 2024. Code and dataset available at //github.com/mianzhang/CIDER

One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem. In this work, we study methods to assess and bolster utterance consistency of chat systems. A dataset is first developed for studying the inconsistencies, where inconsistent dialogue responses, explanations of the inconsistencies, and recovery utterances are authored by annotators. This covers the life span of inconsistencies, namely introduction, understanding, and resolution. Building on this, we introduce a set of tasks centered on dialogue consistency, specifically focused on its detection and resolution. Our experimental findings indicate that our dataset significantly helps the progress in identifying and resolving conversational inconsistencies, and current popular large language models like ChatGPT which are good at resolving inconsistencies however still struggle with detection.

學成 · Performer · 深度學習 · Processing（編程語言） · 圖像處理 ·

2018 年 7 月 31 日

Deep learning in agriculture: A survey

Andreas Kamilaris,Francesc X. Prenafeta-Boldu

Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.