国产欧美日韩综合在线_亚洲日韩网站在线观看_呦男呦女视频精品_亚洲aⅴ久久久噜噜噜噜_A级毛片大片免费香蕉网_亚洲成人电影一区二区_欧美视频一区二区免费看

We introduce a novel writing method called Probing Chain of Thought (ProCoT), which prevents students from cheating using a Large Language Model (LLM), such as ChatGPT, while enhancing their active learning through such models. LLMs have disrupted education and many other feilds. For fear of students cheating, many educationists have resorted to banning their use, as their outputs can be human-like and hard to detect in some cases. These LLMs are also known for hallucinations (i.e. fake facts). We conduct studies with ProCoT in two different courses with a combined total of about 66 students. The students in each course were asked to prompt an LLM of their choice with one question from a set of four and required to affirm or refute statements in the LLM output by using peer reviewed references. The results show two things: (1) ProCoT stimulates creative/critical thinking and writing of students through engagement with LLMs when we compare the LLM solely output to ProCoT output and (2) ProCoT can prevent cheating because of clear limitations in existing LLMs when we compare students ProCoT output to LLM ProCoT output. We also discover that most students prefer to give answers in fewer words than LLMs, which are typically verbose. The average word counts for students, ChatGPT (v3.5) and Phind (v8) are 208, 391 and 383, respectively.

相關內容

大語言(yan)模型(xing)

關注 56

大(da)語(yu)(yu)(yu)言(yan)(yan)(yan)模(mo)型是基于海量(liang)文(wen)本(ben)(ben)數(shu)據(ju)訓(xun)練的(de)(de)(de)(de)深(shen)度學習(xi)模(mo)型。它不(bu)(bu)僅能(neng)(neng)夠(gou)生成自然(ran)語(yu)(yu)(yu)言(yan)(yan)(yan)文(wen)本(ben)(ben)，還能(neng)(neng)夠(gou)深(shen)入(ru)理解文(wen)本(ben)(ben)含義，處理各種(zhong)自然(ran)語(yu)(yu)(yu)言(yan)(yan)(yan)任(ren)務，如文(wen)本(ben)(ben)摘要、問答、翻譯等(deng)。2023年，大(da)語(yu)(yu)(yu)言(yan)(yan)(yan)模(mo)型及(ji)其在人(ren)工智能(neng)(neng)領(ling)域的(de)(de)(de)(de)應用已成為全球科技研(yan)究的(de)(de)(de)(de)熱點，其在規模(mo)上(shang)的(de)(de)(de)(de)增長尤為引(yin)人(ren)注(zhu)目，參(can)數(shu)量(liang)已從(cong)最初的(de)(de)(de)(de)十幾(ji)億躍升到(dao)如今(jin)的(de)(de)(de)(de)一萬億。參(can)數(shu)量(liang)的(de)(de)(de)(de)提升使得模(mo)型能(neng)(neng)夠(gou)更(geng)加(jia)精細地(di)捕捉(zhuo)人(ren)類語(yu)(yu)(yu)言(yan)(yan)(yan)微妙之處，更(geng)加(jia)深(shen)入(ru)地(di)理解人(ren)類語(yu)(yu)(yu)言(yan)(yan)(yan)的(de)(de)(de)(de)復雜性(xing)(xing)。在過(guo)去的(de)(de)(de)(de)一年里(li)，大(da)語(yu)(yu)(yu)言(yan)(yan)(yan)模(mo)型在吸納新知識、分解復雜任(ren)務以及(ji)圖文(wen)對(dui)齊(qi)等(deng)多方面都有顯著提升。隨著技術的(de)(de)(de)(de)不(bu)(bu)斷(duan)成熟，它將(jiang)不(bu)(bu)斷(duan)拓展其應用范圍，為人(ren)類提供更(geng)加(jia)智能(neng)(neng)化(hua)和個(ge)性(xing)(xing)化(hua)的(de)(de)(de)(de)服(fu)務，進一步改善(shan)人(ren)們(men)的(de)(de)(de)(de)生活和生產方式(shi)。

XAI · MoDELS · NLP · Performer · Analysis ·

2024 年 2 月 5 日

SIDU-TXT: An XAI Algorithm for NLP with a Holistic Assessment Approach

Mohammad N. S. Jahromi,Satya. M. Muddamsetty,Asta Sofie Stage Jarlner,Anna Murphy H?genhaug,Thomas Gammeltoft-Hansen,Thomas B. Moeslund

from arxiv, Preprint submitted to Elsevier on Jan 5th, 2024

Explainable AI (XAI) aids in deciphering 'black-box' models. While several methods have been proposed and evaluated primarily in the image domain, the exploration of explainability in the text domain remains a growing research area. In this paper, we delve into the applicability of XAI methods for the text domain. In this context, the 'Similarity Difference and Uniqueness' (SIDU) XAI method, recognized for its superior capability in localizing entire salient regions in image-based classification is extended to textual data. The extended method, SIDU-TXT, utilizes feature activation maps from 'black-box' models to generate heatmaps at a granular, word-based level, thereby providing explanations that highlight contextually significant textual elements crucial for model predictions. Given the absence of a unified standard for assessing XAI methods, this study applies a holistic three-tiered comprehensive evaluation framework: Functionally-Grounded, Human-Grounded and Application-Grounded, to assess the effectiveness of the proposed SIDU-TXT across various experiments. We find that, in sentiment analysis task of a movie review dataset, SIDU-TXT excels in both functionally and human-grounded evaluations, demonstrating superior performance through quantitative and qualitative analyses compared to benchmarks like Grad-CAM and LIME. In the application-grounded evaluation within the sensitive and complex legal domain of asylum decision-making, SIDU-TXT and Grad-CAM demonstrate comparable performances, each with its own set of strengths and weaknesses. However, both methods fall short of entirely fulfilling the sophisticated criteria of expert expectations, highlighting the imperative need for additional research in XAI methods suitable for such domains.

機器閱讀理解 · MoDELS · Processing（編程語言） · 得分 · 基準 ·

2024 年 2 月 5 日

VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension

Thinh Phuoc Ngo,Khoa Tran Anh Dang,Son T. Luu,Kiet Van Nguyen,Ngan Luu-Thuy Nguyen

from arxiv, Accepted as main conference paper at EACL 2024

This paper presents the development process of a Vietnamese spoken language corpus for machine reading comprehension (MRC) tasks and provides insights into the challenges and opportunities associated with using real-world data for machine reading comprehension tasks. The existing MRC corpora in Vietnamese mainly focus on formal written documents such as Wikipedia articles, online newspapers, or textbooks. In contrast, the VlogQA consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube -- an extensive source of user-uploaded content, covering the topics of food and travel. By capturing the spoken language of native Vietnamese speakers in natural settings, an obscure corner overlooked in Vietnamese research, the corpus provides a valuable resource for future research in reading comprehension tasks for the Vietnamese language. Regarding performance evaluation, our deep-learning models achieved the highest F1 score of 75.34% on the test set, indicating significant progress in machine reading comprehension for Vietnamese spoken language data. In terms of EM, the highest score we accomplished is 53.97%, which reflects the challenge in processing spoken-based content and highlights the need for further improvement.

大語言模型 · MoDELS · AI Agent · 語言模型化 · ReQuEST ·

2024 年 2 月 5 日

Chain-of-Feedback: Mitigating the Effects of Inconsistency in Responses

Jinwoo Ahn

from arxiv, Still Ongoing Work

Large Language Models (LLMs) frequently suffer from knowledge-intensive questions, often being inconsistent by providing different outputs despite given the same input. The response quality worsens when the user expresses a firm opposing stance which causes the LLMs to adjust its response despite the correct initial one. These behaviors decrease the reliability and validity of the responses provided by these models. In this paper, we attempt to 1) raise awareness of the inherent risks that follow from overly relying on AI agents like ChatGPT by showing how Chain-of-Feedback (CoF) triggers LLMs to deviate more from the actual answer and 2) suggest a novel prompting method, Recursive Chain of Feedback (R-CoF), that we are conducting further study. The CoF system takes in an open-ended multi-step question. Then, we repetitively provide meaningless feedback requesting another attempt. Our preliminary experiments show that such feedback only decreases the quality of the response. On the other hand, to mitigate the effects of the aforementioned inconsistencies, we present a novel method of recursively revising the initial incorrect reasoning provided by the LLM by repetitively breaking down each incorrect step into smaller individual problems.

多樣性 · Learning · Performer · 強化學習 · 基準 ·

2024 年 2 月 4 日

Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback

Yifu Yuan,Jianye Hao,Yi Ma,Zibin Dong,Hebin Liang,Jinyi Liu,Zhixin Feng,Kai Zhao,Yan Zheng

from arxiv, Published as a conference paper at ICLR 2024. The website is available at //uni-rlhf.github.io/

Reinforcement Learning with Human Feedback (RLHF) has received significant attention for performing tasks without the need for costly manual reward design by aligning human preferences. It is crucial to consider diverse human feedback types and various learning methods in different environments. However, quantifying progress in RLHF with diverse feedback is challenging due to the lack of standardized annotation platforms and widely used unified benchmarks. To bridge this gap, we introduce Uni-RLHF, a comprehensive system implementation tailored for RLHF. It aims to provide a complete workflow from real human feedback, fostering progress in the development of practical problems. Uni-RLHF contains three packages: 1) a universal multi-feedback annotation platform, 2) large-scale crowdsourced feedback datasets, and 3) modular offline RLHF baseline implementations. Uni-RLHF develops a user-friendly annotation interface tailored to various feedback types, compatible with a wide range of mainstream RL environments. We then establish a systematic pipeline of crowdsourced annotations, resulting in large-scale annotated datasets comprising more than 15 million steps across 30+ popular tasks. Through extensive experiments, the results in the collected datasets demonstrate competitive performance compared to those from well-designed manual rewards. We evaluate various design choices and offer insights into their strengths and potential areas of improvement. We wish to build valuable open-source platforms, datasets, and baselines to facilitate the development of more robust and reliable RLHF solutions based on realistic human feedback. The website is available at //uni-rlhf.github.io/.

Integration · 穩健性 · Learning · MoDELS · 聯邦學習 ·

2024 年 2 月 2 日

Position Paper: Assessing Robustness, Privacy, and Fairness in Federated Learning Integrated with Foundation Models

Xi Li,Jiaqi Wang

from arxiv, Under review

Federated Learning (FL), while a breakthrough in decentralized machine learning, contends with significant challenges such as limited data availability and the variability of computational resources, which can stifle the performance and scalability of the models. The integration of Foundation Models (FMs) into FL presents a compelling solution to these issues, with the potential to enhance data richness and reduce computational demands through pre-training and data augmentation. However, this incorporation introduces novel issues in terms of robustness, privacy, and fairness, which have not been sufficiently addressed in the existing research. We make a preliminary investigation into this field by systematically evaluating the implications of FM-FL integration across these dimensions. We analyze the trade-offs involved, uncover the threats and issues introduced by this integration, and propose a set of criteria and strategies for navigating these challenges. Furthermore, we identify potential research directions for advancing this field, laying a foundation for future development in creating reliable, secure, and equitable FL systems.

多峰值 · 數據集 · 論文 · Learning · 講稿 ·

2024 年 2 月 2 日

Distractor Generation for Multiple-Choice Questions: A Survey of Methods, Datasets, and Evaluation

Elaf Alhazmi,Quan Z. Sheng,Wei Emma Zhang,Munazza Zaib,Ahoud Alhazmi

Distractors are important in learning evaluation. This paper surveys distractor generation tasks using English multiple-choice question datasets for textual and multimodal contexts. In particular, this paper presents a thorough literature review of the recent studies on distractor generation tasks, discusses multiple choice components and their characteristics, analyzes the related datasets, and summarizes the evaluation metrics of distractor generation. Our investigation reveals that more than half of datasets are human-generated from educational sources in specific domains such as Science and English, which are largely text-based, with a lack of open domain and multimodal datasets.

entity · 標注 · 命名實體識別 · INFORMS · AIM ·

2024 年 2 月 2 日

Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations

Siyao Peng,Zihang Sun,Sebastian Loftus,Barbara Plank

from arxiv, 9 pages; Accepted at UnImplicit workshop at EACL 2024

Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition. While recent studies address and aim to correct annotation errors via re-labeling efforts, little is known about the sources of human label variation, such as text ambiguity, annotation error, or guideline divergence. This is especially the case for high-quality datasets and beyond English CoNLL03. This paper studies disagreements in expert-annotated named entity datasets for three languages: English, Danish, and Bavarian. We show that text ambiguity and artificial guideline changes are dominant factors for diverse annotations among high-quality revisions. We survey student annotations on a subset of difficult entities and substantiate the feasibility and necessity of manifold annotations for understanding named entity ambiguities from a distributional perspective.

Performer · MoDELS · Learning · 閾值 · Networking ·

2024 年 2 月 1 日

LM-HT SNN: Enhancing the Performance of SNN to ANN Counterpart through Learnable Multi-hierarchical Threshold Model

Zecheng Hao,Xinyu Shi,Zhiyu Pan,Yujia Liu,Zhaofei Yu,Tiejun Huang

from arxiv, 15 pages, 2 figures

Compared to traditional Artificial Neural Network (ANN), Spiking Neural Network (SNN) has garnered widespread academic interest for its intrinsic ability to transmit information in a more biological-inspired and energy-efficient manner. However, despite previous efforts to optimize the learning gradients and model structure of SNNs through various methods, SNNs still lag behind ANNs in terms of performance to some extent. The recently proposed multi-threshold model provides more possibilities for further enhancing the learning capability of SNNs. In this paper, we rigorously analyze the relationship among the multi-threshold model, vanilla spiking model and quantized ANNs from a mathematical perspective, then propose a novel LM-HT model, which is an equidistant multi-hierarchical model that can dynamically regulate the global input current and membrane potential leakage on the time dimension. In addition, we note that the direct training algorithm based on the LM-HT model can seamlessly integrate with the traditional ANN-SNN Conversion framework. This novel hybrid learning framework can effectively improve the relatively poor performance of converted SNNs under low time latency. Extensive experimental results have demonstrated that our LM-HT model can significantly outperform previous state-of-the-art works on various types of datasets, which promote SNNs to achieve a brand-new level of performance comparable to quantized ANNs.

BERT · 語言表示 · state-of-the-art · 可理解性 · 自動問答 ·

2018 年 10 月 11 日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova

from arxiv, 13 pages

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.

判別器 · Performer · 降維 · 卷積神經網絡 · 多任務學習 ·

2018 年 1 月 25 日

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Yuan Gao,Qi She,Jiayi Ma,Mingbo Zhao,Wei Liu,Alan L. Yuille

from arxiv, 11 pages, 5 figures, 7 tables

State-of-the-art Convolutional Neural Network (CNN) benefits a lot from multi-task learning (MTL), which learns multiple related tasks simultaneously to obtain shared or mutually related representations for different tasks. The most widely-used MTL CNN structure is based on an empirical or heuristic split on a specific layer (e.g., the last convolutional layer) to minimize different task-specific losses. However, this heuristic sharing/splitting strategy may be harmful to the final performance of one or multiple tasks. In this paper, we propose a novel CNN structure for MTL, which enables automatic feature fusing at every layer. Specifically, we first concatenate features from different tasks according to their channel dimension, and then formulate the feature fusing problem as discriminative dimensionality reduction. We show that this discriminative dimensionality reduction can be done by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN, which we refer to as Neural Discriminative Dimensionality Reduction (NDDR). We perform ablation analysis in details for different configurations in training the network. The experiments carried out on different network structures and different task sets demonstrate the promising performance and desirable generalizability of our proposed method.