亚洲色偷偷色噜噜狠狠99网VR,无码人妻丰满熟妇A片护士M

This study evaluates the performance of large language models, specifically GPT-3.5 and BARD (supported by Gemini Pro model), in undergraduate admissions exams proposed by the National Polytechnic Institute in Mexico. The exams cover Engineering/Mathematical and Physical Sciences, Biological and Medical Sciences, and Social and Administrative Sciences. Both models demonstrated proficiency, exceeding the minimum acceptance scores for respective academic programs to up to 75% for some academic programs. GPT-3.5 outperformed BARD in Mathematics and Physics, while BARD performed better in History and questions related to factual information. Overall, GPT-3.5 marginally surpassed BARD with scores of 60.94% and 60.42%, respectively.

相關內容

Performer

關注 10

Learning · 圖 · 絕對多數投票 · 查全率/召回率 · 分解的 ·

2024 年 2 月 14 日

HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation

Yihao Fang,Stephen W. Thomas,Xiaodan Zhu

With the widespread adoption of large language models (LLMs) in numerous applications, the challenge of factuality and the propensity for hallucinations raises significant concerns. To address this issue, particularly in retrieval-augmented in-context learning, we introduce the hierarchical graph of thoughts (HGOT), a structured, multi-layered graph approach designed to enhance the retrieval of pertinent passages during in-context learning. The framework utilizes the emergent planning capabilities of LLMs, employing the divide-and-conquer strategy to break down complex queries into manageable sub-queries. It refines self-consistency majority voting for answer selection, which incorporates the recently proposed citation recall and precision metrics to assess the quality of thoughts, linking an answer's credibility intrinsically to the thought's quality. This methodology introduces a weighted system in majority voting, prioritizing answers based on the citation quality of their thoughts. Additionally, we propose a scoring mechanism for evaluating retrieved passages, considering factors such as citation frequency and quality, self-consistency confidence, and the retrieval module's ranking. Experiments reveal that HGOT outperforms other retrieval-augmented in-context learning methods, including Demonstrate-Search-Predict (DSP), ReAct, Self-Ask, and Retrieve-then-Read on different datasets by as much as $7\%$, demonstrating its efficacy in enhancing the factuality of LLMs.

Agent · 可辨認的 · MoDELS · 講稿 · 多樣性 ·

2024 年 2 月 14 日

Causal Explanations for Sequential Decision-Making in Multi-Agent Systems

Balint Gyevnar,Cheng Wang,Christopher G. Lucas,Shay B. Cohen,Stefano V. Albrecht

from arxiv, Accepted in 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2024

We present CEMA: Causal Explanations in Multi-Agent systems; a framework for creating causal natural language explanations of an agent's decisions in dynamic sequential multi-agent systems to build more trustworthy autonomous agents. Unlike prior work that assumes a fixed causal structure, CEMA only requires a probabilistic model for forward-simulating the state of the system. Using such a model, CEMA simulates counterfactual worlds that identify the salient causes behind the agent's decisions. We evaluate CEMA on the task of motion planning for autonomous driving and test it in diverse simulated scenarios. We show that CEMA correctly and robustly identifies the causes behind the agent's decisions, even when a large number of other agents is present, and show via a user study that CEMA's explanations have a positive effect on participants' trust in autonomous vehicles and are rated as high as high-quality baseline explanations elicited from other participants. We release the collected explanations with annotations as the HEADD dataset.

變換 · 大語言模型 · 塊 · 剪枝 · Processing（編程語言） ·

2024 年 2 月 14 日

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Jiwon Song,Kyungseok Oh,Taesu Kim,Hyungjun Kim,Yulhwa Kim,Jae-Joon Kim

Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate that SLEB successfully accelerates LLM inference without compromising the linguistic capabilities of these models, making it a promising technique for optimizing the efficiency of LLMs. The code is available at: //github.com/leapingjagg-dev/SLEB

Performer · MoDELS · 語言模型化 · 大語言模型 · 情景 ·

2024 年 2 月 14 日

Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

Martha Lewis,Melanie Mitchell

Large language models (LLMs) have performed well on several reasoning benchmarks, including ones that test analogical reasoning abilities. However, it has been debated whether they are actually performing humanlike abstract reasoning or instead employing less general processes that rely on similarity to what has been seen in their training data. Here we investigate the generality of analogy-making abilities previously claimed for LLMs (Webb, Holyoak, & Lu, 2023). We take one set of analogy problems used to evaluate LLMs and create a set of "counterfactual" variants-versions that test the same abstract reasoning abilities but that are likely dissimilar from any pre-training data. We test humans and three GPT models on both the original and counterfactual problems, and show that, while the performance of humans remains high for all the problems, the GPT models' performance declines sharply on the counterfactual set. This work provides evidence that, despite previously reported successes of LLMs on analogical reasoning, these models lack the robustness and generality of human analogy-making.

2024 年 2 月 13 日

Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning

Mingyang Li,Hongyu Liu,Yixuan Li,Zejun Wang,Yuan Yuan,Honglin Dai

This study is based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and aims to explore early detection and disease progression in Alzheimer's disease (AD). We employ innovative data preprocessing strategies, including the use of the random forest algorithm to fill missing data and the handling of outliers and invalid data, thereby fully mining and utilizing these limited data resources. Through Spearman correlation coefficient analysis, we identify some features strongly correlated with AD diagnosis. We build and test three machine learning models using these features: random forest, XGBoost, and support vector machine (SVM). Among them, the XGBoost model performs the best in terms of diagnostic performance, achieving an accuracy of 91%. Overall, this study successfully overcomes the challenge of missing data and provides valuable insights into early detection of Alzheimer's disease, demonstrating its unique research value and practical significance.

代碼 · MoDELS · 大語言模型 · 無監督 · Performer ·

2024 年 2 月 13 日

Unsupervised Evaluation of Code LLMs with Round-Trip Correctness

Miltiadis Allamanis,Sheena Panthaplackel,Pengcheng Yin

To evaluate code large language models (LLMs), research has relied on a few small manually curated benchmarks, such as HumanEval and MBPP, which represent a narrow part of the real-world software domains. In this work, we introduce round-trip correctness (RTC) as an alternative evaluation method. RTC allows Code LLM evaluation on a broader spectrum of real-world software domains without the need for costly human curation. RTC rests on the idea that we can ask a model to make a prediction (e.g., describe some code using natural language), feed that prediction back (e.g., synthesize code from the predicted description), and check if this round-trip leads to code that is semantically equivalent to the original input. We show how to employ RTC to evaluate code synthesis and editing. We find that RTC strongly correlates with model performance on existing narrow-domain code synthesis benchmarks while allowing us to expand to a much broader set of domains and tasks which was not previously possible without costly human annotations.

MoDELS · Learning · 聯邦學習 · 英偉達（NVIDIA） · 評論員 ·

2024 年 2 月 12 日

Empowering Federated Learning for Massive Models with NVIDIA FLARE

Holger R. Roth,Ziyue Xu,Yuan-Ting Hsieh,Adithya Renduchintala,Isaac Yang,Zhihong Zhang,Yuhong Wen,Sean Yang,Kevin Lu,Kristopher Kersten,Camir Ricketts,Daguang Xu,Chester Chen,Yan Cheng,Andrew Feng

In the ever-evolving landscape of artificial intelligence (AI) and large language models (LLMs), handling and leveraging data effectively has become a critical challenge. Most state-of-the-art machine learning algorithms are data-centric. However, as the lifeblood of model performance, necessary data cannot always be centralized due to various factors such as privacy, regulation, geopolitics, copyright issues, and the sheer effort required to move vast datasets. In this paper, we explore how federated learning enabled by NVIDIA FLARE can address these challenges with easy and scalable integration capabilities, enabling parameter-efficient and full supervised fine-tuning of LLMs for natural language processing and biopharmaceutical applications to enhance their accuracy and robustness.

大語言模型 · 語言模型化 · MoDELS · Networking · 流 ·

2024 年 2 月 10 日

Sentinels of the Stream: Unleashing Large Language Models for Dynamic Packet Classification in Software Defined Networks -- Position Paper

Shariq Murtuza

With the release of OpenAI's ChatGPT, the field of large language models (LLM) saw an increase of academic interest in GPT based chat assistants. In the next few months multiple accesible large language models were released that included Meta's LLama models and Mistral AI's Mistral and Mixtral MoE models. These models are available openly for a wide array of purposes with a wide spectrum of licenses. These LLMs have found their use in a different number of fields like code development, SQL generation etc. In this work we propose our plan to explore the applicability of large language model in the domain of network security. We plan to create Sentinel, a LLM, to analyse network packet contents and pass a judgment on it's threat level. This work is a preliminary report that will lay our plan for our future endeavors.

Extensibility · on the fly · MoDELS · Performer · Agent ·

2024 年 2 月 9 日

On the Fly Detection of Root Causes from Observed Data with Application to IT Systems

Lei Zan,Charles K. Assaad,Emilie Devijver,Eric Gaussier

This paper introduces a new structural causal model tailored for representing threshold-based IT systems and presents a new algorithm designed to rapidly detect root causes of anomalies in such systems. When root causes are not causally related, the method is proven to be correct; while an extension is proposed based on the intervention of an agent to relax this assumption. Our algorithm and its agent-based extension leverage causal discovery from offline data and engage in subgraph traversal when encountering new anomalies in online data. Our extensive experiments demonstrate the superior performance of our methods, even when applied to data generated from alternative structural causal models or real IT monitoring data.

BERT · Performer · Extensibility · 注意力機制 · MoDELS ·

2021 年 2 月 22 日

Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

Tingyu Xia,Yue Wang,Yuan Tian,Yi Chang

from arxiv, 10 pages, WWW'21, April19-23, 2021, Ljubljana, Slovenia

We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and analyzing what BERT has already known when solving this task, we obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. The analysis further motivates us to take a different approach than most existing works. Instead of using prior knowledge to create a new training task for fine-tuning BERT, we directly inject knowledge into BERT's multi-head attention mechanism. This leads us to a simple yet effective approach that enjoys fast training stage as it saves the model from training on additional data or tasks other than the main task. Extensive experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance over the original BERT model, and the performance benefit is most salient when training data is scarce.