好诱人的搜子好爽免费观看,国产亚洲欧美日韩精品色狠二区,男女猛烈无遮挡免费,一区二区三区视频兔费黄址,国内一区二区三区精品视频在线播放

Large Language Models (LLMs) have emerged as a powerful tool in advancing the Text-to-SQL task, significantly outperforming traditional methods. Nevertheless, as a nascent research field, there is still no consensus on the optimal prompt templates and design frameworks. Additionally, existing benchmarks inadequately explore the performance of LLMs across the various sub-tasks of the Text-to-SQL process, which hinders the assessment of LLMs' cognitive capabilities and the optimization of LLM-based solutions.To address the aforementioned issues, we firstly construct a new dataset designed to mitigate the risk of overfitting in LLMs. Then we formulate five evaluation tasks to comprehensively assess the performance of diverse methods across various LLMs throughout the Text-to-SQL process.Our study highlights the performance disparities among LLMs and proposes optimal in-context learning solutions tailored to each task. These findings offer valuable insights for enhancing the development of LLM-based Text-to-SQL systems.

相關內容

Performer

關注 0

語言模型化 · Performer · MoDELS · 大語言模型 · 可理解性 ·

2024 年 4 月 15 日

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Jiahuan Li,Hao Zhou,Shujian Huang,Shanbo Cheng,Jiajun Chen

from arxiv, accepted by Transaction of ACL, pre-MIT version

Large-scale Pretrained Language Models (LLMs), such as ChatGPT and GPT4, have shown strong abilities in multilingual translations, without being explicitly trained on parallel corpora. It is interesting how the LLMs obtain their ability to carry out translation instructions for different languages. In this paper, we present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation following given instructions. Firstly, we show that multilingual LLMs have stronger translation abilities than previously demonstrated. For a certain language, the performance depends on its similarity to English and the amount of data used in the pretraining phase. Secondly, we find that LLMs' ability to carry out translation instructions relies on the understanding of translation instructions and the alignment among different languages. With multilingual finetuning, LLMs could learn to perform the translation task well even for those language pairs unseen during the instruction tuning phase.

Performer · 塑造 · 勢函數 · 樣本 · Learning ·

2024 年 4 月 11 日

On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning

Giuseppe Canonaco,Leo Ardon,Alberto Pozanco,Daniel Borrajo

The use of Potential Based Reward Shaping (PBRS) has shown great promise in the ongoing research effort to tackle sample inefficiency in Reinforcement Learning (RL). However, the choice of the potential function is critical for this technique to be effective. Additionally, RL techniques are usually constrained to use a finite horizon for computational limitations. This introduces a bias when using PBRS, thus adding an additional layer of complexity. In this paper, we leverage abstractions to automatically produce a "good" potential function. We analyse the bias induced by finite horizons in the context of PBRS producing novel insights. Finally, to asses sample efficiency and performance impact, we evaluate our approach on four environments including a goal-oriented navigation task and three Arcade Learning Environments (ALE) games demonstrating that we can reach the same level of performance as CNN-based solutions with a simple fully-connected network.

ChatGPT · 可辨認的 · Performer · GPT-3.5 · TOOLS ·

2024 年 4 月 8 日

Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability

Wei-Rui Chen,Ife Adebara,Khai Duy Doan,Qisheng Liao,Muhammad Abdul-Mageed

from arxiv, Accepted to NAACL 2024 Findings

ChatGPT has recently emerged as a powerful NLP tool that can carry out a variety of tasks. However, the range of languages ChatGPT can handle remains largely a mystery. To uncover which languages ChatGPT `knows', we investigate its language identification (LID) abilities. For this purpose, we compile Babel-670, a benchmark comprising 670 languages representing 24 language families spoken in five continents. Languages in Babel-670 run the gamut from the very high-resource to the very low-resource. We then study ChatGPT's (both GPT-3.5 and GPT-4) ability to (i) identify language names and language codes (ii) under zero- and few-shot conditions (iii) with and without provision of a label set. When compared to smaller finetuned LID tools, we find that ChatGPT lags behind. For example, it has poor performance on African languages. We conclude that current large language models would benefit from further development before they can sufficiently serve diverse communities.

Integration · Networking · 優化器 · Analysis · Networks ·

2024 年 4 月 6 日

Securing the Skies: An IRS-Assisted AoI-Aware Secure Multi-UAV System with Efficient Task Offloading

Poorvi Joshi,Alakesh Kalita,Mohan Gurusamy

from arxiv, 7 pages, 5 figures, to be published in IEEE 99th Vehicular Technology Conference (VTC2024-Spring)

Unmanned Aerial Vehicles (UAVs) are integral in various sectors like agriculture, surveillance, and logistics, driven by advancements in 5G. However, existing research lacks a comprehensive approach addressing both data freshness and security concerns. In this paper, we address the intricate challenges of data freshness, and security, especially in the context of eavesdropping and jamming in modern UAV networks. Our framework incorporates exponential AoI metrics and emphasizes secrecy rate to tackle eavesdropping and jamming threats. We introduce a transformer-enhanced Deep Reinforcement Learning (DRL) approach to optimize task offloading processes. Comparative analysis with existing algorithms showcases the superiority of our scheme, indicating its promising advancements in UAV network management.

解碼 · MoDELS · Nuance · AIM · 可理解性 ·

2024 年 4 月 5 日

Mind the Exit Pupil Gap: Revisiting the Intrinsics of a Standard Plenoptic Camera

Tim Michels,Daniel M?ckelmann,Reinhard Koch

from arxiv, 29 pages, 16 figures, Accepted for publication in MDPI Sensors, Special Issue 'Short-Range Optical 3D Scanning and 3D Data Processing '

Among the common applications of plenoptic cameras are depth reconstruction and post-shot refocusing. These require a calibration relating the camera-side light field to that of the scene. Numerous methods with this goal have been developed based on thin lens models for the plenoptic camera's main lens and microlenses. Our work addresses the often-overlooked role of the main lens exit pupil in these models and specifically in the decoding process of standard plenoptic camera (SPC) images. We formally deduce the connection between the refocusing distance and the resampling parameter for the decoded light field and provide an analysis of the errors that arise when the exit pupil is not considered. In addition, previous work is revisited with respect to the exit pupil's role and all theoretical results are validated through a ray-tracing-based simulation. With the public release of the evaluated SPC designs alongside our simulation and experimental data we aim to contribute to a more accurate and nuanced understanding of plenoptic camera optics.

話題 · 數據集 · 樣例 · 推斷 · 標注 ·

2024 年 4 月 5 日

Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish

Fred Philippy,Shohreh Haddadan,Siwen Guo

from arxiv, 3rd Annual Meeting of the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL 2024)

In NLP, zero-shot classification (ZSC) is the task of assigning labels to textual data without any labeled examples for the target classes. A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels. However, this approach faces certain challenges, particularly for languages with limited resources. In this paper, we propose an alternative solution that leverages dictionaries as a source of data for ZSC. We focus on Luxembourgish, a low-resource language spoken in Luxembourg, and construct two new topic relevance classification datasets based on a dictionary that provides various synonyms, word translations and example sentences. We evaluate the usability of our dataset and compare it with the NLI-based approach on two topic classification tasks in a zero-shot manner. Our results show that by using the dictionary-based dataset, the trained models outperform the ones following the NLI-based approach for ZSC. While we focus on a single low-resource language in this study, we believe that the efficacy of our approach can also transfer to other languages where such a dictionary is available.

通道 · GPUs · INFORMS · Integration · 錯誤率 ·

2024 年 4 月 5 日

Beyond the Bridge: Contention-Based Covert and Side Channel Attacks on Multi-GPU Interconnect

Yicheng Zhang,Ravan Nazaraliyev,Sankha Baran Dutta,Nael Abu-Ghazaleh,Andres Marquez,Kevin Barker

from arxiv, Accepted to SEED 2024

High-speed interconnects, such as NVLink, are integral to modern multi-GPU systems, acting as a vital link between CPUs and GPUs. This study highlights the vulnerability of multi-GPU systems to covert and side channel attacks due to congestion on interconnects. An adversary can infer private information about a victim's activities by monitoring NVLink congestion without needing special permissions. Leveraging this insight, we develop a covert channel attack across two GPUs with a bandwidth of 45.5 kbps and a low error rate, and introduce a side channel attack enabling attackers to fingerprint applications through the shared NVLink interconnect.

大語言模型 · 賭博機/老虎機 · 語言模型化 · 估計/估計量 · 貪心 ·

2024 年 4 月 3 日

On the Importance of Uncertainty in Decision-Making with Large Language Models

Nicolò Felicioni,Lucas Maystre,Sina Ghiassian,Kamil Ciosek

from arxiv, 12 pages of main content, 25 pages with references and appendix

We investigate the role of uncertainty in decision-making problems with natural language as input. For such tasks, using Large Language Models as agents has become the norm. However, none of the recent approaches employ any additional phase for estimating the uncertainty the agent has about the world during the decision-making task. We focus on a fundamental decision-making framework with natural language as input, which is the one of contextual bandits, where the context information consists of text. As a representative of the approaches with no uncertainty estimation, we consider an LLM bandit with a greedy policy, which picks the action corresponding to the largest predicted reward. We compare this baseline to LLM bandits that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy. We employ different techniques for uncertainty estimation, such as Laplace Approximation, Dropout, and Epinets. We empirically show on real-world data that the greedy policy performs worse than the Thompson Sampling policies. These findings suggest that, while overlooked in the LLM literature, uncertainty plays a fundamental role in bandit tasks with LLMs.

可理解性 · 語言模型化 · MoDELS · Performer · 查準率/準確率 ·

2024 年 4 月 2 日

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

Zihao Wang,Shuyu Li,Tao Zhang,Qi Wang,Pengfei Yu,Jinyang Luo,Yan Liu,Ming Xi,Kejun Zhang

The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of annotated data for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music. All data related to the benchmark and the code for scoring have been open-sourced.

模型評估 · MoDELS · 語言模型化 · 大語言模型 · Performer ·

2024 年 4 月 2 日

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey

Philipp Mondorf,Barbara Plank

from arxiv, 26 pages, 2 figures

Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning, leading to a lively debate on whether these models possess reasoning capabilities similar to humans. However, despite these successes, the depth of LLMs' reasoning abilities remains uncertain. This uncertainty partly stems from the predominant focus on task performance, measured through shallow accuracy metrics, rather than a thorough investigation of the models' reasoning behavior. This paper seeks to address this gap by providing a comprehensive review of studies that go beyond task accuracy, offering deeper insights into the models' reasoning processes. Furthermore, we survey prevalent methodologies to evaluate the reasoning behavior of LLMs, emphasizing current trends and efforts towards more nuanced reasoning analyses. Our review suggests that LLMs tend to rely on surface-level patterns and correlations in their training data, rather than on genuine reasoning abilities. Additionally, we identify the need for further research that delineates the key differences between human and LLM-based reasoning. Through this survey, we aim to shed light on the complex reasoning processes within LLMs.