高清国产三级在线播放_国产精品日本亚洲欧美_日韩精品欧美精品国产精品乱码_粉嫩AV一区二区三区精品_2021最新自拍视频_国产精品成人久久久久_日韩精品一区二区三区久久

from arxiv, 8 Pages, 3 Figures, 6th International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions VAM-HRI 2023. ACM/IEEE International Conference on Human-Robot Interaction. HRI 2023. March 13-16, 2023 Stockholm, SE

In recent years, an increasing number of Human-Robot Interaction (HRI) approaches have been implemented and evaluated in Virtual Reality (VR), as it allows to speed-up design iterations and makes it safer for the final user to evaluate and master the HRI primitives. However, identifying the most suitable VR experience is not straightforward. In this work, we evaluate how, in a smart agriculture scenario, immersive and non-immersive VR are perceived by users with respect to a speech act understanding task. In particular, we collect opinions and suggestions from the 81 participants involved in both experiments to highlight the strengths and weaknesses of these different experiences.

相關內容

INTERACT

關注 5

IFIP TC13 Conference on Human-Computer Interaction是人機交互領域的研究者和實踐者展示其工作的重要平臺。多年來，這些會議吸引了來自幾個國家和文化的研究人員。官網鏈接： · 語言模型化 · MoDELS · 大語言模型 · Performer ·

2024 年 2 月 22 日

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

Xudong Lu,Qi Liu,Yuhui Xu,Aojun Zhou,Siyuan Huang,Bo Zhang,Junchi Yan,Hongsheng Li

from arxiv, Mixture-of-Experts Large Language Models

A pivotal advancement in the progress of large language models (LLMs) is the emergence of the Mixture-of-Experts (MoE) LLMs. Compared to traditional LLMs, MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. Different from previous weight pruning methods that rely on specifically designed hardware, this paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques. Specifically, we propose, for the first time to our best knowledge, post-training approaches for task-agnostic and task-specific expert pruning and skipping of MoE LLMs, tailored to improve deployment efficiency while maintaining model performance across a wide range of tasks. Extensive experiments show that our proposed methods can simultaneously reduce model sizes and increase the inference speed, while maintaining satisfactory performance. Data and code will be available at //github.com/Lucky-Lance/Expert_Sparsity.

優化器 · Principle · Less · Learning · 代價 ·

2024 年 2 月 22 日

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian,Chris Cremer,Matthias Gallé,Marzieh Fadaee,Julia Kreutzer,Ahmet üstün,Sara Hooker

from arxiv, 27 pages, 7 figures, 2 tables

AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. \textsc{Proximal Policy Optimization} (PPO) has been positioned by recent literature as the canonical method for the RL part of RLHF. However, it involves both high computational cost and sensitive hyperparameter tuning. We posit that most of the motivational principles that led to the development of PPO are less of a practical concern in RLHF and advocate for a less computationally expensive method that preserves and even increases performance. We revisit the \textit{formulation} of alignment from human preferences in the context of RL. Keeping simplicity as a guiding principle, we show that many components of PPO are unnecessary in an RLHF context and that far simpler REINFORCE-style optimization variants outperform both PPO and newly proposed "RL-free" methods such as DPO and RAFT. Our work suggests that careful adaptation to LLMs alignment characteristics enables benefiting from online RL optimization at low cost.

Integration · AI · HCI · INTERACT · Guidance ·

2024 年 2 月 22 日

The European Commitment to Human-Centered Technology: The Integral Role of HCI in the EU AI Act's Success

André Calero Valdez,Moreen Heine,Thomas Franke,Nicole Jochems,Hans-Christian Jetter,Tim Schrills

from arxiv, 15 pages, 0 figures

The evolution of AI is set to profoundly reshape the future. The European Union, recognizing this impending prominence, has enacted the AI Act, regulating market access for AI-based systems. A salient feature of the Act is to guard democratic and humanistic values by focusing regulation on transparency, explainability, and the human ability to understand and control AI systems. Hereby, the EU AI Act does not merely specify technological requirements for AI systems. The EU issues a democratic call for human-centered AI systems and, in turn, an interdisciplinary research agenda for human-centered innovation in AI development. Without robust methods to assess AI systems and their effect on individuals and society, the EU AI Act may lead to repeating the mistakes of the General Data Protection Regulation of the EU and to rushed, chaotic, ad-hoc, and ambiguous implementation, causing more confusion than lending guidance. Moreover, determined research activities in Human-AI interaction will be pivotal for both regulatory compliance and the advancement of AI in a manner that is both ethical and effective. Such an approach will ensure that AI development aligns with human values and needs, fostering a technology landscape that is innovative, responsible, and an integral part of our society.

INFORMS · 信息檢索 · 大語言模型 · Performer · MoDELS ·

2024 年 2 月 21 日

BIRCO: A Benchmark of Information Retrieval Tasks with Complex Objectives

Xiaoyue Wang,Jianyou Wang,Weili Cao,Kaicheng Wang,Ramamohan Paturi,Leon Bergen

We present the Benchmark of Information Retrieval (IR) tasks with Complex Objectives (BIRCO). BIRCO evaluates the ability of IR systems to retrieve documents given multi-faceted user objectives. The benchmark's complexity and compact size make it suitable for evaluating large language model (LLM)-based information retrieval systems. We present a modular framework for investigating factors that may influence LLM performance on retrieval tasks, and identify a simple baseline model which matches or outperforms existing approaches and more complex alternatives. No approach achieves satisfactory performance on all benchmark tasks, suggesting that stronger models and new retrieval protocols are necessary to address complex user needs.

NLP · 可約的 · LLaMA · GPT-3.5 · 評論員 ·

2024 年 2 月 21 日

SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization

Prakamya Mishra,Zonghai Yao,Parth Vashisht,Feiyun Ouyang,Beining Wang,Vidhi Dhaval Mody,Hong Yu

from arxiv, Equal contribution for the first two authors. arXiv admin note: text overlap with arXiv:2310.20033

Large Language Models (LLMs) such as GPT and Llama have demonstrated significant achievements in summarization tasks but struggle with factual inaccuracies, a critical issue in clinical NLP applications where errors could lead to serious consequences. To counter the high costs and limited availability of expert-annotated data for factual alignment, this study introduces an innovative pipeline that utilizes GPT-3.5 and GPT-4 to generate high-quality feedback aimed at enhancing factual consistency in clinical note summarization. Our research primarily focuses on edit feedback, mirroring the practical scenario in which medical professionals refine AI system outputs without the need for additional annotations. Despite GPT's proven expertise in various clinical NLP tasks, such as the Medical Licensing Examination, there is scant research on its capacity to deliver expert-level edit feedback for improving weaker LMs or LLMs generation quality. This work leverages GPT's advanced capabilities in clinical NLP to offer expert-level edit feedback. Through the use of two distinct alignment algorithms (DPO and SALT) based on GPT edit feedback, our goal is to reduce hallucinations and align closely with medical facts, endeavoring to narrow the divide between AI-generated content and factual accuracy. This highlights the substantial potential of GPT edits in enhancing the alignment of clinical factuality.

SimPLe · INFORMS · 信息檢索 · IR · 多樣性 ·

2024 年 2 月 20 日

Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism

Hippolyte Gisserot-Boukhlef,Manuel Faysse,Emmanuel Malherbe,Céline Hudelot,Pierre Colombo

Neural Information Retrieval (NIR) has significantly improved upon heuristic-based IR systems. Yet, failures remain frequent, the models used often being unable to retrieve documents relevant to the user's query. We address this challenge by proposing a lightweight abstention mechanism tailored for real-world constraints, with particular emphasis placed on the reranking phase. We introduce a protocol for evaluating abstention strategies in a black-box scenario, demonstrating their efficacy, and propose a simple yet effective data-driven mechanism. We provide open-source code for experiment replication and abstention implementation, fostering wider adoption and application in diverse contexts.

解碼 · 推斷 · 大語言模型 · 語言模型化 · MoDELS ·

2024 年 2 月 20 日

Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding

Heming Xia,Zhe Yang,Qingxiu Dong,Peiyi Wang,Yongqi Li,Tao Ge,Tianyu Liu,Wenjie Li,Zhifang Sui

To mitigate the high inference latency stemming from autoregressive decoding in Large Language Models (LLMs), Speculative Decoding has emerged as a novel decoding paradigm for LLM inference. In each decoding step, this method first drafts several future tokens efficiently and then verifies them in parallel. Unlike autoregressive decoding, Speculative Decoding facilitates the simultaneous decoding of multiple tokens per step, thereby accelerating inference. This paper presents a comprehensive overview and analysis of this promising decoding paradigm. We begin by providing a formal definition and formulation of Speculative Decoding. Then, we organize in-depth discussions on its key facets, such as drafter selection and verification strategies. Furthermore, we present a comparative analysis of leading methods under third-party testing environments. We aim for this work to serve as a catalyst for further research on Speculative Decoding, ultimately contributing to more efficient LLM inference.

有偏 · MoDELS · Seven · Performer · 相關系數 ·

2024 年 2 月 20 日

Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation

Kristian Lum,Jacy Reese Anthis,Chirag Nagpal,Alexander D'Amour

Bias benchmarks are a popular method for studying the negative impacts of bias in LLMs, yet there has been little empirical investigation of whether these benchmarks are actually indicative of how real world harm may manifest in the real world. In this work, we study the correspondence between such decontextualized "trick tests" and evaluations that are more grounded in Realistic Use and Tangible {Effects (i.e. RUTEd evaluations). We explore this correlation in the context of gender-occupation bias--a popular genre of bias evaluation. We compare three de-contextualized evaluations adapted from the current literature to three analogous RUTEd evaluations applied to long-form content generation. We conduct each evaluation for seven instruction-tuned LLMs. For the RUTEd evaluations, we conduct repeated trials of three text generation tasks: children's bedtime stories, user personas, and English language learning exercises. We found no correspondence between trick tests and RUTEd evaluations. Specifically, selecting the least biased model based on the de-contextualized results coincides with selecting the model with the best performance on RUTEd evaluations only as often as random chance. We conclude that evaluations that are not based in realistic use are likely insufficient to mitigate and assess bias and real-world harms.

近似 · 優化器 · 縮放 · Sphering · 成比例 ·

2024 年 2 月 20 日

Guarantees on Warm-Started QAOA: Single-Round Approximation Ratios for 3-Regular MAXCUT and Higher-Round Scaling Limits

Reuben Tate,Stephan Eidenbenz

We generalize Farhi et al.'s 0.6924-approximation result technique of the Max-Cut Quantum Approximate Optimization Algorithm (QAOA) on 3-regular graphs to obtain provable lower bounds on the approximation ratio for warm-started QAOA. Given an initialization angle $\theta$, we consider warm-starts where the initial state is a product state where each qubit position is angle $\theta$ away from either the north or south pole of the Bloch sphere; of the two possible qubit positions the position of each qubit is decided by some classically obtained cut encoded as a bitstring $b$. We illustrate through plots how the properties of $b$ and the initialization angle $\theta$ influence the bound on the approximation ratios of warm-started QAOA. We consider various classical algorithms (and the cuts they produce which we use to generate the warm-start). Our results strongly suggest that there does not exist any choice of initialization angle that yields a (worst-case) approximation ratio that simultaneously beats standard QAOA and the classical algorithm used to create the warm-start. Additionally, we show that at $\theta=60^\circ$, warm-started QAOA is able to (effectively) recover the cut used to generate the warm-start, thus suggesting that in practice, this value could be a promising starting angle to explore alternate solutions in a heuristic fashion. Finally, for any combinatorial optimization problem with integer-valued objective values, we provide bounds on the required circuit depth needed for warm-started QAOA to achieve some change in approximation ratio; more specifically, we show that for small $\theta$, the bound is roughly proportional to $1/\theta$.

MoDELS · 變換 · 優化器 · Taxonomy · HTTPS ·

2023 年 11 月 21 日

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

Yunpeng Huang,Jingwei Xu,Zixu Jiang,Junyu Lai,Zenan Li,Yuan Yao,Taolue Chen,Lijuan Yang,Zhou Xin,Xiaoxing Ma

from arxiv, 35 pages, 3 figures, 4 tables

With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs) have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been applied in diverse areas as knowledge bases, human interfaces, and dynamic agents. However, a prevailing limitation exists: many current LLMs, constrained by resources, are primarily pre-trained on shorter texts, rendering them less effective for longer-context prompts, commonly encountered in real-world settings. In this paper, we present a comprehensive survey focusing on the advancement of model architecture in Transformer-based LLMs to optimize long-context capabilities across all stages from pre-training to inference. We firstly delineate and analyze the problems of handling long-context input and output with the current Transformer-based models. Then, we mainly offer a holistic taxonomy to navigate the landscape of Transformer upgrades on architecture to solve these problems. Afterward, we provide the investigation on wildly used evaluation necessities tailored for long-context LLMs, including datasets, metrics, and baseline models, as well as some amazing optimization toolkits like libraries, systems, and compilers to augment LLMs' efficiency and efficacy across different stages. Finally, we further discuss the predominant challenges and potential avenues for future research in this domain. Additionally, we have established a repository where we curate relevant literature with real-time updates at //github.com/Strivin0311/long-llms-learning.