国产成人精品三级在线_久久久久久精品女人国产毛片_亚洲亚洲老熟妇女_免费萌白酱国产一区二区林_极品尤物精品一区二区三区_91免费线看线完整视频_亚洲色图欧美色图中色天堂

Providing natural language explanations for recommendations is particularly useful from the perspective of a non-expert user. Although several methods for providing such explanations have recently been proposed, we argue that an important aspect of explanation quality has been overlooked in their experimental evaluation. Specifically, the coherence between generated text and predicted rating, which is a necessary condition for an explanation to be useful, is not properly captured by currently used evaluation measures. In this paper, we highlight the issue of explanation and prediction coherence by 1) presenting results from a manual verification of explanations generated by one of the state-of-the-art approaches 2) proposing a method of automatic coherence evaluation 3) introducing a new transformer-based method that aims to produce more coherent explanations than the state-of-the-art approaches 4) performing an experimental evaluation which demonstrates that this method significantly improves the explanation coherence without affecting the other aspects of recommendation performance.

相關內容

state-of-the-art

關注 7

語言模型化 · MoDELS · Performer · 大語言模型 · Processing（編程語言） ·

2024 年 2 月 7 日

The Landscape and Challenges of HPC Research and LLMs

Le Chen,Nesreen K. Ahmed,Akash Dutta,Arijit Bhattacharjee,Sixing Yu,Quazi Ishtiaque Mahmud,Waqwoya Abebe,Hung Phan,Aishwarya Sarkar,Branden Butler,Niranjan Hasabnis,Gal Oren,Vy A. Vo,Juan Pablo Munoz,Theodore L. Willke,Tim Mattson,Ali Jannesari

Recently, language models (LMs), especially large language models (LLMs), have revolutionized the field of deep learning. Both encoder-decoder models and prompt-based techniques have shown immense potential for natural language processing and code-based tasks. Over the past several years, many research labs and institutions have invested heavily in high-performance computing, approaching or breaching exascale performance levels. In this paper, we posit that adapting and utilizing such language model-based techniques for tasks in high-performance computing (HPC) would be very beneficial. This study presents our reasoning behind the aforementioned position and highlights how existing ideas can be improved and adapted for HPC tasks.

大語言模型 · 有偏 · Agent · MoDELS · 統計量 ·

2024 年 2 月 6 日

Systematic Biases in LLM Simulations of Debates

Amir Taubenfeld,Yaniv Dover,Roi Reichart,Ariel Goldstein

Recent advancements in natural language processing, especially the emergence of Large Language Models (LLMs), have opened exciting possibilities for constructing computational simulations designed to replicate human behavior accurately. However, LLMs are complex statistical learners without straightforward deductive rules, making them prone to unexpected behaviors. In this study, we highlight the limitations of LLMs in simulating human interactions, particularly focusing on LLMs' ability to simulate political debates. Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases despite being directed to debate from certain political perspectives. This tendency results in behavioral patterns that seem to deviate from well-established social dynamics among humans. We reinforce these observations using an automatic self-fine-tuning method, which enables us to manipulate the biases within the LLM and demonstrate that agents subsequently align with the altered biases. These results underscore the need for further research to develop methods that help agents overcome these biases, a critical step toward creating more realistic simulations.

語言模型化 · 大語言模型 · 可辨認的 · MoDELS · Extensibility ·

2024 年 2 月 6 日

On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models

Thilini Wijesiriwardene,Ruwan Wickramarachchi,Aishwarya Naresh Reganti,Vinija Jain,Aman Chadha,Amit Sheth,Amitava Das

from arxiv, To appear in Findings of EACL 2024

The ability of Large Language Models (LLMs) to encode syntactic and semantic structures of language is well examined in NLP. Additionally, analogy identification, in the form of word analogies are extensively studied in the last decade of language modeling literature. In this work we specifically look at how LLMs' abilities to capture sentence analogies (sentences that convey analogous meaning to each other) vary with LLMs' abilities to encode syntactic and semantic structures of sentences. Through our analysis, we find that LLMs' ability to identify sentence analogies is positively correlated with their ability to encode syntactic and semantic structures of sentences. Specifically, we find that the LLMs which capture syntactic structures better, also have higher abilities in identifying sentence analogies.

大語言模型 · MoDELS · 語言模型化 · Prompt · Processing（編程語言） ·

2024 年 2 月 5 日

Fundamental Limitations of Alignment in Large Language Models

Yotam Wolf,Noam Wies,Oshri Avnery,Yoav Levine,Amnon Shashua

An important aspect in developing language models that interact with humans is aligning their behavior to be useful and unharmful for their human users. This is usually achieved by tuning the model in a way that enhances desired behaviors and inhibits undesired ones, a process referred to as alignment. In this paper, we propose a theoretical approach called Behavior Expectation Bounds (BEB) which allows us to formally investigate several inherent characteristics and limitations of alignment in large language models. Importantly, we prove that within the limits of this framework, for any behavior that has a finite probability of being exhibited by the model, there exist prompts that can trigger the model into outputting this behavior, with probability that increases with the length of the prompt. This implies that any alignment process that attenuates an undesired behavior but does not remove it altogether, is not safe against adversarial prompting attacks. Furthermore, our framework hints at the mechanism by which leading alignment approaches such as reinforcement learning from human feedback make the LLM prone to being prompted into the undesired behaviors. This theoretical result is being experimentally demonstrated in large scale by the so called contemporary "chatGPT jailbreaks", where adversarial users trick the LLM into breaking its alignment guardrails by triggering it into acting as a malicious persona. Our results expose fundamental limitations in alignment of LLMs and bring to the forefront the need to devise reliable mechanisms for ensuring AI safety.

卷積 · Performance · 講稿 · 代碼 · 論文 ·

2024 年 2 月 5 日

Algorithms for Computing the Free Distance of Convolutional Codes

Zita Abreu,Joachim Rosenthal,Michael Schaller

The free distance of a convolutional code is a reliable indicator of its performance. However its computation is not an easy task. In this paper, we present some algorithms to compute the free distance with good efficiency that work for convolutional codes of all rates and over any field. Furthermore we discuss why an algorithm which is claimed to be very efficient is incorrect.

Performer · 相似度 · Machine Translation · MoDELS · 評論員 ·

2024 年 2 月 4 日

Predicting Machine Translation Performance on Low-Resource Languages: The Role of Domain Similarity

Eric Khiu,Hasti Toossi,David Anugraha,Jinyu Liu,Jiaxu Li,Juan Armando Parra Flores,Leandro Acros Roman,A. Seza Do?ru?z,En-Shiun Annie Lee

from arxiv, 13 pages, 5 figures, accepted to EACL 2024, findings

Fine-tuning and testing a multilingual large language model is expensive and challenging for low-resource languages (LRLs). While previous studies have predicted the performance of natural language processing (NLP) tasks using machine learning methods, they primarily focus on high-resource languages, overlooking LRLs and shifts across domains. Focusing on LRLs, we investigate three factors: the size of the fine-tuning corpus, the domain similarity between fine-tuning and testing corpora, and the language similarity between source and target languages. We employ classical regression models to assess how these factors impact the model's performance. Our results indicate that domain similarity has the most critical impact on predicting the performance of Machine Translation models.

MoDELS · 輸出 · 語言模型化 · 大語言模型 · 小樣本學習 ·

2024 年 2 月 2 日

Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

Angelica Chen,Jason Phang,Alicia Parrish,Vishakh Padmakumar,Chen Zhao,Samuel R. Bowman,Kyunghyun Cho

from arxiv, Accepted to TMLR: //openreview.net/forum?id=5nBqY1y96B

Large language models (LLMs) have achieved widespread success on a variety of in-context few-shot tasks, but this success is typically evaluated via correctness rather than consistency. We argue that self-consistency is an important criteria for valid multi-step reasoning in tasks where the solution is composed of the answers to multiple sub-steps. We propose two types of self-consistency that are particularly important for multi-step reasoning -- hypothetical consistency (a model's ability to predict what its output would be in a hypothetical other context) and compositional consistency (consistency of a model's final outputs when intermediate sub-steps are replaced with the model's outputs for those steps). We demonstrate that multiple variants of the GPT-3/-4 models exhibit poor consistency rates across both types of consistency on a variety of tasks.

大語言模型 · 語言模型化 · 同質 · Less · MoDELS ·

2024 年 2 月 2 日

Homogenization Effects of Large Language Models on Human Creative Ideation

Barrett R. Anderson,Jash Hemant Shah,Max Kreminski

from arxiv, 20 pages, 7 figures

Large language models (LLMs) are now being used in a wide variety of contexts, including as creativity support tools (CSTs) intended to help their users come up with new ideas. But do LLMs actually support user creativity? We hypothesized that the use of an LLM as a CST might make the LLM's users feel more creative, and even broaden the range of ideas suggested by each individual user, but also homogenize the ideas suggested by different users. We conducted a 36-participant comparative user study and found, in accordance with the homogenization hypothesis, that different users tended to produce less semantically distinct ideas with ChatGPT than with an alternative CST. Additionally, ChatGPT users generated a greater number of more detailed ideas, but felt less responsible for the ideas they generated. We discuss potential implications of these findings for users, designers, and developers of LLM-based CSTs.

語言模型化 · 知識 (knowledge) · MoDELS · HTTPS · 有向 ·

2023 年 10 月 11 日

How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances

Zihan Zhang,Meng Fang,Ling Chen,Mohammad-Reza Namazi-Rad,Jun Wang

from arxiv, EMNLP 2023 main conference, paper link at //github.com/hyintell/awesome-refreshing-llms

Although large language models (LLMs) are impressive in solving various tasks, they can quickly be outdated after deployment. Maintaining their up-to-date status is a pressing concern in the current era. This paper provides a comprehensive review of recent advances in aligning LLMs with the ever-changing world knowledge without re-training from scratch. We categorize research works systemically and provide in-depth comparisons and discussion. We also discuss existing challenges and highlight future directions to facilitate research in this field. We release the paper list at //github.com/hyintell/awesome-refreshing-llms

Performer · Machine Learning · 模型性能 · MoDELS · Processing（編程語言） ·

2021 年 8 月 2 日

A Survey of Human-in-the-loop for Machine Learning

Xingjiao Wu,Luwei Xiao,Yixuan Sun,Junhang Zhang,Tianlong Ma,Liang He

Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.