国产乱伦对白刺激视频,国产精品大秀视频日韩无码

Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g., specifying attributes like length and style), which have the potential to yield diverse and attributed generated data. Our investigation focuses on datasets with high cardinality and diverse domains, wherein we demonstrate that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance. Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter. We release the generated dataset and used prompts to facilitate future research. The data and code will be available on \url{//github.com/yueyu1030/AttrPrompt}.

相關內容

多樣性

關注 0

語言模型化 · INFORMS · 圖 · MoDELS · 可約的 ·

2023 年 8 月 21 日

Graph of Thoughts: Solving Elaborate Problems with Large Language Models

Maciej Besta,Nils Blach,Ales Kubicek,Robert Gerstenberger,Lukas Gianinazzi,Joanna Gajda,Tomasz Lehmann,Michal Podstawski,Hubert Niewiadomski,Piotr Nyczyk,Torsten Hoefler

We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks.

Performer · AI · INFORMS · 可理解性 · Analysis ·

2023 年 8 月 21 日

Exploring AI Tool's Versatile Responses: An In-depth Analysis Across Different Industries and Its Performance Evaluation

Hitesh Mohapatra,Soumya Ranjan Mishra

AI Tool is a large language model (LLM) designed to generate human-like responses in natural language conversations. It is trained on a massive corpus of text from the internet, which allows it to leverage a broad understanding of language, general knowledge, and various domains. AI Tool can provide information, engage in conversations, assist with tasks, and even offer creative suggestions. The underlying technology behind AI Tool is a transformer neural network. Transformers excel at capturing long-range dependencies in text, making them well-suited for language-related tasks. AI Tool has 175 billion parameters, making it one of the largest and most powerful LLMs to date. This work presents an overview of AI Tool's responses on various sectors of industry. Further, the responses of AI Tool have been cross-verified with human experts in the corresponding fields. To validate the performance of AI Tool, a few explicit parameters have been considered and the evaluation has been done. This study will help the research community and other users to understand the uses of AI Tool and its interaction pattern. The results of this study show that AI Tool is able to generate human-like responses that are both informative and engaging. However, it is important to note that AI Tool can occasionally produce incorrect or nonsensical answers. It is therefore important to critically evaluate the information that AI Tool provides and to verify it from reliable sources when necessary. Overall, this study suggests that AI Tool is a promising new tool for natural language processing, and that it has the potential to be used in a wide variety of applications.

代碼 · 穩健性 · 語言模型化 · MoDELS · 可理解性 ·

2023 年 8 月 20 日

A Study on Robustness and Reliability of Large Language Model Code Generation

Li Zhong,Zilong Wang

Recently, the large language models (LLMs) have shown extraordinary ability in understanding natural language and generating programming code. It has been a common practice of software engineers to consult LLMs when encountering coding questions. Although efforts have been made to avoid syntax errors and align the code with the intended semantics, the reliability and robustness of the code generationfrom LLMs have not yet been thoroughly studied. The executable code is not equivalent to the reliable and robust code, especially in the context of real-world software development.The misuse of APIs in the generated code could lead to severe problem, such as resource leaks, program crashes, etc.To make things worse, the users of LLM code generation services are actually the developers that are most vulnerable to these code that seems right -- They are always novice developers that are not familiar with the APIs that LLMs generate code for them. Therefore, they could hardly tell the misuse in the code generated by LLMs, which further facilitates the incorrect code applied in real-world software. Existing code evaluation benchmark and datasets focus on crafting small tasks such as programming questions in coding interviews, which however deviates from the problem that developers would ask LLM for real-world coding help. To fill the missing piece, in this work, we propose a dataset RobustAPI for evaluating the reliability and robustness of code generated by LLMs. We collect 1208 coding questions from StackOverflow on 24 representative Java APIs. We summarize thecommon misuse patterns of these APIs and evaluate them oncurrent popular LLMs. The evaluation results show that evenfor GPT-4, 62% of the generated code contains API misuses,which would cause unexpected consequences if the code isintroduced into real-world software.

知識 (knowledge) · 語言模型化 · Performer · MoDELS · 值域 ·

2023 年 8 月 19 日

FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models

Liwen Zhang,Weige Cai,Zhaowei Liu,Zhi Yang,Wei Dai,Yujie Liao,Qianru Qin,Yifei Li,Xingyu Liu,Zhiqiang Liu,Zhoufan Zhu,Anbo Wu,Xin Guo,Yun Chen

Large language models (LLMs) have demonstrated exceptional performance in various natural language processing tasks, yet their efficacy in more challenging and domain-specific tasks remains largely unexplored. This paper presents FinEval, a benchmark specifically designed for the financial domain knowledge in the LLMs. FinEval is a collection of high-quality multiple-choice questions covering Finance, Economy, Accounting, and Certificate. It includes 4,661 questions spanning 34 different academic subjects. To ensure a comprehensive model performance evaluation, FinEval employs a range of prompt types, including zero-shot and few-shot prompts, as well as answer-only and chain-of-thought prompts. Evaluating state-of-the-art Chinese and English LLMs on FinEval, the results show that only GPT-4 achieved an accuracy close to 70% in different prompt settings, indicating significant growth potential for LLMs in the financial domain knowledge. Our work offers a more comprehensive financial knowledge evaluation benchmark, utilizing data of mock exams and covering a wide range of evaluated LLMs.

知識 (knowledge) · Performer · AIM · 語言模型化 · 數據集 ·

2023 年 8 月 19 日

Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs

Suhang Wu,Minlong Peng,Yue Chen,Jinsong Su,Mingming Sun

Large language models (LLMs) possess a wealth of knowledge encoded in their parameters. However, this knowledge may become outdated or unsuitable over time. As a result, there has been a growing interest in knowledge editing for LLMs and evaluating its effectiveness. Existing studies primarily focus on knowledge editing using factual triplets, which not only incur high costs for collection but also struggle to express complex facts. Furthermore, these studies are often limited in their evaluation perspectives. In this paper, we propose Eva-KELLM, a new benchmark for evaluating knowledge editing of LLMs. This benchmark includes an evaluation framework and a corresponding dataset. Under our framework, we first ask the LLM to perform knowledge editing using raw documents, which provides a more convenient and universal approach compared to using factual triplets. We then evaluate the updated LLM from multiple perspectives. In addition to assessing the effectiveness of knowledge editing and the retention of unrelated knowledge from conventional studies, we further test the LLM's ability in two aspects: 1) Reasoning with the altered knowledge, aiming for the LLM to genuinely learn the altered knowledge instead of simply memorizing it. 2) Cross-lingual knowledge transfer, where the LLM updated with raw documents in one language should be capable of handling queries from another language. To facilitate further research, we construct and release the corresponding dataset. Using this benchmark, we investigate the effectiveness of several commonly-used knowledge editing methods. Experimental results indicate that the current methods for knowledge editing using raw documents are not effective in yielding satisfactory results, particularly when it comes to reasoning with altered knowledge and cross-lingual knowledge transfer.

語言模型化 · MoDELS · ChatGPT · 對數似然 · 數據集 ·

2023 年 8 月 18 日

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

Rishabh Bhardwaj,Soujanya Poria

Larger language models (LLMs) have taken the world by storm with their massive multi-tasking capabilities simply by optimizing over a next-word prediction objective. With the emergence of their properties and encoded knowledge, the risk of LLMs producing harmful outputs increases, making them unfit for scalable deployment for the public. In this work, we propose a new safety evaluation benchmark RED-EVAL that carries out red-teaming. We show that even widely deployed models are susceptible to the Chain of Utterances-based (CoU) prompting, jailbreaking closed source LLM-based systems such as GPT-4 and ChatGPT to unethically respond to more than 65% and 73% of harmful queries. We also demonstrate the consistency of the RED-EVAL across 8 open-source LLMs in generating harmful responses in more than 86% of the red-teaming attempts. Next, we propose RED-INSTRUCT--An approach for the safety alignment of LLMs. It constitutes two phases: 1) HARMFULQA data collection: Leveraging CoU prompting, we collect a dataset that consists of 1.9K harmful questions covering a wide range of topics, 9.5K safe and 7.3K harmful conversations from ChatGPT; 2) SAFE-ALIGN: We demonstrate how the conversational dataset can be used for the safety alignment of LLMs by minimizing the negative log-likelihood over helpful responses and penalizing over harmful responses by gradient accent over sample loss. Our model STARLING, a fine-tuned Vicuna-7B, is observed to be more safely aligned when evaluated on RED-EVAL and HHH benchmarks while preserving the utility of the baseline models (TruthfulQA, MMLU, and BBH).

MoDELS · 語言模型化 · HTTPS · 數學 · Extensibility ·

2023 年 8 月 18 日

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo,Qingfeng Sun,Can Xu,Pu Zhao,Jianguang Lou,Chongyang Tao,Xiubo Geng,Qingwei Lin,Shifeng Chen,Dongmei Zhang

from arxiv, LLM, Mathematical Reasoning

Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. WizardMath surpasses all other open-source LLMs by a substantial margin. Furthermore, our model even outperforms ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci-002, PaLM-1 and GPT-3 on MATH. More details and model weights are public at //github.com/nlpxucan/WizardLM and //huggingface.co/WizardLM.

穩健性 · 規范化的 · 語言模型化 · MoDELS · 潛在 ·

2023 年 8 月 18 日

Latent Jailbreak: A Test Suite for Evaluating Both Text Safety and Output Robustness of Large Language Models

Huachuan Qiu,Shuai Zhang,Anqi Li,Hongliang He,Zhenzhong Lan

from arxiv, Code and data are available at //github.com/qiuhuachuan/latent-jailbreak

Considerable research efforts have been devoted to ensuring that large language models (LLMs) align with human values and generate safe text. However, an excessive focus on sensitivity to certain topics can compromise the model's robustness in following instructions, thereby impacting its overall performance in completing tasks. Previous benchmarks for jailbreaking LLMs have primarily focused on evaluating the safety of the models without considering their robustness. In this paper, we propose a benchmark that assesses both the safety and robustness of LLMs, emphasizing the need for a balanced approach. To comprehensively study text safety and output robustness, we introduce a latent jailbreak prompt dataset, each involving malicious instruction embedding. Specifically, we instruct the model to complete a regular task, such as translation, with the text to be translated containing malicious instructions. To further analyze safety and robustness, we design a hierarchical annotation framework. We present a systematic analysis of the safety and robustness of LLMs regarding the position of explicit normal instructions, word replacements (verbs in explicit normal instructions, target groups in malicious instructions, cue words for explicit normal instructions), and instruction replacements (different explicit normal instructions). Our results demonstrate that current LLMs not only prioritize certain instruction verbs but also exhibit varying jailbreak rates for different instruction verbs in explicit normal instructions. Code and data are available at //github.com/qiuhuachuan/latent-jailbreak.

語言模型化 · Taxonomy · MoDELS · motivation · 評論員 ·

2023 年 5 月 31 日

Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models

Chen Ling,Xujiang Zhao,Jiaying Lu,Chengyuan Deng,Can Zheng,Junxiang Wang,Tanmoy Chowdhury,Yun Li,Hejie Cui,Xuchao Zhang,Tianjiao Zhao,Amit Panalkar,Wei Cheng,Haoyu Wang,Yanchi Liu,Zhengzhang Chen,Haifeng Chen,Chris White,Quanquan Gu,Carl Yang,Liang Zhao

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task-agnostic foundation for a wide range of applications. The great promise of LLMs as general task solvers motivated people to extend their functionality largely beyond just a ``chatbot'', and use it as an assistant or even replacement for domain experts and tools in specific domains such as healthcare, finance, and education. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). To fill such a gap, explosively-increase research, and practices have been conducted in very recent years on the domain specialization of LLMs, which, however, calls for a comprehensive and systematic review to better summarizes and guide this promising domain. In this survey paper, first, we propose a systematic taxonomy that categorizes the LLM domain-specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. We also present a comprehensive taxonomy of critical application domains that can benefit from specialized LLMs, discussing their practical significance and open challenges. Furthermore, we offer insights into the current research status and future trends in this area.

Processing（編程語言） · MoDELS · NLP · Taxonomy · 語言表示 ·

2020 年 3 月 18 日

Pre-trained Models for Natural Language Processing: A Survey

Xipeng Qiu,Tianxiang Sun,Yige Xu,Yunfan Shao,Ning Dai,Xuanjing Huang

from arxiv, Invited Review of Science China Technological Sciences

Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy with four perspectives. Next, we describe how to adapt the knowledge of PTMs to the downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.