丰满人妻被公侵犯高清版_99国产精品久久久久99打野战_美女一级在线观看网站国产_亚洲欧美日韩精品一区二区在线_国产欧美日韩另类色视频_日本一本二本三区中文字幕_无码精品人妻AV一区二区三区

The open-source publishing of large language models (LLMs) has created many possibilities for how anyone who understands language and has access to a computer can interact with significant tools of artificial intelligence, particularly in the context of learning and knowledge dissemination. However, the utility of these models in specialized fields like Classics is still largely unexplored. This project is an attempt to merge the knowledge of Classics with the capabilities of artificial intelligence by finetuning an LLM to cater to the specific needs of learners and professionals. The goal of this project is to develop an LLM that not only reproduces contextual knowledge accurately but also exhibits a consistent "personality" - and, indeed, has consistent propriety - to appeal to a diverse audience who possess differing levels of knowledge. A significant portion of this project was dedicated to refining the dataset, following the principle of "garbage in, garbage out," to ensure the model generates relevant, useful, and creative responses when given a prompt (a statement, question, or single word). After training and evaluation, my model's ability to handle a vast array of different types of inputs and prompting exceeded expectations for a 355M parameter model, though its occasional hallucinations (especially when set with a high temperature), particularly in its assertions about historical events or its own identity, make it seem somewhat capricious and more work in the form of continuous finetuning will be undertaken.

相關內容

大語言模型

關注 56

大語言模型是基于海量文本數據訓練的深度學習模型。它不僅能夠生成自然語言文本，還能夠深入理解文本含義，處理各種自然語言任務，如文本摘要、問答、翻譯等。2023年，大語言模型及其在人工智能領域的應用已成為全球科技研究的熱點，其在規模上的增長尤為引人注目，參數量已從最初的十幾億躍升到如今的一萬億。參數量的提升使得模型能夠更加精細地捕捉人類語言微妙之處，更加深入地理解人類語言的復雜性。在過去的一年里，大語言模型在吸納新知識、分解復雜任務以及圖文對齊等多方面都有顯著提升。隨著技術的不斷成熟，它將不斷拓展其應用范圍，為人類提供更加智能化和個性化的服務，進一步改善人們的生活和生產方式。

MoDELS · 輸出 · 語言模型化 · 大語言模型 · 小樣本學習 ·

2024 年 2 月 2 日

Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

Angelica Chen,Jason Phang,Alicia Parrish,Vishakh Padmakumar,Chen Zhao,Samuel R. Bowman,Kyunghyun Cho

from arxiv, Accepted to TMLR: //openreview.net/forum?id=5nBqY1y96B

Large language models (LLMs) have achieved widespread success on a variety of in-context few-shot tasks, but this success is typically evaluated via correctness rather than consistency. We argue that self-consistency is an important criteria for valid multi-step reasoning in tasks where the solution is composed of the answers to multiple sub-steps. We propose two types of self-consistency that are particularly important for multi-step reasoning -- hypothetical consistency (a model's ability to predict what its output would be in a hypothetical other context) and compositional consistency (consistency of a model's final outputs when intermediate sub-steps are replaced with the model's outputs for those steps). We demonstrate that multiple variants of the GPT-3/-4 models exhibit poor consistency rates across both types of consistency on a variety of tasks.

大語言模型 · 語言模型化 · 同質 · Less · MoDELS ·

2024 年 2 月 2 日

Homogenization Effects of Large Language Models on Human Creative Ideation

Barrett R. Anderson,Jash Hemant Shah,Max Kreminski

from arxiv, 20 pages, 7 figures

Large language models (LLMs) are now being used in a wide variety of contexts, including as creativity support tools (CSTs) intended to help their users come up with new ideas. But do LLMs actually support user creativity? We hypothesized that the use of an LLM as a CST might make the LLM's users feel more creative, and even broaden the range of ideas suggested by each individual user, but also homogenize the ideas suggested by different users. We conducted a 36-participant comparative user study and found, in accordance with the homogenization hypothesis, that different users tended to produce less semantically distinct ideas with ChatGPT than with an alternative CST. Additionally, ChatGPT users generated a greater number of more detailed ideas, but felt less responsible for the ideas they generated. We discuss potential implications of these findings for users, designers, and developers of LLM-based CSTs.

多重集 · 變換 · CASE · 話題 · 講稿 ·

2024 年 2 月 1 日

Multiset Bisimulations as a Common Framework for Ordinary and Probabilistic Bisimulations

David de Frutos-Escrig,Miguel Palomino,Ignacio Fábregas

Our concrete objective is to present both ordinary bisimulations and probabilistic bisimulations in a common coalgebraic framework based on multiset bisimulations. For that we show how to relate the underlying powerset and probabilistic distributions functors with the multiset functor by means of adequate natural transformations. This leads us to the general topic that we investigate in the paper: a natural transformation from a functor F to another G transforms F-bisimulations into G-bisimulations but, in general, it is not possible to express G-bisimulations in terms of F-bisimulations. However, they can be characterized by considering Hughes and Jacobs' notion of simulation, taking as the order on the functor F the equivalence induced by the epi-mono decomposition of the natural transformation relating F and G. We also consider the case of alternating probabilistic systems where non-deterministic and probabilistic choices are mixed, although only in a partial way, and extend all these results to categorical simulations.

INFORMS · 可行 · 可辨認的 · 預測器/決策函數 · Principle ·

2024 年 2 月 1 日

Distinguishing the Indistinguishable: Human Expertise in Algorithmic Prediction

Rohan Alur,Manish Raghavan,Devavrat Shah

We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach focuses on the use of human judgment to distinguish inputs which `look the same' to any feasible predictive algorithm. We argue that this framing clarifies the problem of human/AI collaboration in prediction tasks, as experts often have access to information -- particularly subjective information -- which is not encoded in the algorithm's training data. We use this insight to develop a set of principled algorithms for selectively incorporating human feedback only when it improves the performance of any feasible predictor. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can significantly improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly 30% of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.

INTERACT · 正則的 · 散度 · 全 · 近似 ·

2024 年 2 月 1 日

On the Semantic Expressiveness of Iso- and Equi-Recursive Types

Dominique Devriese,Eric Mark Martin,Marco Patrignani

Recursive types extend the simply-typed lambda calculus (STLC) with the additional expressive power to enable diverging computation and to encode recursive data-types (e.g., lists). Two formulations of recursive types exist: iso-recursive and equi-recursive. The relative advantages of iso- and equi-recursion are well-studied when it comes to their impact on type-inference. However, the relative semantic expressiveness of the two formulations remains unclear so far. This paper studies the semantic expressiveness of STLC with iso- and equi-recursive types, proving that these formulations are equally expressive. In fact, we prove that they are both as expressive as STLC with only term-level recursion. We phrase these equi-expressiveness results in terms of full abstraction of three canonical compilers between these three languages (STLC with iso-, with equi-recursive types and with term-level recursion). Our choice of languages allows us to study expressiveness when interacting over both a simply-typed and a recursively-typed interface. The three proofs all rely on a typed version of a proof technique called approximate backtranslation. Together, our results show that there is no difference in semantic expressiveness between STLCs with iso- and equi-recursive types. In this paper, we focus on a simply-typed setting but we believe our results scale to more powerful type systems like System F.

語言模型化 · 大語言模型 · MoDELS · Packing · 損失 ·

2024 年 1 月 31 日

LongAlign: A Recipe for Long Context Alignment of Large Language Models

Yushi Bai,Xin Lv,Jiajie Zhang,Yuze He,Ji Qi,Lei Hou,Jie Tang,Yuxiao Dong,Juanzi Li

Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present LongAlign -- a recipe of the instruction data, training, and evaluation for long context alignment. First, we construct a long instruction-following dataset using Self-Instruct. To ensure the data diversity, it covers a broad range of tasks from various long context sources. Second, we adopt the packing and sorted batching strategies to speed up supervised fine-tuning on data with varied length distributions. Additionally, we develop a loss weighting method to balance the contribution to the loss across different sequences during packing training. Third, we introduce the LongBench-Chat benchmark for evaluating instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing recipes for LLMs in long context tasks by up to 30\%, while also maintaining their proficiency in handling short, generic tasks. The code, data, and long-aligned models are open-sourced at //github.com/THUDM/LongAlign.

知識 (knowledge) · 語言模型化 · 大語言模型 · MoDELS · Notability ·

2024 年 1 月 31 日

Neighboring Perturbations of Knowledge Editing on Large Language Models

Jun-Yu Ma,Jia-Chen Gu,Ningyu Zhang,Zhen-Hua Ling

Despite their exceptional capabilities, large language models (LLMs) are prone to generating unintended text due to false or outdated knowledge. Given the resource-intensive nature of retraining LLMs, there has been a notable increase in the development of knowledge editing. However, current approaches and evaluations rarely explore the perturbation of editing on neighboring knowledge. This paper studies whether updating new knowledge to LLMs perturbs the neighboring knowledge encapsulated within them. Specifically, we seek to figure out whether appending a new answer into an answer list to a factual question leads to catastrophic forgetting of original correct answers in this list, as well as unintentional inclusion of incorrect answers. A metric of additivity is introduced and a benchmark dubbed as Perturbation Evaluation of Appending Knowledge (PEAK) is constructed to evaluate the degree of perturbation to neighboring knowledge when appending new knowledge. Besides, a plug-and-play framework termed Appending via Preservation and Prevention (APP) is proposed to mitigate the neighboring perturbation by maintaining the integrity of the answer list. Experiments demonstrate the effectiveness of APP coupling with four editing methods on three LLMs.

TOOLS · 知識 (knowledge) · 大語言模型 · Agent · 自動問答 ·

2024 年 1 月 30 日

Efficient Tool Use with Chain-of-Abstraction Reasoning

Silin Gao,Jane Dwivedi-Yu,Ping Yu,Xiaoqing Ellen Tan,Ramakanth Pasunuru,Olga Golovneva,Koustuv Sinha,Asli Celikyilmaz,Antoine Bosselut,Tianlu Wang

To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls require holistic and efficient tool usage planning. In this work, we propose a new method for LLMs to better leverage tools in multi-step reasoning. Our method, Chain-of-Abstraction (CoA), trains LLMs to first decode reasoning chains with abstract placeholders, and then call domain tools to reify each reasoning chain by filling in specific knowledge. This planning with abstract chains enables LLMs to learn more general reasoning strategies, which are robust to shifts of domain knowledge (e.g., math results) relevant to different reasoning questions. It also allows LLMs to perform decoding and calling of external tools in parallel, which avoids the inference delay caused by waiting for tool responses. In mathematical reasoning and Wiki QA domains, we show that our method consistently outperforms previous chain-of-thought and tool-augmented baselines on both in-distribution and out-of-distribution test sets, with an average ~6% absolute QA accuracy improvement. LLM agents trained with our method also show more efficient tool use, with inference speed being on average ~1.4x faster than baseline tool-augmented LLMs.

語言模型化 · MoDELS · Taxonomy · AIM · 散度 ·

2023 年 9 月 3 日

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Yue Zhang,Yafu Li,Leyang Cui,Deng Cai,Lemao Liu,Tingchen Fu,Xinting Huang,Enbo Zhao,Yu Zhang,Yulong Chen,Longyue Wang,Anh Tuan Luu,Wei Bi,Freda Shi,Shuming Shi

from arxiv, work in progress; 32 pages

While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.

語言模型化 · Taxonomy · MoDELS · motivation · 評論員 ·

2023 年 5 月 31 日

Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models

Chen Ling,Xujiang Zhao,Jiaying Lu,Chengyuan Deng,Can Zheng,Junxiang Wang,Tanmoy Chowdhury,Yun Li,Hejie Cui,Xuchao Zhang,Tianjiao Zhao,Amit Panalkar,Wei Cheng,Haoyu Wang,Yanchi Liu,Zhengzhang Chen,Haifeng Chen,Chris White,Quanquan Gu,Carl Yang,Liang Zhao

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task-agnostic foundation for a wide range of applications. The great promise of LLMs as general task solvers motivated people to extend their functionality largely beyond just a ``chatbot'', and use it as an assistant or even replacement for domain experts and tools in specific domains such as healthcare, finance, and education. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). To fill such a gap, explosively-increase research, and practices have been conducted in very recent years on the domain specialization of LLMs, which, however, calls for a comprehensive and systematic review to better summarizes and guide this promising domain. In this survey paper, first, we propose a systematic taxonomy that categorizes the LLM domain-specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. We also present a comprehensive taxonomy of critical application domains that can benefit from specialized LLMs, discussing their practical significance and open challenges. Furthermore, we offer insights into the current research status and future trends in this area.