成人艳情一二三区按摩_亚洲欧洲视频图片_亚洲天堂无码地址二_免费人成在线观看网站3D无码_好大好深好猛好爽视频试看_国产午夜福利一区二区三区在线_又粗又大又深又硬又爽免费

Dmitry Zmitrovich,Alexander Abramov,Andrey Kalmykov,Maria Tikhonova,Ekaterina Taktasheva,Danil Astafurov,Mark Baushenko,Artem Snegirev,Vitalii Kadulin,Sergey Markov,Tatiana Shavrina,Vladislav Mikhailov,Alena Fenogenova

from arxiv, to appear in LREC-COLING-2024

Transformer language models (LMs) are fundamental to NLP research methodologies and applications in various languages. However, developing such models specifically for the Russian language has received little attention. This paper introduces a collection of 13 Russian Transformer LMs, which spans encoder (ruBERT, ruRoBERTa, ruELECTRA), decoder (ruGPT-3), and encoder-decoder (ruT5, FRED-T5) architectures. We provide a report on the model architecture design and pretraining, and the results of evaluating their generalization abilities on Russian language understanding and generation datasets and benchmarks. By pretraining and releasing these specialized Transformer LMs, we aim to broaden the scope of the NLP research directions and enable the development of industrial solutions for the Russian language.

相關內容

變(bian)換

關注 2

INFORMS · 生成式人工智能 · AI · 評論員 · 可理解性 ·

2024 年 5 月 19 日

Sociotechnical Implications of Generative Artificial Intelligence for Information Access

Bhaskar Mitra,Henriette Cramer,Olya Gurevich

Robust access to trustworthy information is a critical need for society with implications for knowledge production, public health education, and promoting informed citizenry in democratic societies. Generative AI technologies may enable new ways to access information and improve effectiveness of existing information retrieval systems but we are only starting to understand and grapple with their long-term social implications. In this chapter, we present an overview of some of the systemic consequences and risks of employing generative AI in the context of information access. We also provide recommendations for evaluation and mitigation, and discuss challenges for future research.

Performer · 大語言模型 · 語言模型化 · 解碼 · MoDELS ·

2024 年 5 月 18 日

An Opportunistically Parallel Lambda Calculus for Performant Composition of Large Language Models

Stephen Mell,Steve Zdancewic,Osbert Bastani

Large language models (LLMs) have shown impressive results at a wide-range of tasks. However, they have limitations, such as hallucinating facts and struggling with arithmetic. Recent work has addressed these issues with sophisticated decoding techniques. However, performant decoding, particularly for sophisticated techniques, relies crucially on parallelization and batching, which are difficult for developers. We make two observations: 1) existing approaches are high-level domain-specific languages for gluing expensive black-box calls, but are not general or compositional; 2) LLM programs are essentially pure (all effects commute). Guided by these observations, we develop a novel, general-purpose lambda calculus for automatically parallelizing a wide-range of LLM interactions, without user intervention. The key difference versus standard lambda calculus is a novel "opportunistic" evaluation strategy, which steps independent parts of a program in parallel, dispatching black-box external calls as eagerly as possible, even while data-independent parts of the program are waiting for their own external calls to return. To maintain the simplicity of the language and to ensure uniformity of opportunistic evaluation, control-flow and looping constructs are implemented in-language, via Church encodings. We implement this approach in a framework called EPIC, embedded in--and interoperating closely with--Python. We demonstrate its versatility and performance with three case studies drawn from the machine learning literature: Tree-of-Thoughts (LLMs embedded in classic search procedures), nested tool use, and constrained decoding. Our experiments show that opportunistic evaluation offers a $1.5\times$ to $4.8\times$ speedup over sequential evaluation, while still allowing practitioners to write straightforward and composable programs, without any manual parallelism or batching.

語言模型化 · MoDELS · 大語言模型 · Engineering · Prompt ·

2024 年 5 月 17 日

Realistic Evaluation of Toxicity in Large Language Models

Tinh Son Luong,Thanh-Thien Le,Linh Ngo Van,Thien Huu Nguyen

Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.

Weight · 優化器 · MoDELS · Performance · HTTPS ·

2024 年 5 月 17 日

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Wenhua Cheng,Weiwei Zhang,Haihao Shen,Yiyang Cai,Xin He,Kaokao Lv,Yi Liu

Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks. However, their deployment presents significant challenges due to their substantial memory and storage requirements. To address this challenge, weight-only quantization has emerged as a promising solution. Previous research has indicated that fine-tuning through up and down rounding can enhance performance. In this study, we introduce SignRound, a method that utilizes signed gradient descent (SignSGD) to optimize rounding values and weight clipping within just 200 steps, combining the strengths of both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ). SignRound achieves outstanding results compared to recent methods across 2 to 4 bits, while maintaining low tuning costs and without introducing any additional inference overhead. For instance, SignRound led to absolute average accuracy improvements ranging from 6.91\% to 33.22\% at 2 bits. Furthermore, it demonstrates robust generalization to various recent models and achieves near-lossless quantization in most scenarios at 4 bits. The source code is publicly available at \url{//github.com/intel/auto-round}.

MoDELS · 語言模型化 · 大語言模型 · Performer · Pivotal（公司） ·

2024 年 5 月 16 日

A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks

Xuanfan Ni,Piji Li

from arxiv, CCL2023

Recent efforts have evaluated large language models (LLMs) in areas such as commonsense reasoning, mathematical reasoning, and code generation. However, to the best of our knowledge, no work has specifically investigated the performance of LLMs in natural language generation (NLG) tasks, a pivotal criterion for determining model excellence. Thus, this paper conducts a comprehensive evaluation of well-known and high-performing LLMs, namely ChatGPT, ChatGLM, T5-based models, LLaMA-based models, and Pythia-based models, in the context of NLG tasks. We select English and Chinese datasets encompassing Dialogue Generation and Text Summarization. Moreover, we propose a common evaluation setting that incorporates input templates and post-processing strategies. Our study reports both automatic results, accompanied by a detailed analysis.

MoDELS · 語言模型化 · 大語言模型 · 有偏 · Extensibility ·

2024 年 5 月 15 日

Spectral Editing of Activations for Large Language Model Alignment

Yifu Qiu,Zheng Zhao,Yftah Ziser,Anna Korhonen,Edoardo M. Ponti,Shay B. Cohen

Large language models (LLMs) often exhibit undesirable behaviours, such as generating untruthful or biased content. Editing their internal representations has been shown to be effective in mitigating such behaviours on top of the existing alignment methods. We propose a novel inference-time editing method, namely spectral editing of activations (SEA), to project the input representations into directions with maximal covariance with the positive demonstrations (e.g., truthful) while minimising covariance with the negative demonstrations (e.g., hallucinated). We also extend our method to non-linear editing using feature functions. We run extensive experiments on benchmarks concerning truthfulness and bias with six open-source LLMs of different sizes and model families. The results demonstrate the superiority of SEA in effectiveness, generalisation to similar tasks, as well as inference and data efficiency. We also show that SEA editing only has a limited negative impact on other model capabilities.

估計/估計量 · 剪枝 · 語言模型化 · MoDELS · 模型評估 ·

2024 年 5 月 14 日

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion

Jun Liu,Chao Wu,Changdi Yang,Hao Tang,Haoye Dong,Zhenglun Kong,Geng Yuan,Wei Niu,Dong Huang,Yanzhi Wang

Large language models (LLMs) have become crucial for many generative downstream tasks, leading to an inevitable trend and significant challenge to deploy them efficiently on resource-constrained devices. Structured pruning is a widely used method to address this challenge. However, when dealing with the complex structure of the multiple decoder layers, general methods often employ common estimation approaches for pruning. These approaches lead to a decline in accuracy for specific downstream tasks. In this paper, we introduce a simple yet efficient method that adaptively models the importance of each substructure. Meanwhile, it can adaptively fuse coarse-grained and finegrained estimations based on the results from complex and multilayer structures. All aspects of our design seamlessly integrate into the endto-end pruning framework. Our experimental results, compared with state-of-the-art methods on mainstream datasets, demonstrate average accuracy improvements of 1.1%, 1.02%, 2.0%, and 1.2% for LLaMa-7B,Vicuna-7B, Baichuan-7B, and Bloom-7b1, respectively.

Performer · MoDELS · 泛函 · 可約的 · 縮放 ·

2024 年 5 月 12 日

Zenix: Efficient Execution of Bulky Serverless Applications

Zhiyuan Guo,Zachary Blanco,Junda Chen,Jinmou Li,Zerui Wei,Bili Dong,Ishaan Pota,Mohammad Shahrad,Harry Xu,Yiying Zhang

Serverless computing, commonly offered as Function-as-a-Service, was initially designed for small, lean applications. However, there has been an increasing desire to run larger, more complex applications (what we call bulky applications) in a serverless manner. Existing strategies for enabling such applications are to either increase function sizes or to rewrite applications as DAGs of functions. These approaches cause significant resource wastage, manual efforts, and/or performance overhead. We argue that the root cause of these issues is today's function-centric serverless model, where a function is the resource allocation and scaling unit. We propose a new, resource-centric serverless-computing model for executing bulky applications in a resource- and performance-efficient way, and we build the Zenix serverless platform following this model. Our results show that Zenix reduces resource consumption by up to 90% compared to today's function-centric serverless systems, while improving performance by up to 64%.

結構化學習 · 圖 · 稀疏 · 圖形處理器 · Neural Networks ·

2021 年 12 月 13 日

Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Yinhua Piao,Sangseon Lee,Dohoon Lee,Sun Kim

from arxiv, Accepted by AAAI 2022

Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structures can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document.

state-of-the-art · LISA · Performer · MoDELS · 標注 ·

2018 年 8 月 28 日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Emma Strubell,Patrick Verga,Daniel Andor,David Weiss,Andrew McCallum

from arxiv, In Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium. October 2018

Current state-of-the-art semantic role labeling (SRL) uses a deep neural network with no explicit linguistic features. However, prior work has shown that gold syntax trees can dramatically improve SRL decoding, suggesting the possibility of increased accuracy from explicit modeling of syntax. In this work, we present linguistically-informed self-attention (LISA): a neural network model that combines multi-head self-attention with multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL. Unlike previous models which require significant pre-processing to prepare linguistic features, LISA can incorporate syntax using merely raw tokens as input, encoding the sequence only once to simultaneously perform parsing, predicate detection and role labeling for all predicates. Syntax is incorporated by training one attention head to attend to syntactic parents for each token. Moreover, if a high-quality syntactic parse is already available, it can be beneficially injected at test time without re-training our SRL model. In experiments on CoNLL-2005 SRL, LISA achieves new state-of-the-art performance for a model using predicted predicates and standard word embeddings, attaining 2.5 F1 absolute higher than the previous state-of-the-art on newswire and more than 3.5 F1 on out-of-domain data, nearly 10% reduction in error. On ConLL-2012 English SRL we also show an improvement of more than 2.5 F1. LISA also out-performs the state-of-the-art with contextually-encoded (ELMo) word representations, by nearly 1.0 F1 on news and more than 2.0 F1 on out-of-domain text.