91精品综合久久久久久五月天,欧美成人亚洲国产中文精品,永久免费AV无码网站国产看

Many commercial and open-source models claim to detect machine-generated text with very high accuracy (99\% or higher). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging -- lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. RAID includes over 6 million generations spanning 11 models, 8 domains, 11 adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the out-of-domain and adversarial robustness of 8 open- and 4 closed-source detectors and find that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. We release our dataset and tools to encourage further exploration into detector robustness.

相關內容

RAID

關注 0

獨立硬盤冗余陣列（ RAID, Redundant Array of Independent Disks），舊稱 廉價磁盤冗余陣列（ Redundant Array of Inexpensive Disks），簡稱 硬盤陣列。其基本思想就是把多個相對便宜的硬盤組合起來，成為一個硬盤陣列組，使性能達到甚至超過一個價格昂貴、容量巨大的硬盤。

語言模型化 · MoDELS · Performance · Learning · 大語言模型 ·

2024 年 6 月 21 日

DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

Andong Chen,Lianzhang Lou,Kehai Chen,Xuefeng Bai,Yang Xiang,Muyun Yang,Tiejun Zhao,Min Zhang

from arxiv, Accepted to ACL 2024 main conference

Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual learning of translation tasks to provide effective feedback, thereby enhancing the models' self-reflective abilities and improving translation performance. The application of this method across various translation tasks has proven its effectiveness in improving translation accuracy and eliminating ambiguities, especially in translation tasks with low-resource language pairs.

Analysis · entity · 相同 · 可辨認的 · 估計/估計量 ·

2024 年 6 月 20 日

Analysis of Linked Files: A Missing Data Perspective

Gauri Kamat,Roee Gutman

from arxiv, Accepted manuscript, to be published in Statistical Science

In many applications, researchers seek to identify overlapping entities across multiple data files. Record linkage algorithms facilitate this task, in the absence of unique identifiers. As these algorithms rely on semi-identifying information, they may miss records that represent the same entity, or incorrectly link records that do not represent the same entity. Analysis of linked files commonly ignores such linkage errors, resulting in biased, or overly precise estimates of the associations of interest. We view record linkage as a missing data problem, and delineate the linkage mechanisms that underpin analysis methods with linked files. Following the missing data literature, we group these methods under three categories: likelihood and Bayesian methods, imputation methods, and weighting methods. We summarize the assumptions and limitations of the methods, and evaluate their performance in a wide range of simulation scenarios.

MoDELS · 語言模型化 · Networking · 大語言模型 · Performer ·

2024 年 6 月 20 日

MemDPT: Differential Privacy for Memory Efficient Language Models

Yanming Liu,Xinyue Peng,Jiannan Cao,Yuwei Zhang,Chen Ma,Songhang Deng,Mengchen Fu,Xuhong Zhang,Sheng Cheng,Xun Wang,Jianwei Yin,Tianyu Du

from arxiv, 12 pages first version

Large language models have consistently demonstrated remarkable performance across a wide spectrum of applications. Nonetheless, the deployment of these models can inadvertently expose user privacy to potential risks. The substantial memory demands of these models during training represent a significant resource consumption challenge. The sheer size of these models imposes a considerable burden on memory resources, which is a matter of significant concern in practice. In this paper, we present an innovative training framework MemDPT that not only reduces the memory cost of large language models but also places a strong emphasis on safeguarding user data privacy. MemDPT provides edge network and reverse network designs to accommodate various differential privacy memory-efficient fine-tuning schemes. Our approach not only achieves $2 \sim 3 \times$ memory optimization but also provides robust privacy protection, ensuring that user data remains secure and confidential. Extensive experiments have demonstrated that MemDPT can effectively provide differential privacy efficient fine-tuning across various task scenarios.

任務對話系統 · 語言模型化 · MoDELS · 可約的 · 多峰值 ·

2024 年 6 月 18 日

PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems

Kentaro Mitsui,Koh Mitsuda,Toshiaki Wakatsuki,Yukiya Hono,Kei Sawada

from arxiv, 8 pages, 4 figures, 4 tables, demo samples: //rinnakk.github.io/research/publications/PSLM

Multimodal language models that process both text and speech have a potential for applications in spoken dialogue systems. However, current models face two major challenges in response generation latency: (1) generating a spoken response requires the prior generation of a written response, and (2) speech sequences are significantly longer than text sequences. This study addresses these issues by extending the input and output sequences of the language model to support the parallel generation of text and speech. Our experiments on spoken question answering tasks demonstrate that our approach improves latency while maintaining the quality of response content. Additionally, we show that latency can be further reduced by generating speech in multiple sequences. Demo samples are available at //rinnakk.github.io/research/publications/PSLM.

多峰值 · 優化器 · MoDELS · 語言模型化 · 大語言模型 ·

2024 年 6 月 17 日

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Fei Wang,Wenxuan Zhou,James Y. Huang,Nan Xu,Sheng Zhang,Hoifung Poon,Muhao Chen

Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment. Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement. Through a comparative experiment, we identify the unconditional preference problem in multimodal preference optimization, where the model overlooks the image condition. To address this problem, we propose mDPO, a multimodal DPO objective that prevents the over-prioritization of language-only preferences by also optimizing image preference. Moreover, we introduce a reward anchor that forces the reward to be positive for chosen responses, thereby avoiding the decrease in their likelihood -- an intrinsic problem of relative preference optimization. Experiments on two multimodal LLMs of different sizes and three widely used benchmarks demonstrate that mDPO effectively addresses the unconditional preference problem in multimodal preference optimization and significantly improves model performance, particularly in reducing hallucination.

Learning · 目標檢測 · 深度學習 · 可辨認的 · 可約的 ·

2024 年 6 月 17 日

YOLO-FEDER FusionNet: A Novel Deep Learning Architecture for Drone Detection

Tamara R. Lenhard,Andreas Weinmann,Stefan J?ger,Tobias Koch

from arxiv, 7 pages, 4 figures, 6 tables, to be published in the conference proceedings of the 2024 IEEE International Conference on Image Processing (ICIP)

Predominant methods for image-based drone detection frequently rely on employing generic object detection algorithms like YOLOv5. While proficient in identifying drones against homogeneous backgrounds, these algorithms often struggle in complex, highly textured environments. In such scenarios, drones seamlessly integrate into the background, creating camouflage effects that adversely affect the detection quality. To address this issue, we introduce a novel deep learning architecture called YOLO-FEDER FusionNet. Unlike conventional approaches, YOLO-FEDER FusionNet combines generic object detection methods with the specialized strength of camouflage object detection techniques to enhance drone detection capabilities. Comprehensive evaluations of YOLO-FEDER FusionNet show the efficiency of the proposed model and demonstrate substantial improvements in both reducing missed detections and false alarms.

Processing（編程語言） · 代碼 · HuggingFace · MoDELS · 情景 ·

2024 年 6 月 17 日

Long Code Arena: a Set of Benchmarks for Long-Context Code Models

Egor Bogomolov,Aleksandra Eliseeva,Timur Galimzyanov,Evgeniy Glukhov,Anton Shapkin,Maria Tigina,Yaroslav Golubev,Alexander Kovrigin,Arie van Deursen,Maliheh Izadi,Timofey Bryksin

from arxiv, 54 pages, 4 figures, 22 tables

Nowadays, the fields of code and natural language processing are evolving rapidly. In particular, models become better at processing long context windows - supported context sizes have increased by orders of magnitude over the last few years. However, there is a shortage of benchmarks for code processing that go beyond a single file of context, while the most popular ones are limited to a single method. With this work, we aim to close this gap by introducing Long Code Arena, a suite of six benchmarks for code processing tasks that require project-wide context. These tasks cover different aspects of code processing: library-based code generation, CI builds repair, project-level code completion, commit message generation, bug localization, and module summarization. For each task, we provide a manually verified dataset for testing, an evaluation suite, and open-source baseline solutions based on popular LLMs to showcase the usage of the dataset and to simplify adoption by other researchers. We publish the benchmark page on HuggingFace Spaces with the leaderboard, links to HuggingFace Hub for all the datasets, and link to the GitHub repository with baselines: //huggingface.co/spaces/JetBrains-Research/long-code-arena.

MoDELS · Networking · INFORMS · 可約的 · 代碼 ·

2024 年 6 月 17 日

Scalable Image Coding for Humans and Machines Using Feature Fusion Network

Takahiro Shindo,Taiju Watanabe,Yui Tatsumi,Hiroshi Watanabe

As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.

語言模型化 · MoDELS · HTTPS · 穩健性 · 大語言模型 ·

2024 年 6 月 16 日

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Patrick Chao,Edoardo Debenedetti,Alexander Robey,Maksym Andriushchenko,Francesco Croce,Vikash Sehwag,Edgar Dobriban,Nicolas Flammarion,George J. Pappas,Florian Tramer,Hamed Hassani,Eric Wong

from arxiv, JailbreakBench v1.0: more attack artifacts, more test-time defenses, a more accurate jailbreak judge (Llama-3-70B with a custom prompt), a larger dataset of human preferences for selecting a jailbreak judge (300 examples), an over-refusal evaluation dataset (100 benign/borderline behaviors), a semantic refusal judge based on Llama-3-8B

Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. First, there is no clear standard of practice regarding jailbreaking evaluation. Second, existing works compute costs and success rates in incomparable ways. And third, numerous works are not reproducible, as they withhold adversarial prompts, involve closed-source code, or rely on evolving proprietary APIs. To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work -- which align with OpenAI's usage policies; (3) a standardized evaluation framework at //github.com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at //jailbreakbench.github.io/ that tracks the performance of attacks and defenses for various LLMs. We have carefully considered the potential ethical implications of releasing this benchmark, and believe that it will be a net positive for the community.

樣例 · 變換 · Automator · 可辨認的 · 代碼 ·

2024 年 6 月 15 日

Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example

Malinda Dilhara,Abhiram Bellur,Timofey Bryksin,Danny Dig

from arxiv, This paper is accepted to Proceedings of the 32nd ACM Symposium on the Foundations of Software Engineering (FSE - 2024), This is an author copy

Software developers often repeat code changes, known as "code change patterns" (CPATs), within and across projects. Automating these CPATs accelerates development, but current Transformation by Example (TBE) techniques are limited by the input examples' quality and quantity, missing variations with different syntax or flow yet semantically similar. Large Language Models (LLMs), trained on vast code datasets, can overcome these limitations by generating semantically equivalent, unseen CPAT variants, enhancing TBE effectiveness. We identified best practices for using LLMs to generate code variants meeting criteria of correctness, usefulness, and applicability. Implementing these in PyCraft, combining static and dynamic analysis with LLMs, we achieved an F-measure of 96.6% in identifying correct variants, expanding inputs by 58x on average, and automating changes to increase target codes by up to 39x. Patches from PyCraft were submitted to projects like microsoft/DeepSpeed and IBM/inFairness, with an 83% acceptance rate, validating our approach's usefulness.