久久久久精品电影_亚洲一区二区免费视频_又粗又猛又黄又爽无遮挡免费的_精品无码中文字幕_日韩欧美一区二区久久婷婷_国产成人综合亚91_欧美精品成人3D在线

This paper proposes a simple yet effective jailbreak attack named FlipAttack against black-box LLMs. First, from the autoregressive nature, we reveal that LLMs tend to understand the text from left to right and find that they struggle to comprehend the text when noise is added to the left side. Motivated by these insights, we propose to disguise the harmful prompt by constructing left-side noise merely based on the prompt itself, then generalize this idea to 4 flipping modes. Second, we verify the strong ability of LLMs to perform the text-flipping task, and then develop 4 variants to guide LLMs to denoise, understand, and execute harmful behaviors accurately. These designs keep FlipAttack universal, stealthy, and simple, allowing it to jailbreak black-box LLMs within only 1 query. Experiments on 8 LLMs demonstrate the superiority of FlipAttack. Remarkably, it achieves $\sim$98\% attack success rate on GPT-4o, and $\sim$98\% bypass rate against 5 guardrail models on average. The codes are available at GitHub\footnote{//github.com/yueliu1999/FlipAttack}.

相關內容

翻轉

關注 0

代價 · 回合 · 控制器 · 情景 · 設計 ·

2024 年 11 月 13 日

Lo-MARVE: A Low Cost Autonomous Underwater Vehicle for Marine Exploration

Karl Mason,Daniel Kelly

from arxiv, This paper was presented at the 12th International Conference on Control, Mechatronics and Automation (ICCMA 2024), held in London, UK, from November 11-13, 2024

This paper presents Low-cost Marine Autonomous Robotic Vehicle Explorer (Lo-MARVE), a novel autonomous underwater vehicle (AUV) designed to provide a low cost solution for underwater exploration and environmental monitoring in shallow water environments. Lo-MARVE offers a cost-effective alternative to existing AUVs, featuring a modular design, low-cost sensors, and wireless communication capabilities. The total cost of Lo-MARVE is approximately EUR 500. Lo-MARVE is developed using the Raspberry Pi 4B microprocessor, with control software written in Python. The proposed AUV was validated through field testing outside of a laboratory setting, in the freshwater environment of the River Corrib in Galway, Ireland. This demonstrates its ability to navigate autonomously, collect data, and communicate effectively outside of a controlled laboratory setting. The successful deployment of Lo-MARVE in a real-world environment validates its proof of concept.

SimPLe · 基準 · 多峰值 · Networking · 論文 ·

2024 年 11 月 12 日

SimBase: A Simple Baseline for Temporal Video Grounding

Peijun Bao,Alex C. Kot

from arxiv, Technical report

This paper presents SimBase, a simple yet effective baseline for temporal video grounding. While recent advances in temporal grounding have led to impressive performance, they have also driven network architectures toward greater complexity, with a range of methods to (1) capture temporal relationships and (2) achieve effective multimodal fusion. In contrast, this paper explores the question: How effective can a simplified approach be? To investigate, we design SimBase, a network that leverages lightweight, one-dimensional temporal convolutional layers instead of complex temporal structures. For cross-modal interaction, SimBase only employs an element-wise product instead of intricate multimodal fusion. Remarkably, SimBase achieves state-of-the-art results on two large-scale datasets. As a simple yet powerful baseline, we hope SimBase will spark new ideas and streamline future evaluations in temporal video grounding.

MoDELS · Agent · state-of-the-art · AI · 大語言模型 ·

2024 年 11 月 12 日

World Models: The Safety Perspective

Zifan Zeng,Chongzhe Zhang,Feng Liu,Joseph Sifakis,Qunli Zhang,Shiming Liu,Peng Wang

from arxiv, 8 pages, 3 figures, accepted at the International Workshop on Dependability Modeling and Design (WDMD) during the IEEE International Symposium on Software Reliability Engineering (ISSRE)

With the proliferation of the Large Language Model (LLM), the concept of World Models (WM) has recently attracted a great deal of attention in the AI research community, especially in the context of AI agents. It is arguably evolving into an essential foundation for building AI agent systems. A WM is intended to help the agent predict the future evolution of environmental states or help the agent fill in missing information so that it can plan its actions and behave safely. The safety property of WM plays a key role in their effective use in critical applications. In this work, we review and analyze the impacts of the current state-of-the-art in WM technology from the point of view of trustworthiness and safety based on a comprehensive survey and the fields of application envisaged. We provide an in-depth analysis of state-of-the-art WMs and derive technical research challenges and their impact in order to call on the research community to collaborate on improving the safety and trustworthiness of WM.

MoDELS · 視覺問答 · 自動問答 · 可理解性 · 講稿 ·

2024 年 11 月 12 日

SparrowVQE: Visual Question Explanation for Course Content Understanding

Jialu Li,Manish Kumar Thota,Ruslan Gokhman,Radek Holik,Youshan Zhang

Visual Question Answering (VQA) research seeks to create AI systems to answer natural language questions in images, yet VQA methods often yield overly simplistic and short answers. This paper aims to advance the field by introducing Visual Question Explanation (VQE), which enhances the ability of VQA to provide detailed explanations rather than brief responses and address the need for more complex interaction with visual content. We first created an MLVQE dataset from a 14-week streamed video machine learning course, including 885 slide images, 110,407 words of transcripts, and 9,416 designed question-answer (QA) pairs. Next, we proposed a novel SparrowVQE, a small 3 billion parameters multimodal model. We trained our model with a three-stage training mechanism consisting of multimodal pre-training (slide images and transcripts feature alignment), instruction tuning (tuning the pre-trained model with transcripts and QA pairs), and domain fine-tuning (fine-tuning slide image and QA pairs). Eventually, our SparrowVQE can understand and connect visual information using the SigLIP model with transcripts using the Phi-2 language model with an MLP adapter. Experimental results demonstrate that our SparrowVQE achieves better performance in our developed MLVQE dataset and outperforms state-of-the-art methods in the other five benchmark VQA datasets. The source code is available at \url{//github.com/YoushanZhang/SparrowVQE}.

Agent · Automator · Processing（編程語言） · 設計 · Integration ·

2024 年 11 月 11 日

StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration

Panwen Hu,Jin Jiang,Jianqi Chen,Mingfei Han,Shengcai Liao,Xiaojun Chang,Xiaodan Liang

The advent of AI-Generated Content (AIGC) has spurred research into automated video generation to streamline conventional processes. However, automating storytelling video production, particularly for customized narratives, remains challenging due to the complexity of maintaining subject consistency across shots. While existing approaches like Mora and AesopAgent integrate multiple agents for Story-to-Video (S2V) generation, they fall short in preserving protagonist consistency and supporting Customized Storytelling Video Generation (CSVG). To address these limitations, we propose StoryAgent, a multi-agent framework designed for CSVG. StoryAgent decomposes CSVG into distinct subtasks assigned to specialized agents, mirroring the professional production process. Notably, our framework includes agents for story design, storyboard generation, video creation, agent coordination, and result evaluation. Leveraging the strengths of different models, StoryAgent enhances control over the generation process, significantly improving character consistency. Specifically, we introduce a customized Image-to-Video (I2V) method, LoRA-BE, to enhance intra-shot temporal consistency, while a novel storyboard generation pipeline is proposed to maintain subject consistency across shots. Extensive experiments demonstrate the effectiveness of our approach in synthesizing highly consistent storytelling videos, outperforming state-of-the-art methods. Our contributions include the introduction of StoryAgent, a versatile framework for video generation tasks, and novel techniques for preserving protagonist consistency.

MoDELS · 多樣性 · 優化器 · 原點 · 有向 ·

2024 年 11 月 11 日

SeedEdit: Align Image Re-Generation to Image Editing

Yichun Shi,Peng Wang,Weilin Huang

from arxiv, Our website: //team.doubao.com/seededit

We introduce SeedEdit, a diffusion model that is able to revise a given image with any text prompt. In our perspective, the key to such a task is to obtain an optimal balance between maintaining the original image, i.e. image reconstruction, and generating a new image, i.e. image re-generation. To this end, we start from a weak generator (text-to-image model) that creates diverse pairs between such two directions and gradually align it into a strong image editor that well balances between the two tasks. SeedEdit can achieve more diverse and stable editing capability over prior image editing methods, enabling sequential revision over images generated by diffusion models.

VR · 虛擬現實（VR） · 回合 · Notability · 規范化的 ·

2024 年 11 月 7 日

Break Times: Virtual Reality Art Therapy

Yi Rou Yap,Yun Li Lee

from arxiv, Part of proceedings of 6th International Conference AsiaHaptics 2024

This paper presents a Virtual Reality (VR) art therapy known as "Break Times" which aims to enhance students' mental well-being and foster creative expression. The proposed "Break Times" application mimics the art therapy sessions in the VR environment design. Pilot user acceptance test with 10 participants showed a notable reduction in stress levels, with 50% reporting normal stress levels post-intervention, compared to 20% pre-intervention. Participants praised the "Break Times" therapy's functionality and engagement features and suggested improvements such as saving creations, incorporating 3D painting, and expanding the artmaking scene variety. The study highlights that VR art therapy has potential as an effective tool for stress management, emphasizing the need for continued refinement to maximize its therapeutic benefits.

Agent · Extensibility · INTERACT · MoDELS · 講稿 ·

2023 年 8 月 7 日

AgentBench: Evaluating LLMs as Agents

Xiao Liu,Hao Yu,Hanchen Zhang,Yifan Xu,Xuanyu Lei,Hanyu Lai,Yu Gu,Hangliang Ding,Kaiwen Men,Kejuan Yang,Shudan Zhang,Xiang Deng,Aohan Zeng,Zhengxiao Du,Chenhui Zhang,Sheng Shen,Tianjun Zhang,Yu Su,Huan Sun,Minlie Huang,Yuxiao Dong,Jie Tang

from arxiv, 38 pages

Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has been an urgent need to evaluate LLMs as agents on challenging tasks in interactive environments. We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. Our extensive test over 25 LLMs (including APIs and open-sourced models) shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and open-sourced competitors. It also serves as a component of an ongoing project with wider coverage and deeper consideration towards systematic LLM evaluation. Datasets, environments, and an integrated evaluation package for AgentBench are released at //github.com/THUDM/AgentBench

圖 · 知識圖譜 · 鏈路預測 · Extensibility · entity ·

2020 年 10 月 6 日

CoDEx: A Comprehensive Knowledge Graph Completion Benchmark

Tara Safavi,Danai Koutra

from arxiv, EMNLP 2020

We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at //bit.ly/2EPbrJs.

contrastive · Performer · 無監督 · 控制器 · 學成 ·

2020 年 4 月 28 日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Aravind Srinivas,Michael Laskin,Pieter Abbeel

from arxiv, First two authors contributed equally, website: //mishalaskin.github.io/curl code: //github.com/MishaLaskin/curl

We present CURL: Contrastive Unsupervised Representations for Reinforcement Learning. CURL extracts high-level features from raw pixels using contrastive learning and performs off-policy control on top of the extracted features. CURL outperforms prior pixel-based methods, both model-based and model-free, on complex tasks in the DeepMind Control Suite and Atari Games showing 1.9x and 1.6x performance gains at the 100K environment and interaction steps benchmarks respectively. On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency and performance of methods that use state-based features.