亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<li id='iyxor'></li>

_{^{<dd id='iyxor'><tbody id='iyxor'><td id='iyxor'><optgroup id='iyxor'><strong id='iyxor'></strong></optgroup><address id='iyxor'><ul id='iyxor'></ul></address><big id='iyxor'></big></td><table id='iyxor'></table></tbody><pre id='iyxor'></pre></dd><span id='iyxor'><b id='iyxor'></b></span>}}


<dfn id='iyxor'><optgroup id='iyxor'></optgroup></dfn><tfoot id='iyxor'><bdo id='iyxor'><div id='iyxor'></div><i id='iyxor'><dt id='iyxor'></dt></i></bdo></tfoot>

_{<fieldset id='iyxor'></fieldset>}

·

Prompt · 大語言模型 · MoDELS · 分離的 · Performer ·

2024 年 2 月 9 日

StruQ: Defending Against Prompt Injection with Structured Queries

Sizhe Chen,Julien Piet,Chawin Sitawarin,David Wagner

from arxiv, prompt injections, LLM security

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language understanding capabilities. However, as LLMs have improved, so have the attacks against them. Prompt injection attacks are an important threat: they trick the model to deviate from the original application's instructions and instead follow user directives. These attacks rely on the LLM's ability to follow instructions and inability to separate the prompts and user data. We introduce structured queries, a general approach to tackle this problem. Structured queries separate prompts and data into two channels. We implement a system that supports structured queries. This system is made of (1) a secure front-end that formats a prompt and user data into a special format, and (2) a specially trained LLM that can produce high-quality outputs from these inputs. The LLM is trained using a novel fine-tuning strategy: we convert a base (non-instruction-tuned) LLM to a structured instruction-tuned model that will only follow instructions in the prompt portion of a query. To do so, we augment standard instruction tuning datasets with examples that also include instructions in the data portion of the query, and fine-tune the model to ignore these. Our system significantly improves resistance to prompt injection attacks, with little or no impact on utility. Our code is released at //github.com/Sizhe-Chen/PromptInjectionDefense.

相關內容

Prompt

ASSETS · 3D · 多樣性 · MoDELS · Extensibility ·

2024 年 3 月 22 日

ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

Zhenwei Wang,Tengfei Wang,Gerhard Hancke,Ziwei Liu,Rynson W. H. Lau

from arxiv, Project page: //3dthemestation.github.io/

Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D generation. ThemeStation synthesizes customized 3D assets based on given few exemplars with two goals: 1) unity for generating 3D assets that thematically align with the given exemplars and 2) diversity for generating 3D assets with a high degree of variations. To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage. We propose a novel dual score distillation (DSD) loss to jointly leverage priors from both the input exemplars and the synthesized concept image. Extensive experiments and user studies confirm that ThemeStation surpasses prior works in producing diverse theme-aware 3D models with impressive quality. ThemeStation also enables various applications such as controllable 3D-to-3D generation.

Agent · Notability · INFORMS · 回合 · Processing（編程語言） ·

2024 年 3 月 22 日

TriHelper: Zero-Shot Object Navigation with Dynamic Assistance

Lingfeng Zhang,Qiang Zhang,Hao Wang,Erjia Xiao,Zixuan Jiang,Honglei Chen,Renjing Xu

from arxiv, 8 pages, 5 figures

Navigating toward specific objects in unknown environments without additional training, known as Zero-Shot object navigation, poses a significant challenge in the field of robotics, which demands high levels of auxiliary information and strategic planning. Traditional works have focused on holistic solutions, overlooking the specific challenges agents encounter during navigation such as collision, low exploration efficiency, and misidentification of targets. To address these challenges, our work proposes TriHelper, a novel framework designed to assist agents dynamically through three primary navigation challenges: collision, exploration, and detection. Specifically, our framework consists of three innovative components: (i) Collision Helper, (ii) Exploration Helper, and (iii) Detection Helper. These components work collaboratively to solve these challenges throughout the navigation process. Experiments on the Habitat-Matterport 3D (HM3D) and Gibson datasets demonstrate that TriHelper significantly outperforms all existing baseline methods in Zero-Shot object navigation, showcasing superior success rates and exploration efficiency. Our ablation studies further underscore the effectiveness of each helper in addressing their respective challenges, notably enhancing the agent's navigation capabilities. By proposing TriHelper, we offer a fresh perspective on advancing the object navigation task, paving the way for future research in the domain of Embodied AI and visual-based navigation.

mPLUG-Owl · 大語言模型 · MoDELS · 知識 (knowledge) · 多峰值 ·

2024 年 3 月 22 日

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Qinghao Ye,Haiyang Xu,Guohai Xu,Jiabo Ye,Ming Yan,Yiyang Zhou,Junyang Wang,Anwen Hu,Pengcheng Shi,Yaya Shi,Chenliang Li,Yuanhong Xu,Hehong Chen,Junfeng Tian,Qi Qian,Ji Zhang,Fei Huang,Jingren Zhou

from arxiv, Working in Process

Large language models (LLMs) have demonstrated impressive zero-shot abilities on a variety of open-ended tasks, while recent research has also explored the use of LLMs for multi-modal generation. In this study, we introduce mPLUG-Owl, a novel training paradigm that equips LLMs with multi-modal abilities through modularized learning of foundation LLM, a visual knowledge module, and a visual abstractor module. This approach can support multiple modalities and facilitate diverse unimodal and multimodal abilities through modality collaboration. The training paradigm of mPLUG-Owl involves a two-stage method for aligning image and text, which learns visual knowledge with the assistance of LLM while maintaining and even improving the generation abilities of LLM. In the first stage, the visual knowledge module and abstractor module are trained with a frozen LLM module to align the image and text. In the second stage, language-only and multi-modal supervised datasets are used to jointly fine-tune a low-rank adaption (LoRA) module on LLM and the abstractor module by freezing the visual knowledge module. We carefully build a visually-related instruction evaluation set OwlEval. Experimental results show that our model outperforms existing multi-modal models, demonstrating mPLUG-Owl's impressive instruction and visual understanding ability, multi-turn conversation ability, and knowledge reasoning ability. Besides, we observe some unexpected and exciting abilities such as multi-image correlation and scene text understanding, which makes it possible to leverage it for harder real scenarios, such as vision-only document comprehension. Our code, pre-trained model, instruction-tuned models, and evaluation set are available at //github.com/X-PLUG/mPLUG-Owl. The online demo is available at //www.modelscope.cn/studios/damo/mPLUG-Owl.

求逆 · 圖像修復 · Adobe Photoshop · 控制器 · 基準 ·

2024 年 3 月 21 日

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Xiang Fan,Anand Bhattad,Ranjay Krishna

We introduce Videoshop, a training-free video editing algorithm for localized semantic edits. Videoshop allows users to use any editing software, including Photoshop and generative inpainting, to modify the first frame; it automatically propagates those changes, with semantic, spatial, and temporally consistent motion, to the remaining frames. Unlike existing methods that enable edits only through imprecise textual instructions, Videoshop allows users to add or remove objects, semantically change objects, insert stock photos into videos, etc. with fine-grained control over locations and appearance. We achieve this through image-based video editing by inverting latents with noise extrapolation, from which we generate videos conditioned on the edited image. Videoshop produces higher quality edits against 6 baselines on 2 editing benchmarks using 10 evaluation metrics.

MoDELS · 模型評估 · 樣本 · Performer · Processing（編程語言） ·

2024 年 3 月 21 日

ReNoise: Real Image Inversion Through Iterative Noising

Daniel Garibi,Or Patashnik,Andrey Voynov,Hadar Averbuch-Elor,Daniel Cohen-Or

from arxiv, project page at: //garibida.github.io/ReNoise-Inversion/

Recent advancements in text-guided diffusion models have unlocked powerful image manipulation capabilities. However, applying these methods to real images necessitates the inversion of the images into the domain of the pretrained diffusion model. Achieving faithful inversion remains a challenge, particularly for more recent models trained to generate images with a small number of denoising steps. In this work, we introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations. Building on reversing the diffusion sampling process, our method employs an iterative renoising mechanism at each inversion sampling step. This mechanism refines the approximation of a predicted point along the forward diffusion trajectory, by iteratively applying the pretrained diffusion model, and averaging these predictions. We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models. Through comprehensive evaluations and comparisons, we show its effectiveness in terms of both accuracy and speed. Furthermore, we confirm that our method preserves editability by demonstrating text-driven image editing on real images.

TSE · MoDELS · INFORMS · 分離的 · 全 ·

2024 年 3 月 21 日

CATSE: A Context-Aware Framework for Causal Target Sound Extraction

Shrishail Baligar,Mikolaj Kegler,Bryce Irvin,Marko Stamenovic,Shawn Newsam

from arxiv, Submitted to EUSIPCO 2024

Target Sound Extraction (TSE) focuses on the problem of separating sources of interest, indicated by a user's cue, from the input mixture. Most existing solutions operate in an offline fashion and are not suited to the low-latency causal processing constraints imposed by applications in live-streamed content such as augmented hearing. We introduce a family of context-aware low-latency causal TSE models suitable for real-time processing. First, we explore the utility of context by providing the TSE model with oracle information about what sound classes make up the input mixture, where the objective of the model is to extract one or more sources of interest indicated by the user. Since the practical applications of oracle models are limited due to their assumptions, we introduce a composite multi-task training objective involving separation and classification losses. Our evaluation involving single- and multi-source extraction shows the benefit of using context information in the model either by means of providing full context or via the proposed multi-task training loss without the need for full context information. Specifically, we show that our proposed model outperforms size- and latency-matched Waveformer, a state-of-the-art model for real-time TSE.

估計/估計量 · state-of-the-art · FAST · MoDELS · 判別方法 ·

2024 年 3 月 20 日

DepthFM: Fast Monocular Depth Estimation with Flow Matching

Ming Gui,Johannes S. Fischer,Ulrich Prestel,Pingchuan Ma,Dmytro Kotovenko,Olga Grebenkova,Stefan Andreas Baumann,Vincent Tao Hu,Bj?rn Ommer

Monocular depth estimation is crucial for numerous downstream vision tasks and applications. Current discriminative approaches to this problem are limited due to blurry artifacts, while state-of-the-art generative methods suffer from slow sampling due to their SDE nature. Rather than starting from noise, we seek a direct mapping from input image to depth map. We observe that this can be effectively framed using flow matching, since its straight trajectories through solution space offer efficiency and high quality. Our study demonstrates that a pre-trained image diffusion model can serve as an adequate prior for a flow matching depth model, allowing efficient training on only synthetic data to generalize to real images. We find that an auxiliary surface normals loss further improves the depth estimates. Due to the generative nature of our approach, our model reliably predicts the confidence of its depth estimates. On standard benchmarks of complex natural scenes, our lightweight approach exhibits state-of-the-art performance at favorable low computational cost despite only being trained on little synthetic data.

MoDELS · 狀態序列 · 周期的 · 相同 · Learning ·

2024 年 3 月 20 日

USE: Dynamic User Modeling with Stateful Sequence Models

Zhihan Zhou,Qixiang Fang,Leonardo Neves,Francesco Barbieri,Yozen Liu,Han Liu,Maarten W. Bos,Ron Dotsch

User embeddings play a crucial role in user engagement forecasting and personalized services. Recent advances in sequence modeling have sparked interest in learning user embeddings from behavioral data. Yet behavior-based user embedding learning faces the unique challenge of dynamic user modeling. As users continuously interact with the apps, user embeddings should be periodically updated to account for users' recent and long-term behavior patterns. Existing methods highly rely on stateless sequence models that lack memory of historical behavior. They have to either discard historical data and use only the most recent data or reprocess the old and new data jointly. Both cases incur substantial computational overhead. To address this limitation, we introduce User Stateful Embedding (USE). USE generates user embeddings and reflects users' evolving behaviors without the need for exhaustive reprocessing by storing previous model states and revisiting them in the future. Furthermore, we introduce a novel training objective named future W-behavior prediction to transcend the limitations of next-token prediction by forecasting a broader horizon of upcoming user behaviors. By combining it with the Same User Prediction, a contrastive learning-based objective that predicts whether different segments of behavior sequences belong to the same user, we further improve the embeddings' distinctiveness and representativeness. We conducted experiments on 8 downstream tasks using Snapchat users' behavioral logs in both static (i.e., fixed user behavior sequences) and dynamic (i.e., periodically updated user behavior sequences) settings. We demonstrate USE's superior performance over established baselines. The results underscore USE's effectiveness and efficiency in integrating historical and recent user behavior sequences into user embeddings in dynamic user modeling.

Pix2Seq · 語言模型化 · 目標檢測 · MoDELS · 離散化 ·

2021 年 9 月 22 日

Pix2seq: A Language Modeling Framework for Object Detection

Ting Chen,Saurabh Saxena,Lala Li,David J. Fleet,Geoffrey Hinton

This paper presents Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we simply cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural net to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural net knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.

可理解性 · GAN · GANs · Better · 生成式對抗網絡 ·

2018 年 12 月 8 日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

David Bau,Jun-Yan Zhu,Hendrik Strobelt,Bolei Zhou,Joshua B. Tenenbaum,William T. Freeman,Antonio Torralba

from arxiv, 18 pages, 19 figures

Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. We first identify a group of interpretable units that are closely related to object concepts using a segmentation-based network dissection method. Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. We examine the contextual relationship between these units and their surroundings by inserting the discovered object concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a scene. We provide open source interpretation tools to help researchers and practitioners better understand their GAN models.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

大語言模型

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='iyxor'></tfoot>

<legend id='iyxor'><style id='iyxor'><dir id='iyxor'><q id='iyxor'></q></dir></style></legend>

<i id='iyxor'><tr id='iyxor'><dt id='iyxor'><q id='iyxor'><span id='iyxor'><b id='iyxor'><form id='iyxor'><ins id='iyxor'></ins><ul id='iyxor'></ul><sub id='iyxor'></sub></form><legend id='iyxor'></legend><bdo id='iyxor'><pre id='iyxor'><center id='iyxor'></center></pre></bdo></b><th id='iyxor'></th></span></q></dt></tr></i><div id='iyxor'><tfoot id='iyxor'></tfoot><dl id='iyxor'><fieldset id='iyxor'></fieldset></dl></div>