亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='jp9cp'></tfoot>

<legend id='jp9cp'><style id='jp9cp'><dir id='jp9cp'><q id='jp9cp'></q></dir></style></legend>

<i id='jp9cp'><tr id='jp9cp'><dt id='jp9cp'><q id='jp9cp'><span id='jp9cp'><b id='jp9cp'><form id='jp9cp'><ins id='jp9cp'></ins><ul id='jp9cp'></ul><sub id='jp9cp'></sub></form><legend id='jp9cp'></legend><bdo id='jp9cp'><pre id='jp9cp'><center id='jp9cp'></center></pre></bdo></b><th id='jp9cp'></th></span></q></dt></tr></i><div id='jp9cp'><tfoot id='jp9cp'></tfoot><dl id='jp9cp'><fieldset id='jp9cp'></fieldset></dl></div>

·

原點 · MoDELS · Guidance · 優化器 · 示例 ·

2023 年 10 月 30 日

IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI

Bochuan Cao,Changjiang Li,Ting Wang,Jinyuan Jia,Bo Li,Jinghui Chen

from arxiv, 21 pages, 11 figures, 9 tables. Accepted by NeurIPS 2023

Diffusion-based image generation models, such as Stable Diffusion or DALL-E 2, are able to learn from given images and generate high-quality samples following the guidance from prompts. For instance, they can be used to create artistic images that mimic the style of an artist based on his/her original artworks or to maliciously edit the original images for fake content. However, such ability also brings serious ethical issues without proper authorization from the owner of the original images. In response, several attempts have been made to protect the original images from such unauthorized data usage by adding imperceptible perturbations, which are designed to mislead the diffusion model and make it unable to properly generate new samples. In this work, we introduce a perturbation purification platform, named IMPRESS, to evaluate the effectiveness of imperceptible perturbations as a protective measure. IMPRESS is based on the key observation that imperceptible perturbations could lead to a perceptible inconsistency between the original image and the diffusion-reconstructed image, which can be used to devise a new optimization strategy for purifying the image, which may weaken the protection of the original image from unauthorized data usage (e.g., style mimicking, malicious editing). The proposed IMPRESS platform offers a comprehensive evaluation of several contemporary protection methods, and can be used as an evaluation platform for future protection methods.

相關內容

知識 (knowledge) · 混合專家模型 · 語言模型化 · MoDELS · Performer ·

2023 年 12 月 18 日

LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment

Shihan Dou,Enyu Zhou,Yan Liu,Songyang Gao,Jun Zhao,Wei Shen,Yuhao Zhou,Zhiheng Xi,Xiao Wang,Xiaoran Fan,Shiliang Pu,Jiang Zhu,Rui Zheng,Tao Gui,Qi Zhang,Xuanjing Huang

from arxiv, 17 pages, 7 figures

Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. When the models are required to align with a broader range of downstream tasks, or there is a desire to notably improve the performance on a specific task, a substantial increase in fine-tuning data often emerges as the solution. However, we find that large-scale increases in instruction data can disrupt the world knowledge previously stored in the LLMs, i.e., world knowledge forgetting. In this paper, we introduce LoRAMoE to address the above challenge. The LoRAMoE is a plugin version of Mixture of Experts (MoE). The plugin form ensures the integrity of world knowledge by freezing the backbone model during the training phase. We then propose the use of localized balancing constraints to coordinate parts of experts for task utilization, meanwhile enabling other experts to fully leverage the world knowledge stored in the models. Experimental results demonstrate that LoRAMoE can reasonably coordinate experts based on data type during inference, and even dramatically increasing instruction data does not result in knowledge forgetting. Moreover, LoRAMoE provides additional benefits for the performance of downstream tasks, indicating the potential of our approach for multi-task learning.

知識 (knowledge) · 混合專家模型 · 語言模型化 · MoDELS · Performer ·

2023 年 12 月 15 日

The Art of Balancing: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment

Shihan Dou,Enyu Zhou,Yan Liu,Songyang Gao,Jun Zhao,Wei Shen,Yuhao Zhou,Zhiheng Xi,Xiao Wang,Xiaoran Fan,Shiliang Pu,Jiang Zhu,Rui Zheng,Tao Gui,Qi Zhang,Xuanjing Huang

from arxiv, 17 pages, 7 figures

Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. When the models are required to align with a broader range of downstream tasks, or there is a desire to notably improve the performance on a specific task, a substantial increase in fine-tuning data often emerges as the solution. However, we find that large-scale increases in instruction data can disrupt the world knowledge previously stored in the LLMs, i.e., world knowledge forgetting. In this paper, we introduce LoRAMoE to address above challenge. The LoRAMoE is a plugin version of Mixture of Experts (MoE). The plugin-form ensures the integrity of world knowledge by freezing the backbone model during the training phase. And we propose the use of localized balancing constraints to coordinate parts of experts for task utilization, meanwhile enables other experts to to fully leverage the world knowledge stored in the models. Experimental results demonstrate that LoRAMoE can reasonly coordinate experts based on data type during inference, and even dramatically increasing instruction data does not result in knowledge forgetting. Moreover, LoRAMoE provides additional benefits for the performance of downstream tasks, indicating the potential of our approach for multi-task learning.

穩健性 · Ray · MoDELS · 樣本 · 情景 ·

2023 年 12 月 15 日

RANRAC: Robust Neural Scene Representations via Random Ray Consensus

Benno Buschmann,Andreea Dogaru,Elmar Eisemann,Michael Weinmann,Bernhard Egger

We introduce RANRAC, a robust reconstruction algorithm for 3D objects handling occluded and distracted images, which is a particularly challenging scenario that prior robust reconstruction methods cannot deal with. Our solution supports single-shot reconstruction by involving light-field networks, and is also applicable to photo-realistic, robust, multi-view reconstruction from real-world images based on neural radiance fields. While the algorithm imposes certain limitations on the scene representation and, thereby, the supported scene types, it reliably detects and excludes inconsistent perspectives, resulting in clean images without floating artifacts. Our solution is based on a fuzzy adaption of the random sample consensus paradigm, enabling its application to large scale models. We interpret the minimal number of samples to determine the model parameters as a tunable hyperparameter. This is applicable, as a cleaner set of samples improves reconstruction quality. Further, this procedure also handles outliers. Especially for conditioned models, it can result in the same local minimum in the latent space as would be obtained with a completely clean set. We report significant improvements for novel-view synthesis in occluded scenarios, of up to 8dB PSNR compared to the baseline.

樣例 · Extensibility · 損失函數（機器學習） · Performer · 評論員 ·

2023 年 12 月 15 日

SlowTrack: Increasing the Latency of Camera-based Perception in Autonomous Driving Using Adversarial Examples

Chen Ma,Ningfei Wang,Qi Alfred Chen,Chao Shen

from arxiv, Accepted by AAAI 2024

In Autonomous Driving (AD), real-time perception is a critical component responsible for detecting surrounding objects to ensure safe driving. While researchers have extensively explored the integrity of AD perception due to its safety and security implications, the aspect of availability (real-time performance) or latency has received limited attention. Existing works on latency-based attack have focused mainly on object detection, i.e., a component in camera-based AD perception, overlooking the entire camera-based AD perception, which hinders them to achieve effective system-level effects, such as vehicle crashes. In this paper, we propose SlowTrack, a novel framework for generating adversarial attacks to increase the execution time of camera-based AD perception. We propose a novel two-stage attack strategy along with the three new loss function designs. Our evaluation is conducted on four popular camera-based AD perception pipelines, and the results demonstrate that SlowTrack significantly outperforms existing latency-based attacks while maintaining comparable imperceptibility levels. Furthermore, we perform the evaluation on Baidu Apollo, an industry-grade full-stack AD system, and LGSVL, a production-grade AD simulator, with two scenarios to compare the system-level effects of SlowTrack and existing attacks. Our evaluation results show that the system-level effects can be significantly improved, i.e., the vehicle crash rate of SlowTrack is around 95% on average while existing works only have around 30%.

有向 · 圖 · 泛函 · 情景 · 邊 ·

2023 年 12 月 14 日

On the Complexity of Simultaneous Geometric Embedding for Edge-Disjoint Graphs

Benedikt Künzel,Jonathan Rollin

from arxiv, 13 pages, 7 figures

Simultaneous Geometric Embedding (SGE) asks whether, for a given collection of graphs on the same vertex set V, there is an embedding of V in the plane that admits a crossing-free drawing with straightline edges for each of the given graphs. It is known that SGE is $\exists\mathbb{R}$-complete, that is, the problem is polynomially equivalent to deciding whether a system of polynomial equations and inequalities with integer coefficients has a real solution. We prove that SGE remains $\exists\mathbb{R}$-complete for edge-disjoint input graphs, that is, for collections of graphs without so-called public edges. As an intermediate result, we prove that it is $\exists\mathbb{R}$-complete to decide whether a directional walk without repeating edges is realizable. Here, a directional walk consists of a sequence of not-necessarily distinct vertices (a walk) and a function prescribing for each inner position whether the walk shall turn left or shall turn right. A directional walk is realizable, if there is an embedding of its vertices in the plane such that the embedded walk turns according to the given directions. Previously it was known that realization is $\exists\mathbb{R}$-complete to decide for directional walks repeating each edge at most 336 times. This answers two questions posed by Schaefer ["On the Complexity of Some Geometric Problems With Fixed Parameters", JGAA 2021].

語言模型化 · MoDELS · Processing（編程語言） · 訓練數據 · Performer ·

2023 年 12 月 14 日

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

Zhiyuan You,Zheyuan Li,Jinjin Gu,Zhenfei Yin,Tianfan Xue,Chao Dong

We introduce a Depicted image Quality Assessment method (DepictQA), overcoming the constraints of traditional score-based approaches. DepictQA leverages Multi-modal Large Language Models (MLLMs), allowing for detailed, language-based, human-like evaluation of image quality. Unlike conventional Image Quality Assessment (IQA) methods relying on scores, DepictQA interprets image content and distortions descriptively and comparatively, aligning closely with humans' reasoning process. To build the DepictQA model, we establish a hierarchical task framework, and collect a multi-modal IQA training dataset, named M-BAPPS. To navigate the challenges in limited training data and processing multiple images, we propose to use multi-source training data and specialized image tags. Our DepictQA demonstrates a better performance than score-based methods on the BAPPS benchmark. Moreover, compared with general MLLMs, our DepictQA can generate more accurate reasoning descriptive languages. Our research indicates that language-based IQA methods have the potential to be customized for individual preferences. Datasets and codes will be released publicly.

圖像檢索 · 可辨認的 · 語義鴻溝 · Learning · 模型評估 ·

2023 年 12 月 13 日

Advancements in Content-Based Image Retrieval: A Comprehensive Survey of Relevance Feedback Techniques

Hamed Qazanfari,Mohammad M. AlyanNezhadi,Zohreh Nozari Khoshdaregi

from arxiv, 7 pages, 2 figures

Content-based image retrieval (CBIR) systems have emerged as crucial tools in the field of computer vision, allowing for image search based on visual content rather than relying solely on metadata. This survey paper presents a comprehensive overview of CBIR, emphasizing its role in object detection and its potential to identify and retrieve visually similar images based on content features. Challenges faced by CBIR systems, including the semantic gap and scalability, are discussed, along with potential solutions. It elaborates on the semantic gap, which arises from the disparity between low-level features and high-level semantic concepts, and explores approaches to bridge this gap. One notable solution is the integration of relevance feedback (RF), empowering users to provide feedback on retrieved images and refine search results iteratively. The survey encompasses long-term and short-term learning approaches that leverage RF for enhanced CBIR accuracy and relevance. These methods focus on weight optimization and the utilization of active learning algorithms to select samples for training classifiers. Furthermore, the paper investigates machine learning techniques and the utilization of deep learning and convolutional neural networks to enhance CBIR performance. This survey paper plays a significant role in advancing the understanding of CBIR and RF techniques. It guides researchers and practitioners in comprehending existing methodologies, challenges, and potential solutions while fostering knowledge dissemination and identifying research gaps. By addressing future research directions, it sets the stage for advancements in CBIR that will enhance retrieval accuracy, usability, and effectiveness in various application domains.

語言模型化 · MoDELS · Taxonomy · AIM · 散度 ·

2023 年 9 月 3 日

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Yue Zhang,Yafu Li,Leyang Cui,Deng Cai,Lemao Liu,Tingchen Fu,Xinting Huang,Enbo Zhao,Yu Zhang,Yulong Chen,Longyue Wang,Anh Tuan Luu,Wei Bi,Freda Shi,Shuming Shi

from arxiv, work in progress; 32 pages

While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.

MoDELS · Guidance · Seven · Continuity · Performer ·

2023 年 8 月 10 日

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Yang Liu,Yuanshun Yao,Jean-Francois Ton,Xiaoying Zhang,Ruocheng Guo Hao Cheng,Yegor Klochkov,Muhammad Faaiz Taufiq,Hang Li

Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.

任務對話系統 · INFORMS · 圖 · Networking · entity ·

2020 年 8 月 11 日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Xiaoze Jiang,Siyi Du,Zengchang Qin,Yajing Sun,Jing Yu

from arxiv, Accepted by the 28th ACM International Conference on Multimedia (ACM MM 2020)

Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms exiting models with state-of-the-art results.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='2mbPa'></tfoot>

<legend id='rQcS2'><style id='YALt4'><dir id='Zu9Bm'><q id='EEU6z'></q></dir></style></legend>

<i id='ttrVi'><tr id='7yJHx'><dt id='rWYmu'><q id='R7x5t'><span id='OnGqg'><b id='ylYsN'><form id='fuWJz'><ins id='V6tAt'></ins><ul id='fqTnW'></ul><sub id='PIFWz'></sub></form><legend id='uw9MT'></legend><bdo id='fXaOU'><pre id='UJr5s'><center id='70pXJ'></center></pre></bdo></b><th id='WM7Kn'></th></span></q></dt></tr></i><div id='Rzyzv'><tfoot id='MEGm5'></tfoot><dl id='eHzLJ'><fieldset id='KZzDL'></fieldset></dl></div>

<li id='OH2zp'><abbr id='mXWum'></abbr></li>