高清一区二区三区视频在线观看,国产一区二区三区日本韩国

from arxiv, The first work evaluates the robustness of SAM under various corruptions such as style transfer, local occlusion, and adversarial patch attack

Segment anything model (SAM), as the name suggests, is claimed to be capable of cutting out any object and demonstrates impressive zero-shot transfer performance with the guidance of a prompt. However, there is currently a lack of comprehensive evaluation regarding its robustness under various corruptions. Understanding SAM's robustness across different corruption scenarios is crucial for its real-world deployment. Prior works show that SAM is biased towards texture (style) rather than shape, motivated by which we start by investigating SAM's robustness against style transfer, which is synthetic corruption. Following the interpretation of the corruption's effect as style change, we proceed to conduct a comprehensive evaluation of the SAM for its robustness against 15 types of common corruption. These corruptions mainly fall into categories such as digital, noise, weather, and blur. Within each of these corruption categories, we explore 5 severity levels to simulate real-world corruption scenarios. Beyond the corruptions, we further assess its robustness regarding local occlusion and local adversarial patch attacks in images. To the best of our knowledge, our work is the first of its kind to evaluate the robustness of SAM under style change, local occlusion, and local adversarial patch attacks. Considering that patch attacks visible to human eyes are easily detectable, we also assess SAM's robustness against adversarial perturbations that are imperceptible to human eyes. Overall, this work provides a comprehensive empirical study on SAM's robustness, evaluating its performance under various corruptions and extending the assessment to critical aspects like local occlusion, local patch attacks, and imperceptible adversarial perturbations, which yields valuable insights into SAM's practical applicability and effectiveness in addressing real-world challenges.

相關內容

穩健性

關注 3

MoDELS · INFORMS · 可辨認的 · 查準率/準確率 · Integration ·

2023 年 10 月 9 日

Geom-Erasing: Geometry-Driven Removal of Implicit Concept in Diffusion Models

Zhili Liu,Kai Chen,Yifan Zhang,Jianhua Han,Lanqing Hong,Hang Xu,Zhenguo Li,Dit-Yan Yeung,James Kwok

Fine-tuning diffusion models through personalized datasets is an acknowledged method for improving generation quality across downstream tasks, which, however, often inadvertently generates unintended concepts such as watermarks and QR codes, attributed to the limitations in image sources and collecting methods within specific downstream tasks. Existing solutions suffer from eliminating these unintentionally learned implicit concepts, primarily due to the dependency on the model's ability to recognize concepts that it actually cannot discern. In this work, we introduce \methodname, a novel approach that successfully removes the implicit concepts with either an additional accessible classifier or detector model to encode geometric information of these concepts into text domain. Moreover, we propose \textit{Implicit Concept}, a novel image-text dataset imbued with three implicit concepts (\ie, watermarks, QR codes, and text) for training and evaluation. Experimental results demonstrate that \methodname not only identifies but also proficiently eradicates implicit concepts, revealing a significant improvement over the existing methods. The integration of geometric information marks a substantial progression in the precise removal of implicit concepts in diffusion models.

通用動力公司 · CASE · 線性的 · 情景 · 確切的 ·

2023 年 10 月 9 日

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

Nuoya Xiong,Lijun Ding,Simon S. Du

This paper rigorously shows how over-parameterization changes the convergence behaviors of gradient descent (GD) for the matrix sensing problem, where the goal is to recover an unknown low-rank ground-truth matrix from near-isotropic linear measurements. First, we consider the symmetric setting with the symmetric parameterization where $M^* \in \mathbb{R}^{n \times n}$ is a positive semi-definite unknown matrix of rank $r \ll n$, and one uses a symmetric parameterization $XX^\top$ to learn $M^*$. Here $X \in \mathbb{R}^{n \times k}$ with $k > r$ is the factor matrix. We give a novel $\Omega (1/T^2)$ lower bound of randomly initialized GD for the over-parameterized case ($k >r$) where $T$ is the number of iterations. This is in stark contrast to the exact-parameterization scenario ($k=r$) where the convergence rate is $\exp (-\Omega (T))$. Next, we study asymmetric setting where $M^* \in \mathbb{R}^{n_1 \times n_2}$ is the unknown matrix of rank $r \ll \min\{n_1,n_2\}$, and one uses an asymmetric parameterization $FG^\top$ to learn $M^*$ where $F \in \mathbb{R}^{n_1 \times k}$ and $G \in \mathbb{R}^{n_2 \times k}$. Building on prior work, we give a global exact convergence result of randomly initialized GD for the exact-parameterization case ($k=r$) with an $\exp (-\Omega(T))$ rate. Furthermore, we give the first global exact convergence result for the over-parameterization case ($k>r$) with an $\exp(-\Omega(\alpha^2 T))$ rate where $\alpha$ is the initialization scale. This linear convergence result in the over-parameterization case is especially significant because one can apply the asymmetric parameterization to the symmetric setting to speed up from $\Omega (1/T^2)$ to linear convergence. On the other hand, we propose a novel method that only modifies one step of GD and obtains a convergence rate independent of $\alpha$, recovering the rate in the exact-parameterization case.

可理解性 · INFORMS · 原點 · 情景 · 樣本 ·

2023 年 10 月 9 日

The Gulf of Interpretation: From Chart to Message and Back Again

Christian Knoll,Torsten M?ller,Kathleen Gregory,Laura Koesten

from arxiv, 11 pages, 8 figures, 5 tables

Charts are used to communicate data visually, but designing an effective chart that a broad set of people can understand is challenging. Usually, we do not know whether a chart's intended message aligns with the message readers perceive. In this mixed-methods study, we investigate how data journalists encode data and how a broad audience engages with, experiences, and understands these data visualizations. We conducted a series of workshops and interviews with school students, university students, job seekers, designers, and senior citizens to collect perceived messages and subjective feedback on a sample of eight real-world charts. We analyzed these messages and compared them to the intended message of the chart producer. Four of the collected messages from consumers were then provided to data journalists (including the ones that created the original charts) as a starting point to re-design the charts accordingly. The results from our work underline the difficulty of complex charts such as stacked bar charts and Sankey diagrams. Consumers are often overwhelmed with the amount of data provided and are easily confused with terms (as text) not well known. Chart producers tend to be faithful with data but are willing to abstract further when asked to transport particular messages visually. There are strong conventions on how to visually encode particular information that might not be to the benefit of many consumers.

語言模型化 · MoDELS · Prompt · Performer · 推斷 ·

2023 年 10 月 9 日

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Huiqiang Jiang,Qianhui Wu,Chin-Yew Lin,Yuqing Yang,Lili Qiu

from arxiv, Accepted at EMNLP 2023

Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs are becoming increasingly lengthy, even exceeding tens of thousands of tokens. To accelerate model inference and reduce cost, this paper presents LLMLingua, a coarse-to-fine prompt compression method that involves a budget controller to maintain semantic integrity under high compression ratios, a token-level iterative compression algorithm to better model the interdependence between compressed contents, and an instruction tuning based method for distribution alignment between language models. We conduct experiments and analysis over four datasets from different scenarios, i.e., GSM8K, BBH, ShareGPT, and Arxiv-March23; showing that the proposed approach yields state-of-the-art performance and allows for up to 20x compression with little performance loss. Our code is available at //aka.ms/LLMLingua.

INFORMS · 無監督 · MoDELS · Networks · Extensibility ·

2023 年 10 月 6 日

Deformable Generator Networks: Unsupervised Disentanglement of Appearance and Geometry

Xianglei Xing,Ruiqi Gao,Tian Han,Song-Chun Zhu,Ying Nian Wu

from arxiv, The version of IEEE Transactions on Pattern Analysis and Machine Intelligence

We present a deformable generator model to disentangle the appearance and geometric information for both image and video data in a purely unsupervised manner. The appearance generator network models the information related to appearance, including color, illumination, identity or category, while the geometric generator performs geometric warping, such as rotation and stretching, through generating deformation field which is used to warp the generated appearance to obtain the final image or video sequences. Two generators take independent latent vectors as input to disentangle the appearance and geometric information from image or video sequences. For video data, a nonlinear transition model is introduced to both the appearance and geometric generators to capture the dynamics over time. The proposed scheme is general and can be easily integrated into different generative models. An extensive set of qualitative and quantitative experiments shows that the appearance and geometric information can be well disentangled, and the learned geometric generator can be conveniently transferred to other image datasets to facilitate knowledge transfer tasks.

Prompt · MoDELS · tuning · 可辨認的 · Perplexity ·

2023 年 10 月 5 日

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

Chen Dun,Mirian Hipolito Garcia,Guoqing Zheng,Ahmed Hassan Awadallah,Anastasios Kyrillidis,Robert Sim

Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how one would expand prompt tuning to handle -- concomitantly -- heterogeneous tasks and data distributions is a widely open question. To address this gap, we suggest the use of \emph{Mixture of Prompts}, or MoPs, associated with smart gating functionality: the latter -- whose design is one of the contributions of this paper -- can identify relevant skills embedded in different groups of prompts and dynamically assign combined experts (i.e., collection of prompts), based on the target task. Additionally, MoPs are empirically agnostic to any model compression technique applied -- for efficiency reasons -- as well as instruction data source and task composition. In practice, MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios (e.g., task and data heterogeneity across sources), as well as possible implications from model approximations. As a highlight, MoPs manage to decrease final perplexity from $\sim20\%$ up to $\sim70\%$, as compared to baselines, in the federated scenario, and from $\sim 3\%$ up to $\sim30\%$ in the centralized scenario.

語言模型化 · MoDELS · INFORMS · 泛函 · TOOLS ·

2023 年 7 月 31 日

When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities

Jin Chen,Zheng Liu,Xu Huang,Chenwang Wu,Qi Liu,Gangwei Jiang,Yuanhao Pu,Yuxuan Lei,Xiaolong Chen,Xingmei Wang,Defu Lian,Enhong Chen

The advent of large language models marks a revolutionary breakthrough in artificial intelligence. With the unprecedented scale of training and model parameters, the capability of large language models has been dramatically improved, leading to human-like performances in understanding, language synthesizing, and common-sense reasoning, etc. Such a major leap-forward in general AI capacity will change the pattern of how personalization is conducted. For one thing, it will reform the way of interaction between humans and personalization systems. Instead of being a passive medium of information filtering, large language models present the foundation for active user engagement. On top of such a new foundation, user requests can be proactively explored, and user's required information can be delivered in a natural and explainable way. For another thing, it will also considerably expand the scope of personalization, making it grow from the sole function of collecting personalized information to the compound function of providing personalized services. By leveraging large language models as general-purpose interface, the personalization systems may compile user requests into plans, calls the functions of external tools to execute the plans, and integrate the tools' outputs to complete the end-to-end personalization tasks. Today, large language models are still being developed, whereas the application in personalization is largely unexplored. Therefore, we consider it to be the right time to review the challenges in personalization and the opportunities to address them with LLMs. In particular, we dedicate this perspective paper to the discussion of the following aspects: the development and challenges for the existing personalization system, the newly emerged capabilities of large language models, and the potential ways of making use of large language models for personalization.

道德化 · 極小點 · Agent · Continuity · MoDELS ·

2023 年 7 月 2 日

Minimum Levels of Interpretability for Artificial Moral Agents

Avish Vijayaraghavan,Cosmin Badea

As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.

MoDELS · 生成模型 · Processing（編程語言） · Taxonomy · Signal Processing ·

2022 年 9 月 2 日

Diffusion Models: A Comprehensive Survey of Methods and Applications

Ling Yang,Zhilong Zhang,Shenda Hong

from arxiv, 23 pages

Diffusion models are a class of deep generative models that have shown impressive results on various tasks with dense theoretical founding. Although diffusion models have achieved impressive quality and diversity of sample synthesis than other state-of-the-art models, they still suffer from costly sampling procedure and sub-optimal likelihood estimation. Recent studies have shown great enthusiasm on improving the performance of diffusion model. In this article, we present a first comprehensive review of existing variants of the diffusion models. Specifically, we provide a first taxonomy of diffusion models and categorize them variants to three types, namely sampling-acceleration enhancement, likelihood-maximization enhancement and data-generalization enhancement. We also introduce in detail other five generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models), and clarify the connections between diffusion models and these generative models. Then we make a thorough investigation into the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of this generative model.

泛化理論 · INFORMS · 估計/估計量 · 互信息 · 泛化誤差 ·

2021 年 6 月 18 日

A Probabilistic Representation of DNNs: Bridging Mutual Information and Generalization

Xinjie Lan,Kenneth Barner

from arxiv, To appear in the ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI

Recently, Mutual Information (MI) has attracted attention in bounding the generalization error of Deep Neural Networks (DNNs). However, it is intractable to accurately estimate the MI in DNNs, thus most previous works have to relax the MI bound, which in turn weakens the information theoretic explanation for generalization. To address the limitation, this paper introduces a probabilistic representation of DNNs for accurately estimating the MI. Leveraging the proposed MI estimator, we validate the information theoretic explanation for generalization, and derive a tighter generalization bound than the state-of-the-art relaxations.