五月丁香四月婷婷激情综合,少妇高潮惨叫正在播放对白

With the recent surge in popularity of LLMs has come an ever-increasing need for LLM safety training. In this paper, we investigate the fragility of SOTA open-source LLMs under simple, optimization-free attacks we refer to as $\textit{priming attacks}$, which are easy to execute and effectively bypass alignment from safety training. Our proposed attack improves the Attack Success Rate on Harmful Behaviors, as measured by Llama Guard, by up to $3.3\times$ compared to baselines. Source code and data are available at //github.com/uiuc-focal-lab/llm-priming-attacks.

相關內容

surge

關注 0

Surge 是 iOS 與 macOS 平臺上的 Web 開發人員工具與代理實用程序。

損失 · Performer · 大語言模型 · 語音識別 · MoDELS ·

2024 年 6 月 20 日

Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions

Murali Karthick Baskar,Andrew Rosenberg,Bhuvana Ramabhadran,Neeraj Gaur,Zhong Meng

In this paper, we focus on addressing the constraints faced when applying LLMs to ASR. Recent works utilize prefixLM-type models, which directly apply speech as a prefix to LLMs for ASR. We have found that optimizing speech prefixes leads to better ASR performance and propose applying RNNT loss to perform speech prefix-tuning. This is a simple approach and does not increase the model complexity or alter the inference pipeline. We also propose language-based soft prompting to further improve with frozen LLMs. Empirical analysis on realtime testset from 10 Indic languages demonstrate that our proposed speech prefix-tuning yields improvements with both frozen and fine-tuned LLMs. Our recognition results on an average of 10 Indics show that the proposed prefix-tuning with RNNT loss results in a 12\% relative improvement in WER over the baseline with a fine-tuned LLM. Our proposed approches with the frozen LLM leads to a 31\% relative improvement over basic soft-prompting prefixLM.

簇 · 可辨認的 · motivation · 分解的 · CRAFT ·

2024 年 6 月 20 日

Evaluating Impact of User-Cluster Targeted Attacks in Matrix Factorisation Recommenders

Sulthana Shams,Douglas Leith

In practice, users of a Recommender System (RS) fall into a few clusters based on their preferences. In this work, we conduct a systematic study on user-cluster targeted data poisoning attacks on Matrix Factorisation (MF) based RS, where an adversary injects fake users with falsely crafted user-item feedback to promote an item to a specific user cluster. We analyse how user and item feature matrices change after data poisoning attacks and identify the factors that influence the effectiveness of the attack on these feature matrices. We demonstrate that the adversary can easily target specific user clusters with minimal effort and that some items are more susceptible to attacks than others. Our theoretical analysis has been validated by the experimental results obtained from two real-world datasets. Our observations from the study could serve as a motivating point to design a more robust RS.

推斷 · 模糊邏輯 · Integration · 規范化的 · binary ·

2024 年 6 月 19 日

Integrating Fuzzy Logic with Causal Inference: Enhancing the Pearl and Neyman-Rubin Methodologies

Amir Saki,Usef Faghihi

from arxiv, 25 pages, 6 figures, 4 tables

In this paper, we generalize the Pearl and Neyman-Rubin methodologies in causal inference by introducing a generalized approach that incorporates fuzzy logic. Indeed, we introduce a fuzzy causal inference approach that consider both the vagueness and imprecision inherent in data, as well as the subjective human perspective characterized by fuzzy terms such as 'high', 'medium', and 'low'. To do so, we introduce two fuzzy causal effect formulas: the Fuzzy Average Treatment Effect (FATE) and the Generalized Fuzzy Average Treatment Effect (GFATE), together with their normalized versions: NFATE and NGFATE. When dealing with a binary treatment variable, our fuzzy causal effect formulas coincide with classical Average Treatment Effect (ATE) formula, that is a well-established and popular metric in causal inference. In FATE, all values of the treatment variable are considered equally important. In contrast, GFATE takes into account the rarity and frequency of these values. We show that for linear Structural Equation Models (SEMs), the normalized versions of our formulas, NFATE and NGFATE, are equivalent to ATE. Further, we provide identifiability criteria for these formulas and show their stability with respect to minor variations in the fuzzy subsets and the probability distributions involved. This ensures the robustness of our approach in handling small perturbations in the data. Finally, we provide several experimental examples to empirically validate and demonstrate the practical application of our proposed fuzzy causal inference methods.

TOOLS · AI · 有偏 · 設計 · INFORMS ·

2024 年 6 月 18 日

Reclaiming Power over AI: Equipping Queer Teens as AI Designers for HIV Prevention

William Liem,Andrew Berry,Kathryn Macapagal

from arxiv, In CHI 2024: Designing (with) AI for Wellbeing

In this position paper, we explore the potential of generative AI (GenAI) tools in supporting HIV prevention initiatives among LGBTQ+ adolescents. GenAI offers opportunities to bridge information gaps and enhance healthcare access, yet it also risks exacerbating existing inequities through biased AI outputs reflecting heteronormative and cisnormative values. We advocate for the importance of queer adolescent-centered interventions, contend with the promise of GenAI tools while addressing concerns of bias, and position participatory frameworks for empowering queer youth in the design and development of AI tools. Viewing LGBTQ+ adolescents as designers, we propose a community-engaged approach to enable a group of queer teens with sexual health education expertise to design their own GenAI health tools. Through this collaborative effort, we put forward participatory ways to develop processes minimizing the potential iatrogenic harms of biased AI models, while harnessing AI benefits for LGBTQ+ teens. In this workshop, we offer specialized community-engaged knowledge in designing equitable AI tools to improve LGBTQ+ well-being.

MoDELS · Principle · Nuance · AIM · Performer ·

2024 年 6 月 18 日

Unmasking the Veil: An Investigation into Concept Ablation for Privacy and Copyright Protection in Images

Shivank Garg,Manyana Tiwari

In this paper, we extend the study of concept ablation within pre-trained models as introduced in 'Ablating Concepts in Text-to-Image Diffusion Models' by (Kumari et al.,2022). Our work focuses on reproducing the results achieved by the different variants of concept ablation proposed and validated through predefined metrics. We also introduce a novel variant of concept ablation, namely 'trademark ablation'. This variant combines the principles of memorization and instance ablation to tackle the nuanced influence of proprietary or branded elements in model outputs. Further, our research contributions include an observational analysis of the model's limitations. Moreover, we investigate the model's behavior in response to ablation leakage-inducing prompts, which aim to indirectly ablate concepts, revealing insights into the model's resilience and adaptability. We also observe the model's performance degradation on images generated by concepts far from its target ablation concept, documented in the appendix.

流形 · 向量化 · 平滑 · Projection · 可辨認的 ·

2024 年 6 月 18 日

Generalized Moving Least-Squares for Solving Vector-valued PDEs on Unknown Manifolds

Rongji Li,Qile Yan,Shixiao W. Jiang

In this paper, we extend the Generalized Moving Least-Squares (GMLS) method in two different ways to solve the vector-valued PDEs on unknown smooth 2D manifolds without boundaries embedded in $\mathbb{R}^{3}$, identified with randomly sampled point cloud data. The two approaches are referred to as the intrinsic method and the extrinsic method. For the intrinsic method which relies on local approximations of metric tensors, we simplify the formula of Laplacians and covariant derivatives acting on vector fields at the base point by calculating them in a local Monge coordinate system. On the other hand, the extrinsic method formulates tangential derivatives on a submanifold as the projection of the directional derivative in the ambient Euclidean space onto the tangent space of the submanifold. One challenge of this method is that the discretization of vector Laplacians yields a matrix whose size relies on the ambient dimension. To overcome this issue, we reduce the dimension of vector Laplacian matrices by employing an appropriate projection. The complexity of both methods scales well with the dimension of manifolds rather than the ambient dimension. We also present supporting numerical examples, including eigenvalue problems, linear Poisson equations, and nonlinear Burgers' equations, to examine the numerical accuracy of proposed methods on various smooth manifolds.

MoDELS · 大語言模型 · GPT-4 · Performer · AIM ·

2024 年 6 月 17 日

On the Limitations of Fine-tuned Judge Models for LLM Evaluation

Hui Huang,Yingqi Qu,Hongli Zhou,Jing Liu,Muyun Yang,Bing Xu,Tiejun Zhao

Recently, there has been a growing trend of utilizing Large Language Model (LLM) to evaluate the quality of other LLMs. Many studies have employed proprietary close-source models, especially GPT-4, as the evaluator. Alternatively, other works have fine-tuned judge models based on open-source LLMs as the evaluator. While the fine-tuned judge models are claimed to achieve comparable evaluation capability with GPT-4, in this study, we conduct an empirical study of judge models. Our findings indicate that although the fine-tuned judge models achieve high performance on in-domain test sets, even surpassing GPT-4, they underperform GPT-4 across several dimensions, including generalizability, fairness, aspect-specific evaluation, and scalability. We also reveal that the fine-tuned judge model inherently operates as a task-specific classifier, consequently imposing the limitations. Finally, we propose an effective indicator to measure the reliability of fine-tuned judges, with the aim of maximizing their utility in LLM evaluation.

機器人 · Performer · 穩健性 · Prompt · MoDELS ·

2024 年 6 月 16 日

Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics

Xiyang Wu,Souradip Chakraborty,Ruiqi Xian,Jing Liang,Tianrui Guan,Fuxiao Liu,Brian M. Sadler,Dinesh Manocha,Amrit Singh Bedi

In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications. Recent works focus on using LLMs and VLMs to improve the performance of robotics tasks, such as manipulation and navigation. Despite these improvements, analyzing the safety of such systems remains underexplored yet extremely critical. LLMs and VLMs are highly susceptible to adversarial inputs, prompting a significant inquiry into the safety of robotic systems. This concern is important because robotics operate in the physical world where erroneous actions can result in severe consequences. This paper explores this issue thoroughly, presenting a mathematical formulation of potential attacks on LLM/VLM-based robotic systems and offering experimental evidence of the safety challenges. Our empirical findings highlight a significant vulnerability: simple modifications to the input can drastically reduce system effectiveness. Specifically, our results demonstrate an average performance deterioration of 19.4% under minor input prompt modifications and a more alarming 29.1% under slight perceptual changes. These findings underscore the urgent need for robust countermeasures to ensure the safe and reliable deployment of advanced LLM/VLM-based robotic systems.

Analysis · 任務對話系統 · 相似度 · 數據集 · 論文 ·

2024 年 6 月 16 日

Social Norms in Cinema: A Cross-Cultural Analysis of Shame, Pride and Prejudice

Sunny Rai,Khushang Jilesh Zaveri,Shreya Havaldar,Soumna Nema,Lyle Ungar,Sharath Chandra Guntuku

Social emotions such as shame and pride reflect social sanctions or approvals in society. In this paper, we examine how expressions of shame and pride vary across cultures and harness them to extract unspoken normative expectations across cultures. We introduce the first cross-cultural shame/pride emotions movie dialogue dataset, obtained from ~5.4K Bollywood and Hollywood movies, along with over 10K implicit social norms. Our study reveals variations in expressions of social emotions and social norms that align with known cultural tendencies observed in the United States and India -- e.g., Hollywood movies express shame predominantly toward self whereas Bollywood movies express shame predominantly toward others. Similarly, Bollywood shames non-conformity in gender roles, and takes pride in collective identity, while Hollywood shames lack of accountability, and takes pride in ethical behavior. More importantly, women face more prejudice across cultures and are sanctioned for similar social norms.

motivation · HER · MoDELS · 優化器 · binary ·

2024 年 6 月 15 日

Reward Schemes and Committee Sizes in Proof of Stake Governance

Georgios Birmpas,Philip Lazos,Evangelos Markakis,Paolo Penna

In this paper, we investigate the impact of reward schemes and committee sizes motivated by governance systems over blockchain communities. We introduce a model for elections with a binary outcome space where there is a ground truth (i.e., a "correct" outcome), and where stakeholders can only choose to delegate their voting power to a set of delegation representatives (DReps). Moreover, the effort (cost) invested by each DRep positively influences both (i) her ability to vote correctly and (ii) the total delegation that she attracts, thereby increasing her voting power. This model constitutes the natural counterpart of delegated proof-of-stake (PoS) protocols, where delegated stakes are used to elect the block builders. As a way to motivate the representatives to exert effort, a reward scheme can be used based on the delegation attracted by each DRep. We analyze both the game-theoretic aspects and the optimization counterpart of this model. Our primary focus is on selecting a committee that maximizes the probability of reaching the correct outcome, given a fixed monetary budget allocated for rewarding the delegates. Our findings provide insights into the design of effective reward mechanisms and optimal committee structures (i.e., how many DReps are enough) in these PoS-like governance systems.