99欧美日韩精品一区二区红桃_91婷婷国产精选国产色_精品人妻久久一区二区三区_成人大片免费下载_成人午夜福利一区视频在线_95精品视频在线观看播放_亚洲成人免费一区二区三区

In this study, we tackle a growing concern around the safety and ethical use of large language models (LLMs). Despite their potential, these models can be tricked into producing harmful or unethical content through various sophisticated methods, including 'jailbreaking' techniques and targeted manipulation. Our work zeroes in on a specific issue: to what extent LLMs can be led astray by asking them to generate responses that are instruction-centric such as a pseudocode, a program or a software snippet as opposed to vanilla text. To investigate this question, we introduce TechHazardQA, a dataset containing complex queries which should be answered in both text and instruction-centric formats (e.g., pseudocodes), aimed at identifying triggers for unethical responses. We query a series of LLMs -- Llama-2-13b, Llama-2-7b, Mistral-V2 and Mistral 8X7B -- and ask them to generate both text and instruction-centric responses. For evaluation we report the harmfulness score metric as well as judgements from GPT-4 and humans. Overall, we observe that asking LLMs to produce instruction-centric responses enhances the unethical response generation by ~2-38% across the models. As an additional objective, we investigate the impact of model editing using the ROME technique, which further increases the propensity for generating undesirable content. In particular, asking edited LLMs to generate instruction-centric responses further increases the unethical response generation by ~3-16% across the different models.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 統計量 · UniFormer · 確切的 · Analysis ·

2024 年 4 月 29 日

Randomization-based confidence intervals for the local average treatment effect

P. M. Aronow,Haoge Chang,Patrick Lopatto

from arxiv, 40 pages

We consider the problem of generating confidence intervals in randomized experiments with noncompliance. We show that a refinement of a randomization-based procedure proposed by Imbens and Rosenbaum (2005) has desirable properties. Namely, we show that using a studentized Anderson-Rubin-type statistic as a test statistic yields confidence intervals that are finite-sample exact under treatment effect homogeneity, and remain asymptotically valid for the Local Average Treatment Effect when the treatment effect is heterogeneous. We provide a uniform analysis of this procedure.

Networking · Neural Networks · 損失 · 圖片分類 · 統計量 ·

2024 年 4 月 29 日

Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss under the hierarchical max-pooling model

Michael Kohler,Sophie Langer

from arxiv, arXiv admin note: text overlap with arXiv:2003.01526

Convolutional neural networks (CNNs) trained with cross-entropy loss have proven to be extremely successful in classifying images. In recent years, much work has been done to also improve the theoretical understanding of neural networks. Nevertheless, it seems limited when these networks are trained with cross-entropy loss, mainly because of the unboundedness of the target function. In this paper, we aim to fill this gap by analyzing the rate of the excess risk of a CNN classifier trained by cross-entropy loss. Under suitable assumptions on the smoothness and structure of the a posteriori probability, it is shown that these classifiers achieve a rate of convergence which is independent of the dimension of the image. These rates are in line with the practical observations about CNNs.

Scopus · Analysis · Notability · 評論員 · 模型評估 ·

2024 年 4 月 26 日

An analysis of the suitability of OpenAlex for bibliometric analyses

Juan Pablo Alperin,Jason Portenoy,Kyle Demes,Vincent Larivière,Stefanie Haustein

Scopus and the Web of Science have been the foundation for research in the science of science even though these traditional databases systematically underrepresent certain disciplines and world regions. In response, new inclusive databases, notably OpenAlex, have emerged. While many studies have begun using OpenAlex as a data source, few critically assess its limitations. This study, conducted in collaboration with the OpenAlex team, addresses this gap by comparing OpenAlex to Scopus across a number of dimensions. The analysis concludes that OpenAlex is a superset of Scopus and can be a reliable alternative for some analyses, particularly at the country level. Despite this, issues of metadata accuracy and completeness show that additional research is needed to fully comprehend and address OpenAlex's limitations. Doing so will be necessary to confidently use OpenAlex across a wider set of analyses, including those that are not at all possible with more constrained databases.

Performer · Learning · 優化器 · Agent · INFORMS ·

2024 年 4 月 26 日

Q-Learning to navigate turbulence without a map

Marco Rando,Martin James,Alessandro Verri,Lorenzo Rosasco,Agnese Seminara

from arxiv, 18 pages, 8 figures

We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor location. We ask whether navigation strategies to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent plumes. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.

Performer · 極大似然 · 最大似然估計 · 估計/估計量 · 似然 ·

2024 年 4 月 26 日

A comparison of the discrimination performance of lasso and maximum likelihood estimation in logistic regression model

Gilberto P. Alcantara Junior,Gustavo H. A. Pereira

Logistic regression is widely used in many areas of knowledge. Several works compare the performance of lasso and maximum likelihood estimation in logistic regression. However, part of these works do not perform simulation studies and the remaining ones do not consider scenarios in which the ratio of the number of covariates to sample size is high. In this work, we compare the discrimination performance of lasso and maximum likelihood estimation in logistic regression using simulation studies and applications. Variable selection is done both by lasso and by stepwise when maximum likelihood estimation is used. We consider a wide range of values for the ratio of the number of covariates to sample size. The main conclusion of the work is that lasso has a better discrimination performance than maximum likelihood estimation when the ratio of the number of covariates to sample size is high.

翻轉 · 有偏 · 相互獨立的 · 估計/估計量 · Performer ·

2024 年 4 月 25 日

Improving device-independent weak coin flipping protocols

Atul Singh Arora,Jamie Sikora,Thomas Van Himbeeck

from arxiv, 25 pages, 7 figures

Weak coin flipping is the cryptographic task where Alice and Bob remotely flip a coin but want opposite outcomes. This work studies this task in the device-independent regime where Alice and Bob neither trust each other, nor their quantum devices. The best protocol was devised over a decade ago by Silman, Chailloux, Aharon, Kerenidis, Pironio, and Massar with bias $\varepsilon \approx 0.33664$, where the bias is a commonly adopted security measure for coin flipping protocols. This work presents two techniques to lower the bias of such protocols, namely self-testing and abort-phobic compositions. We apply these techniques to the SCAKPM '11 protocol above and, assuming a continuity conjecture, lower the bias to $\varepsilon \approx 0.29104$. We believe that these techniques could be useful in the design of device-independent protocols for a variety of other tasks. Independently of weak coin flipping, en route to our results, we show how one can test $n-1$ out of $n$ devices, and estimate the performance of the remaining device, for later use in the protocol. The proof uses linear programming and, due to its generality, may find applications elsewhere.

2024 年 4 月 25 日

Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks

Melissa Ailem,Katerina Marazopoulou,Charlotte Siska,James Bono

Benchmarks have emerged as the central approach for evaluating Large Language Models (LLMs). The research community often relies on a model's average performance across the test prompts of a benchmark to evaluate the model's performance. This is consistent with the assumption that the test prompts within a benchmark represent a random sample from a real-world distribution of interest. We note that this is generally not the case; instead, we hold that the distribution of interest varies according to the specific use case. We find that (1) the correlation in model performance across test prompts is non-random, (2) accounting for correlations across test prompts can change model rankings on major benchmarks, (3) explanatory factors for these correlations include semantic similarity and common LLM failure points.

大語言模型 · 黑盒 · INTERACT · Prompt · MoDELS ·

2024 年 4 月 24 日

Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions

Divyansh Agarwal,Alexander R. Fabbri,Philippe Laban,Shafiq Joty,Caiming Xiong,Chien-Sheng Wu

Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs. Our unique multi-turn threat model leverages the LLM's sycophancy effect and our analysis dissects task instruction and knowledge leakage in the LLM response. In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86.2%, including a 99% leakage with GPT-4 and claude-1.3. We find that some black-box LLMs like Gemini show variable susceptibility to leakage across domains - they are more likely to leak contextual knowledge in the news domain compared to the medical domain. Our experiments measure specific effects of 6 black-box defense strategies, including a query-rewriter in the RAG scenario. Our proposed multi-tier combination of defenses still has an ASR of 5.3% for black-box LLMs, indicating room for enhancement and future direction for LLM security research.

相關系數 · 統計量 · 相互獨立的 · Better · 相同 ·

2024 年 4 月 24 日

Tests for partial correlation between repeatedly observed nonstationary nonlinear timeseries

Kenneth D. Harris,Alex E. Yuan

We describe two families of statistical tests to detect partial correlation in vectorial timeseries. The tests measure whether an observed timeseries Y can be predicted from a second series X, even after accounting for a third series Z which may correlate with X. They do not make any assumptions on the nature of these timeseries, such as stationarity or linearity, but they do require that multiple statistically independent recordings of the 3 series are available. Intuitively, the tests work by asking if the series Y recorded on one experiment can be better predicted from X recorded on the same experiment than on a different experiment, after accounting for the prediction from Z recorded on both experiments.

奇異的 · CASES · 論文 · 數值分析 ·

2024 年 4 月 24 日

Adapted Lie splitting method for convection-diffusion problems with singular convective term

Thi Tam Dang,Trung Hau Hoang,Giandomenico Orlandi

from arxiv, 14 pages, 6 figures

Splitting methods are a widely used numerical scheme for solving convection-diffusion problems. However, they may lose stability in some situations, particularly when applied to convection-diffusion problems in the presence of an unbounded convective term. In this paper, we propose a new splitting method, called the "Adapted Lie splitting method", which successfully overcomes the observed instability in certain cases. Assuming that the unbounded coefficient belongs to a suitable Lorentz space, we show that the adapted Lie splitting converges to first-order under the analytic semigroup framework. Furthermore, we provide numerical experiments to illustrate our newly proposed splitting approach.