斗破苍穹第四季25集免费观看_99国产精品久久久久99_日本高清中文字幕二区三区不卡_久久久久一区二区_日本免费一区二区三区在线视频_久久精品人妻无码一区二区三66_精品不卡一区二区中文字幕

This paper introduces SGCode, a flexible prompt-optimizing system to generate secure code with large language models (LLMs). SGCode integrates recent prompt-optimization approaches with LLMs in a unified system accessible through front-end and back-end APIs, enabling users to 1) generate secure code, which is free of vulnerabilities, 2) review and share security analysis, and 3) easily switch from one prompt optimization approach to another, while providing insights on model and system performance. We populated SGCode on an AWS server with PromSec, an approach that optimizes prompts by combining an LLM and security tools with a lightweight generative adversarial graph neural network to detect and fix security vulnerabilities in the generated code. Extensive experiments show that SGCode is practical as a public tool to gain insights into the trade-offs between model utility, secure code generation, and system cost. SGCode has only a marginal cost compared with prompting LLMs. SGCode is available at: //sgcode.codes/.

相關內容

代碼

關注 10

代(dai)(dai)碼（Code）是專知(zhi)網的一(yi)個重要知(zhi)識資料文(wen)檔板塊，旨在整理收錄論文(wen)源代(dai)(dai)碼、復(fu)現(xian)代(dai)(dai)碼，經典工程代(dai)(dai)碼等，便于用(yong)(yong)戶查閱(yue)下(xia)載使用(yong)(yong)。

MoDELS · 語言模型化 · 穩健性 · HTTPS · 大語言模型 ·

2024 年 10 月 31 日

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Patrick Chao,Edoardo Debenedetti,Alexander Robey,Maksym Andriushchenko,Francesco Croce,Vikash Sehwag,Edgar Dobriban,Nicolas Flammarion,George J. Pappas,Florian Tramer,Hamed Hassani,Eric Wong

from arxiv, The camera-ready version of JailbreakBench v1.0 (accepted at NeurIPS 2024 Datasets and Benchmarks Track): more attack artifacts, more test-time defenses, a more accurate jailbreak judge (Llama-3-70B with a custom prompt), a larger dataset of human preferences for selecting a jailbreak judge (300 examples), an over-refusal evaluation dataset, a semantic refusal judge based on Llama-3-8B

Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. First, there is no clear standard of practice regarding jailbreaking evaluation. Second, existing works compute costs and success rates in incomparable ways. And third, numerous works are not reproducible, as they withhold adversarial prompts, involve closed-source code, or rely on evolving proprietary APIs. To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work (Zou et al., 2023; Mazeika et al., 2023, 2024) -- which align with OpenAI's usage policies; (3) a standardized evaluation framework at //github.com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at //jailbreakbench.github.io/ that tracks the performance of attacks and defenses for various LLMs. We have carefully considered the potential ethical implications of releasing this benchmark, and believe that it will be a net positive for the community.

詞元分析器 · MoDELS · 語言模型化 · Processing（編程語言） · 可理解性 ·

2024 年 10 月 31 日

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models

Heng-Jui Chang,Hongyu Gong,Changhan Wang,James Glass,Yu-An Chung

from arxiv, Preprint

Spoken language models (SLMs) have gained increasing attention with advancements in text-based, decoder-only language models. SLMs process text and speech, enabling simultaneous speech understanding and generation. This paper presents Double-Codebook Speaker-invariant Clustering (DC-Spin), which aims to improve speech tokenization by bridging audio signals and SLM tokens. DC-Spin extracts speaker-invariant tokens rich in phonetic information and resilient to input variations, enhancing zero-shot SLM tasks and speech resynthesis. We propose a chunk-wise approach to enable streamable DC-Spin without retraining and degradation. Comparisons of tokenization methods (self-supervised and neural audio codecs), model scalability, and downstream task proxies show that tokens easily modeled by an n-gram LM or aligned with phonemes offer strong performance, providing insights for designing speech tokenizers for SLMs.

MoDELS · Performer · 模型評估 · 基準 · 有向 ·

2024 年 10 月 31 日

LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

Andrew M. Bean,Simi Hellsten,Harry Mayne,Jabez Magomere,Ethan A. Chi,Ryan Chi,Scott A. Hale,Hannah Rose Kirk

from arxiv, Oral presentation at NeurIPS 2024 Datasets and Benchmarks Track. 10 pages, 5 figures, 22 pages supplemental materials

In this paper, we present the LingOly benchmark, a novel benchmark for advanced reasoning abilities in large language models. Using challenging Linguistic Olympiad puzzles, we evaluate (i) capabilities for in-context identification and generalisation of linguistic patterns in very low-resource or extinct languages, and (ii) abilities to follow complex task instructions. The LingOly benchmark covers more than 90 mostly low-resource languages, minimising issues of data contamination, and contains 1,133 problems across 6 formats and 5 levels of human difficulty. We assess performance with both direct accuracy and comparison to a no-context baseline to penalise memorisation. Scores from 11 state-of-the-art LLMs demonstrate the benchmark to be challenging, and models perform poorly on the higher difficulty problems. On harder problems, even the top model only achieved 38.7% accuracy, a 24.7% improvement over the no-context baseline. Large closed models typically outperform open models, and in general, the higher resource the language, the better the scores. These results indicate, in absence of memorisation, true multi-step out-of-domain reasoning remains a challenge for current language models.

Learning · 機器人 · 可理解性 · 多樣性 · 語言模型化 ·

2024 年 10 月 30 日

EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning

Peide Huang,Yuhan Hu,Nataliya Nechyporenko,Daehwa Kim,Walter Talbott,Jian Zhang

This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots, enhancing their ability to engage in humanlike non-verbal communication. Non-verbal cues such as facial expressions, gestures, and body movements play a crucial role in effective interpersonal interactions. Despite the advancements in robotic behaviors, existing methods often fall short in mimicking the diversity and subtlety of human non-verbal communication. To address this gap, our approach leverages the in-context learning capability of large language models (LLMs) to dynamically generate socially appropriate gesture motion sequences for human-robot interaction. We use this framework to generate 10 different expressive gestures and conduct online user studies comparing the naturalness and understandability of the motions generated by EMOTION and its human-feedback version, EMOTION++, against those by human operators. The results demonstrate that our approach either matches or surpasses human performance in generating understandable and natural robot motions under certain scenarios. We also provide design implications for future research to consider a set of variables when generating expressive robotic gestures.

大語言模型 · MoDELS · 語言模型化 · contrastive · INFORMS ·

2024 年 10 月 30 日

ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning

Millennium Bismay,Xiangjue Dong,James Caverlee

from arxiv, Large Language Model, Recommendation, Human-Interpretable Reasoning, Personalization, Submitted for NAACL 2025

This paper presents ReasoningRec, a reasoning-based recommendation framework that leverages Large Language Models (LLMs) to bridge the gap between recommendations and human-interpretable explanations. In contrast to conventional recommendation systems that rely on implicit user-item interactions, ReasoningRec employs LLMs to model users and items, focusing on preferences, aversions, and explanatory reasoning. The framework utilizes a larger LLM to generate synthetic explanations for user preferences, subsequently used to fine-tune a smaller LLM for enhanced recommendation accuracy and human-interpretable explanation. Our experimental study investigates the impact of reasoning and contextual information on personalized recommendations, revealing that the quality of contextual and personalized data significantly influences the LLM's capacity to generate plausible explanations. Empirical evaluations demonstrate that ReasoningRec surpasses state-of-the-art methods by up to 12.5\% in recommendation prediction while concurrently providing human-intelligible explanations. The code is available here: //github.com/millenniumbismay/reasoningrec.

Swift · 優化器 · 查準率/準確率 · Integration · AIM ·

2024 年 10 月 30 日

PDSR: Efficient UAV Deployment for Swift and Accurate Post-Disaster Search and Rescue

Alaa Awad Abdellatif,Ali Elmancy,Amr Mohamed,Ahmed Massoud,Wadha Lebda,Khalid K. Naji

from arxiv, This paper is currently under review at IEEE IoT Magazine

This paper introduces a comprehensive framework for Post-Disaster Search and Rescue (PDSR), aiming to optimize search and rescue operations leveraging Unmanned Aerial Vehicles (UAVs). The primary goal is to improve the precision and availability of sensing capabilities, particularly in various catastrophic scenarios. Central to this concept is the rapid deployment of UAV swarms equipped with diverse sensing, communication, and intelligence capabilities, functioning as an integrated system that incorporates multiple technologies and approaches for efficient detection of individuals buried beneath rubble or debris following a disaster. Within this framework, we propose architectural solution and address associated challenges to ensure optimal performance in real-world disaster scenarios. The proposed framework aims to achieve complete coverage of damaged areas significantly faster than traditional methods using a multi-tier swarm architecture. Furthermore, integrating multi-modal sensing data with machine learning for data fusion could enhance detection accuracy, ensuring precise identification of survivors.

MoDELS · 穩健性 · 語言模型化 · 數據集 · 秩 ·

2024 年 10 月 30 日

CURATRON: Complete and Robust Preference Data for Rigorous Alignment of Large Language Models

Son The Nguyen,Niranjan Uma Naresh,Theja Tulabandhula

This paper addresses the challenges of aligning large language models (LLMs) with human values via preference learning (PL), focusing on incomplete and corrupted data in preference datasets. We propose a novel method for robustly and completely recalibrating values within these datasets to enhance LLMs' resilience against the issues. In particular, we devise a guaranteed polynomial time ranking algorithm that robustifies several existing models, such as the classic Bradley-Terry-Luce (BTL) (Bradley and Terry, 1952) model and certain generalizations of it. To the best of our knowledge, our present work is the first to propose an algorithm that provably recovers an $\epsilon$-optimal ranking with high probability while allowing as large as $O(n)$ perturbed pairwise comparison results per model response. Furthermore, we show robust recovery results in the partially observed setting. Our experiments confirm that our algorithms handle adversarial noise and unobserved comparisons well in both general and LLM preference dataset settings. This work contributes to the development and scaling of more reliable and ethically aligned AI models by equipping the dataset curation pipeline with the ability to handle missing and maliciously manipulated inputs.

去噪 · DNN · 穩健性 · 估計/估計量 · Weight ·

2024 年 10 月 30 日

Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising

Yoto Fujita,Aditya Arie Nugraha,Diego Di Carlo,Yoshiaki Bando,Mathieu Fontaine,Kazuyoshi Yoshii

from arxiv, Accepted to APSIPA2024

This paper describes speech enhancement for realtime automatic speech recognition (ASR) in real environments. A standard approach to this task is to use neural beamforming that can work efficiently in an online manner. It estimates the masks of clean dry speech from a noisy echoic mixture spectrogram with a deep neural network (DNN) and then computes a enhancement filter used for beamforming. The performance of such a supervised approach, however, is drastically degraded under mismatched conditions. This calls for run-time adaptation of the DNN. Although the ground-truth speech spectrogram required for adaptation is not available at run time, blind dereverberation and separation methods such as weighted prediction error (WPE) and fast multichannel nonnegative matrix factorization (FastMNMF) can be used for generating pseudo groundtruth data from a mixture. Based on this idea, a prior work proposed a dual-process system based on a cascade of WPE and minimum variance distortionless response (MVDR) beamforming asynchronously fine-tuned by block-online FastMNMF. To integrate the dereverberation capability into neural beamforming and make it fine-tunable at run time, we propose to use weighted power minimization distortionless response (WPD) beamforming, a unified version of WPE and minimum power distortionless response (MPDR), whose joint dereverberation and denoising filter is estimated using a DNN. We evaluated the impact of run-time adaptation under various conditions with different numbers of speakers, reverberation times, and signal-to-noise ratios (SNRs).

樣本 · 去噪 · 蒙特卡羅 · 采樣法 · 估計/估計量 ·

2024 年 10 月 30 日

Zeroth-Order Sampling Methods for Non-Log-Concave Distributions: Alleviating Metastability by Denoising Diffusion

Ye He,Kevin Rojas,Molei Tao

from arxiv, updated related works and experiments

This paper considers the problem of sampling from non-logconcave distribution, based on queries of its unnormalized density. It first describes a framework, Denoising Diffusion Monte Carlo (DDMC), based on the simulation of a denoising diffusion process with its score function approximated by a generic Monte Carlo estimator. DDMC is an oracle-based meta-algorithm, where its oracle is the assumed access to samples that generate a Monte Carlo score estimator. Then we provide an implementation of this oracle, based on rejection sampling, and this turns DDMC into a true algorithm, termed Zeroth-Order Diffusion Monte Carlo (ZOD-MC). We provide convergence analyses by first constructing a general framework, i.e. a performance guarantee for DDMC, without assuming the target distribution to be log-concave or satisfying any isoperimetric inequality. Then we prove that ZOD-MC admits an inverse polynomial dependence on the desired sampling accuracy, albeit still suffering from the curse of dimensionality. Consequently, for low dimensional distributions, ZOD-MC is a very efficient sampler, with performance exceeding latest samplers, including also-denoising-diffusion-based RDMC and RSDMC. Last, we experimentally demonstrate the insensitivity of ZOD-MC to increasingly higher barriers between modes or discontinuity in non-convex potential.

MoDELS · 可理解性 · 語言模型化 · Automator · 代碼 ·

2024 年 10 月 15 日

FVEval: Understanding Language Model Capabilities in Formal Verification of Digital Hardware

Minwoo Kang,Mingjie Liu,Ghaith Bany Hamad,Syed Suhaib,Haoxing Ren

The remarkable reasoning and code generation capabilities of large language models (LLMs) have spurred significant interest in applying LLMs to enable task automation in digital chip design. In particular, recent work has investigated early ideas of applying these models to formal verification (FV), an approach to verifying hardware implementations that can provide strong guarantees of confidence but demands significant amounts of human effort. While the value of LLM-driven automation is evident, our understanding of model performance, however, has been hindered by the lack of holistic evaluation. In response, we present FVEval, the first comprehensive benchmark and evaluation framework for characterizing LLM performance in tasks pertaining to FV. The benchmark consists of three sub-tasks that measure LLM capabilities at different levels: from the generation of SystemVerilog assertions (SVAs) given natural language descriptions to reasoning about the design RTL and suggesting assertions directly without additional human input. As test instances, we present both collections of expert-written verification collateral and methodologies to scalably generate synthetic examples aligned with industrial FV workflows. A wide range of existing LLMs, both proprietary and open-source, are evaluated against FVEval, based on which we investigate where today's LLMs stand and how we might further enable their application toward improving productivity in digital FV. Our benchmark and evaluation code is available at \url{//github.com/NVlabs/FVEval}.