在线点播亚洲日韩国产欧美,亚洲AV永久少妇精品一区在线,在线看的黄色AV网站,国产精品自在自线观看

This paper proposes an interpretation of RLAIF as Bayesian inference by introducing distilled Self-Critique (dSC), which refines the outputs of a LLM through a Gibbs sampler that is later distilled into a fine-tuned model. Only requiring synthetic data, dSC is exercised in experiments regarding safety, sentiment, and privacy control, showing it can be a viable and cheap alternative to align LLMs. Code released at \url{//github.com/vicgalle/distilled-self-critique}.

相關內容

蒸餾

關注 5

回合 · 覆蓋 · 劃分 · 圖 · Weight ·

2024 年 1 月 24 日

Equitable Persistent Coverage of Non-Convex Environments with Graph-Based Planning

José Manuel Palacios-Gasós,Danilo Tardioli,Eduardo Montijano,Carlos Sagüés

from arxiv, This is the accepted version an already published manuscript. See journal reference for details

In this paper we tackle the problem of persistently covering a complex non-convex environment with a team of robots. We consider scenarios where the coverage quality of the environment deteriorates with time, requiring to constantly revisit every point. As a first step, our solution finds a partition of the environment where the amount of work for each robot, weighted by the importance of each point, is equal. This is achieved using a power diagram and finding an equitable partition through a provably correct distributed control law on the power weights. Compared to other existing partitioning methods, our solution considers a continuous environment formulation with non-convex obstacles. In the second step, each robot computes a graph that gathers sweep-like paths and covers its entire partition. At each planning time, the coverage error at the graph vertices is assigned as weights of the corresponding edges. Then, our solution is capable of efficiently finding the optimal open coverage path through the graph with respect to the coverage error per distance traversed. Simulation and experimental results are presented to support our proposal.

binary · 數據集 · MoDELS · INFORMS · 評論員 ·

2024 年 1 月 23 日

From Numbers to Words: Multi-Modal Bankruptcy Prediction Using the ECL Dataset

Henri Arno,Klaas Mulier,Joke Baeck,Thomas Demeester

from arxiv, Presented at the 6th Workshop on Financial Technology and Natural Language Processing (FinNLP) @ IJCNLP-AACL 2023 in Bali, Indonesia

In this paper, we present ECL, a novel multi-modal dataset containing the textual and numerical data from corporate 10K filings and associated binary bankruptcy labels. Furthermore, we develop and critically evaluate several classical and neural bankruptcy prediction models using this dataset. Our findings suggest that the information contained in each data modality is complementary for bankruptcy prediction. We also see that the binary bankruptcy prediction target does not enable our models to distinguish next year bankruptcy from an unhealthy financial situation resulting in bankruptcy in later years. Finally, we explore the use of LLMs in the context of our task. We show how GPT-based models can be used to extract meaningful summaries from the textual data but zero-shot bankruptcy prediction results are poor. All resources required to access and update the dataset or replicate our experiments are available on github.com/henriarnoUG/ECL.

蒙特卡羅 · Better · Analysis · 情景 · 相同 ·

2024 年 1 月 22 日

Multilevel Metamodels: A Novel Approach to Enhance Efficiency and Generalizability in Monte Carlo Simulation Studies

Joshua Gilbert

Metamodels, or the regression analysis of Monte Carlo simulation (MCS) results, provide a powerful tool to summarize MCS findings. However, an as of yet unexplored approach is the use of multilevel metamodels (MLMM) that better account for the dependent data structure of MCS results that arises from fitting multiple models to the same simulated data set. In this study, we articulate the theoretical rationale for the MLMM and illustrate how it can dramatically improve efficiency over the traditional regression approach, better account for complex MCS designs, and provide new insights into the generalizability of MCS findings.

大語言模型 · 可辨認的 · 道德化 · INFORMS · AIM ·

2024 年 1 月 22 日

The Ethics of Interaction: Mitigating Security Threats in LLMs

Ashutosh Kumar,Sagarika Singh,Shiv Vignesh Murty,Swathy Ragupathy

This paper comprehensively explores the ethical challenges arising from security threats to Language Learning Models (LLMs). These intricate digital repositories are increasingly integrated into our daily lives, making them prime targets for attacks that can compromise their training data and the confidentiality of their data sources. The paper delves into the nuanced ethical repercussions of such security threats on society and individual privacy. We scrutinize five major threats: prompt injection, jailbreaking, Personal Identifiable Information (PII) exposure, sexually explicit content, and hate based content, going beyond mere identification to assess their critical ethical consequences and the urgency they create for robust defensive strategies. The escalating reliance on LLMs underscores the crucial need for ensuring these systems operate within the bounds of ethical norms, particularly as their misuse can lead to significant societal and individual harm. We propose conceptualizing and developing an evaluative tool tailored for LLMs, which would serve a dual purpose, guiding developers and designers in preemptive fortification of backend systems and scrutinizing the ethical dimensions of LLM chatbot responses during the testing phase. By comparing LLM responses with those expected from humans in a moral context, we aim to discern the degree to which AI behaviors align with the ethical values held by a broader society. Ultimately, this paper not only underscores the ethical troubles presented by LLMs, it also highlights a path toward cultivating trust in these systems.

優化器 · 非線性規劃 · Performer · 類別 · 控制器 ·

2024 年 1 月 20 日

Constraint-Generation Policy Optimization (CGPO): Nonlinear Programming for Policy Optimization in Mixed Discrete-Continuous MDPs

Michael Gimelfarb,Ayal Taitler,Scott Sanner

We propose Constraint-Generation Policy Optimization (CGPO) for optimizing policy parameters within compact and interpretable policy classes for mixed discrete-continuous Markov Decision Processes (DC-MDPs). CGPO is not only able to provide bounded policy error guarantees over an infinite range of initial states for many DC-MDPs with expressive nonlinear dynamics, but it can also provably derive optimal policies in cases where it terminates with zero error. Furthermore, CGPO can generate worst-case state trajectories to diagnose policy deficiencies and provide counterfactual explanations of optimal actions. To achieve such results, CGPO proposes a bi-level mixed-integer nonlinear optimization framework for optimizing policies within defined expressivity classes (i.e. piecewise (non)-linear) and reduces it to an optimal constraint generation methodology that adversarially generates worst-case state trajectories. Furthermore, leveraging modern nonlinear optimizers, CGPO can obtain solutions with bounded optimality gap guarantees. We handle stochastic transitions through explicit marginalization (where applicable) or chance-constraints, providing high-probability policy performance guarantees. We also present a road-map for understanding the computational complexities associated with different expressivity classes of policy, reward, and transition dynamics. We experimentally demonstrate the applicability of CGPO in diverse domains, including inventory control, management of a system of water reservoirs, and physics control. In summary, we provide a solution for deriving structured, compact, and explainable policies with bounded performance guarantees, enabling worst-case scenario generation and counterfactual policy diagnostics.

評論員 · 近似 · 模型評估 · 周期的 · Networking ·

2024 年 1 月 19 日

Accuracy Analysis of Physics-Informed Neural Networks for Approximating the Critical SQG Equation

Elie Abdo,Ruimeng Hu,Quyuan Lin

from arxiv, 21 pages

We systematically analyze the accuracy of Physics-Informed Neural Networks (PINNs) in approximating solutions to the critical Surface Quasi-Geostrophic (SQG) equation on two-dimensional periodic boxes. The critical SQG equation involves advection and diffusion described by nonlocal periodic operators, posing challenges for neural network-based methods that do not commonly exhibit periodic boundary conditions. In this paper, we present a novel approximation of these operators using their nonperiodic analogs based on singular integral representation formulas and use it to perform error estimates. This idea can be generalized to a larger class of nonlocal partial differential equations whose solutions satisfy prescribed boundary conditions, thereby initiating a new PINNs theory for equations with nonlocalities.

語音增強 · Learning · 相似度 · 無監督 · 秩 ·

2024 年 1 月 19 日

NOMAD: Unsupervised Learning of Perceptual Embeddings for Speech Enhancement and Non-matching Reference Audio Quality Assessment

Alessandro Ragano,Jan Skoglund,Andrew Hines

from arxiv, Accepted for ICASSP 2024

This paper presents NOMAD (Non-Matching Audio Distance), a differentiable perceptual similarity metric that measures the distance of a degraded signal against non-matching references. The proposed method is based on learning deep feature embeddings via a triplet loss guided by the Neurogram Similarity Index Measure (NSIM) to capture degradation intensity. During inference, the similarity score between any two audio samples is computed through Euclidean distance of their embeddings. NOMAD is fully unsupervised and can be used in general perceptual audio tasks for audio analysis e.g. quality assessment and generative tasks such as speech enhancement and speech synthesis. The proposed method is evaluated with 3 tasks. Ranking degradation intensity, predicting speech quality, and as a loss function for speech enhancement. Results indicate NOMAD outperforms other non-matching reference approaches in both ranking degradation intensity and quality assessment, exhibiting competitive performance with full-reference audio metrics. NOMAD demonstrates a promising technique that mimics human capabilities in assessing audio quality with non-matching references to learn perceptual embeddings without the need for human-generated labels.

有偏 · 樣本 · Continuity · MoDELS · Extensibility ·

2024 年 1 月 19 日

Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video

Zhaobo Qi,Yibo Yuan,Xiaowen Ruan,Shuhui Wang,Weigang Zhang,Qingming Huang

from arxiv, accepted by AAAI 2024

Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias issue, which is caused by the uneven temporal distribution of the target moments for samples with similar semantic components in input videos or query texts. Existing methods resort to utilizing prior knowledge about bias to artificially break this uneven distribution, which only removes a limited amount of significant language biases. In this work, we propose the bias-conflict sample synthesis and adversarial removal debias strategy (BSSARD), which dynamically generates bias-conflict samples by explicitly leveraging potentially spurious correlations between single-modality features and the temporal position of the target moments. Through adversarial training, its bias generators continuously introduce biases and generate bias-conflict samples to deceive its grounding model. Meanwhile, the grounding model continuously eliminates the introduced biases, which requires it to model multi-modality alignment information. BSSARD will cover most kinds of coupling relationships and disrupt language and visual biases simultaneously. Extensive experiments on Charades-CD and ActivityNet-CD demonstrate the promising debiasing capability of BSSARD. Source codes are available at //github.com/qzhb/BSSARD.

INTERACT · 代碼 · 大語言模型 · 語言模型化 · 學習器 ·

2024 年 1 月 19 日

INTERVENOR: Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing

Hanbin Wang,Zhenghao Liu,Shuo Wang,Ganqu Cui,Ning Ding,Zhiyuan Liu,Ge Yu

from arxiv, 26 pages, 19 figures, 8 tables

This paper proposes INTERactiVE chaiN Of Repairing (INTERVENOR), which mimics human code repairing behavior (iteratively judging, rethinking, and repairing) and prompts the coding ability of regard Large Language Models (LLMs). Specifically, INTERVENOR employs two LLM based agents, Code Learner and Code Teacher, to play different roles in code repairing and work interactively to repair the generated codes. The Code Learner is asked to generate and repair code according to the instructions from the Code Teacher. The Code Teacher rethinks the code errors according to the corresponding feedback from compilers and iteratively generates the chain-of-repairing (CoR) to guide the code repairing process for Code Learner. Our experiments show that INTERVENOR outperforms the state-of-the-art methods and achieves about 13% and 4.5% improvements over the GPT-3.5 model in code generation and code translation tasks, respectively. Our further analyses show that CoR can illuminate the bug reasons and solution plans via natural language. With the feedback of code compilers, INTERVENOR can accurately identify the syntax errors and assertion errors in the code and provide precise instructions to repair codes. All data and codes are available at //github.com/NEUIR/INTERVENOR

SR · tuning · MoDELS · INFORMS · Performer ·

2024 年 1 月 18 日

Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Xin Yuan,Jinoo Baek,Keyang Xu,Omer Tov,Hongliang Fei

from arxiv, WACV'24 workshop

We propose an efficient diffusion-based text-to-video super-resolution (SR) tuning approach that leverages the readily learned capacity of pixel level image diffusion model to capture spatial information for video generation. To accomplish this goal, we design an efficient architecture by inflating the weightings of the text-to-image SR model into our video generation framework. Additionally, we incorporate a temporal adapter to ensure temporal coherence across video frames. We investigate different tuning approaches based on our inflated architecture and report trade-offs between computational costs and super-resolution quality. Empirical evaluation, both quantitative and qualitative, on the Shutterstock video dataset, demonstrates that our approach is able to perform text-to-video SR generation with good visual quality and temporal consistency. To evaluate temporal coherence, we also present visualizations in video format in //drive.google.com/drive/folders/1YVc-KMSJqOrEUdQWVaI-Yfu8Vsfu_1aO?usp=sharing .