97SE亚洲国产综合在线,中文字幕无码乱人伦漫画,成人片免费无码视频,日韩精品一区二区不卡中文字幕,国内视频精品极品在线播放

Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be efficient in terms of statistical complexity, computational complexity, and query complexity. In this work, we consider the RLHF setting where the feedback is given in the format of preferences over pairs of trajectories. In the linear MDP model, using randomization in algorithm design, we present an algorithm that is sample efficient (i.e., has near-optimal worst-case regret bounds) and has polynomial running time (i.e., computational complexity is polynomial with respect to relevant parameters). Our algorithm further minimizes the query complexity through a novel randomized active learning procedure. In particular, our algorithm demonstrates a near-optimal tradeoff between the regret bound and the query complexity. To extend the results to more general nonlinear function approximation, we design a model-based randomized algorithm inspired by the idea of Thompson sampling. Our algorithm minimizes Bayesian regret bound and query complexity, again achieving a near-optimal tradeoff between these two quantities. Computation-wise, similar to the prior Thompson sampling algorithms under the regular RL setting, the main computation primitives of our algorithm are Bayesian supervised learning oracles which have been heavily investigated on the empirical side when applying Thompson sampling algorithms to RL benchmark problems.

相關內容

Learning

關注 12

分解的 · 潛在 · 估計/估計量 · 統計量 · Facebook AI Research ·

2024 年 4 月 25 日

Statistical Inference for Covariate-Adjusted and Interpretable Generalized Factor Model with Application to Testing Fairness

Jing Ouyang,Chengyu Cui,Kean Ming Tan,Gongjun Xu

In the era of data explosion, statisticians have been developing interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wide applications, such as evaluating the fairness of educational testing, where the covariate effect reflects whether a test question is biased toward certain individual characteristics (e.g., gender and race) taking into account their latent abilities. However, the large sample size, substantial covariate dimension, and great test length pose challenges to developing efficient methods and drawing valid inferences. Moreover, to accommodate the commonly encountered discrete types of responses, nonlinear latent factor models are often assumed, bringing further complexity to the problem. To address these challenges, we consider a covariate-adjusted generalized factor model and develop novel and interpretable conditions to address the identifiability issue. Based on the identifiability conditions, we propose a joint maximum likelihood estimation method and establish estimation consistency and asymptotic normality results for the covariate effects under a practical yet challenging asymptotic regime. Furthermore, we derive estimation and inference results for latent factors and the factor loadings. We illustrate the finite sample performance of the proposed method through extensive numerical studies and an application to an educational assessment dataset obtained from the Programme for International Student Assessment (PISA).

多樣性 · 損失 · 語言模型化 · 代價 · 詞元分析器 ·

2024 年 4 月 25 日

Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples

Kuofeng Gao,Jindong Gu,Yang Bai,Shu-Tao Xia,Philip Torr,Wei Liu,Zhifeng Li

from arxiv, arXiv admin note: substantial text overlap with arXiv:2401.11170

Despite the exceptional performance of multi-modal large language models (MLLMs), their deployment requires substantial computational resources. Once malicious users induce high energy consumption and latency time (energy-latency cost), it will exhaust computational resources and harm availability of service. In this paper, we investigate this vulnerability for MLLMs, particularly image-based and video-based ones, and aim to induce high energy-latency cost during inference by crafting an imperceptible perturbation. We find that high energy-latency cost can be manipulated by maximizing the length of generated sequences, which motivates us to propose verbose samples, including verbose images and videos. Concretely, two modality non-specific losses are proposed, including a loss to delay end-of-sequence (EOS) token and an uncertainty loss to increase the uncertainty over each generated token. In addition, improving diversity is important to encourage longer responses by increasing the complexity, which inspires the following modality specific loss. For verbose images, a token diversity loss is proposed to promote diverse hidden states. For verbose videos, a frame feature diversity loss is proposed to increase the feature diversity among frames. To balance these losses, we propose a temporal weight adjustment algorithm. Experiments demonstrate that our verbose samples can largely extend the length of generated sequences.

Performer · Networking · 預測器/決策函數 · 圖 · 估計/估計量 ·

2024 年 4 月 25 日

Surprisingly Strong Performance Prediction with Neural Graph Features

Gabriela Kadlecová,Jovita Lukasik,Martin Pilát,Petra Vidnerová,Mahmoud Safari,Roman Neruda,Frank Hutter

from arxiv, 45 pages, 30 figures

Performance prediction has been a key part of the neural architecture search (NAS) process, allowing to speed up NAS algorithms by avoiding resource-consuming network training. Although many performance predictors correlate well with ground truth performance, they require training data in the form of trained networks. Recently, zero-cost proxies have been proposed as an efficient method to estimate network performance without any training. However, they are still poorly understood, exhibit biases with network properties, and their performance is limited. Inspired by the drawbacks of zero-cost proxies, we propose neural graph features (GRAF), simple to compute properties of architectural graphs. GRAF offers fast and interpretable performance prediction while outperforming zero-cost proxies and other common encodings. In combination with other zero-cost proxies, GRAF outperforms most existing performance predictors at a fraction of the cost.

ENJOY · CASE · 知識 (knowledge) · 人工智能 · 數據庫 ·

2024 年 4 月 24 日

Constructive Interpolation and Concept-Based Beth Definability for Description Logics via Sequents

Tim S. Lyon,Jonas Karge

from arxiv, Accepted to IJCAI 2024

We introduce a constructive method applicable to a large number of description logics (DLs) for establishing the concept-based Beth definability property (CBP) based on sequent systems. Using the highly expressive DL RIQ as a case study, we introduce novel sequent calculi for RIQ-ontologies and show how certain interpolants can be computed from sequent calculus proofs, which permit the extraction of explicit definitions of implicitly definable concepts. To the best of our knowledge, this is the first sequent-based approach to computing interpolants and definitions within the context of DLs, as well as the first proof that RIQ enjoys the CBP. Moreover, due to the modularity of our sequent systems, our results hold for any restriction of RIQ, and are applicable to other DLs by suitable modifications.

樣例 · CASE · 變換 · 約束 · 相同 ·

2024 年 4 月 21 日

Splitting Techniques for DAEs with port-Hamiltonian Applications

Andreas Bartel,Malak Diab,Andreas Frommer,Michael Günther,Nicole Marheineke

from arxiv, 31 pages, 22 figures

In the simulation of differential-algebraic equations (DAEs), it is essential to employ numerical schemes that take into account the inherent structure and maintain explicit or hidden algebraic constraints without altering them. This paper focuses on operator-splitting techniques for coupled systems and aims at preserving the structure in the port-Hamiltonian framework. The study explores two decomposition strategies: one considering the underlying coupled subsystem structure and the other addressing energy-associated properties such as conservation and dissipation. We show that for coupled index-$1$ DAEs with and without private index-2 variables, the splitting schemes on top of a dimension-reducing decomposition achieve the same convergence rate as in the case of ordinary differential equations. Additionally, we discuss an energy-associated decomposition for index-1 pH-DAEs and introduce generalized Cayley transforms to uphold energy conservation. The effectiveness of both strategies is evaluated using port-Hamiltonian benchmark examples from electric circuits.

Analysis · 泛函 · 漢明距離 · 可約的 · 查準率/準確率 ·

2024 年 4 月 20 日

Runtime Analysis for Permutation-based Evolutionary Algorithms

Benjamin Doerr,Yassine Ghannane,Marouane Ibn Brahim

from arxiv, Journal version of our paper at GECCO 2022, appeared in Algorithmica. 52 pages. arXiv admin note: substantial text overlap with arXiv:2204.07637

While the theoretical analysis of evolutionary algorithms (EAs) has made significant progress for pseudo-Boolean optimization problems in the last 25 years, only sporadic theoretical results exist on how EAs solve permutation-based problems. To overcome the lack of permutation-based benchmark problems, we propose a general way to transfer the classic pseudo-Boolean benchmarks into benchmarks defined on sets of permutations. We then conduct a rigorous runtime analysis of the permutation-based $(1+1)$ EA proposed by Scharnow, Tinnefeld, and Wegener (2004) on the analogues of the LeadingOnes and Jump benchmarks. The latter shows that, different from bit-strings, it is not only the Hamming distance that determines how difficult it is to mutate a permutation $\sigma$ into another one $\tau$, but also the precise cycle structure of $\sigma \tau^{-1}$. For this reason, we also regard the more symmetric scramble mutation operator. We observe that it not only leads to simpler proofs, but also reduces the runtime on jump functions with odd jump size by a factor of $\Theta(n)$. Finally, we show that a heavy-tailed version of the scramble operator, as in the bit-string case, leads to a speed-up of order $m^{\Theta(m)}$ on jump functions with jump size $m$. A short empirical analysis confirms these findings, but also reveals that small implementation details like the rate of void mutations can make an important difference.

Continuity · MoDELS · HTTPS · Learning · 前向 ·

2024 年 4 月 18 日

Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay

Jinmei Liu,Wenbin Li,Xiangyu Yue,Shilin Zhang,Chunlin Chen,Zhi Wang

We study continual offline reinforcement learning, a practical paradigm that facilitates forward transfer and mitigates catastrophic forgetting to tackle sequential offline tasks. We propose a dual generative replay framework that retains previous knowledge by concurrent replay of generated pseudo-data. First, we decouple the continual learning policy into a diffusion-based generative behavior model and a multi-head action evaluation model, allowing the policy to inherit distributional expressivity for encompassing a progressive range of diverse behaviors. Second, we train a task-conditioned diffusion model to mimic state distributions of past tasks. Generated states are paired with corresponding responses from the behavior generator to represent old tasks with high-fidelity replayed samples. Finally, by interleaving pseudo samples with real ones of the new task, we continually update the state and behavior generators to model progressively diverse behaviors, and regularize the multi-head critic via behavior cloning to mitigate forgetting. Experiments demonstrate that our method achieves better forward transfer with less forgetting, and closely approximates the results of using previous ground-truth data due to its high-fidelity replay of the sample space. Our code is available at \href{//github.com/NJU-RL/CuGRO}{//github.com/NJU-RL/CuGRO}.

估計/估計量 · MoDELS · 經驗分布 · Performer · 樣例 ·

2024 年 4 月 18 日

A Distributions-based Approach for Data-Consistent Inversion

Kirana Bergstrom,Troy Butler,Tim Wildey

from arxiv, 26 pages, 19 figures

We formulate a novel approach to solve a class of stochastic problems, referred to as data-consistent inverse (DCI) problems, which involve the characterization of a probability measure on the parameters of a computational model whose subsequent push-forward matches an observed probability measure on specified quantities of interest (QoI) typically associated with the outputs from the computational model. Whereas prior DCI solution methodologies focused on either constructing non-parametric estimates of the densities or the probabilities of events associated with the pre-image of the QoI map, we develop and analyze a constrained quadratic optimization approach based on estimating push-forward measures using weighted empirical distribution functions. The method proposed here is more suitable for low-data regimes or high-dimensional problems than the density-based method, as well as for problems where the probability measure does not admit a density. Numerical examples are included to demonstrate the performance of the method and to compare with the density-based approach where applicable.

圖卷積神經網絡/圖卷積網絡 · 情感分類 · 圖卷積 · INFORMS · 卷積 ·

2019 年 9 月 8 日

Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks

Chen Zhang,Qiuchi Li,Dawei Song

from arxiv, 11 pages, 4 figures, accepted to EMNLP 2019

Due to their inherent capability in semantic alignment of aspects and their context words, attention mechanism and Convolutional Neural Networks (CNNs) are widely applied for aspect-based sentiment classification. However, these models lack a mechanism to account for relevant syntactical constraints and long-range word dependencies, and hence may mistakenly recognize syntactically irrelevant contextual words as clues for judging aspect sentiment. To tackle this problem, we propose to build a Graph Convolutional Network (GCN) over the dependency tree of a sentence to exploit syntactical information and word dependencies. Based on it, a novel aspect-specific sentiment classification framework is raised. Experiments on three benchmarking collections illustrate that our proposed model has comparable effectiveness to a range of state-of-the-art models, and further demonstrate that both syntactical information and long-range word dependencies are properly captured by the graph convolution structure.

entity · Performer · 圖 · 知識圖譜 · MoDELS ·

2019 年 6 月 4 日

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Deepak Nathani,Jatin Chauhan,Charu Sharma,Manohar Kaul

from arxiv, accepted as long paper in ACL 2019

The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.