韩国成年性午夜免费视频,国产视频999免费在线观看,国产99R福利小视频精品

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

相關內容

縮放

關注 0

知識 (knowledge) · 穩健性 · MoDELS · 語言模型化 · 大語言模型 ·

2024 年 3 月 26 日

Robust and Scalable Model Editing for Large Language Models

Yingfa Chen,Zhengyan Zhang,Xu Han,Chaojun Xiao,Zhiyuan Liu,Chen Chen,Kuai Li,Tao Yang,Maosong Sun

from arxiv, LREC-COLING 2024 paper, 16 pages, 4 figures

Large language models (LLMs) can make predictions using parametric knowledge--knowledge encoded in the model weights--or contextual knowledge--knowledge presented in the context. In many scenarios, a desirable behavior is that LLMs give precedence to contextual knowledge when it conflicts with the parametric knowledge, and fall back to using their parametric knowledge when the context is irrelevant. This enables updating and correcting the model's knowledge by in-context editing instead of retraining. Previous works have shown that LLMs are inclined to ignore contextual knowledge and fail to reliably fall back to parametric knowledge when presented with irrelevant context. In this work, we discover that, with proper prompting methods, instruction-finetuned LLMs can be highly controllable by contextual knowledge and robust to irrelevant context. Utilizing this feature, we propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing. To better evaluate the robustness of model editors, we collect a new dataset, that contains irrelevant questions that are more challenging than the ones in existing datasets. Empirical results show that our method outperforms current state-of-the-art methods by a large margin. Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs (and vice versa). The source code can be found at //github.com/thunlp/EREN.

變換 · 可約的 · Performer · Learning · MoDELS ·

2024 年 3 月 26 日

Learning Flexible Body Collision Dynamics with Hierarchical Contact Mesh Transformer

Youn-Yeol Yu,Jeongwhan Choi,Woojin Cho,Kookjin Lee,Nayong Kim,Kiseok Chang,Chang-Seung Woo,Ilho Kim,Seok-Woo Lee,Joon-Young Yang,Sooyoung Yoon,Noseong Park

from arxiv, Accepted at ICLR 2024

Recently, many mesh-based graph neural network (GNN) models have been proposed for modeling complex high-dimensional physical systems. Remarkable achievements have been made in significantly reducing the solving time compared to traditional numerical solvers. These methods are typically designed to i) reduce the computational cost in solving physical dynamics and/or ii) propose techniques to enhance the solution accuracy in fluid and rigid body dynamics. However, it remains under-explored whether they are effective in addressing the challenges of flexible body dynamics, where instantaneous collisions occur within a very short timeframe. In this paper, we present Hierarchical Contact Mesh Transformer (HCMT), which uses hierarchical mesh structures and can learn long-range dependencies (occurred by collisions) among spatially distant positions of a body -- two close positions in a higher-level mesh correspond to two distant positions in a lower-level mesh. HCMT enables long-range interactions, and the hierarchical mesh structure quickly propagates collision effects to faraway positions. To this end, it consists of a contact mesh Transformer and a hierarchical mesh Transformer (CMT and HMT, respectively). Lastly, we propose a flexible body dynamics dataset, consisting of trajectories that reflect experimental settings frequently used in the display industry for product designs. We also compare the performance of several baselines using well-known benchmark datasets. Our results show that HCMT provides significant performance improvements over existing methods. Our code is available at //github.com/yuyudeep/hcmt.

約束 · Learning · MoDELS · 機器人 · 模糊邏輯 ·

2024 年 3 月 25 日

Learning Symbolic and Subsymbolic Temporal Task Constraints from Bimanual Human Demonstrations

Christian Dreher,Tamim Asfour

from arxiv, 8 pages, submitted to IROS 2024

Learning task models of bimanual manipulation from human demonstration and their execution on a robot should take temporal constraints between actions into account. This includes constraints on (i) the symbolic level such as precedence relations or temporal overlap in the execution, and (ii) the subsymbolic level such as the duration of different actions, or their starting and end points in time. Such temporal constraints are crucial for temporal planning, reasoning, and the exact timing for the execution of bimanual actions on a bimanual robot. In our previous work, we addressed the learning of temporal task constraints on the symbolic level and demonstrated how a robot can leverage this knowledge to respond to failures during execution. In this work, we propose a novel model-driven approach for the combined learning of symbolic and subsymbolic temporal task constraints from multiple bimanual human demonstrations. Our main contributions are a subsymbolic foundation of a temporal task model that describes temporal nexuses of actions in the task based on distributions of temporal differences between semantic action keypoints, as well as a method based on fuzzy logic to derive symbolic temporal task constraints from this representation. This complements our previous work on learning comprehensive temporal task models by integrating symbolic and subsymbolic information based on a subsymbolic foundation, while still maintaining the symbolic expressiveness of our previous approach. We compare our proposed approach with our previous pure-symbolic approach and show that we can reproduce and even outperform it. Additionally, we show how the subsymbolic temporal task constraints can synchronize otherwise unimanual movement primitives for bimanual behavior on a humanoid robot.

Networking · 策略搜索 · 樣本 · 穩健性 · MoDELS ·

2024 年 3 月 23 日

Deep Gaussian Covariance Network with Trajectory Sampling for Data-Efficient Policy Search

Can Bogoclu,Robert Vosshall,Kevin Cremanns,Dirk Roos

Probabilistic world models increase data efficiency of model-based reinforcement learning (MBRL) by guiding the policy with their epistemic uncertainty to improve exploration and acquire new samples. Moreover, the uncertainty-aware learning procedures in probabilistic approaches lead to robust policies that are less sensitive to noisy observations compared to uncertainty unaware solutions. We propose to combine trajectory sampling and deep Gaussian covariance network (DGCN) for a data-efficient solution to MBRL problems in an optimal control setting. We compare trajectory sampling with density-based approximation for uncertainty propagation using three different probabilistic world models; Gaussian processes, Bayesian neural networks, and DGCNs. We provide empirical evidence using four different well-known test environments, that our method improves the sample-efficiency over other combinations of uncertainty propagation methods and probabilistic models. During our tests, we place particular emphasis on the robustness of the learned policies with respect to noisy initial states.

Learning · 深度強化學習 · 強化學習 · Integration · SimPLe ·

2024 年 3 月 23 日

Utilizing Motion Matching with Deep Reinforcement Learning for Target Location Tasks

Jeongmin Lee,Taesoo Kwon,Hyunju Shin,Yoonsang Lee

from arxiv, Eurographics 2024 Short Papers

We present an approach using deep reinforcement learning (DRL) to directly generate motion matching queries for long-term tasks, particularly targeting the reaching of specific locations. By integrating motion matching and DRL, our method demonstrates the rapid learning of policies for target location tasks within minutes on a standard desktop, employing a simple reward design. Additionally, we propose a unique hit reward and obstacle curriculum scheme to enhance policy learning in environments with moving obstacles.

MoDELS · 控制器 · Performer · 端到端 · Learning ·

2024 年 3 月 22 日

End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control

Daniel Mayfrank,Alexander Mitsos,Manuel Dahmen

from arxiv, manuscript (18 pages, 7 figures, 5 tables), supplementary materials (3 pages, 2 tables)

(Economic) nonlinear model predictive control ((e)NMPC) requires dynamic models that are sufficiently accurate and computationally tractable. Data-driven surrogate models for mechanistic models can reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum prediction accuracy on simulation samples and perform suboptimally in (e)NMPC. We present a method for end-to-end reinforcement learning of Koopman surrogate models for optimal performance as part of (e)NMPC. We apply our method to two applications derived from an established nonlinear continuous stirred-tank reactor model. The controller performance is compared to that of (e)NMPCs utilizing models trained using system identification, and model-free neural network controllers trained using reinforcement learning. We show that the end-to-end trained models outperform those trained using system identification in (e)NMPC, and that, in contrast to the neural network controllers, the (e)NMPC controllers can react to changes in the control setting without retraining.

相互獨立的 · Learning · 掩碼 · SimPLe · MoDELS ·

2024 年 3 月 22 日

Learning to Embed Time Series Patches Independently

Seunghan Lee,Taeyoung Park,Kibok Lee

from arxiv, ICLR 2024

Masked time series modeling has recently gained much attention as a self-supervised representation learning strategy for time series. Inspired by masked image modeling in computer vision, recent works first patchify and partially mask out time series, and then train Transformers to capture the dependencies between patches by predicting masked patches from unmasked patches. However, we argue that capturing such patch dependencies might not be an optimal strategy for time series representation learning; rather, learning to embed patches independently results in better time series representations. Specifically, we propose to use 1) the simple patch reconstruction task, which autoencode each patch without looking at other patches, and 2) the simple patch-wise MLP that embeds each patch independently. In addition, we introduce complementary contrastive learning to hierarchically capture adjacent time series information efficiently. Our proposed method improves time series forecasting and classification performance compared to state-of-the-art Transformer-based models, while it is more efficient in terms of the number of parameters and training/inference time. Code is available at this repository: //github.com/seunghan96/pits.

道德化 · 極小點 · Agent · Continuity · MoDELS ·

2023 年 7 月 2 日

Minimum Levels of Interpretability for Artificial Moral Agents

Avish Vijayaraghavan,Cosmin Badea

As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.

數據增強 · Taxonomy · 文本分類 · Machine Learning · 訓練數據 ·

2021 年 7 月 7 日

A Survey on Data Augmentation for Text Classification

Markus Bayer,Marc-André Kaufhold,Christian Reuter

from arxiv, 35 pages, 6 figures, 8 tables

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).

Machine Translation · NMT · Performer · state-of-the-art · 學成 ·

2018 年 6 月 1 日

A Survey of Domain Adaptation for Neural Machine Translation

Chenhui Chu,Rui Wang

from arxiv, COLING 2018, 16 pages, 9 figures

Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.