精品夜色国产国偷自产乱码_国产亚洲欧美丝袜在线观看三区_AV无码天一区二区一三区_99久久久精品免费观看国产蜜_久久国产免费2020_三级精品三级在专区_视频欧美日韩一区二区三区

In this work, we compare emergent communication (EC) built upon multi-agent deep reinforcement learning (MADRL) and language-oriented semantic communication (LSC) empowered by a pre-trained large language model (LLM) using human language. In a multi-agent remote navigation task, with multimodal input data comprising location and channel maps, it is shown that EC incurs high training cost and struggles when using multimodal data, whereas LSC yields high inference computing cost due to the LLM's large size. To address their respective bottlenecks, we propose a novel framework of language-guided EC (LEC) by guiding the EC training using LSC via knowledge distillation (KD). Simulations corroborate that LEC achieves faster travel time while avoiding areas with poor channel conditions, as well as speeding up the MADRL training convergence by up to 61.8% compared to EC.

相關內容

大語(yu)言模型

關注 56

大(da)語(yu)言(yan)(yan)(yan)(yan)模型(xing)(xing)是基于海量文本(ben)數據訓練的(de)(de)深度學習模型(xing)(xing)。它(ta)不(bu)(bu)僅(jin)能(neng)(neng)夠生(sheng)成自然(ran)語(yu)言(yan)(yan)(yan)(yan)文本(ben)，還能(neng)(neng)夠深入理(li)解文本(ben)含義，處(chu)理(li)各種自然(ran)語(yu)言(yan)(yan)(yan)(yan)任務(wu)，如文本(ben)摘要、問答、翻譯等。2023年，大(da)語(yu)言(yan)(yan)(yan)(yan)模型(xing)(xing)及其(qi)(qi)在人(ren)(ren)(ren)工(gong)智能(neng)(neng)領域的(de)(de)應用已成為(wei)全球科技(ji)研究(jiu)的(de)(de)熱點，其(qi)(qi)在規模上的(de)(de)增長尤為(wei)引人(ren)(ren)(ren)注目，參(can)數量已從最初的(de)(de)十幾億(yi)(yi)躍升到如今的(de)(de)一萬億(yi)(yi)。參(can)數量的(de)(de)提升使得(de)模型(xing)(xing)能(neng)(neng)夠更加(jia)精細地捕捉人(ren)(ren)(ren)類語(yu)言(yan)(yan)(yan)(yan)微妙之處(chu)，更加(jia)深入地理(li)解人(ren)(ren)(ren)類語(yu)言(yan)(yan)(yan)(yan)的(de)(de)復(fu)雜(za)性。在過去的(de)(de)一年里，大(da)語(yu)言(yan)(yan)(yan)(yan)模型(xing)(xing)在吸納新知識、分解復(fu)雜(za)任務(wu)以及圖文對(dui)齊(qi)等多方面都有顯著提升。隨著技(ji)術的(de)(de)不(bu)(bu)斷(duan)成熟，它(ta)將不(bu)(bu)斷(duan)拓展其(qi)(qi)應用范圍，為(wei)人(ren)(ren)(ren)類提供(gong)更加(jia)智能(neng)(neng)化和個性化的(de)(de)服(fu)務(wu)，進(jin)一步改善人(ren)(ren)(ren)們的(de)(de)生(sheng)活和生(sheng)產方式(shi)。

邊 · MoDELS · Performer · Learning · 邊緣計算 ·

2024 年 4 月 12 日

Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing

Cui Zhang,Xiao Xu,Qiong Wu,Pingyi Fan,Qiang Fan,Huiling Zhu,Jiangzhou Wang

from arxiv, This paper has been accepted by China Communications.The source code has been released at://github.com/giongwu86/By-AFLDDPG

In vehicle edge computing (VEC), asynchronous federated learning (AFL) is used, where the edge receives a local model and updates the global model, effectively reducing the global aggregation latency.Due to different amounts of local data,computing capabilities and locations of the vehicles, renewing the global model with same weight is inappropriate.The above factors will affect the local calculation time and upload time of the local model, and the vehicle may also be affected by Byzantine attacks, leading to the deterioration of the vehicle data. However, based on deep reinforcement learning (DRL), we can consider these factors comprehensively to eliminate vehicles with poor performance as much as possible and exclude vehicles that have suffered Byzantine attacks before AFL. At the same time, when aggregating AFL, we can focus on those vehicles with better performance to improve the accuracy and safety of the system. In this paper, we proposed a vehicle selection scheme based on DRL in VEC. In this scheme, vehicle s mobility, channel conditions with temporal variations, computational resources with temporal variations, different data amount, transmission channel status of vehicles as well as Byzantine attacks were taken into account.Simulation results show that the proposed scheme effectively improves the safety and accuracy of the global model.

Learning · Performer · ACP · 泛化理論 · 變換 ·

2024 年 4 月 9 日

Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction

Jinyuan Feng,Min Chen,Zhiqiang Pu,Tenghai Qiu,Jianqiang Yi

Multi-task reinforcement learning (MTRL) demonstrate potential for enhancing the generalization of a robot, enabling it to perform multiple tasks concurrently. However, the performance of MTRL may still be susceptible to conflicts between tasks and negative interference. To facilitate efficient MTRL, we propose Task-Specific Action Correction (TSAC), a general and complementary approach designed for simultaneous learning of multiple tasks. TSAC decomposes policy learning into two separate policies: a shared policy (SP) and an action correction policy (ACP). To alleviate conflicts resulting from excessive focus on specific tasks' details in SP, ACP incorporates goal-oriented sparse rewards, enabling an agent to adopt a long-term perspective and achieve generalization across tasks. Additional rewards transform the original problem into a multi-objective MTRL problem. Furthermore, to convert the multi-objective MTRL into a single-objective formulation, TSAC assigns a virtual expected budget to the sparse rewards and employs Lagrangian method to transform a constrained single-objective optimization into an unconstrained one. Experimental evaluations conducted on Meta-World's MT10 and MT50 benchmarks demonstrate that TSAC outperforms existing state-of-the-art methods, achieving significant improvements in both sample efficiency and effective action execution.

Learning · Performer · 潛在 · Backbone · 層 ·

2024 年 4 月 5 日

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

Xudong Guo,Daming Shi,Junjie Yu,Wenhui Fan

The rise of multi-agent systems, especially the success of multi-agent reinforcement learning (MARL), is reshaping our future across diverse domains like autonomous vehicle networks. However, MARL still faces significant challenges, particularly in achieving zero-shot scalability, which allows trained MARL models to be directly applied to unseen tasks with varying numbers of agents. In addition, real-world multi-agent systems usually contain agents with different functions and strategies, while the existing scalable MARL methods only have limited heterogeneity. To address this, we propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO), integrating heterogeneity into parameter-shared PPO-based MARL networks. we first leverage a latent network to adaptively learn strategy patterns for each agent. Second, we introduce a heterogeneous layer for decision-making, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity at the same time. We implement our approach based on the state-of-the-art backbone PPO-based algorithm as SHPPO, while our approach is agnostic to the backbone and can be seamlessly plugged into any parameter-shared MARL method. SHPPO exhibits superior performance over the baselines such as MAPPO and HAPPO in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability and offering insights into the learned latent representation's impact on team performance by visualization.

資源管理 · Performer · 聯邦學習 · Learning · 可約的 ·

2024 年 4 月 3 日

Price-Discrimination Game for Distributed Resource Management in Federated Learning

Han Zhang,Halvin Yang,Guopeng Zhang

In vanilla federated learning (FL) such as FedAvg, the parameter server (PS) and multiple distributed clients can form a typical buyer's market, where the number of PS/buyers of FL services is far less than the number of clients/sellers. In order to improve the performance of FL and reduce the cost of motivating clients to participate in FL, this paper proposes to differentiate the pricing for services provided by different clients rather than simply providing the same service pricing for different clients. The price is differentiated based on the performance improvements brought to FL and their heterogeneity in computing and communication capabilities. To this end, a price-discrimination game (PDG) is formulated to comprehensively address the distributed resource management problems in FL, including multi-objective trade-off, client selection, and incentive mechanism. As the PDG is a mixed-integer nonlinear programming (MINLP) problem, a distributed semi-heuristic algorithm with low computational complexity and low communication overhead is designed to solve it. The simulation result verifies the effectiveness of the proposed approach.

Agent · Performer · Learning · 可辨認的 · 強化學習 ·

2024 年 4 月 2 日

Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning

Samuel Tovey,Christoph Lohrmann,Christian Holm

from arxiv, 12 pages, 6 figures

Reinforcement learning (RL) is a flexible and efficient method for programming micro-robots in complex environments. Here we investigate whether reinforcement learning can provide insights into biological systems when trained to perform chemotaxis. Namely, whether we can learn about how intelligent agents process given information in order to swim towards a target. We run simulations covering a range of agent shapes, sizes, and swim speeds to determine if the physical constraints on biological swimmers, namely Brownian motion, lead to regions where reinforcement learners' training fails. We find that the RL agents can perform chemotaxis as soon as it is physically possible and, in some cases, even before the active swimming overpowers the stochastic environment. We study the efficiency of the emergent policy and identify convergence in agent size and swim speeds. Finally, we study the strategy adopted by the reinforcement learning algorithm to explain how the agents perform their tasks. To this end, we identify three emerging dominant strategies and several rare approaches taken. These strategies, whilst producing almost identical trajectories in simulation, are distinct and give insight into the possible mechanisms behind which biological agents explore their environment and respond to changing conditions.

Markov · 數據集 · 部分可觀測馬爾可夫決策過程 · Learning · Processing（編程語言） ·

2024 年 4 月 1 日

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

Miao Lu,Yifei Min,Zhaoran Wang,Zhuoran Yang

from arxiv, Updates. 52 pages

We study offline reinforcement learning (RL) in partially observable Markov decision processes. In particular, we aim to learn an optimal policy from a dataset collected by a behavior policy which possibly depends on the latent state. Such a dataset is confounded in the sense that the latent state simultaneously affects the action and the observation, which is prohibitive for existing offline RL algorithms. To this end, we propose the \underline{P}roxy variable \underline{P}essimistic \underline{P}olicy \underline{O}ptimization (\texttt{P3O}) algorithm, which addresses the confounding bias and the distributional shift between the optimal and behavior policies in the context of general function approximation. At the core of \texttt{P3O} is a coupled sequence of pessimistic confidence regions constructed via proximal causal inference, which is formulated as minimax estimation. Under a partial coverage assumption on the confounded dataset, we prove that \texttt{P3O} achieves a $n^{-1/2}$-suboptimality, where $n$ is the number of trajectories in the dataset. To our best knowledge, \texttt{P3O} is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.

Learning · 樣本復雜度 · 樣本 · 優化器 · 轉移核 ·

2024 年 4 月 1 日

Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency

Lingxiao Wang,Qi Cai,Zhuoran Yang,Zhaoran Wang

from arxiv, Accepted by ICLR 2022

Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges. (i) It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon. (ii) The observation and state spaces are often continuous, which induces a sample complexity that scales exponentially with the extrinsic dimension. Addressing such challenges requires learning a minimal but sufficient representation of the observation and state histories by exploiting the structure of the POMDP. To this end, we propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy.~(i) For each step, ETC learns to represent the state with a low-dimensional feature, which factorizes the transition kernel. (ii) Across multiple steps, ETC learns to represent the full history with a low-dimensional embedding, which assembles the per-step feature. We integrate (i) and (ii) in a unified framework that allows a variety of estimators (including maximum likelihood estimators and generative adversarial networks). For a class of POMDPs with a low-rank structure in the transition kernel, ETC attains an $O(1/\epsilon^2)$ sample complexity that scales polynomially with the horizon and the intrinsic dimension (that is, the rank). Here $\epsilon$ is the optimality gap. To our best knowledge, ETC is the first sample-efficient algorithm that bridges representation learning and policy optimization in POMDPs with infinite observation and state spaces.

Performer · Networking · Analysis · 優化器 · Networks ·

2024 年 3 月 29 日

Cooperative Sensing and Communication for ISAC Networks: Performance Analysis and Optimization

Kaitao Meng,Christos Masouros

from arxiv, 7 pages, 5 figures, this paper has been submitted to IEEE for possible publication

In this work, we study integrated sensing and communication (ISAC) networks intending to effectively balance sensing and communication (S&C) performance at the network level. Through the simultaneous utilization of multi-point (CoMP) coordinated joint transmission and distributed multiple-input multiple-output (MIMO) radar techniques, we propose a cooperative networked ISAC scheme to enhance both S&C services. Then, the tool of stochastic geometry is exploited to capture the S&C performance, which allows us to illuminate key cooperative dependencies in the ISAC network. Remarkably, the derived expression of the Cramer-Rao lower bound (CRLB) of the localization accuracy unveils a significant finding: Deploying $N$ ISAC transceivers yields an enhanced sensing performance across the entire network, in accordance with the $\ln^2N$ scaling law. Simulation results demonstrate that compared to the time-sharing scheme, the proposed cooperative ISAC scheme can effectively improve the average data rate and reduce the CRLB.

流 · 決策樹 · MoDELS · 優化器 · 切分點 ·

2024 年 3 月 28 日

Finding Decision Tree Splits in Streaming and Massively Parallel Models

Huy Pham,Hoang Ta,Hoa T. Vu

In this work, we provide data stream algorithms that compute optimal splits in decision tree learning. In particular, given a data stream of observations $x_i$ and their labels $y_i$, the goal is to find the optimal split point $j$ that divides the data into two sets such that the mean squared error (for regression) or misclassification rate (for classification) is minimized. We provide various fast streaming algorithms that use sublinear space and a small number of passes for these problems. These algorithms can also be extended to the massively parallel computation model. Our work, while not directly comparable, complements the seminal work of Domingos and Hulten (KDD 2000).

Automator · AutoML · Machine Learning · 學成 · 可約的 ·

2019 年 1 月 17 日

Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Quanming Yao,Mengshuo Wang,Yuqiang Chen,Wenyuan Dai,Hu Yi-Qi,Li Yu-Feng,Tu Wei-Wei,Yang Qiang,Yu Yang

from arxiv, This is a preliminary and will be kept updated

Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursue good learning performance, human experts are heavily involved in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automated machine learning (AutoML) has emerged as a hot topic with both industrial and academic interest. In this paper, we provide an up to date survey on AutoML. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers most existing approaches to date but also can guide the design for new methods. Subsequently, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future research.