亚洲国产最新AV片-日本成年黄色一区二区三区

Collision avoidance is key for mobile robots and agents to operate safely in the real world. In this work we present SAFER, an efficient and effective collision avoidance system that is able to improve safety by correcting the control commands sent by an operator. It combines real-world reinforcement learning (RL), search-based online trajectory planning, and automatic emergency intervention, e.g. automatic emergency braking (AEB). The goal of the RL is to learn an effective corrective control action that is used in a focused search for collision-free trajectories, and to reduce the frequency of triggering automatic emergency braking. This novel setup enables the RL policy to learn safely and directly on mobile robots in a real-world indoor environment, minimizing actual crashes even during training. Our real-world experiments show that, when compared with several baselines, our approach enjoys a higher average speed, lower crash rate, less emergency intervention, smaller computation overhead, and smoother overall control.

相關內容

Learning

關注 12

Learning · Continuity · Guidance · 狀態空間 · Integration ·

2023 年 8 月 22 日

LEAGUE: Guided Skill Learning and Abstraction for Long-Horizon Manipulation

Shuo Cheng,Danfei Xu

from arxiv, Accepted to RA-L 2023

To assist with everyday human activities, robots must solve complex long-horizon tasks and generalize to new settings. Recent deep reinforcement learning (RL) methods show promise in fully autonomous learning, but they struggle to reach long-term goals in large environments. On the other hand, Task and Motion Planning (TAMP) approaches excel at solving and generalizing across long-horizon tasks, thanks to their powerful state and action abstractions. But they assume predefined skill sets, which limits their real-world applications. In this work, we combine the benefits of these two paradigms and propose an integrated task planning and skill learning framework named LEAGUE (Learning and Abstraction with Guidance). LEAGUE leverages the symbolic interface of a task planner to guide RL-based skill learning and creates abstract state space to enable skill reuse. More importantly, LEAGUE learns manipulation skills in-situ of the task planning system, continuously growing its capability and the set of tasks that it can solve. We evaluate LEAGUE on four challenging simulated task domains and show that LEAGUE outperforms baselines by large margins. We also show that the learned skills can be reused to accelerate learning in new tasks domains and transfer to a physical robot platform.

相互獨立的 · Agent · Learning · 強化學習 · 回合 ·

2023 年 8 月 21 日

CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision Making

Giovanni Minelli,Mirco Musolesi

Robust coordination skills enable agents to operate cohesively in shared environments, together towards a common goal and, ideally, individually without hindering each other's progress. To this end, this paper presents Coordinated QMIX (CoMIX), a novel training framework for decentralized agents that enables emergent coordination through flexible policies, allowing at the same time independent decision-making at individual level. CoMIX models selfish and collaborative behavior as incremental steps in each agent's decision process. This allows agents to dynamically adapt their behavior to different situations balancing independence and collaboration. Experiments using a variety of simulation environments demonstrate that CoMIX outperforms baselines on collaborative tasks. The results validate our incremental policy approach as effective technique for improving coordination in multi-agent systems.

Performer · 邊緣化 · Boosting（一種模型訓練加速方式） · 判別器 · Backbone ·

2023 年 8 月 21 日

Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation

Xiangtai Li,Haobo Yuan,Wenwei Zhang,Guangliang Cheng,Jiangmiao Pang,Chen Change Loy

from arxiv, ICCV-2023, Project page: //github.com/lxtGH/Tube-Link (fix typos and errors, update the results)

Video segmentation aims to segment and track every pixel in diverse scenarios accurately. In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks of video segmentation with a unified architecture. Our framework is a near-online approach that takes a short subclip as input and outputs the corresponding spatial-temporal tube masks. To enhance the modeling of cross-tube relationships, we propose an effective way to perform tube-level linking via attention along the queries. In addition, we introduce temporal contrastive learning to instance-wise discriminative features for tube-level association. Our approach offers flexibility and efficiency for both short and long video inputs, as the length of each subclip can be varied according to the needs of datasets or scenarios. Tube-Link outperforms existing specialized architectures by a significant margin on five video segmentation datasets. Specifically, it achieves almost 13% relative improvements on VIPSeg and 4% improvements on KITTI-STEP over the strong baseline Video K-Net. When using a ResNet50 backbone on Youtube-VIS-2019 and 2021, Tube-Link boosts IDOL by 3% and 4%, respectively.

Agent · 相互獨立的 · Learning · 深度強化學習 · 大學 ·

2023 年 8 月 21 日

FLARE: Fingerprinting Deep Reinforcement Learning Agents using Universal Adversarial Masks

Buse G. A. Tekgul,N. Asokan

from arxiv, Will appear in the proceedings of ACSAC 2023; 13 pages, 5 figures, 7 tables

We propose FLARE, the first fingerprinting mechanism to verify whether a suspected Deep Reinforcement Learning (DRL) policy is an illegitimate copy of another (victim) policy. We first show that it is possible to find non-transferable, universal adversarial masks, i.e., perturbations, to generate adversarial examples that can successfully transfer from a victim policy to its modified versions but not to independently trained policies. FLARE employs these masks as fingerprints to verify the true ownership of stolen DRL policies by measuring an action agreement value over states perturbed via such masks. Our empirical evaluations show that FLARE is effective (100% action agreement on stolen copies) and does not falsely accuse independent policies (no false positives). FLARE is also robust to model modification attacks and cannot be easily evaded by more informed adversaries without negatively impacting agent performance. We also show that not all universal adversarial masks are suitable candidates for fingerprints due to the inherent characteristics of DRL policies. The spatio-temporal dynamics of DRL problems and sequential decision-making process make characterizing the decision boundary of DRL policies more difficult, as well as searching for universal masks that capture the geometry of it.

Learning · DNN · 可辨認的 · 類別 · MoDELS ·

2023 年 8 月 21 日

Concept Evolution in Deep Learning Training: A Unified Interpretation Framework and Discoveries

Haekyu Park,Seongmin Lee,Benjamin Hoover,Austin P. Wright,Omar Shaikh,Rahul Duggal,Nilaksh Das,Judy Hoffman,Duen Horng Chau

from arxiv, Accepted at CIKM'23

We present ConceptEvo, a unified interpretation framework for deep neural networks (DNNs) that reveals the inception and evolution of learned concepts during training. Our work addresses a critical gap in DNN interpretation research, as existing methods primarily focus on post-training interpretation. ConceptEvo introduces two novel technical contributions: (1) an algorithm that generates a unified semantic space, enabling side-by-side comparison of different models during training, and (2) an algorithm that discovers and quantifies important concept evolutions for class predictions. Through a large-scale human evaluation and quantitative experiments, we demonstrate that ConceptEvo successfully identifies concept evolutions across different models, which are not only comprehensible to humans but also crucial for class predictions. ConceptEvo is applicable to both modern DNN architectures, such as ConvNeXt, and classic DNNs, such as VGGs and InceptionV3.

評論員 · Medium · Performer · Learning · 深度強化學習 ·

2023 年 8 月 18 日

DoCRL: Double Critic Deep Reinforcement Learning for Mapless Navigation of a Hybrid Aerial Underwater Vehicle with Medium Transition

Ricardo B. Grando,Junior C. de Jesus,Victor A. Kich,Alisson H. Kolling,Rodrigo S. Guerra,Paulo L. J. Drews-Jr

from arxiv, Accepted to the Latin American Symposium 2023. arXiv admin note: substantial text overlap with arXiv:2209.06332

Deep Reinforcement Learning (Deep-RL) techniques for motion control have been continuously used to deal with decision-making problems for a wide variety of robots. Previous works showed that Deep-RL can be applied to perform mapless navigation, including the medium transition of Hybrid Unmanned Aerial Underwater Vehicles (HUAUVs). These are robots that can operate in both air and water media, with future potential for rescue tasks in robotics. This paper presents new approaches based on the state-of-the-art Double Critic Actor-Critic algorithms to address the navigation and medium transition problems for a HUAUV. We show that double-critic Deep-RL with Recurrent Neural Networks using range data and relative localization solely improves the navigation performance of HUAUVs. Our DoCRL approaches achieved better navigation and transitioning capability, outperforming previous approaches.

優化器 · Microsoft Surface · 通道 · Performer · Learning ·

2023 年 8 月 18 日

RISnet: A Scalable Approach for Reconfigurable Intelligent Surface Optimization with Partial CSI

Bile Peng,Karl-Ludwig Besser,Ramprasad Raghunath,Vahid Jamali,Eduard A. Jorswieck

The reconfigurable intelligent surface (RIS) is a promising technology that enables wireless communication systems to achieve improved performance by intelligently manipulating wireless channels. In this paper, we consider the sum-rate maximization problem in a downlink multi-user multi-input-single-output (MISO) channel via space-division multiple access (SDMA). Two major challenges of this problem are the high dimensionality due to the large number of RIS elements and the difficulty to obtain the full channel state information (CSI), which is assumed known in many algorithms proposed in the literature. Instead, we propose a hybrid machine learning approach using the weighted minimum mean squared error (WMMSE) precoder at the base station (BS) and a dedicated neural network (NN) architecture, RISnet, for RIS configuration. The RISnet has a good scalability to optimize 1296 RIS elements and requires partial CSI of only 16 RIS elements as input. We show it achieves a high performance with low requirement for channel estimation for geometric channel models obtained with ray-tracing simulation. The unsupervised learning lets the RISnet find an optimized RIS configuration by itself. Numerical results show that a trained model configures the RIS with low computational effort, considerably outperforms the baselines, and can work with discrete phase shifts.

Web3 · Use Case · Projection · CASE · Next ·

2023 年 8 月 18 日

Towards Web3 Applications: Easing the Access and Transition

Guangsheng Yu,Xu Wang,Qin Wang,Tingting Bi,Yifei Dong,Ren Ping Liu,Nektarios Georgalas,Andrew Reeves

from arxiv, 8 pages, 3 figures, code snippets, interviews

Web3 is leading a wave of the next generation of web services that even many Web2 applications are keen to ride. However, the lack of Web3 background for Web2 developers hinders easy and effective access and transition. On the other hand, Web3 applications desire for encouragement and advertisement from conventional Web2 companies and projects due to their low market shares. In this paper, we propose a seamless transition framework that transits Web2 to Web3, named WebttCom, after exploring the connotation of Web3 and the key differences between Web2 and Web3 applications. We also provide a full-stack implementation as a use case to support the proposed framework, followed by interviews with five participants that show four positive and one natural response. We confirm that the proposed framework WebttCom addresses the defined research question, and the implementation well satisfies the framework WebttCom in terms of strong necessity, usability, and completeness based on the interview results.

Processing（編程語言） · 稀疏 · 3D · 卷積 · 可約的 ·

2023 年 8 月 18 日

SpOctA: A 3D Sparse Convolution Accelerator with Octree-Encoding-Based Map Search and Inherent Sparsity-Aware Processing

Dongxu Lyu,Zhenyu Li,Yuzhou Chen,Jinming Zhang,Ningyi Xu,Guanghui He

from arxiv, Accepted to ICCAD 2023

Point-cloud-based 3D perception has attracted great attention in various applications including robotics, autonomous driving and AR/VR. In particular, the 3D sparse convolution (SpConv) network has emerged as one of the most popular backbones due to its excellent performance. However, it poses severe challenges to real-time perception on general-purpose platforms, such as lengthy map search latency, high computation cost, and enormous memory footprint. In this paper, we propose SpOctA, a SpConv accelerator that enables high-speed and energy-efficient point cloud processing. SpOctA parallelizes the map search by utilizing algorithm-architecture co-optimization based on octree encoding, thereby achieving 8.8-21.2x search speedup. It also attenuates the heavy computational workload by exploiting inherent sparsity of each voxel, which eliminates computation redundancy and saves 44.4-79.1% processing latency. To optimize on-chip memory management, a SpConv-oriented non-uniform caching strategy is introduced to reduce external memory access energy by 57.6% on average. Implemented on a 40nm technology and extensively evaluated on representative benchmarks, SpOctA rivals the state-of-the-art SpConv accelerators by 1.1-6.9x speedup with 1.5-3.1x energy efficiency improvement.

Learning · 不完美信息 · Agent · 強化學習 · Self-Play ·

2022 年 6 月 30 日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Julien Perolat,Bart de Vylder,Daniel Hennes,Eugene Tarassov,Florian Strub,Vincent de Boer,Paul Muller,Jerome T. Connor,Neil Burch,Thomas Anthony,Stephen McAleer,Romuald Elie,Sarah H. Cen,Zhe Wang,Audrunas Gruslys,Aleksandra Malysheva,Mina Khan,Sherjil Ozair,Finbarr Timbers,Toby Pohlen,Tom Eccles,Mark Rowland,Marc Lanctot,Jean-Baptiste Lespiau,Bilal Piot,Shayegan Omidshafiei,Edward Lockhart,Laurent Sifre,Nathalie Beauguerlange,Remi Munos,David Silver,Satinder Singh,Demis Hassabis,Karl Tuyls

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.