清纯唯美另类亚洲欧美综合,国色天香网站,国产极品精品自在线

Ricardo B. Grando,Raul Steinmetz,Victor A. Kich,Alisson H. Kolling,Pablo M. Furik,Junior C. de Jesus,Bruna V. Guterres,Daniel T. Gamarra,Rodrigo S. Guerra,Paulo L. J. Drews-Jr

from arxiv, IEEE 20th International Conference on Automation Science and Engineering (CASE)

Deep Reinforcement Learning (DRL) has emerged as a promising approach to enhancing motion control and decision-making through a wide range of robotic applications. While prior research has demonstrated the efficacy of DRL algorithms in facilitating autonomous mapless navigation for aerial and terrestrial mobile robots, these methods often grapple with poor generalization when faced with unknown tasks and environments. This paper explores the impact of the Delayed Policy Updates (DPU) technique on fostering generalization to new situations, and bolstering the overall performance of agents. Our analysis of DPU in aerial and terrestrial mobile robots reveals that this technique significantly curtails the lack of generalization and accelerates the learning process for agents, enhancing their efficiency across diverse tasks and unknown scenarios.

相關內容

泛化理論

關注 25

代碼 · Analysis · DATE · 正則化項 · IntelliJ IDEA ·

2024 年 7 月 16 日

Code Documentation and Analysis to Secure Software Development

Paul Attie,Anas Obeidat,Nathaniel Oh,Ian Yelle

from arxiv, 31 pages

We present the Code Documentation and Analysis Tool (CoDAT). CoDAT is a tool designed to maintain consistency between the various levels of code documentation, e.g. if a line in a code sketch is changed, the comment that documents the corresponding code is also changed. That is, comments are linked and updated so as to remain internally consistent and also consistent with the code. By flagging "out of date" comments, CoDAT alerts the developer to maintain up-to-date documentation. We use a large language model to check the semantic consistency between a fragment of code and the comments that describe it. Thus we also flag semantic inconsistency as well as out of date comments. This helps programers write code that correctly implements a code sketch, and so provides machine support for a step-wise refinement approach, starting with a code sketch and proceeding down to code through one or more refinement iterations. CoDAT is implemented in the Intellij IDEA IDE where we use the Code Insight daemon package alongside a custom regular expression algorithm to mark tagged comments whose corresponding code blocks have changed. CoDAT's backend is structurally decentralized to allow a distributed ledger framework for code consistency and architectural compilation tracking.

自助法/自舉法 · 控制器 · Learning · 語言模型化 · Continuity ·

2024 年 7 月 15 日

Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models

Georgios Tziafas,Hamidreza Kasaei

from arxiv, ICRA 2024

Large Language Models (LLMs) have emerged as a new paradigm for embodied reasoning and control, most recently by generating robot policy code that utilizes a custom library of vision and control primitive skills. However, prior arts fix their skills library and steer the LLM with carefully hand-crafted prompt engineering, limiting the agent to a stationary range of addressable tasks. In this work, we introduce LRLL, an LLM-based lifelong learning agent that continuously grows the robot skill library to tackle manipulation tasks of ever-growing complexity. LRLL achieves this with four novel contributions: 1) a soft memory module that allows dynamic storage and retrieval of past experiences to serve as context, 2) a self-guided exploration policy that proposes new tasks in simulation, 3) a skill abstractor that distills recent experiences into new library skills, and 4) a lifelong learning algorithm for enabling human users to bootstrap new skills with minimal online interaction. LRLL continuously transfers knowledge from the memory to the library, building composable, general and interpretable policies, while bypassing gradient-based optimization, thus relieving the learner from catastrophic forgetting. Empirical evaluation in a simulated tabletop environment shows that LRLL outperforms end-to-end and vanilla LLM approaches in the lifelong setup while learning skills that are transferable to the real world. Project material will become available at the webpage //gtziafas.github.io/LRLL_project.

XAI · 縮放 · INTERACT · Single-Shot · state-of-the-art ·

2024 年 7 月 15 日

XEQ Scale for Evaluating XAI Experience Quality Grounded in Psychometric Theory

Anjana Wijekoon,Nirmalie Wiratunga,David Corsar,Kyle Martin,Ikechukwu Nkisi-Orji,Belen Díaz-Agudo,Derek Bridge

Explainable Artificial Intelligence (XAI) aims to improve the transparency of autonomous decision-making through explanations. Recent literature has emphasised users' need for holistic "multi-shot" explanations and the ability to personalise their engagement with XAI systems. We refer to this user-centred interaction as an XAI Experience. Despite advances in creating XAI experiences, evaluating them in a user-centred manner has remained challenging. To address this, we introduce the XAI Experience Quality (XEQ) Scale (pronounced "Seek" Scale), for evaluating the user-centred quality of XAI experiences. Furthermore, XEQ quantifies the quality of experiences across four evaluation dimensions: learning, utility, fulfilment and engagement. These contributions extend the state-of-the-art of XAI evaluation, moving beyond the one-dimensional metrics frequently developed to assess single-shot explanations. In this paper, we present the XEQ scale development and validation process, including content validation with XAI experts as well as discriminant and construct validation through a large-scale pilot study. Out pilot study results offer strong evidence that establishes the XEQ Scale as a comprehensive framework for evaluating user-centred XAI experiences.

泛函 · 獎勵函數 · 優化器 · Performer · 知識 (knowledge) ·

2024 年 7 月 15 日

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

Yihan Du,Anna Winnicki,Gal Dalal,Shie Mannor,R. Srikant

Reinforcement Learning from Human Feedback (RLHF) has achieved impressive empirical successes while relying on a small amount of human feedback. However, there is limited theoretical justification for this phenomenon. Additionally, most recent studies focus on value-based algorithms despite the recent empirical successes of policy-based algorithms. In this work, we consider an RLHF algorithm based on policy optimization (PO-RLHF). The algorithm is based on the popular Policy Cover-Policy Gradient (PC-PG) algorithm, which assumes knowledge of the reward function. In PO-RLHF, knowledge of the reward function is not assumed, and the algorithm uses trajectory-based comparison feedback to infer the reward function. We provide performance bounds for PO-RLHF with low query complexity, which provides insight into why a small amount of human feedback may be sufficient to achieve good performance with RLHF. A key novelty is a trajectory-level elliptical potential analysis, which bounds the reward estimation error when comparison feedback (rather than numerical reward observation) is given. We provide and analyze algorithms PG-RLHF and NN-PG-RLHF for two settings: linear and neural function approximation, respectively.

E2E · Performer · CASES · ESA · CASE ·

2024 年 7 月 14 日

Towards Enabling 5G-NTN Satellite Communications for Manned and Unmanned Rotary Wing Aircraft

Vasileios Leon,Ilias Christofilos,Athanasios Nesiadis,Iosif Paraskevas,Juan Perrela,Georgios Ioannopoulos,Alexandros Tasoulis-Nonikas,Mathieu Bernou,Jacques Reading

from arxiv, Submitted to IEEE CAMAD 2024

Satellite Communications (SatCom) are a backbone of worldwide development. In contrast with the past, when the GEO satellites were the only means for such connectivity, nowadays the multi-orbital connectivity is emerging, especially with the use of satellite constellations. Simultaneously, SatCom enabled the so-called In-Flight Connectivity, while with the advent of 5G-NTN, the development of this market is being accelerated. However, there are still various missing points before such a technology becomes mainstream, especially in the case of Rotary Wing Aircraft (RWA). Indeed, due to their particular characteristics, such as the low altitude flights and the blade interference, there are still open challenges. In this work, an End-to-End (E2E) analysis for the performance of SatCom under 5G-NTN for manned and unmanned RWA is performed. Various scenarios are examined, and related requirements are shown. The effects of blades and other characteristics of the RWA are established, and simulations for these cases are developed. Results along with related discussion are presented, while future directions for development are suggested. This work is part of the ESA ACROSS-AIR project.

回合 · Learning · 值域 · 強化學習 · 在線 ·

2024 年 7 月 12 日

A Benchmark Environment for Offline Reinforcement Learning in Racing Games

Girolamo Macaluso,Alessandro Sestini,Andrew D. Bagdanov

from arxiv, Accepted at IEEE Conference on Games

Offline Reinforcement Learning (ORL) is a promising approach to reduce the high sample complexity of traditional Reinforcement Learning (RL) by eliminating the need for continuous environmental interactions. ORL exploits a dataset of pre-collected transitions and thus expands the range of application of RL to tasks in which the excessive environment queries increase training time and decrease efficiency, such as in modern AAA games. This paper introduces OfflineMania a novel environment for ORL research. It is inspired by the iconic TrackMania series and developed using the Unity 3D game engine. The environment simulates a single-agent racing game in which the objective is to complete the track through optimal navigation. We provide a variety of datasets to assess ORL performance. These datasets, created from policies of varying ability and in different sizes, aim to offer a challenging testbed for algorithm development and evaluation. We further establish a set of baselines for a range of Online RL, ORL, and hybrid Offline to Online RL approaches using our environment.

WiFi · 數據集 · 回合 · Networking · Neural Networks ·

2024 年 7 月 12 日

An Adaptive Indoor Localization Approach Using WiFi RSSI Fingerprinting with SLAM-Enabled Robotic Platform and Deep Neural Networks

Seyed Alireza Rahimi Azghadi,Atah Nuh Mih,Asfia Kawnine,Francis Palma,Hung Cao

from arxiv, Fingerprinting dataset; Robotic platform; Indoor localization; Signals strength indicator; Location-based services;

Indoor localization plays a vital role in the era of the IoT and robotics, with WiFi technology being a prominent choice due to its ubiquity. We present a method for creating WiFi fingerprinting datasets to enhance indoor localization systems and address the gap in WiFi fingerprinting dataset creation. We used the Simultaneous Localization And Mapping (SLAM) algorithm and employed a robotic platform to construct precise maps and localize robots in indoor environments. We developed software applications to facilitate data acquisition, fingerprinting dataset collection, and accurate ground truth map building. Subsequently, we aligned the spatial information generated via the SLAM with the WiFi scans to create a comprehensive WiFi fingerprinting dataset. The created dataset was used to train a deep neural network (DNN) for indoor localization, which can prove the usefulness of grid density. We conducted experimental validation within our office environment to demonstrate the proposed method's effectiveness, including a heatmap from the dataset showcasing the spatial distribution of WiFi signal strengths for the testing access points placed within the environment. Notably, our method offers distinct advantages over existing approaches as it eliminates the need for a predefined map of the environment, requires no preparatory steps, lessens human intervention, creates a denser fingerprinting dataset, and reduces the WiFi fingerprinting dataset creation time. Our method achieves 26% more accurate localization than the other methods and can create a six times denser fingerprinting dataset in one-third of the time compared to the traditional method. In summary, using WiFi RSSI Fingerprinting data surveyed by the SLAM-Enabled Robotic Platform, we can adapt our trained DNN model to indoor localization in any dynamic environment and enhance its scalability and applicability in real-world scenarios.

MoDELS · 模型評估 · 基準 · 語言模型化 · 相似度 ·

2024 年 7 月 12 日

Accuracy is Not All You Need

Abhinav Dutta,Sanjeev Krishnan,Nipun Kwatra,Ramachandran Ramjee

When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks.If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality.However, even when the accuracy of baseline and compressed model are similar, we observe the phenomenon of flips, wherein answers change from correct to incorrect and vice versa in proportion.We conduct a detailed study of metrics across multiple compression techniques, models and datasets, demonstrating that the behavior of compressed models as visible to end-users is often significantly different from the baseline model, even when accuracy is similar.We further evaluate compressed models qualitatively and quantitatively using MT-Bench and show that compressed models are significantly worse than baseline models in this free-form generative task.Thus, we argue that compression techniques should also be evaluated using distance metrics.We propose two such metrics, KL-Divergence and flips, and show that they are well correlated.

entity · 小樣本學習 · 注意力機制 · 圖 · Networking ·

2020 年 10 月 19 日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Jiawei Sheng,Shu Guo,Zhenyu Chen,Juwei Yue,Lihong Wang,Tingwen Liu,Hongbo Xu

from arxiv, 11 pages, 3 figures

Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of entities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.

小樣本學習 · 注意力機制 · 圖形處理器 · GNN · 學成 ·

2020 年 7 月 14 日

Attentive Graph Neural Networks for Few-Shot Learning

Hao Cheng,Joey Tianyi Zhou,Wee Peng Tay,Bihan Wen

Graph Neural Networks (GNN) has demonstrated the superior performance in many challenging applications, including the few-shot learning tasks. Despite its powerful capacity to learn and generalize from few samples, GNN usually suffers from severe over-fitting and over-smoothing as the model becomes deep, which limit the model scalability. In this work, we propose a novel Attentive GNN to tackle these challenges, by incorporating a triple-attention mechanism, \ie node self-attention, neighborhood attention, and layer memory attention. We explain why the proposed attentive modules can improve GNN for few-shot learning with theoretical analysis and illustrations. Extensive experiments show that the proposed Attentive GNN outperforms the state-of-the-art GNN-based methods for few-shot learning over the mini-ImageNet and Tiered-ImageNet datasets, with both inductive and transductive settings.