亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning, yet it faces criticisms from prior studies. In this paper, we rethink AIRL and respond to these criticisms. Criticism 1 lies in Inadequate Policy Imitation. We show that substituting the built-in algorithm with soft actor-critic (SAC) during policy updating (requires multi-iterations) significantly enhances the efficiency of policy imitation. Criticism 2 lies in Limited Performance in Transferable Reward Recovery Despite SAC Integration. While we find that SAC indeed exhibits a significant improvement in policy imitation, it introduces drawbacks to transferable reward recovery. We prove that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for a satisfactory transfer effect. Criticism 3 lies in Unsatisfactory Proof from the Perspective of Potential Equilibrium. We reanalyze it from an algebraic theory perspective.

相關內容

SAC:Selected Areas in Cryptography。 Explanation:密碼術的選擇區。 Publisher:Springer。 SIT:

As a crucial step toward real-world learning scenarios with changing environments, dataset shift theory and invariant representation learning algorithm have been extensively studied to relax the identical distribution assumption in classical learning setting. Among the different assumptions on the essential of shifting distributions, generalized label shift (GLS) is the latest developed one which shows great potential to deal with the complex factors within the shift. In this paper, we aim to explore the limitations of current dataset shift theory and algorithm, and further provide new insights by presenting a comprehensive understanding of GLS. From theoretical aspect, two informative generalization bounds are derived, and the GLS learner is proved to be sufficiently close to optimal target model from the Bayesian perspective. The main results show the insufficiency of invariant representation learning, and prove the sufficiency and necessity of GLS correction for generalization, which provide theoretical supports and innovations for exploring generalizable model under dataset shift. From methodological aspect, we provide a unified view of existing shift correction frameworks, and propose a kernel embedding-based correction algorithm (KECA) to minimize the generalization error and achieve successful knowledge transfer. Both theoretical results and extensive experiment evaluations demonstrate the sufficiency and necessity of GLS correction for addressing dataset shift and the superiority of proposed algorithm.

Recent advances in reinforcement learning (RL) heavily rely on a variety of well-designed benchmarks, which provide environmental platforms and consistent criteria to evaluate existing and novel algorithms. Specifically, in multi-agent RL (MARL), a plethora of benchmarks based on cooperative games have spurred the development of algorithms that improve the scalability of cooperative multi-agent systems. However, for the competitive setting, a lightweight and open-sourced benchmark with challenging gaming dynamics and visual inputs has not yet been established. In this work, we present FightLadder, a real-time fighting game platform, to empower competitive MARL research. Along with the platform, we provide implementations of state-of-the-art MARL algorithms for competitive games, as well as a set of evaluation metrics to characterize the performance and exploitability of agents. We demonstrate the feasibility of this platform by training a general agent that consistently defeats 12 built-in characters in single-player mode, and expose the difficulty of training a non-exploitable agent without human knowledge and demonstrations in two-player mode. FightLadder provides meticulously designed environments to address critical challenges in competitive MARL research, aiming to catalyze a new era of discovery and advancement in the field. Videos and code at //sites.google.com/view/fightladder/home.

The ever-growing complexity of reinforcement learning (RL) tasks demands a distributed system to efficiently generate and process a massive amount of data. However, existing open-source libraries suffer from various limitations, which impede their practical use in challenging scenarios where large-scale training is necessary. In this paper, we present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework. Following this abstraction, we develop a scalable, efficient, and extensible distributed RL system called ReaLlyScalableRL, which allows efficient and massively parallelized training and easy development of customized algorithms. Our evaluation shows that SRL outperforms existing academic libraries, reaching at most 21x higher training throughput in a distributed setting. On learning performance, beyond performing and scaling well on common RL benchmarks with different RL algorithms, SRL can reproduce the same solution in the challenging hide-and-seek environment as reported by OpenAI with up to 5x speedup in wall-clock time. Notably, SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores. SRL source code is available at: //github.com/openpsi-project/srl .

Reinforcement learning (RL) has emerged as a promising paradigm in complex and continuous robotic tasks, however, safe exploration has been one of the main challenges, especially in contact-rich manipulation tasks in unstructured environments. Focusing on this issue, we propose SRL-VIC: a model-free safe RL framework combined with a variable impedance controller (VIC). Specifically, safety critic and recovery policy networks are pre-trained where safety critic evaluates the safety of the next action using a risk value before it is executed and the recovery policy suggests a corrective action if the risk value is high. Furthermore, the policies are updated online where the task policy not only achieves the task but also modulates the stiffness parameters to keep a safe and compliant profile. A set of experiments in contact-rich maze tasks demonstrate that our framework outperforms the baselines (without the recovery mechanism and without the VIC), yielding a good trade-off between efficient task accomplishment and safety guarantee. We show our policy trained on simulation can be deployed on a physical robot without fine-tuning, achieving successful task completion with robustness and generalization. The video is available at //youtu.be/ksWXR3vByoQ.

Graph Neural Networks (GNNs) have emerged as powerful representation learning tools for capturing complex dependencies within diverse graph-structured data. Despite their success in a wide range of graph mining tasks, GNNs have raised serious concerns regarding their trustworthiness, including susceptibility to distribution shift, biases towards certain populations, and lack of explainability. Recently, integrating causal learning techniques into GNNs has sparked numerous ground-breaking studies since many GNN trustworthiness issues can be alleviated by capturing the underlying data causality rather than superficial correlations. In this survey, we comprehensively review recent research efforts on Causality-Inspired GNNs (CIGNNs). Specifically, we first employ causal tools to analyze the primary trustworthiness risks of existing GNNs, underscoring the necessity for GNNs to comprehend the causal mechanisms within graph data. Moreover, we introduce a taxonomy of CIGNNs based on the type of causal learning capability they are equipped with, i.e., causal reasoning and causal representation learning. Besides, we systematically introduce typical methods within each category and discuss how they mitigate trustworthiness risks. Finally, we summarize useful resources and discuss several future directions, hoping to shed light on new research opportunities in this emerging field. The representative papers, along with open-source data and codes, are available in //github.com/usail-hkust/Causality-Inspired-GNNs.

eXplainable artificial intelligence (XAI) methods have emerged to convert the black box of machine learning (ML) models into a more digestible form. These methods help to communicate how the model works with the aim of making ML models more transparent and increasing the trust of end-users into their output. SHapley Additive exPlanations (SHAP) and Local Interpretable Model Agnostic Explanation (LIME) are two widely used XAI methods, particularly with tabular data. In this perspective piece, we discuss the way the explainability metrics of these two methods are generated and propose a framework for interpretation of their outputs, highlighting their weaknesses and strengths. Specifically, we discuss their outcomes in terms of model-dependency and in the presence of collinearity among the features, relying on a case study from the biomedical domain (classification of individuals with or without myocardial infarction). The results indicate that SHAP and LIME are highly affected by the adopted ML model and feature collinearity, raising a note of caution on their usage and interpretation.

We study the problem of feature selection in general machine learning (ML) context, which is one of the most critical subjects in the field. Although, there exist many feature selection methods, however, these methods face challenges such as scalability, managing high-dimensional data, dealing with correlated features, adapting to variable feature importance, and integrating domain knowledge. To this end, we introduce the "Adaptive Feature Selection with Binary Masking" (AFS-BM) which remedies these problems. AFS-BM achieves this by joint optimization for simultaneous feature selection and model training. In particular, we do the joint optimization and binary masking to continuously adapt the set of features and model parameters during the training process. This approach leads to significant improvements in model accuracy and a reduction in computational requirements. We provide an extensive set of experiments where we compare AFS-BM with the established feature selection methods using well-known datasets from real-life competitions. Our results show that AFS-BM makes significant improvement in terms of accuracy and requires significantly less computational complexity. This is due to AFS-BM's ability to dynamically adjust to the changing importance of features during the training process, which an important contribution to the field. We openly share our code for the replicability of our results and to facilitate further research.

Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning and have thus become one of the most important workloads in today's computing landscape. However, deploying LLM inference poses challenges due to the high compute and memory requirements stemming from the enormous model size and the difficulty of running it in the integer pipelines. In this paper, we present Tender, an algorithm-hardware co-design solution that enables efficient deployment of LLM inference at low precision. Based on our analysis of outlier values in LLMs, we propose a decomposed quantization technique in which the scale factors of decomposed matrices are powers of two apart. The proposed scheme allows us to avoid explicit requantization (i.e., dequantization/quantization) when accumulating the partial sums from the decomposed matrices, with a minimal extension to the commodity tensor compute hardware. Our evaluation shows that Tender achieves higher accuracy and inference performance compared to the state-of-the-art methods while also being significantly less intrusive to the existing accelerators.

The promise and proliferation of large-scale dynamic federated learning gives rise to a prominent open question - is it prudent to share data or model across nodes, if efficiency of transmission and fast knowledge transfer are the prime objectives. This work investigates exactly that. Specifically, we study the choices of exchanging raw data, synthetic data, or (partial) model updates among devices. The implications of these strategies in the context of foundational models are also examined in detail. Accordingly, we obtain key insights about optimal data and model exchange mechanisms considering various environments with different data distributions and dynamic device and network connections. Across various scenarios that we considered, time-limited knowledge transfer efficiency can differ by up to 9.08\%, thus highlighting the importance of this work.

Federated learning (FL) has been developed as a promising framework to leverage the resources of edge devices, enhance customers' privacy, comply with regulations, and reduce development costs. Although many methods and applications have been developed for FL, several critical challenges for practical FL systems remain unaddressed. This paper provides an outlook on FL development, categorized into five emerging directions of FL, namely algorithm foundation, personalization, hardware and security constraints, lifelong learning, and nonstandard data. Our unique perspectives are backed by practical observations from large-scale federated systems for edge devices.

北京阿比特科技有限公司