成人午夜性影院视频,韩国无在线视频一区二区三区,亚洲已满18点击进入在线观看,成人高清视频在线

from arxiv, 8 pages, 10 figures. This work has been accepted to be presented at the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

This study presents a whole-body model predictive control (MPC) of robotic systems with rigid contacts, under a given contact sequence using online switching time optimization (STO). We treat robot dynamics with rigid contacts as a switched system and formulate an optimal control problem of switched systems to implement the MPC. We utilize an efficient solution algorithm for the MPC problem that optimizes the switching times and trajectory simultaneously. The present efficient algorithm, unlike inefficient existing methods, enables online optimization as well as switching times. The proposed MPC with online STO is compared over the conventional MPC with fixed switching times, through numerical simulations of dynamic jumping motions of a quadruped robot. In the simulation comparison, the proposed MPC successfully controls the dynamic jumping motions in twice as many cases as the conventional MPC, which indicates that the proposed method extends the ability of the whole-body MPC. We further conduct hardware experiments on the quadrupedal robot Unitree A1 and prove that the proposed method achieves dynamic motions on the real robot.

相關內容

優化器

關注 4

MIMO · MoDELS · Performer · SimPLe · 基準 ·

2022 年 12 月 9 日

MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction

Shuliang Ning,Mengcheng Lan,Yanran Li,Chaofeng Chen,Qian Chen,Xunlai Chen,Xiaoguang Han,Shuguang Cui

The mainstream of the existing approaches for video prediction builds up their models based on a Single-In-Single-Out (SISO) architecture, which takes the current frame as input to predict the next frame in a recursive manner. This way often leads to severe performance degradation when they try to extrapolate a longer period of future, thus limiting the practical use of the prediction model. Alternatively, a Multi-In-Multi-Out (MIMO) architecture that outputs all the future frames at one shot naturally breaks the recursive manner and therefore prevents error accumulation. However, only a few MIMO models for video prediction are proposed and they only achieve inferior performance due to the date. The real strength of the MIMO model in this area is not well noticed and is largely under-explored. Motivated by that, we conduct a comprehensive investigation in this paper to thoroughly exploit how far a simple MIMO architecture can go. Surprisingly, our empirical studies reveal that a simple MIMO model can outperform the state-of-the-art work with a large margin much more than expected, especially in dealing with longterm error accumulation. After exploring a number of ways and designs, we propose a new MIMO architecture based on extending the pure Transformer with local spatio-temporal blocks and a new multi-output decoder, namely MIMO-VP, to establish a new standard in video prediction. We evaluate our model in four highly competitive benchmarks (Moving MNIST, Human3.6M, Weather, KITTI). Extensive experiments show that our model wins 1st place on all the benchmarks with remarkable performance gains and surpasses the best SISO model in all aspects including efficiency, quantity, and quality. We believe our model can serve as a new baseline to facilitate the future research of video prediction tasks. The code will be released.

Learning · 蒙特卡羅估計 · 全局優化 · 蒙特卡羅 · 估計/估計量 ·

2022 年 12 月 8 日

Generalizing LTL Instructions via Future Dependent Options

Duo Xu,Faramarz Fekri

from arxiv, arXiv admin note: text overlap with arXiv:2206.05096, arXiv:2102.06858 by other authors

Linear temporal logic (LTL) is a widely-used task specification language which has a compositional grammar that naturally induces temporally extended behaviours across tasks, including conditionals and alternative realizations. An important problem i RL with LTL tasks is to learn task-conditioned policies which can zero-shot generalize to new LTL instructions not observed in the training. However, because symbolic observation is often lossy and LTL tasks can have long time horizon, previous works can suffer from issues such as training sampling inefficiency and infeasibility or sub-optimality of the found solutions. In order to tackle these issues, this paper proposes a novel multi-task RL algorithm with improved learning efficiency and optimality. To achieve the global optimality of task completion, we propose to learn options dependent on the future subgoals via a novel off-policy approach. In order to propagate the rewards of satisfying future subgoals back more efficiently, we propose to train a multi-step value function conditioned on the subgoal sequence which is updated with Monte Carlo estimates of multi-step discounted returns. In experiments on three different domains, we evaluate the LTL generalization capability of the agent trained by the proposed method, showing its advantage over previous representative methods.

Continuity · Learning · 機器人 · Automator · 路徑 ·

2022 年 12 月 8 日

HERD: Continuous Human-to-Robot Evolution for Learning from Human Demonstration

Xingyu Liu,Deepak Pathak,Kris M. Kitani

from arxiv, CoRL 2022

The ability to learn from human demonstration endows robots with the ability to automate various tasks. However, directly learning from human demonstration is challenging since the structure of the human hand can be very different from the desired robot gripper. In this work, we show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning, where a five-finger human dexterous hand robot gradually evolves into a commercial robot, while repeated interacting in a physics simulator to continuously update the policy that is first learned from human demonstration. To deal with the high dimensions of robot parameters, we propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy. Through experiments on human object manipulation datasets, we show that our framework can efficiently transfer the expert human agent policy trained from human demonstrations in diverse modalities to target commercial robots.

控制器 · Networking · 預測器/決策函數 · 可約的 · Principle ·

2022 年 12 月 8 日

Optimization-Based Predictive Congestion Control for the Tor Network: Opportunities and Challenges

Christoph D?pmann,Felix Fiedler,Sergio Lucia,Florian Tschorsch

Based on the principle of onion routing, the Tor network achieves anonymity for its users by relaying user data over a series of intermediate relays. This approach makes congestion control in the network a challenging task. As of today, this results in higher latencies due to considerable backlog as well as unfair data rate allocation. In this paper, we present a concept study of PredicTor, a novel approach to congestion control that tackles clogged overlay networks. Unlike traditional approaches, it is built upon the idea of distributed model predictive control, a recent advancement from the area of control theory. PredicTor is tailored to minimizing latency in the network and achieving max-min fairness. We contribute a thorough evaluation of its behavior in both toy scenarios to assess the optimizer and complex networks to assess its potential. For this, we conduct large-scale simulation studies and compare PredicTor to existing congestion control mechanisms in Tor. We show that PredicTor is highly effective in reducing latency and realizing fair rate allocations. In addition, we strive to bring the ideas of modern control theory to the networking community, enabling the development of improved, future congestion control. We therefore demonstrate benefits and issues alike with this novel research direction.

MoDELS · Learning · Extensibility · 講稿 · INFORMS ·

2022 年 12 月 8 日

Learning to Dub Movies via Hierarchical Prosody Models

Gaoxiang Cong,Liang Li,Yuankai Qi,Zhengjun Zha,Qi Wu,Wenyu Wang,Bin Jiang,Ming-Hsuan Yang,Qingming Huang

Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference. V2C is more challenging than conventional text-to-speech tasks as it additionally requires the generated speech to exactly match the varying emotions and speaking speed presented in the video. Unlike previous works, we propose a novel movie dubbing architecture to tackle these problems via hierarchical prosody modelling, which bridges the visual information to corresponding speech prosody from three aspects: lip, face, and scene. Specifically, we align lip movement to the speech duration, and convey facial expression to speech energy and pitch via attention mechanism based on valence and arousal representations inspired by recent psychology findings. Moreover, we design an emotion booster to capture the atmosphere from global video scenes. All these embeddings together are used to generate mel-spectrogram and then convert to speech waves via existing vocoder. Extensive experimental results on the Chem and V2C benchmark datasets demonstrate the favorable performance of the proposed method. The source code and trained models will be released to the public.

Learning · Performer · 控制器 · Processing（編程語言） · 在線 ·

2022 年 12 月 7 日

Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Chao Li

Deep reinforcement learning (DRL) provides a new way to generate robot control policy. However, the process of training control policy requires lengthy exploration, resulting in a low sample efficiency of reinforcement learning (RL) in real-world tasks. Both imitation learning (IL) and learning from demonstrations (LfD) improve the training process by using expert demonstrations, but imperfect expert demonstrations can mislead policy improvement. Offline to Online reinforcement learning requires a lot of offline data to initialize the policy, and distribution shift can easily lead to performance degradation during online fine-tuning. To solve the above problems, we propose a learning from demonstrations method named A-SILfD, which treats expert demonstrations as the agent's successful experiences and uses experiences to constrain policy improvement. Furthermore, we prevent performance degradation due to large estimation errors in the Q-function by the ensemble Q-functions. Our experiments show that A-SILfD can significantly improve sample efficiency using a small number of different quality expert demonstrations. In four Mujoco continuous control tasks, A-SILfD can significantly outperform baseline methods after 150,000 steps of online training and is not misled by imperfect expert demonstrations during training.

INFORMS · 控制器 · Learning · Networking · Neural Networks ·

2022 年 12 月 7 日

Active Classification of Moving Targets with Learned Control Policies

álvaro Serra-Gómez,Eduardo Montijano,Wendelin B?hmer,Javier Alonso-Mora

from arxiv, 8 pages, 6 figures, Submitted to IEEE RA-L

In this paper, we consider the problem where a drone has to collect semantic information to classify multiple moving targets. In particular, we address the challenge of computing control inputs that move the drone to informative viewpoints, position and orientation, when the information is extracted using a "black-box" classifier, e.g., a deep learning neural network. These algorithms typically lack of analytical relationships between the viewpoints and their associated outputs, preventing their use in information-gathering schemes. To fill this gap, we propose a novel attention-based architecture, trained via Reinforcement Learning (RL), that outputs the next viewpoint for the drone favoring the acquisition of evidence from as many unclassified targets as possible while reasoning about their movement, orientation, and occlusions. Then, we use a low-level MPC controller to move the drone to the desired viewpoint taking into account its actual dynamics. We show that our approach not only outperforms a variety of baselines but also generalizes to scenarios unseen during training. Additionally, we show that the network scales to large numbers of targets and generalizes well to different movement dynamics of the targets.

FAST · 逆強化學習 · Learning · Performance · 機器人 ·

2022 年 12 月 6 日

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Letian Chen,Sravan Jayanthi,Rohan Paleja,Daniel Martin,Viacheslav Zakharov,Matthew Gombolay

Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.

內積 · FAST · 縮放 · 估計/估計量 · Learning ·

2022 年 12 月 6 日

Fast Offline Policy Optimization for Large Scale Recommendation

Otmane Sakhi,David Rohde,Alexandre Gilotte

from arxiv, Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

Personalised interactive systems such as recommender systems require selecting relevant items dependent on context. Production systems need to identify the items rapidly from very large catalogues which can be efficiently solved using maximum inner product search technology. Offline optimisation of maximum inner product search can be achieved by a relaxation of the discrete problem resulting in policy learning or REINFORCE style learning algorithms. Unfortunately, this relaxation step requires computing a sum over the entire catalogue making the complexity of the evaluation of the gradient (and hence each stochastic gradient descent iterations) linear in the catalogue size. This calculation is untenable in many real world examples such as large catalogue recommender systems, severely limiting the usefulness of this method in practice. In this paper, we derive an excellent approximation of these policy learning algorithms that scale logarithmically with the catalogue size. Our contribution is based upon combining three novel ideas: a new Monte Carlo estimate of the gradient of a policy, the self normalised importance sampling estimator and the use of fast maximum inner product search at training time. Extensive experiments show that our algorithm is an order of magnitude faster than naive approaches yet produces equally good policies.

Agent · Learning · Automator · INTERACT · 控制器 ·

2022 年 12 月 6 日

Reinforcement Learning for UAV control with Policy and Reward Shaping

Cristian Millán-Arias,Ruben Contreras,Francisco Cruz,Bruno Fernandes

from arxiv, 9 pages, 9 figures, Preprint accepted at the 41st International Conference of the Chilean Computer Science Society, SCCC 2022, Santiago, Chile, 2022

In recent years, unmanned aerial vehicle (UAV) related technology has expanded knowledge in the area, bringing to light new problems and challenges that require solutions. Furthermore, because the technology allows processes usually carried out by people to be automated, it is in great demand in industrial sectors. The automation of these vehicles has been addressed in the literature, applying different machine learning strategies. Reinforcement learning (RL) is an automation framework that is frequently used to train autonomous agents. RL is a machine learning paradigm wherein an agent interacts with an environment to solve a given task. However, learning autonomously can be time consuming, computationally expensive, and may not be practical in highly-complex scenarios. Interactive reinforcement learning allows an external trainer to provide advice to an agent while it is learning a task. In this study, we set out to teach an RL agent to control a drone using reward-shaping and policy-shaping techniques simultaneously. Two simulated scenarios were proposed for the training; one without obstacles and one with obstacles. We also studied the influence of each technique. The results show that an agent trained simultaneously with both techniques obtains a lower reward than an agent trained using only a policy-based approach. Nevertheless, the agent achieves lower execution times and less dispersion during training.