CS391R: Robot Learning
機器人和自動系統在現代經濟中扮演著重要的角色。定制機器人極大地提高了生產率、操作安全性和產品質量。然而,這些機器人通常是在良好控制的環境中為特定任務編程,而不能在現實世界中執行不同的任務。我們怎樣才能把機器人從受約束的環境中帶到我們的日常生活中,作為我們的伴侶和助手來幫助我們完成各種現實生活中的任務?它要求一種新型的通用自主機器人,機器人通過感知來理解世界,并據此做出明智的決策。本課程研究了作為智能代理的自主機器人的現代機器學習和人工智能算法。它涵蓋了圍繞以下原則和技術的高級主題:1) 機器人如何從原始的感官數據感知非結構化的環境,2)機器人如何根據其感知做出決策,3)機器人如何在物理世界中主動地、持續地學習和適應。
//www.cs.utexas.edu/~yukez/cs391r_fall2020/index.html
課程目錄:
Deep Reinforcement Learning via Policy Optimization
機器人在社會上有很多應用,比如今年雙十一我們明顯感到快遞變得更快了!這背后就有分揀機器人的功勞~ 除此之外,機器人在搜救,太空探索,手術等很多方面都有應用。而為了讓機器人能夠更好地服務于人類,讓機器模仿人的行為,并變得更加智能必不可少。
來自斯坦福與Google Research的Chelsea Finn為我們介紹了面向機器人的機器學習。
《Machine Learning for Robots》教程的內容
基本知識與模仿學習:物體分類是一個監督學習的任務,處理的是獨立同分布的數據,因此能夠取得較好的結果。但是物體生產是一個序列決策任務,決策會影響到下一步的狀態。模仿學習又稱“行為克隆”,它很簡單,但是需要人類的監督信息,并且天花板就是人類。并且,由于它的錯誤是累積的,可能每一步差之毫厘,最終結果就會失之千里。
基于模型的強化學習:基于模型的強化學習有幾種實現方式。1. 隨機采樣,動態訓練,優點是簡單,但是缺點也很明顯,會遇到分布不匹配的問題。2. 迭代地采樣,逐步優化模型。優點是解決了分布不匹配的問題。3. 使用MPC迭代的采樣,優點是對小的模型的錯誤魯棒,缺點是計算復雜。
輸入為圖片的基于模型的強化學習:如果反饋信號為高維數據(如圖片)時,基于模型的強化學習主要有兩種方案:在隱空間學習,即學習反饋信號的表示,然后在隱空間做基于模型的強化學習(基于概率的方法、結構化方法(空間或者以物體為中心的結構化表示));直接在觀察空間學習:使用深度學習,預測可能得到的反饋信號(視頻),再與實際的反饋信號相比,最小化。
機器人學習的一些挑戰:理解和轉化一些復雜的命令;新環境的適應性;收集大量數據并從中學習。
PPT下載鏈接://pan.baidu.com/s/1-zqrWBUrXCVMrj0d3EPTkQ 提取碼:4jta
機器學習可解釋性,Interpretability and Explainability in Machine Learning
【北京郵電大學】機器學習在材料科學中的應用綜述,Machine learning in materials science //onlinelibrary.wiley.com/doi/pdf/10.1002/inf2.12028
There is a recent large and growing interest in generative adversarial networks (GANs), which offer powerful features for generative modeling, density estimation, and energy function learning. GANs are difficult to train and evaluate but are capable of creating amazingly realistic, though synthetic, image data. Ideas stemming from GANs such as adversarial losses are creating research opportunities for other challenges such as domain adaptation. In this paper, we look at the field of GANs with emphasis on these areas of emerging research. To provide background for adversarial techniques, we survey the field of GANs, looking at the original formulation, training variants, evaluation methods, and extensions. Then we survey recent work on transfer learning, focusing on comparing different adversarial domain adaptation methods. Finally, we take a look forward to identify open research directions for GANs and domain adaptation, including some promising applications such as sensor-based human behavior modeling.
Meta-learning is a powerful tool that builds on multi-task learning to learn how to quickly adapt a model to new tasks. In the context of reinforcement learning, meta-learning algorithms can acquire reinforcement learning procedures to solve new problems more efficiently by meta-learning prior tasks. The performance of meta-learning algorithms critically depends on the tasks available for meta-training: in the same way that supervised learning algorithms generalize best to test points drawn from the same distribution as the training points, meta-learning methods generalize best to tasks from the same distribution as the meta-training tasks. In effect, meta-reinforcement learning offloads the design burden from algorithm design to task design. If we can automate the process of task design as well, we can devise a meta-learning algorithm that is truly automated. In this work, we take a step in this direction, proposing a family of unsupervised meta-learning algorithms for reinforcement learning. We describe a general recipe for unsupervised meta-reinforcement learning, and describe an effective instantiation of this approach based on a recently proposed unsupervised exploration technique and model-agnostic meta-learning. We also discuss practical and conceptual considerations for developing unsupervised meta-learning methods. Our experimental results demonstrate that unsupervised meta-reinforcement learning effectively acquires accelerated reinforcement learning procedures without the need for manual task design, significantly exceeds the performance of learning from scratch, and even matches performance of meta-learning methods that use hand-specified task distributions.