強化學習理論(RL),重點是樣本復雜性分析。
本課程介紹構成現代計算機操作系統的基本概念和核心原理。這門課的目標是解釋那些可能在未來許多年仍然存在的概念和原則。本課程是操作系統和分布式系統研究的起點。具體地說,本課程介紹了進程、并發、同步、調度、多程序設計、內存管理和文件系統的概念。
這是一門關于機器學習的入門課程。機器學習是一組技術,它允許機器從數據和經驗中學習,而不是要求人類手工指定所需的行為。在過去的20年里,機器學習技術在人工智能的學術領域和科技行業中都變得越來越重要。本課程提供了一些最常用的ML算法的廣泛介紹。
課程的前半部分側重于監督學習。我們從最近鄰、決策樹和集合開始。然后介紹了參數化模型,包括線性回歸、logistic回歸和softmax回歸以及神經網絡。然后我們轉向無監督學習,特別關注概率模型,以及主成分分析和k均值。最后,我們介紹了強化學習的基礎知識。
課程內容:
//www.cs.toronto.edu/~huang/courses/csc2515_2020f/
推薦閱讀材料: Hastie, Tibshirani, and Friedman: “The Elements of Statistical Learning” Christopher Bishop: “Pattern Recognition and Machine Learning”, 2006. Kevin Murphy: “Machine Learning: a Probabilistic Perspective”, 2012. David Mackay: “Information Theory, Inference, and Learning Algorithms”, 2003. Shai Shalev-Shwartz & Shai Ben-David: “Understanding Machine Learning: From Theory to Algorithms”, 2014.
學習路線圖:
【導讀】Pieter Abbeel 是加州大學伯克利分校的教授,伯克利機器人學習實驗室的主任,其新開課程CS294深度無監督學習包含兩個領域,分別是生成模型和自監督學習。這個15周的課程包含視頻PPT能資源,有助于讀者對深度學習無監督的理解。最新一期是生成式對抗網絡Generative Adversarial Networks的課程,共有257頁ppt,包括GAN, DC GAN, ImprovedGAN, WGAN, WGAN-GP, Progr.GAN, SN-GAN, SAGAN, BigGAN(-Deep), StyleGAN-v1,2, VIB-GAN, GANs as Energy Models,非常值得關注!
目錄內容:
【導讀】卡內基梅隆大學(CMU),在2020年春季學習繼續開設了由Eric P. Xing教授執教的經典課程《Probabilistic Graphical Models》(概率圖模型)。這門課程從2005年開設至今,已經有十多個年頭了。它影響了一代又一代計算機學者,為學界培養了大量機器學習人才。直到如今,概率圖模型仍然是機器學習領域非常火熱的方向,感興趣的同學不要錯過。
課程簡介:
在人工智能、統計學、計算機系統、計算機視覺、自然語言處理和計算生物學等許多其他領域中,許多問題都可以看作是從局部信息中尋找一致的全局結論。概率圖模型框架為這一范圍廣泛的問題提供了統一的視角,支持對具有大量屬性和龐大數據集的問題進行有效的推理、決策和學習。無論是應用圖模型來解決復雜問題還是作為將圖模型作為核心研究課題,本課程都能為你打下堅實基礎。
邢波 Eric P. Xing 教授
Eric P.Xing是卡內基梅隆大學(Carnegie Mellon University)計算機科學教授,是2018年世界經濟論壇(World Economic Forum)技術先驅公司Petuum Inc.的創始人、首席執行官和首席科學家,該公司為廣泛和通用的工業人工智能應用構建標準化人工智能開發平臺和操作系統。美國新澤西州立大學分子生物學與生物化學博士;美國加州大學伯克利分校(UC,Berkeley)計算機科學博士。主要研究興趣集中在機器學習和統計學習方法論及理論的發展,和大規模計算系統和架構的開發,以解決在復雜系統中的高維、多峰和動態的潛在世界中的自動化學習、推理以及決策問題。目前或曾經擔任《美國統計協會期刊》(JASA)、《應用統計年鑒》(AOAS)、《IEEE模式分析與機器智能學報》(PAMI)和《PLoS計算生物學雜志》(the PLoS JournalofComputational Biology)的副主編,《機器學習雜志》(MLJ)和《機器學習研究雜志》(JMLR)的執行主編,還是美國國防部高級研究計劃署(DARPA)信息科學與技術顧問組成員,曾獲得美國國家科學基金會(NSF)事業獎、Alfred P. Sloan學者獎、美國空軍青年學者獎以及IBM開放協作研究學者獎等,以及多次論文獎。曾于2014年擔任國際機器學習大會(ICML)主席。
課程信息:
The tutorial is written for those who would like an introduction to reinforcement learning (RL). The aim is to provide an intuitive presentation of the ideas rather than concentrate on the deeper mathematics underlying the topic. RL is generally used to solve the so-called Markov decision problem (MDP). In other words, the problem that you are attempting to solve with RL should be an MDP or its variant. The theory of RL relies on dynamic programming (DP) and artificial intelligence (AI). We will begin with a quick description of MDPs. We will discuss what we mean by “complex” and “large-scale” MDPs. Then we will explain why RL is needed to solve complex and large-scale MDPs. The semi-Markov decision problem (SMDP) will also be covered.
The tutorial is meant to serve as an introduction to these topics and is based mostly on the book: “Simulation-based optimization: Parametric Optimization techniques and reinforcement learning” [4]. The book discusses this topic in greater detail in the context of simulators. There are at least two other textbooks that I would recommend you to read: (i) Neuro-dynamic programming [2] (lots of details on convergence analysis) and (ii) Reinforcement Learning: An Introduction [11] (lots of details on underlying AI concepts). A more recent tutorial on this topic is [8]. This tutorial has 2 sections: ? Section 2 discusses MDPs and SMDPs. ? Section 3 discusses RL. By the end of this tutorial, you should be able to ? Identify problem structures that can be set up as MDPs / SMDPs. ? Use some RL algorithms.
機器學習可解釋性,Interpretability and Explainability in Machine Learning
While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
Recent years have witnessed significant progresses in deep Reinforcement Learning (RL). Empowered with large scale neural networks, carefully designed architectures, novel training algorithms and massively parallel computing devices, researchers are able to attack many challenging RL problems. However, in machine learning, more training power comes with a potential risk of more overfitting. As deep RL techniques are being applied to critical problems such as healthcare and finance, it is important to understand the generalization behaviors of the trained agents. In this paper, we conduct a systematic study of standard RL agents and find that they could overfit in various ways. Moreover, overfitting could happen "robustly": commonly used techniques in RL that add stochasticity do not necessarily prevent or detect overfitting. In particular, the same agents and learning algorithms could have drastically different test performance, even when all of them achieve optimal rewards during training. The observations call for more principled and careful evaluation protocols in RL. We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.