在线直接观看免费的黄片视频,国产真实乱人伦视频在线观看,欧美欧美一区二区免费,曰批免费视频播放在线看片一,欧美激情性综合一区二区三区

Communications standards are designed via committees of humans holding repeated meetings over months or even years until consensus is achieved. This includes decisions regarding the modulation and coding schemes to be supported over an air interface. We propose a way to "automate" the selection of the set of modulation and coding schemes to be supported over a given air interface and thereby streamline both the standards design process and the ease of extending the standard to support new modulation schemes applicable to new higher-level applications and services. Our scheme involves machine learning, whereby a constructor entity submits proposals to an evaluator entity, which returns a score for the proposal. The constructor employs reinforcement learning to iterate on its submitted proposals until a score is achieved that was previously agreed upon by both constructor and evaluator to be indicative of satisfying the required design criteria (including performance metrics for transmissions over the interface).

相關內容

entity

關注 1

推薦系統 · 學成 · 強化學習 · 策略搜索 · INTERACT ·

2021 年 9 月 22 日

A Survey on Reinforcement Learning for Recommender Systems

Yuanguo Lin,Yong Liu,Fan Lin,Pengcheng Wu,Wenhua Zeng,Chunyan Miao

from arxiv, 25 pages, 4 figures

Recommender systems have been widely applied in different real-life scenarios to help us find useful information. Recently, Reinforcement Learning (RL) based recommender systems have become an emerging research topic. It often surpasses traditional recommendation models even most deep learning-based methods, owing to its interactive nature and autonomous learning ability. Nevertheless, there are various challenges of RL when applying in recommender systems. Toward this end, we firstly provide a thorough overview, comparisons, and summarization of RL approaches for five typical recommendation scenarios, following three main categories of RL: value-function, policy search, and Actor-Critic. Then, we systematically analyze the challenges and relevant solutions on the basis of existing literature. Finally, under discussion for open issues of RL and its limitations of recommendation, we highlight some potential research directions in this field.

泛函 · 約束 · 強化學習 · Q函數 · 學成 ·

2021 年 6 月 24 日

Density Constrained Reinforcement Learning

Zengyi Qin,Yuxiao Chen,Chuchu Fan

from arxiv, Accepted by ICML, 2021

We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. Density constraints can also avoid the time-consuming process of designing and tuning cost functions required by value function-based constraints to encode system specifications. We leverage the duality between density functions and Q functions to develop an effective algorithm to solve the density constrained RL problem optimally and the constrains are guaranteed to be satisfied. We prove that the proposed algorithm converges to a near-optimal solution with a bounded error even when the policy update is imperfect. We use a set of comprehensive experiments to demonstrate the advantages of our approach over state-of-the-art CRL methods, with a wide range of density constrained tasks as well as standard CRL benchmarks such as Safety-Gym.

模型復雜度 · MoDELS · 深度學習 · 泛化理論 · 模型選擇 ·

2021 年 3 月 8 日

Model Complexity of Deep Learning: A Survey

Xia Hu,Lingyang Chu,Jian Pei,Weiqing Liu,Jiang Bian

Model complexity is a fundamental problem in deep learning. In this paper we conduct a systematic overview of the latest studies on model complexity in deep learning. Model complexity of deep learning can be categorized into expressive capacity and effective model complexity. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process and data complexity. We also discuss the applications of deep learning model complexity including understanding model generalization capability, model optimization, and model selection and design. We conclude by proposing several interesting future directions.

估計/估計量 · Extensibility · 學成 · 塑造 · state-of-the-art ·

2020 年 3 月 23 日

ASLFeat: Learning Local Features of Accurate Shape and Localization

Zixin Luo,Lei Zhou,Xuyang Bai,Hongkai Chen,Jiahui Zhang,Yao Yao,Shiwei Li,Tian Fang,Long Quan

from arxiv, Accepted to CVPR 2020, supplementary materials included, code available

This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors. First, the ability to estimate the local shape (scale, orientation, etc.) of feature points is often neglected during dense feature extraction, while the shape-awareness is crucial to acquire stronger geometric invariance. Second, the localization accuracy of detected keypoints is not sufficient to reliably recover camera geometry, which has become the bottleneck in tasks such as 3D reconstruction. In this paper, we present ASLFeat, with three light-weight yet effective modifications to mitigate above issues. First, we resort to deformable convolutional networks to densely estimate and apply local transformation. Second, we take advantage of the inherent feature hierarchy to restore spatial resolution and low-level details for accurate keypoint localization. Finally, we use a peakiness measurement to relate feature responses and derive more indicative detection scores. The effect of each modification is thoroughly studied, and the evaluation is extensively conducted across a variety of practical scenarios. State-of-the-art results are reported that demonstrate the superiority of our methods.

評分函數 · 圖 · 學成 · 泛函 · 得分 ·

2020 年 3 月 19 日

Causal Discovery with Reinforcement Learning

Shengyu Zhu,Ignavier Ng,Zhitang Chen

from arxiv, Camera-ready version for ICLR 2020 (oral). Codes, datasets, and training logs have been made available at //github.com/huawei-noah/trustworthyAI/tree/master/Causal_Structure_Learning/Causal_Discovery_RL

Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model assumptions, they are usually less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use Reinforcement Learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows a flexible score function under the acyclicity constraint.

異方差 · 上置信界限 · 深度 Q 學習 · 強化學習 · 學成 ·

2018 年 12 月 18 日

Information-Directed Exploration for Deep Reinforcement Learning

Nikolay Nikolov,Johannes Kirschner,Felix Berkenkamp,Andreas Krause

Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches.

獎勵函數 · 線性的 · 強化學習 · 學成 · 值迭代 ·

2018 年 12 月 6 日

Logically-Constrained Reinforcement Learning

Mohammadhosein Hasanbeig,Alessandro Abate,Daniel Kroening

This paper proposes a model-free Reinforcement Learning (RL) algorithm to synthesise policies for an unknown Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), then construct a synchronized MDP between the automaton and the original MDP. According to the resulting LDBA, a reward function is then defined over the state-action pairs of the product MDP. With this reward function, our algorithm synthesises a policy whose traces satisfies the linear time property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.

深度強化學習 · 學成 · 強化學習 · 泛化理論 · BASIC ·

2018 年 12 月 3 日

An Introduction to Deep Reinforcement Learning

Vincent Francois-Lavet,Peter Henderson,Riashat Islam,Marc G. Bellemare,Joelle Pineau

Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision-making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.

估計/估計量 · 學成 · 混淆矩陣 · 無偏 · 強化學習 ·

2018 年 10 月 5 日

Reinforcement Learning with Perturbed Rewards

Jingkang Wang,Yang Liu,Bo Li

Recent studies have shown the vulnerability of reinforcement learning (RL) models in noisy settings. The sources of noises differ across scenarios. For instance, in practice, the observed reward channel is often subject to noise (e.g., when observed rewards are collected through sensors), and thus observed rewards may not be credible as a result. Also, in applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors. In this paper, we consider noisy RL problems where observed rewards by RL agents are generated with a reward confusion matrix. We call such observed rewards as perturbed rewards. We develop an unbiased reward estimator aided robust RL framework that enables RL agents to learn in noisy environments while observing only perturbed rewards. Our framework draws upon approaches for supervised learning with noisy data. The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 67.5% and 46.7% improvements in average on five Atari games, when the error rates are 10% and 30% respectively.

MoDELS · 學成 · ROUGE · 強化學習 · 深度強化學習 ·

2018 年 4 月 19 日

Learning to Extract Coherent Summary via Deep Reinforcement Learning

Yuxiang Wu,Baotian Hu

from arxiv, 8 pages, 1 figure, presented at AAAI-2018

Coherence plays a critical role in producing a high-quality summary from a document. In recent years, neural extractive summarization is becoming increasingly attractive. However, most of them ignore the coherence of summaries when extracting sentences. As an effort towards extracting coherent summaries, we propose a neural coherence model to capture the cross-sentence semantic and syntactic coherence patterns. The proposed neural coherence model obviates the need for feature engineering and can be trained in an end-to-end fashion using unlabeled data. Empirical results show that the proposed neural coherence model can efficiently capture the cross-sentence coherence patterns. Using the combined output of the neural coherence model and ROUGE package as the reward, we design a reinforcement learning method to train a proposed neural extractive summarizer which is named Reinforced Neural Extractive Summarization (RNES) model. The RNES model learns to optimize coherence and informative importance of the summary simultaneously. Experimental results show that the proposed RNES outperforms existing baselines and achieves state-of-the-art performance in term of ROUGE on CNN/Daily Mail dataset. The qualitative evaluation indicates that summaries produced by RNES are more coherent and readable.