青柠在线观看免费高清1_99视频在线播放喷射_欧美成人亚洲国产中文精品_精品九九久久精品电影_超碰人人摸人人摸_国产AV导航大全精品_日本视频高清免费观看一区

Efficient methods to evaluate new algorithms are critical for improving interactive bandit and reinforcement learning systems such as recommendation systems. A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure. In this paper, we develop an alternative method, which predicts the performance of algorithms given historical data that may have been generated by a different algorithm. Our estimator has the property that its prediction converges in probability to the true performance of a counterfactual algorithm at a rate of $\sqrt{N}$, as the sample size $N$ increases. We also show a correct way to estimate the variance of our prediction, thus allowing the analyst to quantify the uncertainty in the prediction. These properties hold even when the analyst does not know which among a large number of potentially important state variables are actually important. We validate our method by a simulation experiment about reinforcement learning. We finally apply it to improve advertisement design by a major advertisement company. We find that our method produces smaller mean squared errors than state-of-the-art methods.

相關內容

Performer

關注 10

秩 · 剪枝 · 可辨認的 · 原點 · Performer ·

2021 年 10 月 3 日

Ranking Policy Decisions

Hadrien Pouget,Hana Chockler,Youcheng Sun,Daniel Kroening

Policies trained via Reinforcement Learning (RL) are often needlessly complex, making them difficult to analyse and interpret. In a run with $n$ time steps, a policy will make $n$ decisions on actions to take; we conjecture that only a small subset of these decisions delivers value over selecting a simple default action. Given a trained policy, we propose a novel black-box method based on statistical fault localisation that ranks the states of the environment according to the importance of decisions made in those states. We argue that among other things, the ranked list of states can help explain and understand the policy. As the ranking method is statistical, a direct evaluation of its quality is hard. As a proxy for quality, we use the ranking to create new, simpler policies from the original ones by pruning decisions identified as unimportant (that is, replacing them by default actions) and measuring the impact on performance. Our experiments on a diverse set of standard benchmarks demonstrate that pruned policies can perform on a level comparable to the original policies. Conversely, we show that naive approaches for ranking policy decisions, e.g., ranking based on the frequency of visiting a state, do not result in high-performing pruned policies.

秩 · 推斷 · Extensibility · 成對型 · MoDELS ·

2021 年 10 月 1 日

Lagrangian Inference for Ranking Problems

Yue Liu,Ethan X. Fang,Junwei Lu

We propose a novel combinatorial inference framework to conduct general uncertainty quantification in ranking problems. We consider the widely adopted Bradley-Terry-Luce (BTL) model, where each item is assigned a positive preference score that determines the Bernoulli distributions of pairwise comparisons' outcomes. Our proposed method aims to infer general ranking properties of the BTL model. The general ranking properties include the "local" properties such as if an item is preferred over another and the "global" properties such as if an item is among the top $K$-ranked items. We further generalize our inferential framework to multiple testing problems where we control the false discovery rate (FDR), and apply the method to infer the top-$K$ ranked items. We also derive the information-theoretic lower bound to justify the minimax optimality of the proposed method. We conduct extensive numerical studies using both synthetic and real datasets to back up our theory.

流 · Extensibility · INTERACT · 簇 · Neural Networks ·

2021 年 6 月 7 日

A Semi-Personalized System for User Cold Start Recommendation on Music Streaming Apps

Léa Briand,Guillaume Salha-Galvan,Walid Bendada,Mathieu Morlon,Viet-Anh Tran

from arxiv, 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021)

Music streaming services heavily rely on recommender systems to improve their users' experience, by helping them navigate through a large musical catalog and discover new songs, albums or artists. However, recommending relevant and personalized content to new users, with few to no interactions with the catalog, is challenging. This is commonly referred to as the user cold start problem. In this applied paper, we present the system recently deployed on the music streaming service Deezer to address this problem. The solution leverages a semi-personalized recommendation strategy, based on a deep neural network architecture and on a clustering of users from heterogeneous sources of information. We extensively show the practical impact of this system and its effectiveness at predicting the future musical preferences of cold start users on Deezer, through both offline and online large-scale experiments. Besides, we publicly release our code as well as anonymized usage data from our experiments. We hope that this release of industrial resources will benefit future research on user cold start recommendation.

INFORMS · 推薦系統 · 圖 · 知識圖譜 · 特化 ·

2020 年 2 月 28 日

A Survey on Knowledge Graph-Based Recommender Systems

Qingyu Guo,Fuzhen Zhuang,Chuan Qin,Hengshu Zhu,Xing Xie,Hui Xiong,Qing He

from arxiv, 17 pages, 1 figure

To solve the information explosion problem and enhance user experience in various online applications, recommender systems have been developed to model users preferences. Although numerous efforts have been made toward more personalized recommendations, recommender systems still suffer from several challenges, such as data sparsity and cold start. In recent years, generating recommendations with the knowledge graph as side information has attracted considerable interest. Such an approach can not only alleviate the abovementioned issues for a more accurate recommendation, but also provide explanations for recommended items. In this paper, we conduct a systematical survey of knowledge graph-based recommender systems. We collect recently published papers in this field and summarize them from two perspectives. On the one hand, we investigate the proposed algorithms by focusing on how the papers utilize the knowledge graph for accurate and explainable recommendation. On the other hand, we introduce datasets used in these works. Finally, we propose several potential research directions in this field.

Facebook AI Research · Extensibility · 張成子空間 · 邊緣化 · 設計 ·

2020 年 2 月 25 日

FairRec: Two-Sided Fairness for Personalized Recommendations in Two-Sided Platforms

Gourab K. Patro,Arpita Biswas,Niloy Ganguly,Krishna P. Gummadi,Abhijnan Chakraborty

We investigate the problem of fair recommendation in the context of two-sided online platforms, comprising customers on one side and producers on the other. Traditionally, recommendation services in these platforms have focused on maximizing customer satisfaction by tailoring the results according to the personalized preferences of individual customers. However, our investigation reveals that such customer-centric design may lead to unfair distribution of exposure among the producers, which may adversely impact their well-being. On the other hand, a producer-centric design might become unfair to the customers. Thus, we consider fairness issues that span both customers and producers. Our approach involves a novel mapping of the fair recommendation problem to a constrained version of the problem of fairly allocating indivisible goods. Our proposed FairRec algorithm guarantees at least Maximin Share (MMS) of exposure for most of the producers and Envy-Free up to One item (EF1) fairness for every customer. Extensive evaluations over multiple real-world datasets show the effectiveness of FairRec in ensuring two-sided fairness while incurring a marginal loss in the overall recommendation quality.

學成 · 泛化理論 · AIM · state-of-the-art · 強化學習 ·

2019 年 10 月 24 日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Tianhe Yu,Deirdre Quillen,Zhanpeng He,Ryan Julian,Karol Hausman,Chelsea Finn,Sergey Levine

from arxiv, CoRL 2019. Videos are here: meta-world.github.io and open-sourced codes are available at: //github.com/rlworkgroup/metaworld

Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task distributions that are very narrow. For example, a commonly used meta-reinforcement learning benchmark uses different running velocities for a simulated robot as different tasks. When policies are meta-trained on such narrow task distributions, they cannot possibly generalize to more quickly acquire entirely new tasks. Therefore, if the aim of these methods is to enable faster acquisition of entirely new behaviors, we must evaluate them on task distributions that are sufficiently broad to enable generalization to new behaviors. In this paper, we propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. Our aim is to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks. We evaluate 6 state-of-the-art meta-reinforcement learning and multi-task learning algorithms on these tasks. Surprisingly, while each task and its variations (e.g., with different object positions) can be learned with reasonable success, these algorithms struggle to learn with multiple tasks at the same time, even with as few as ten distinct training tasks. Our analysis and open-source environments pave the way for future research in multi-task learning and meta-learning that can enable meaningful generalization, thereby unlocking the full potential of these methods.

state-of-the-art · 學成 · Next · 分解的 · 優化器 ·

2018 年 8 月 3 日

Causal Embeddings for Recommendation

Stephen Bonner,Flavian Vasile

from arxiv, Accepted as a long paper at the Twelfth ACM Conference on Recommender Systems (RecSys '18), October 2--7, 2018, Vancouver, BC, Canada

Many current applications use recommendations in order to modify the natural user behavior, such as to increase the number of sales or the time spent on a website. This results in a gap between the final recommendation objective and the classical setup where recommendation candidates are evaluated by their coherence with past user behavior, by predicting either the missing entries in the user-item matrix, or the most likely next event. To bridge this gap, we optimize a recommendation policy for the task of increasing the desired outcome versus the organic user behavior. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy. To this end, we propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization methods, in addition to new approaches of causal recommendation and show significant improvements.

推薦系統 · 簇 · 可約的 · 賭博機/老虎機 · Performer ·

2018 年 7 月 31 日

Graph-Based Recommendation System

Kaige Yang,Laura Toni

In this work, we study recommendation systems modelled as contextual multi-armed bandit (MAB) problems. We propose a graph-based recommendation system that learns and exploits the geometry of the user space to create meaningful clusters in the user domain. This reduces the dimensionality of the recommendation problem while preserving the accuracy of MAB. We then study the effect of graph sparsity and clusters size on the MAB performance and provide exhaustive simulation results both in synthetic and in real-case datasets. Simulation results show improvements with respect to state-of-the-art MAB algorithms.

MoDELS · INTERACT · 推薦系統 · 深度模型 · 內積 ·

2018 年 5 月 21 日

Learning Tree-based Deep Model for Recommender Systems

Han Zhu,Xiang Li,Pengye Zhang,Guozheng Li,Jie He,Han Li,Kun Gai

Model-based methods for recommender systems have been studied extensively in recent years. In systems with large corpus, however, the calculation cost for the learnt model to predict all user-item preferences is tremendous, which makes full corpus retrieval extremely difficult. To overcome the calculation barriers, models such as matrix factorization resort to inner product form (i.e., model user-item preference as the inner product of user, item latent factors) and indexes to facilitate efficient approximate k-nearest neighbor searches. However, it still remains challenging to incorporate more expressive interaction forms between user and item features, e.g., interactions through deep neural networks, because of the calculation cost. In this paper, we focus on the problem of introducing arbitrary advanced models to recommender systems with large corpus. We propose a novel tree-based method which can provide logarithmic complexity w.r.t. corpus size even with more expressive models such as deep neural networks. Our main idea is to predict user interests from coarse to fine by traversing tree nodes in a top-down fashion and making decisions for each user-node pair. We also show that the tree structure can be jointly learnt towards better compatibility with users' interest distribution and hence facilitate both training and prediction. Experimental evaluations with two large-scale real-world datasets show that the proposed method significantly outperforms traditional methods. Online A/B test results in Taobao display advertising platform also demonstrate the effectiveness of the proposed method in production environments.

INTERACT · Extensibility · 推薦系統 · 估計/估計量 · MoDELS ·

2018 年 3 月 28 日

Human Interaction with Recommendation Systems

Sven Schmit,Carlos Riquelme

from arxiv, Accepted to AISTATS 2018

Many recommendation algorithms rely on user data to generate recommendations. However, these recommendations also affect the data obtained from future users. This work aims to understand the effects of this dynamic interaction. We propose a simple model where users with heterogeneous preferences arrive over time. Based on this model, we prove that naive estimators, i.e. those which ignore this feedback loop, are not consistent. We show that consistent estimators are efficient in the presence of myopic agents. Our results are validated using extensive simulations.