夏娃韩剧电视剧在剧免费韩剧TV,精品亚洲高清一区二区三区电影,色欲AV人妻一区二区精品,日韩A无码大片无码软件

User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em predictive engagement}, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human annotators have high agreement on assessing utterance-level engagement scores; (2) conversation-level engagement scores can be predicted from properly aggregated utterance-level engagement scores. Furthermore, we show that the utterance-level engagement scores can be learned from data. These scores can improve automatic evaluation metrics for open-domain dialogue systems, as shown by correlation with human judgements. This suggests that predictive engagement can be used as a real-time feedback for training better dialogue models.

相關內容

任務(wu)對(dui)話系統

關注 36

Perplexity · Chatbot · 得分 · 端到端 · Better ·

2020 年 1 月 27 日

Towards a Human-like Open-Domain Chatbot

Daniel Adiwardana,Minh-Thang Luong,David R. So,Jamie Hall,Noah Fiedel,Romal Thoppilan,Zi Yang,Apoorv Kulshreshtha,Gaurav Nemade,Yifeng Lu,Quoc V. Le

from arxiv, 38 pages, 12 figures

We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is trained to minimize perplexity, an automatic metric that we compare against human judgement of multi-turn conversation quality. To capture this judgement, we propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of good conversation. Interestingly, our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher than the next highest scoring chatbot that we evaluated.

機器學習 · 模型評估 · 估計預測 · 深度學習 · Sarik Ghazarian ·

2019 年 11 月 15 日

[付(fu)費5元(yuan)查看完整(zheng)內容]【AAAI2020接受論(lun)文】預(yu)測(ce)性參與:開放領域對(dui)話系(xi)統(tong)自動(dong)評估的(de)有效指(zhi)標（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

專知會員服務

專知，提供專業可信的知識分發服務，讓認知協作更快更好！

論文題目： Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

論文作者： Sarik Ghazarian, Ralph Weischedel, Aram Galstyan, Nanyun Peng

論文摘要： 用戶參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度是評(ping)估(gu)開放(fang)(fang)域(yu)對話(hua)(hua)系(xi)統質量(liang)的(de)(de)關鍵指(zhi)(zhi)標(biao)。通過使用啟發式構(gou)造(zao)的(de)(de)功能（例如轉數(shu)和對話(hua)(hua)的(de)(de)總時間），先(xian)前的(de)(de)工作(zuo)集中(zhong)在對話(hua)(hua)級別的(de)(de)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)上。在本文中(zhong)，我們(men)調查(cha)了(le)估(gu)計話(hua)(hua)語(yu)級別參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度的(de)(de)可(ke)能性(xing)和有(you)效(xiao)性(xing)，并(bing)定義(yi)了(le)一種(zhong)用于自(zi)動評(ping)估(gu)開放(fang)(fang)域(yu)對話(hua)(hua)系(xi)統的(de)(de)新(xin)指(zhi)(zhi)標(biao)，預測(ce)性(xing)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度。我們(men)的(de)(de)實驗表明：（1）人類注(zhu)釋者在評(ping)估(gu)話(hua)(hua)語(yu)水(shui)平(ping)的(de)(de)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)分(fen)數(shu)方面具有(you)很高的(de)(de)一致性(xing)；（2）對話(hua)(hua)級別的(de)(de)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度得分(fen)可(ke)以(yi)根據(ju)適當匯總的(de)(de)話(hua)(hua)語(yu)級別的(de)(de)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度得分(fen)進行(xing)預測(ce)。此外，我們(men)表明可(ke)以(yi)從數(shu)據(ju)中(zhong)學到(dao)話(hua)(hua)語(yu)水(shui)平(ping)的(de)(de)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度分(fen)數(shu)。這(zhe)些分(fen)數(shu)可(ke)以(yi)改善開放(fang)(fang)域(yu)對話(hua)(hua)系(xi)統的(de)(de)自(zi)動評(ping)估(gu)指(zhi)(zhi)標(biao)，如與(yu)(yu)(yu)(yu)人類判斷的(de)(de)相(xiang)關性(xing)所(suo)示。這(zhe)表明預測(ce)性(xing)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)可(ke)以(yi)用作(zuo)實時反饋，以(yi)訓練(lian)更好的(de)(de)對話(hua)(hua)模型。

付費5元查看完整內容

任務對話系統 · Chatbot · 學成 · 樣例 · 估計/估計量 ·

2019 年 1 月 16 日

Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

Braden Hancock,Antoine Bordes,Pierre-Emmanuel Mazare,Jason Weston

The majority of conversations a dialogue agent sees over its lifetime occur after it has already been trained and deployed, leaving a vast store of potential training signal untapped. In this work, we propose the self-feeding chatbot, a dialogue agent with the ability to extract new training examples from the conversations it participates in. As our agent engages in conversation, it also estimates user satisfaction in its responses. When the conversation appears to be going well, the user's responses become new training examples to imitate. When the agent believes it has made a mistake, it asks for feedback; learning to predict the feedback that will be given improves the chatbot's dialogue abilities further. On the PersonaChat chit-chat dataset with over 131k training examples, we find that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.

Atari · 學成 · Performer · 獎勵函數 · MoDELS ·

2018 年 11 月 15 日

Reward learning from human preferences and demonstrations in Atari

Borja Ibarz,Jan Leike,Tobias Pohlen,Geoffrey Irving,Shane Legg,Dario Amodei

from arxiv, NIPS 2018

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

任務對話系統 · 推斷 · MoDELS · 數據集 · 設計 ·

2018 年 11 月 1 日

Dialogue Natural Language Inference

Sean Welleck,Jason Weston,Arthur Szlam,Kyunghyun Cho

Consistency is a long standing issue faced by dialogue models. In this paper, we frame the consistency of dialogue agents as natural language inference (NLI) and create a new natural language inference dataset called Dialogue NLI. We propose a method which demonstrates that a model trained on Dialogue NLI can be used to improve the consistency of a dialogue model, and evaluate the method with human evaluation and with automatic metrics on a suite of evaluation sets designed to measure a dialogue model's consistency.

MoDELS · 估計/估計量 · 機器閱讀理解 · 基準 · 自然語言處理 ·

2018 年 5 月 30 日

Neural Models for Key Phrase Detection and Question Generation

Sandeep Subramanian,Tong Wang,Xingdi Yuan,Saizheng Zhang,Yoshua Bengio,Adam Trischler

from arxiv, Machine Reading for Question Answering workshop at ACL 2018

We propose a two-stage neural model to tackle question generation from documents. First, our model estimates the probability that word sequences in a document are ones that a human would pick when selecting candidate answers by training a neural key-phrase extractor on the answers in a question-answering corpus. Predicted key phrases then act as target answers and condition a sequence-to-sequence question-generation model with a copy mechanism. Empirically, our key-phrase extraction model significantly outperforms an entity-tagging baseline and existing rule-based approaches. We further demonstrate that our question generation system formulates fluent, answerable questions from key phrases. This two-stage system could be used to augment or generate reading comprehension datasets, which may be leveraged to improve machine reading systems or in educational settings.

度量學習 · 學成 · 層 · 馬哈拉諾比斯距離 · 特征變換 ·

2018 年 5 月 15 日

Online Deep Metric Learning

Wenbin Li,Jing Huo,Yinghuan Shi,Yang Gao,Lei Wang,Jiebo Luo

from arxiv, 9 pages

Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space? In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms? To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.

任務對話系統 · 學成 · INTERACT · 端到端 · 強化學習 ·

2018 年 4 月 18 日

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

Bing Liu,Gokhan Tur,Dilek Hakkani-Tur,Pararth Shah,Larry Heck

from arxiv, To appear in NAACL 2018 as a long paper

In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent's capability in successfully completing a task.

INTERACT · Networking · Performer · Extensibility · 學成 ·

2018 年 3 月 30 日

Guide Me: Interacting with Deep Networks

Christian Rupprecht,Iro Laina,Nassir Navab,Gregory D. Hager,Federico Tombari

from arxiv, CVPR 2018

Interaction and collaboration between humans and intelligent machines has become increasingly important as machine learning methods move into real-world applications that involve end users. While much prior work lies at the intersection of natural language and vision, such as image captioning or image generation from text descriptions, less focus has been placed on the use of language to guide or improve the performance of a learned visual processing algorithm. In this paper, we explore methods to flexibly guide a trained convolutional neural network through user input to improve its performance during inference. We do so by inserting a layer that acts as a spatio-semantic guide into the network. This guide is trained to modify the network's activations, either directly via an energy minimization scheme or indirectly through a recurrent model that translates human language queries to interaction weights. Learning the verbal interaction is fully automatic and does not require manual text annotations. We evaluate the method on two datasets, showing that guiding a pre-trained network can improve performance, and provide extensive insights into the interaction between the guide and the CNN.

INTERACT · Extensibility · 推薦系統 · 估計/估計量 · MoDELS ·

2018 年 3 月 28 日

Human Interaction with Recommendation Systems

Sven Schmit,Carlos Riquelme

from arxiv, Accepted to AISTATS 2018

Many recommendation algorithms rely on user data to generate recommendations. However, these recommendations also affect the data obtained from future users. This work aims to understand the effects of this dynamic interaction. We propose a simple model where users with heterogeneous preferences arrive over time. Based on this model, we prove that naive estimators, i.e. those which ignore this feedback loop, are not consistent. We show that consistent estimators are efficient in the presence of myopic agents. Our results are validated using extensive simulations.