亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em predictive engagement}, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human annotators have high agreement on assessing utterance-level engagement scores; (2) conversation-level engagement scores can be predicted from properly aggregated utterance-level engagement scores. Furthermore, we show that the utterance-level engagement scores can be learned from data. These scores can improve automatic evaluation metrics for open-domain dialogue systems, as shown by correlation with human judgements. This suggests that predictive engagement can be used as a real-time feedback for training better dialogue models.

相關內容

We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is trained to minimize perplexity, an automatic metric that we compare against human judgement of multi-turn conversation quality. To capture this judgement, we propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of good conversation. Interestingly, our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher than the next highest scoring chatbot that we evaluated.

論文題目: Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

論文作者: Sarik Ghazarian, Ralph Weischedel, Aram Galstyan, Nanyun Peng

論文摘要: 用戶參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度是評(ping)估(gu)開放(fang)(fang)域(yu)對話(hua)(hua)系(xi)統質量(liang)的(de)(de)關鍵指(zhi)(zhi)標(biao)。通過使用啟發式構(gou)造(zao)的(de)(de)功能(例如轉數(shu)和對話(hua)(hua)的(de)(de)總時間),先(xian)前的(de)(de)工作(zuo)集中(zhong)在對話(hua)(hua)級別的(de)(de)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)上。在本文中(zhong),我們(men)調查(cha)了(le)估(gu)計話(hua)(hua)語(yu)級別參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度的(de)(de)可(ke)能性(xing)和有(you)效(xiao)性(xing),并(bing)定義(yi)了(le)一種(zhong)用于自(zi)動評(ping)估(gu)開放(fang)(fang)域(yu)對話(hua)(hua)系(xi)統的(de)(de)新(xin)指(zhi)(zhi)標(biao),預測(ce)性(xing)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度。我們(men)的(de)(de)實驗表明:(1)人類注(zhu)釋者在評(ping)估(gu)話(hua)(hua)語(yu)水(shui)平(ping)的(de)(de)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)分(fen)數(shu)方面具有(you)很高的(de)(de)一致性(xing); (2)對話(hua)(hua)級別的(de)(de)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度得分(fen)可(ke)以(yi)根據(ju)適當匯總的(de)(de)話(hua)(hua)語(yu)級別的(de)(de)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度得分(fen)進行(xing)預測(ce)。此外,我們(men)表明可(ke)以(yi)從數(shu)據(ju)中(zhong)學到(dao)話(hua)(hua)語(yu)水(shui)平(ping)的(de)(de)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)度分(fen)數(shu)。這(zhe)些分(fen)數(shu)可(ke)以(yi)改善開放(fang)(fang)域(yu)對話(hua)(hua)系(xi)統的(de)(de)自(zi)動評(ping)估(gu)指(zhi)(zhi)標(biao),如與(yu)(yu)(yu)(yu)人類判斷的(de)(de)相(xiang)關性(xing)所(suo)示。這(zhe)表明預測(ce)性(xing)參(can)(can)(can)(can)(can)與(yu)(yu)(yu)(yu)可(ke)以(yi)用作(zuo)實時反饋,以(yi)訓練(lian)更好的(de)(de)對話(hua)(hua)模型。

付費5元查看完整內容

The majority of conversations a dialogue agent sees over its lifetime occur after it has already been trained and deployed, leaving a vast store of potential training signal untapped. In this work, we propose the self-feeding chatbot, a dialogue agent with the ability to extract new training examples from the conversations it participates in. As our agent engages in conversation, it also estimates user satisfaction in its responses. When the conversation appears to be going well, the user's responses become new training examples to imitate. When the agent believes it has made a mistake, it asks for feedback; learning to predict the feedback that will be given improves the chatbot's dialogue abilities further. On the PersonaChat chit-chat dataset with over 131k training examples, we find that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

Consistency is a long standing issue faced by dialogue models. In this paper, we frame the consistency of dialogue agents as natural language inference (NLI) and create a new natural language inference dataset called Dialogue NLI. We propose a method which demonstrates that a model trained on Dialogue NLI can be used to improve the consistency of a dialogue model, and evaluate the method with human evaluation and with automatic metrics on a suite of evaluation sets designed to measure a dialogue model's consistency.

We propose a two-stage neural model to tackle question generation from documents. First, our model estimates the probability that word sequences in a document are ones that a human would pick when selecting candidate answers by training a neural key-phrase extractor on the answers in a question-answering corpus. Predicted key phrases then act as target answers and condition a sequence-to-sequence question-generation model with a copy mechanism. Empirically, our key-phrase extraction model significantly outperforms an entity-tagging baseline and existing rule-based approaches. We further demonstrate that our question generation system formulates fluent, answerable questions from key phrases. This two-stage system could be used to augment or generate reading comprehension datasets, which may be leveraged to improve machine reading systems or in educational settings.

Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space? In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms? To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.

In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent's capability in successfully completing a task.

Interaction and collaboration between humans and intelligent machines has become increasingly important as machine learning methods move into real-world applications that involve end users. While much prior work lies at the intersection of natural language and vision, such as image captioning or image generation from text descriptions, less focus has been placed on the use of language to guide or improve the performance of a learned visual processing algorithm. In this paper, we explore methods to flexibly guide a trained convolutional neural network through user input to improve its performance during inference. We do so by inserting a layer that acts as a spatio-semantic guide into the network. This guide is trained to modify the network's activations, either directly via an energy minimization scheme or indirectly through a recurrent model that translates human language queries to interaction weights. Learning the verbal interaction is fully automatic and does not require manual text annotations. We evaluate the method on two datasets, showing that guiding a pre-trained network can improve performance, and provide extensive insights into the interaction between the guide and the CNN.

Many recommendation algorithms rely on user data to generate recommendations. However, these recommendations also affect the data obtained from future users. This work aims to understand the effects of this dynamic interaction. We propose a simple model where users with heterogeneous preferences arrive over time. Based on this model, we prove that naive estimators, i.e. those which ignore this feedback loop, are not consistent. We show that consistent estimators are efficient in the presence of myopic agents. Our results are validated using extensive simulations.

北京阿比特科技有限公司