Neural network models usually suffer from the challenge of incorporating commonsense knowledge into the open-domain dialogue systems. In this paper, we propose a novel knowledge-aware dialogue generation model (called TransDG), which transfers question representation and knowledge matching abilities from knowledge base question answering (KBQA) task to facilitate the utterance understanding and factual knowledge selection for dialogue generation. In addition, we propose a response guiding attention and a multi-step decoding strategy to steer our model to focus on relevant features for response generation. Experiments on two benchmark datasets demonstrate that our model has robust superiority over compared methods in generating informative and fluent dialogues. Our code is available at //github.com/siat-nlp/TransDG.
This paper focuses on how to take advantage of external relational knowledge to improve machine reading comprehension (MRC) with multi-task learning. Most of the traditional methods in MRC assume that the knowledge used to get the correct answer generally exists in the given documents. However, in real-world task, part of knowledge may not be mentioned and machines should be equipped with the ability to leverage external knowledge. In this paper, we integrate relational knowledge into MRC model for commonsense reasoning. Specifically, based on a pre-trained language model (LM). We design two auxiliary relation-aware tasks to predict if there exists any commonsense relation and what is the relation type between two words, in order to better model the interactions between document and candidate answer option. We conduct experiments on two multi-choice benchmark datasets: the SemEval-2018 Task 11 and the Cloze Story Test. The experimental results demonstrate the effectiveness of the proposed method, which achieves superior performance compared with the comparable baselines on both datasets.
Recent progress in pretraining language models on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and may be able to answer queries structured as "fill-in-the-blank" cloze statements. Language models have many advantages over structured knowledge bases: they require no schema engineering, allow practitioners to query about an open class of relations, are easy to extend to more data, and require no human supervision to train. We present an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models. We find that (i) without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard language model pretraining approaches. The surprisingly strong ability of these models to recall factual knowledge without any fine-tuning demonstrates their potential as unsupervised open-domain QA systems. The code to reproduce our analysis is available at //github.com/facebookresearch/LAMA.
Two types of knowledge, triples from knowledge graphs and texts from documents, have been studied for knowledge aware open-domain conversation generation, in which graph paths can narrow down vertex candidates for knowledge selection decision, and texts can provide rich information for response generation. Fusion of a knowledge graph and texts might yield mutually reinforcing advantages, but there is less study on that. To address this challenge, we propose a knowledge aware chatting machine with three components, an augmented knowledge graph with both triples and texts, knowledge selector, and knowledge aware response generator. For knowledge selection on the graph, we formulate it as a problem of multi-hop graph reasoning to effectively capture conversation flow, which is more explainable and flexible in comparison with previous work. To fully leverage long text information that differentiates our graph from others, we improve a state of the art reasoning algorithm with machine reading comprehension technology. We demonstrate the effectiveness of our system on two datasets in comparison with state-of-the-art models.
Paragraph-style image captions describe diverse aspects of an image as opposed to the more common single-sentence captions that only provide an abstract description of the image. These paragraph captions can hence contain substantial information of the image for tasks such as visual question answering. Moreover, this textual information is complementary with visual information present in the image because it can discuss both more abstract concepts and more explicit, intermediate symbolic information about objects, events, and scenes that can directly be matched with the textual question and copied into the textual answer (i.e., via easier modality match). Hence, we propose a combined Visual and Textual Question Answering (VTQA) model which takes as input a paragraph caption as well as the corresponding image, and answers the given question based on both inputs. In our model, the inputs are fused to extract related information by cross-attention (early fusion), then fused again in the form of consensus (late fusion), and finally expected answers are given an extra score to enhance the chance of selection (later fusion). Empirical results show that paragraph captions, even when automatically generated (via an RL-based encoder-decoder model), help correctly answer more visual questions. Overall, our joint model, when trained on the Visual Genome dataset, significantly improves the VQA performance over a strong baseline model.
Two types of knowledge, factoid knowledge from graphs and non-factoid knowledge from unstructured documents, have been studied for knowledge aware open-domain conversation generation, in which edge information in graphs can help generalization of knowledge selectors, and text sentences of non-factoid knowledge can provide rich information for response generation. Fusion of knowledge triples and sentences might yield mutually reinforcing advantages for conversation generation, but there is less study on that. To address this challenge, we propose a knowledge aware chatting machine with three components, augmented knowledge graph containing both factoid and non-factoid knowledge, knowledge selector, and response generator. For knowledge selection on the graph, we formulate it as a problem of multi-hop graph reasoning that is more flexible in comparison with previous one-hop knowledge selection models. To fully leverage long text information that differentiates our graph from others, we improve a state of the art reasoning algorithm with machine reading comprehension technology. We demonstrate that supported by such unified knowledge and knowledge selection method, our system can generate more appropriate and informative responses than baselines.
Although neural network approaches achieve remarkable success on a variety of NLP tasks, many of them struggle to answer questions that require commonsense knowledge. We believe the main reason is the lack of commonsense connections between concepts. To remedy this, we provide a simple and effective method that leverages external commonsense knowledge base such as ConceptNet. We pre-train direct and indirect relational functions between concepts, and show that these pre-trained functions could be easily added to existing neural network models. Results show that incorporating commonsense-based function improves the state-of-the-art on two question answering tasks that require commonsense reasoning. Further analysis shows that our system discovers and leverages useful evidences from an external commonsense knowledge base, which is missing in existing neural network models and help derive the correct answer.
Machine reading comprehension (MRC) requires reasoning about both the knowledge involved in a document and knowledge about the world. However, existing datasets are typically dominated by questions that can be well solved by context matching, which fail to test this capability. To encourage the progress on knowledge-based reasoning in MRC, we present knowledge-based MRC in this paper, and build a new dataset consisting of 40,047 question-answer pairs. The annotation of this dataset is designed so that successfully answering the questions requires understanding and the knowledge involved in a document. We implement a framework consisting of both a question answering model and a question generation model, both of which take the knowledge extracted from the document as well as relevant facts from an external knowledge base such as Freebase/ProBase/Reverb/NELL. Results show that incorporating side information from external KB improves the accuracy of the baseline question answer system. We compare it with a standard MRC model BiDAF, and also provide the difficulty of the dataset and lay out remaining challenges.
Recently, Visual Question Answering (VQA) has emerged as one of the most significant tasks in multimodal learning as it requires understanding both visual and textual modalities. Existing methods mainly rely on extracting image and question features to learn their joint feature embedding via multimodal fusion or attention mechanism. Some recent studies utilize external VQA-independent models to detect candidate entities or attributes in images, which serve as semantic knowledge complementary to the VQA task. However, these candidate entities or attributes might be unrelated to the VQA task and have limited semantic capacities. To better utilize semantic knowledge in images, we propose a novel framework to learn visual relation facts for VQA. Specifically, we build up a Relation-VQA (R-VQA) dataset based on the Visual Genome dataset via a semantic similarity module, in which each data consists of an image, a corresponding question, a correct answer and a supporting relation fact. A well-defined relation detector is then adopted to predict visual question-related relation facts. We further propose a multi-step attention model composed of visual attention and semantic attention sequentially to extract related visual knowledge and semantic knowledge. We conduct comprehensive experiments on the two benchmark datasets, demonstrating that our model achieves state-of-the-art performance and verifying the benefit of considering visual relation facts.
Human conversation is a complex mechanism with subtle nuances. It is hence an ambitious goal to develop artificial intelligence agents that can participate fluently in a conversation. While we are still far from achieving this goal, recent progress in visual question answering, image captioning, and visual question generation shows that dialog systems may be realizable in the not too distant future. To this end, a novel dataset was introduced recently and encouraging results were demonstrated, particularly for question answering. In this paper, we demonstrate a simple symmetric discriminative baseline, that can be applied to both predicting an answer as well as predicting a question. We show that this method performs on par with the state of the art, even memory net based methods. In addition, for the first time on the visual dialog dataset, we assess the performance of a system asking questions, and demonstrate how visual dialog can be generated from discriminative question generation and question answering.
Answering complex questions is a time-consuming activity for humans that requires reasoning and integration of information. Recent work on reading comprehension made headway in answering simple questions, but tackling complex questions is still an ongoing research challenge. Conversely, semantic parsers have been successful at handling compositionality, but only when the information resides in a target knowledge-base. In this paper, we present a novel framework for answering broad and complex questions, assuming answering simple questions is possible using a search engine and a reading comprehension model. We propose to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers. To illustrate the viability of our approach, we create a new dataset of complex questions, ComplexWebQuestions, and present a model that decomposes questions and interacts with the web to compute an answer. We empirically demonstrate that question decomposition improves performance from 20.8 precision@1 to 27.5 precision@1 on this new dataset.