亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Disfluency detection is a critical task in real-time dialogue systems. However, despite its importance, it remains a relatively unexplored field, mainly due to the lack of appropriate datasets. At the same time, existing datasets suffer from various issues, including class imbalance issues, which can significantly affect the performance of the model on rare classes, as it is demonstrated in this paper. To this end, we propose LARD, a method for generating complex and realistic artificial disfluencies with little effort. The proposed method can handle three of the most common types of disfluencies: repetitions, replacements and restarts. In addition, we release a new large-scale dataset with disfluencies that can be used on four different tasks: disfluency detection, classification, extraction and correction. Experimental results on the LARD dataset demonstrate that the data produced by the proposed method can be effectively used for detecting and removing disfluencies, while also addressing limitations of existing datasets.

相關內容

數據集,又稱為資料集、數據集合或資料集合,是一種由數據所組成的集合。
 Data set(或dataset)是一個數據的集合,通常以表格形式出現。每一列代表一個特定變量。每一行都對應于某一成員的數據集的問題。它列出的價值觀為每一個變量,如身高和體重的一個物體或價值的隨機數。每個數值被稱為數據資料。對應于行數,該數據集的數據可能包括一個或多個成員。

Recent advances in natural language processing (NLP) have led to strong text classification models for many tasks. However, still often thousands of examples are needed to train models with good quality. This makes it challenging to quickly develop and deploy new models for real world problems and business needs. Few-shot learning and active learning are two lines of research, aimed at tackling this problem. In this work, we combine both lines into FASL, a platform that allows training text classification models using an iterative and fast process. We investigate which active learning methods work best in our few-shot setup. Additionally, we develop a model to predict when to stop annotating. This is relevant as in a few-shot setup we do not have access to a large validation set.

Rumor detection has become an emerging and active research field in recent years. At the core is to model the rumor characteristics inherent in rich information, such as propagation patterns in social network and semantic patterns in post content, and differentiate them from the truth. However, existing works on rumor detection fall short in modeling heterogeneous information, either using one single information source only (e.g. social network, or post content) or ignoring the relations among multiple sources (e.g. fusing social and content features via simple concatenation). Therefore, they possibly have drawbacks in comprehensively understanding the rumors, and detecting them accurately. In this work, we explore contrastive self-supervised learning on heterogeneous information sources, so as to reveal their relations and characterize rumors better. Technically, we supplement the main supervised task of detection with an auxiliary self-supervised task, which enriches post representations via post self-discrimination. Specifically, given two heterogeneous views of a post (i.e. representations encoding social patterns and semantic patterns), the discrimination is done by maximizing the mutual information between different views of the same post compared to that of other posts. We devise cluster-wise and instance-wise approaches to generate the views and conduct the discrimination, considering different relations of information sources. We term this framework as Self-supervised Rumor Detection (SRD). Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.

Pretrained language models can be effectively stimulated by textual prompts or demonstrations, especially in low-data scenarios. Recent works have focused on automatically searching discrete or continuous prompts or optimized verbalizers, yet studies for the demonstration are still limited. Concretely, the demonstration examples are crucial for an excellent final performance of prompt-tuning. In this paper, we propose a novel pluggable, extensible, and efficient approach named contrastive demonstration tuning, which is free of demonstration sampling. Furthermore, the proposed approach can be: (i) Plugged to any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories. Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance. Code is available in //github.com/zjunlp/PromptKG/tree/main/research/Demo-Tuning.

Massive false rumors emerging along with breaking news or trending topics severely hinder the truth. Existing rumor detection approaches achieve promising performance on the yesterday`s news, since there is enough corpus collected from the same domain for model training. However, they are poor at detecting rumors about unforeseen events especially those propagated in different languages due to the lack of training data and prior knowledge (i.e., low-resource regimes). In this paper, we propose an adversarial contrastive learning framework to detect rumors by adapting the features learned from well-resourced rumor data to that of the low-resourced. Our model explicitly overcomes the restriction of domain and/or language usage via language alignment and a novel supervised contrastive training paradigm. Moreover, we develop an adversarial augmentation mechanism to further enhance the robustness of low-resource rumor representation. Extensive experiments conducted on two low-resource datasets collected from real-world microblog platforms demonstrate that our framework achieves much better performance than state-of-the-art methods and exhibits a superior capacity for detecting rumors at early stages.

Pre-trained models are widely used in the tasks of natural language processing nowadays. However, in the specific field of text simplification, the research on improving pre-trained models is still blank. In this work, we propose a continued pre-training method for text simplification. Specifically, we propose a new masked language modeling (MLM) mechanism, which does not randomly mask words but only masks simple words. The new mechanism can make the model learn to generate simple words. We use a small-scale simple text dataset for continued pre-training and employ two methods to identify simple words from the texts. We choose BERT, a representative pre-trained model, and continue pre-training it using our proposed method. Finally, we obtain SimpleBERT, which surpasses BERT in both lexical simplification and sentence simplification tasks and has achieved state-of-the-art results on multiple datasets. What's more, SimpleBERT can replace BERT in existing simplification models without modification.

The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.

Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.

This paper addresses the difficulty of forecasting multiple financial time series (TS) conjointly using deep neural networks (DNN). We investigate whether DNN-based models could forecast these TS more efficiently by learning their representation directly. To this end, we make use of the dynamic factor graph (DFG) from that we enhance by proposing a novel variable-length attention-based mechanism to render it memory-augmented. Using this mechanism, we propose an unsupervised DNN architecture for multivariate TS forecasting that allows to learn and take advantage of the relationships between these TS. We test our model on two datasets covering 19 years of investment funds activities. Our experimental results show that our proposed approach outperforms significantly typical DNN-based and statistical models at forecasting their 21-day price trajectory.

Meta-learning extracts the common knowledge acquired from learning different tasks and uses it for unseen tasks. It demonstrates a clear advantage on tasks that have insufficient training data, e.g., few-shot learning. In most meta-learning methods, tasks are implicitly related via the shared model or optimizer. In this paper, we show that a meta-learner that explicitly relates tasks on a graph describing the relations of their output dimensions (e.g., classes) can significantly improve the performance of few-shot learning. This type of graph is usually free or cheap to obtain but has rarely been explored in previous works. We study the prototype based few-shot classification, in which a prototype is generated for each class, such that the nearest neighbor search between the prototypes produces an accurate classification. We introduce "Gated Propagation Network (GPN)", which learns to propagate messages between prototypes of different classes on the graph, so that learning the prototype of each class benefits from the data of other related classes. In GPN, an attention mechanism is used for the aggregation of messages from neighboring classes, and a gate is deployed to choose between the aggregated messages and the message from the class itself. GPN is trained on a sequence of tasks from many-shot to few-shot generated by subgraph sampling. During training, it is able to reuse and update previously achieved prototypes from the memory in a life-long learning cycle. In experiments, we change the training-test discrepancy and test task generation settings for thorough evaluations. GPN outperforms recent meta-learning methods on two benchmark datasets in all studied cases.

Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect $2806$ aerial images from different sensors and platforms. Each image is of the size about 4000-by-4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using $15$ common object categories. The fully annotated DOTA images contains $188,282$ instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.

北京阿比特科技有限公司