日本人体黄色三级视频_我和子的性关系过程在线观看_欧美专区亚洲一区二区_成人看片在线观看完整版_亚洲欧美国产日韩中文丝袜_亚洲欧美日本一区二区三区_亚洲高清精品视频一区二区

Objective: Leveraging machine learning methods, we aim to extract both explicit and implicit cause-effect associations in patient-reported, diabetes-related tweets and provide a tool to better understand opinion, feelings and observations shared within the diabetes online community from a causality perspective. Materials and Methods: More than 30 million diabetes-related tweets in English were collected between April 2017 and January 2021. Deep learning and natural language processing methods were applied to focus on tweets with personal and emotional content. A cause-effect-tweet dataset was manually labeled and used to train 1) a fine-tuned Bertweet model to detect causal sentences containing a causal association 2) a CRF model with BERT based features to extract possible cause-effect associations. Causes and effects were clustered in a semi-supervised approach and visualised in an interactive cause-effect-network. Results: Causal sentences were detected with a recall of 68% in an imbalanced dataset. A CRF model with BERT based features outperformed a fine-tuned BERT model for cause-effect detection with a macro recall of 68%. This led to 96,676 sentences with cause-effect associations. "Diabetes" was identified as the central cluster followed by "Death" and "Insulin". Insulin pricing related causes were frequently associated with "Death". Conclusions: A novel methodology was developed to detect causal sentences and identify both explicit and implicit, single and multi-word cause and corresponding effect as expressed in diabetes-related tweets leveraging BERT-based architectures and visualised as cause-effect-network. Extracting causal associations on real-life, patient reported outcomes in social media data provides a useful complementary source of information in diabetes research.

相關內容

可辨認的

關注 4

聯邦學習 · 學成 · Processing（編程語言） · MoDELS · 語言模型化 ·

2021 年 7 月 27 日

Federated Learning Meets Natural Language Processing: A Survey

Ming Liu,Stella Ho,Mengqi Wang,Longxiang Gao,Yuan Jin,He Zhang

from arxiv, 19 pages

Federated Learning aims to learn machine learning models from multiple decentralized edge devices (e.g. mobiles) or servers without sacrificing local data privacy. Recent Natural Language Processing techniques rely on deep learning and large pre-trained language models. However, both big deep neural and language models are trained with huge amounts of data which often lies on the server side. Since text data is widely originated from end users, in this work, we look into recent NLP models and techniques which use federated learning as the learning framework. Our survey discusses major challenges in federated natural language processing, including the algorithm challenges, system challenges as well as the privacy issues. We also provide a critical review of the existing Federated NLP evaluation methods and tools. Finally, we highlight the current research gaps and future directions.

相關系數 · 相互獨立的 · 變差因素 · 表示學習 · 學成 ·

2021 年 7 月 16 日

On Disentangled Representations Learned From Correlated Data

Frederik Tr?uble,Elliot Creager,Niki Kilbertus,Francesco Locatello,Andrea Dittadi,Anirudh Goyal,Bernhard Sch?lkopf,Stefan Bauer

from arxiv, Published at the 38th International Conference on Machine Learning (ICML 2021)

The focus of disentanglement approaches has been on identifying independent factors of variation in data. However, the causal variables underlying real-world observations are often not statistically independent. In this work, we bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data in a large-scale empirical study (including 4260 models). We show and quantify that systematically induced correlations in the dataset are being learned and reflected in the latent representations, which has implications for downstream applications of disentanglement such as fairness. We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.

分解的 · 學成 · 可辨認的 · 泛化誤差 · Performer ·

2021 年 6 月 16 日

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Chang Liu,Xinwei Sun,Jindong Wang,Haoyue Tang,Tao Li,Tao Qin,Wei Chen,Tie-Yan Liu

from arxiv, Figures for CSG-ind/DA; model selection highlighted; condition and intuition of identifiability; new version of OOD error bound supporting CSG-ind; improved experiment implementation, with shifted-MNIST and ImageCLEF-DA results updated; MDD and BNM baselines added; results on PACS and VLCS datasets added

Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output. To address the problem, we propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately, and develop methods for OOD prediction from a single training domain, which is common and challenging. The methods are based on the causal invariance principle, with a novel design for both efficient learning and easy prediction. Theoretically, we prove that under certain conditions, CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error and the success of adaptation. Empirical study shows improved OOD performance over prevailing baselines.

損失函數（機器學習） · 情感分析 · 學成 · 深度學習 · 泛函 ·

2018 年 6 月 22 日

Combination of Domain Knowledge and Deep Learning for Sentiment Analysis

Khuong Vo,Dang Pham,Mao Nguyen,Trung Mai,Tho Quan

from arxiv, Accepted to MIWAI 2017

The emerging technique of deep learning has been widely applied in many different areas. However, when adopted in a certain specific domain, this technique should be combined with domain knowledge to improve efficiency and accuracy. In particular, when analyzing the applications of deep learning in sentiment analysis, we found that the current approaches are suffering from the following drawbacks: (i) the existing works have not paid much attention to the importance of different types of sentiment terms, which is an important concept in this area; and (ii) the loss function currently employed does not well reflect the degree of error of sentiment misclassification. To overcome such problem, we propose to combine domain knowledge with deep learning. Our proposal includes using sentiment scores, learnt by regression, to augment training data; and introducing penalty matrix for enhancing the loss function of cross entropy. When experimented, we achieved a significant improvement in classification results.

情感分析 · Engineering · INFORMS · MoDELS · 分類模型 ·

2018 年 5 月 22 日

Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid Approach

Nora Al-Twairesh,Hend Al-Khalifa,AbdulMalik Alsalman,Yousef Al-Ohali

Sentiment Analysis in Arabic is a challenging task due to the rich morphology of the language. Moreover, the task is further complicated when applied to Twitter data that is known to be highly informal and noisy. In this paper, we develop a hybrid method for sentiment analysis for Arabic tweets for a specific Arabic dialect which is the Saudi Dialect. Several features were engineered and evaluated using a feature backward selection method. Then a hybrid method that combines a corpus-based and lexicon-based method was developed for several classification models (two-way, three-way, four-way). The best F1-score for each of these models was (69.9,61.63,55.07) respectively.

樣例 · MINE · 度量學習 · 流形 · 無監督 ·

2018 年 3 月 29 日

Mining on Manifolds: Metric Learning without Labels

Ahmet Iscen,Giorgos Tolias,Yannis Avrithis,Ondrej Chum

In this work we present a novel unsupervised framework for hard training example mining. The only input to the method is a collection of images relevant to the target application and a meaningful initial representation, provided e.g. by pre-trained CNN. Positive examples are distant points on a single manifold, while negative examples are nearby points on different manifolds. Both types of examples are revealed by disagreements between Euclidean and manifold similarities. The discovered examples can be used in training with any discriminative loss. The method is applied to unsupervised fine-tuning of pre-trained networks for fine-grained classification and particular object retrieval. Our models are on par or are outperforming prior models that are fully or partially supervised.

推斷 · 測試數據 · 基準 · 學習器 · 模型評估 ·

2018 年 3 月 2 日

Baselines and test data for cross-lingual inference

?eljko Agi?,Natalie Schluter

from arxiv, To appear at LREC 2018

The recent years have seen a revival of interest in textual entailment, sparked by i) the emergence of powerful deep neural network learners for natural language processing and ii) the timely development of large-scale evaluation datasets such as SNLI. Recast as natural language inference, the problem now amounts to detecting the relation between pairs of statements: they either contradict or entail one another, or they are mutually neutral. Current research in natural language inference is effectively exclusive to English. In this paper, we propose to advance the research in SNLI-style natural language inference toward multilingual evaluation. To that end, we provide test data for four major languages: Arabic, French, Spanish, and Russian. We experiment with a set of baselines. Our systems are based on cross-lingual word embeddings and machine translation. While our best system scores an average accuracy of just over 75%, we focus largely on enabling further research in multilingual inference.

詞向量表示 · 相似度 · Performer · Better · PubMed ·

2018 年 2 月 1 日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Yanshan Wang,Sijia Liu,Naveed Afzal,Majid Rastegar-Mojarad,Liwei Wang,Feichen Shen,Hongfang Liu

Neural word embeddings have been widely used in biomedical Natural Language Processing (NLP) applications since they provide vector representations of words that capture the semantic properties of words and the linguistic relationship between words. Many biomedical applications use different textual sources to train word embeddings and apply these word embeddings to downstream biomedical applications. However, there has been little work on comprehensively evaluating the word embeddings trained from these resources. In this study, we provide a comprehensive empirical evaluation of word embeddings trained from four different resources, namely clinical notes, biomedical publications, Wikepedia, and news. We perform the evaluation qualitatively and quantitatively. In qualitative evaluation, we manually inspect five most similar medical words to a given set of target medical words, and then analyze word embeddings through the visualization of those word embeddings. Quantitative evaluation falls into two categories: extrinsic and intrinsic evaluation. Based on the evaluation results, we can draw the following conclusions. First, EHR and PubMed can capture the semantics of medical terms better than GloVe and Google News and find more relevant similar medical terms. Second, the medical semantic similarity captured by the word embeddings trained on EHR and PubMed are closer to human experts' judgments, compared to these trained on GloVe and Google News. Third, there does not exist a consistent global ranking of word embedding quality for downstream biomedical NLP applications. However, adding word embeddings as extra features will improve results on most downstream tasks. Finally, word embeddings trained from a similar domain corpus do not necessarily have better performance than other word embeddings for any downstream biomedical tasks.

詞向量表示 · 無監督 · 監督 · state-of-the-art · Pair ·

2018 年 1 月 30 日

Word Translation Without Parallel Data

Alexis Conneau,Guillaume Lample,Marc'Aurelio Ranzato,Ludovic Denoyer,Hervé Jégou

from arxiv, ICLR 2018

State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a common alphabet. In this work, we show that we can build a bilingual dictionary between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Without using any character information, our model even outperforms existing supervised methods on cross-lingual tasks for some language pairs. Our experiments demonstrate that our method works very well also for distant language pairs, like English-Russian or English-Chinese. We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation. Our code, embeddings and dictionaries are publicly available.

話題模型 · entity · MoDELS · 話題 · 情感分析 ·

2018 年 1 月 23 日

SentiBubbles: Topic Modeling and Sentiment Visualization of Entity-centric Tweets

Jo?o Oliveira,Mike Pinto,Pedro Saleiro,Jorge Teixeira

Social Media users tend to mention entities when reacting to news events. The main purpose of this work is to create entity-centric aggregations of tweets on a daily basis. By applying topic modeling and sentiment analysis, we create data visualization insights about current events and people reactions to those events from an entity-centric perspective.