成人午夜性影院视频_国产成人三级经典中文_久久国产亚洲精品麻豆涩爱_亚洲日韩高清在线亚洲专区2021_亚洲AV无码免播放器在线播放_亚洲伊人久久成综合人影院_亚洲专区一区精品中文字幕

Public figures receive a disproportionate amount of abuse on social media, impacting their active participation in public life. Automated systems can identify abuse at scale but labelling training data is expensive, complex and potentially harmful. So, it is desirable that systems are efficient and generalisable, handling both shared and specific aspects of online abuse. We explore the dynamics of cross-group text classification in order to understand how well classifiers trained on one domain or demographic can transfer to others, with a view to building more generalisable abuse classifiers. We fine-tune language models to classify tweets targeted at public figures across DOmains (sport and politics) and DemOgraphics (women and men) using our novel DODO dataset, containing 28,000 labelled entries, split equally across four domain-demographic pairs. We find that (i) small amounts of diverse data are hugely beneficial to generalisation and model adaptation; (ii) models transfer more easily across demographics but models trained on cross-domain data are more generalisable; (iii) some groups contribute more to generalisability than others; and (iv) dataset similarity is a signal of transferability.

相關內容

語言模型化

關注 9

Continuity · Learning · 知識 (knowledge) · 參數空間 · state-of-the-art ·

2023 年 9 月 20 日

Create and Find Flatness: Building Flat Training Spaces in Advance for Continual Learning

Wenhang Shi,Yiren Chen,Zhe Zhao,Wei Lu,Kimmo Yan,Xiaoyong Du

from arxiv, 10pages, ECAI2023 conference

Catastrophic forgetting remains a critical challenge in the field of continual learning, where neural networks struggle to retain prior knowledge while assimilating new information. Most existing studies emphasize mitigating this issue only when encountering new tasks, overlooking the significance of the pre-task phase. Therefore, we shift the attention to the current task learning stage, presenting a novel framework, C&F (Create and Find Flatness), which builds a flat training space for each task in advance. Specifically, during the learning of the current task, our framework adaptively creates a flat region around the minimum in the loss landscape. Subsequently, it finds the parameters' importance to the current task based on their flatness degrees. When adapting the model to a new task, constraints are applied according to the flatness and a flat space is simultaneously prepared for the impending task. We theoretically demonstrate the consistency between the created and found flatness. In this manner, our framework not only accommodates ample parameter space for learning new tasks but also preserves the preceding knowledge of earlier tasks. Experimental results exhibit C&F's state-of-the-art performance as a standalone continual learning approach and its efficacy as a framework incorporating other methods. Our work is available at //github.com/Eric8932/Create-and-Find-Flatness.

自動問答 · 自助法/自舉法 · 監督 · 情景 · Continuity ·

2023 年 9 月 20 日

QASnowball: An Iterative Bootstrapping Framework for High-Quality Question-Answering Data Generation

Kunlun Zhu,Shihao Liang,Xu Han,Zhi Zheng,Guoyang Zeng,Zhiyuan Liu,Maosong Sun

Recent years have witnessed the success of question answering (QA), especially its potential to be a foundation paradigm for tackling diverse NLP tasks. However, obtaining sufficient data to build an effective and stable QA system still remains an open problem. For this problem, we introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball), which can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples. Specifically, QASnowball consists of three modules, an answer extractor to extract core phrases in unlabeled documents as candidate answers, a question generator to generate questions based on documents and candidate answers, and a QA data filter to filter out high-quality QA data. Moreover, QASnowball can be self-enhanced by reseeding the seed set to fine-tune itself in different iterations, leading to continual improvements in the generation quality. We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models: (1) training models on the generated data achieves comparable results to using supervised data, and (2) pre-training on the generated data and fine-tuning on supervised data can achieve better performance. Our code and generated data will be released to advance further work.

自助法/自舉法 · 規范化的 · 線性回歸 · 線性的 · MoDELS ·

2023 年 9 月 19 日

A New Bootstrap Goodness-of-Fit Test for Normal Linear Regression Models

Scott H. Koeneman,Joseph E. Cavanaugh

In this work, the distributional properties of the goodness-of-fit term in likelihood-based information criteria are explored. These properties are then leveraged to construct a novel goodness-of-fit test for normal linear regression models that relies on a non-parametric bootstrap. Several simulation studies are performed to investigate the properties and efficacy of the developed procedure, with these studies demonstrating that the bootstrap test offers distinct advantages as compared to other methods of assessing the goodness-of-fit of a normal linear regression model.

知識 (knowledge) · 分解的 · INFORMS · GROUP · 可辨認的 ·

2023 年 9 月 19 日

Pitfalls in Effective Knowledge Management: Insights from an International Information Technology Organization

Kalle Koivisto,Toni Taipalus

Knowledge is considered an essential resource for organizations. For organizations to benefit from their possessed knowledge, knowledge needs to be managed effectively. Despite knowledge sharing and management being viewed as important by practitioners, organizations fail to benefit from their knowledge, leading to issues in cooperation and the loss of valuable knowledge with departing employees. This study aims to identify hindering factors that prevent individuals from effectively sharing and managing knowledge and understand how to eliminate these factors. Empirical data were collected through semi-structured group interviews from 50 individuals working in an international large IT organization. This study confirms the existence of a gap between the perceived importance of knowledge management and how little this importance is reflected in practice. Several hindering factors were identified, grouped into personal social topics, organizational social topics, technical topics, environmental topics, and interrelated social and technical topics. The presented recommendations for mitigating these hindering factors are focused on improving employees' actions, such as offering training and guidelines to follow. The findings of this study have implications for organizations in knowledge-intensive fields, as they can use this knowledge to create knowledge sharing and management strategies to improve their overall performance.

Performer · MoDELS · 語言模型化 · 在線 · NLP ·

2023 年 9 月 18 日

Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for Inferring Online Health Texts

Joseph Gatto,Sarah M. Preum

User-generated texts available on the web and social platforms are often long and semantically challenging, making them difficult to annotate. Obtaining human annotation becomes increasingly difficult as problem domains become more specialized. For example, many health NLP problems require domain experts to be a part of the annotation pipeline. Thus, it is crucial that we develop low-resource NLP solutions able to work with this set of limited-data problems. In this study, we employ Abstract Meaning Representation (AMR) graphs as a means to model low-resource Health NLP tasks sourced from various online health resources and communities. AMRs are well suited to model online health texts as they can represent multi-sentence inputs, abstract away from complex terminology, and model long-distance relationships between co-referring tokens. AMRs thus improve the ability of pre-trained language models to reason about high-complexity texts. Our experiments show that we can improve performance on 6 low-resource health NLP tasks by augmenting text embeddings with semantic graph embeddings. Our approach is task agnostic and easy to merge into any standard text classification pipeline. We experimentally validate that AMRs are useful in the modeling of complex texts by analyzing performance through the lens of two textual complexity measures: the Flesch Kincaid Reading Level and Syntactic Complexity. Our error analysis shows that AMR-infused language models perform better on complex texts and generally show less predictive variance in the presence of changing complexity.

可約的 · Extensibility · 評論員 · 模型評估 · Twitter ·

2023 年 9 月 18 日

Empowering Fake-News Mitigation: Insights from Sharers' Social Media Post-Histories

Verena Schoenmueller,Simon J. Blanchard,Gita V. Johar

from arxiv, 108 pages

Misinformation is a global concern and limiting its spread is critical for protecting democracy, public health, and consumers. We propose that consumers' own social media post-histories are an underutilized data source to study what leads them to share links to fake-news. In Study 1, we explore how textual cues extracted from post-histories distinguish fake-news sharers from random social media users and others in the misinformation ecosystem. Among other results, we find across two datasets that fake-news sharers use more words related to anger, religion and power. In Study 2, we show that adding textual cues from post-histories improves the accuracy of models to predict who is likely to share fake-news. In Study 3, we provide a preliminary test of two mitigation strategies deduced from Study 1 - activating religious values and reducing anger - and find that they reduce fake-news sharing and sharing more generally. In Study 4, we combine survey responses with users' verified Twitter post-histories and show that using empowering language in a fact-checking browser extension ad increases download intentions. Our research encourages marketers, misinformation scholars, and practitioners to use post-histories to develop theories and test interventions to reduce the spread of misinformation.

Processing（編程語言） · SPARQL · 操作 · RDF · 流 ·

2023 年 8 月 28 日

A Real-Time Approach for Smart Building Operations Prediction Using Rule-Based Complex Event Processing and SPARQL Query

Shashi Shekhar Kumar,Ritesh Chandra,Sonali Agarwal

Due to intelligent, adaptive nature towards various operations and their ability to provide maximum comfort to the occupants residing in them, smart buildings are becoming a pioneering area of research. Since these architectures leverage the Internet of Things (IoT), there is a need for monitoring different operations (Occupancy, Humidity, Temperature, CO2, etc.) to provide sustainable comfort to the occupants. This paper proposes a novel approach for intelligent building operations monitoring using rule-based complex event processing and query-based approaches for dynamically monitoring the different operations. Siddhi is a complex event processing engine designed for handling multiple sources of event data in real time and processing it according to predefined rules using a decision tree. Since streaming data is dynamic in nature, to keep track of different operations, we have converted the IoT data into an RDF dataset. The RDF dataset is ingested to Apache Kafka for streaming purposes and for stored data we have used the GraphDB tool that extracts information with the help of SPARQL query. Consequently, the proposed approach is also evaluated by deploying the large number of events through the Siddhi CEP engine and how efficiently they are processed in terms of time. Apart from that, a risk estimation scenario is also designed to generate alerts for end users in case any of the smart building operations need immediate attention. The output is visualized and monitored for the end user through a tableau dashboard.

知識 (knowledge) · Processing（編程語言） · 圖 · NLP · 知識圖譜 ·

2022 年 9 月 30 日

A Decade of Knowledge Graphs in Natural Language Processing: A Survey

Phillip Schneider,Tim Schopf,Juraj Vladika,Mikhail Galkin,Elena Simperl,Florian Matthes

from arxiv, Accepted to AACL-IJCNLP 2022

In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.

蒸餾 · MoDELS · 學成 · Student-Teacher · Vision ·

2021 年 6 月 17 日

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Lin Wang,Kuk-Jin Yoon

from arxiv, Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI),2021. Some references are updated in this version

Deep neural models in recent years have been successful in almost every field, including extremely complex problem statements. However, these models are huge in size, with millions (and even billions) of parameters, thus demanding more heavy computation power and failing to be deployed on edge devices. Besides, the performance boost is highly dependent on redundant labeled data. To achieve faster speeds and to handle the problems caused by the lack of data, knowledge distillation (KD) has been proposed to transfer information learned from one model to another. KD is often characterized by the so-called `Student-Teacher' (S-T) learning framework and has been broadly applied in model compression and knowledge transfer. This paper is about KD and S-T learning, which are being actively studied in recent years. First, we aim to provide explanations of what KD is and how/why it works. Then, we provide a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically for vision tasks. In general, we consider some fundamental questions that have been driving this research area and thoroughly generalize the research progress and technical details. Additionally, we systematically analyze the research status of KD in vision applications. Finally, we discuss the potentials and open challenges of existing methods and prospect the future directions of KD and S-T learning.

估計/估計量 · 正交 · 泛函 · MoDELS · 有偏 ·

2018 年 1 月 20 日

IEOPF: An Active Contour Model for Image Segmentation with Inhomogeneities Estimated by Orthogonal Primary Functions

Chaolu Feng

from arxiv, 27 pages, 14 figures

Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.