The outbreak of the SARS-CoV-2 novel coronavirus (COVID-19) has been accompanied by a large amount of misleading and false information about the virus, especially on social media. During the pandemic social media gained special interest as it went on to become an important medium of communication. This made the information being relayed on these platforms especially critical. In our work, we aim to explore the percentage of fake news being spread on Twitter as well as measure the sentiment of the public at the same time. We further study how the sentiment of fear is present among the public. In addition to that we compare the rate of spread of the virus per day with the rate of spread of fake news on Twitter. Our study is useful in establishing the role of Twitter, and social media, during a crisis, and more specifically during crisis management.
Modern code review is a critical and indispensable practice in a pull-request development paradigm that prevails in Open Source Software (OSS) development. Finding a suitable reviewer in projects with massive participants thus becomes an increasingly challenging task. Many reviewer recommendation approaches (recommenders) have been developed to support this task which apply a similar strategy, i.e. modeling the review history first then followed by predicting/recommending a reviewer based on the model. Apparently, the better the model reflects the reality in review history, the higher recommender's performance we may expect. However, one typical scenario in a pull-request development paradigm, i.e. one Pull-Request (PR) (such as a revision or addition submitted by a contributor) may have multiple reviewers and they may impact each other through publicly posted comments, has not been modeled well in existing recommenders. We adopted the hypergraph technique to model this high-order relationship (i.e. one PR with multiple reviewers herein) and developed a new recommender, namely HGRec, which is evaluated by 12 OSS projects with more than 87K PRs, 680K comments in terms of accuracy and recommendation distribution. The results indicate that HGRec outperforms the state-of-the-art recommenders on recommendation accuracy. Besides, among the top three accurate recommenders, HGRec is more likely to recommend a diversity of reviewers, which can help to relieve the core reviewers' workload congestion issue. Moreover, since HGRec is based on hypergraph, which is a natural and interpretable representation to model review history, it is easy to accommodate more types of entities and realistic relationships in modern code review scenarios. As the first attempt, this study reveals the potentials of hypergraph on advancing the pragmatic solutions for code reviewer recommendation.
The wide dissemination of fake news is increasingly threatening both individuals and society. Fake news detection aims to train a model on the past news and detect fake news of the future. Though great efforts have been made, existing fake news detection methods overlooked the unintended entity bias in the real-world data, which seriously influences models' generalization ability to future data. For example, 97\% of news pieces in 2010-2017 containing the entity `Donald Trump' are real in our data, but the percentage falls down to merely 33\% in 2018. This would lead the model trained on the former set to hardly generalize to the latter, as it tends to predict news pieces about `Donald Trump' as real for lower training loss. In this paper, we propose an entity debiasing framework (\textbf{ENDEF}) which generalizes fake news detection models to the future data by mitigating entity bias from a cause-effect perspective. Based on the causal graph among entities, news contents, and news veracity, we separately model the contribution of each cause (entities and contents) during training. In the inference stage, we remove the direct effect of the entities to mitigate entity bias. Extensive offline experiments on the English and Chinese datasets demonstrate that the proposed framework can largely improve the performance of base fake news detectors, and online tests verify its superiority in practice. To the best of our knowledge, this is the first work to explicitly improve the generalization ability of fake news detection models to the future data. The code has been released at //github.com/ICTMCG/ENDEF-SIGIR2022.
As the final stage of the multi-stage recommender system (MRS), reranking directly affects users' experience and satisfaction, thus playing a critical role in MRS. Despite the improvement achieved in the existing work, three issues are yet to be solved. First, users' historical behaviors contain rich preference information, such as users' long and short-term interests, but are not fully exploited in reranking. Previous work typically treats items in history equally important, neglecting the dynamic interaction between the history and candidate items. Second, existing reranking models focus on learning interactions at the item level while ignoring the fine-grained feature-level interactions. Lastly, estimating the reranking score on the ordered initial list before reranking may lead to the early scoring problem, thereby yielding suboptimal reranking performance. To address the above issues, we propose a framework named Multi-level Interaction Reranking (MIR). MIR combines low-level cross-item interaction and high-level set-to-list interaction, where we view the candidate items to be reranked as a set and the users' behavior history in chronological order as a list. We design a novel SLAttention structure for modeling the set-to-list interactions with personalized long-short term interests. Moreover, feature-level interactions are incorporated to capture the fine-grained influence among items. We design MIR in such a way that any permutation of the input items would not change the output ranking, and we theoretically prove it. Extensive experiments on three public and proprietary datasets show that MIR significantly outperforms the state-of-the-art models using various ranking and utility metrics.
Automatic text summarization has experienced substantial progress in recent years. With this progress, the question has arisen whether the types of summaries that are typically generated by automatic summarization models align with users' needs. Ter Hoeve et al (2020) answer this question negatively. Amongst others, they recommend focusing on generating summaries with more graphical elements. This is in line with what we know from the psycholinguistics literature about how humans process text. Motivated from these two angles, we propose a new task: summarization with graphical elements, and we verify that these summaries are helpful for a critical mass of people. We collect a high quality human labeled dataset to support research into the task. We present a number of baseline methods that show that the task is interesting and challenging. Hence, with this work we hope to inspire a new line of research within the automatic summarization community.
Although text style transfer has witnessed rapid development in recent years, there is as yet no established standard for evaluation, which is performed using several automatic metrics, lacking the possibility of always resorting to human judgement. We focus on the task of formality transfer, and on the three aspects that are usually evaluated: style strength, content preservation, and fluency. To cast light on how such aspects are assessed by common and new metrics, we run a human-based evaluation and perform a rich correlation analysis. We are then able to offer some recommendations on the use of such metrics in formality transfer, also with an eye to their generalisability (or not) to related tasks.
Automatic fake news detection models are ostensibly based on logic, where the truth of a claim made in a headline can be determined by supporting or refuting evidence found in a resulting web query. These models are believed to be reasoning in some way; however, it has been shown that these same results, or better, can be achieved without considering the claim at all -- only the evidence. This implies that other signals are contained within the examined evidence, and could be based on manipulable factors such as emotion, sentiment, or part-of-speech (POS) frequencies, which are vulnerable to adversarial inputs. We neutralize some of these signals through multiple forms of both neural and non-neural pre-processing and style transfer, and find that this flattening of extraneous indicators can induce the models to actually require both claims and evidence to perform well. We conclude with the construction of a model using emotion vectors built off a lexicon and passed through an "emotional attention" mechanism to appropriately weight certain emotions. We provide quantifiable results that prove our hypothesis that manipulable features are being used for fact-checking.
Proactive dialogue system is able to lead the conversation to a goal topic and has advantaged potential in bargain, persuasion and negotiation. Current corpus-based learning manner limits its practical application in real-world scenarios. To this end, we contribute to advance the study of the proactive dialogue policy to a more natural and challenging setting, i.e., interacting dynamically with users. Further, we call attention to the non-cooperative user behavior -- the user talks about off-path topics when he/she is not satisfied with the previous topics introduced by the agent. We argue that the targets of reaching the goal topic quickly and maintaining a high user satisfaction are not always converge, because the topics close to the goal and the topics user preferred may not be the same. Towards this issue, we propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors (dialogue turn, goal completion difficulty, user satisfaction estimation, and cooperative degree). The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
Emotion plays an important role in detecting fake news online. When leveraging emotional signals, the existing methods focus on exploiting the emotions of news contents that conveyed by the publishers (i.e., publisher emotion). However, fake news is always fabricated to evoke high-arousal or activating emotions of people to spread like a virus, so the emotions of news comments that aroused by the crowd (i.e., social emotion) can not be ignored. Furthermore, it needs to be explored whether there exists a relationship between publisher emotion and social emotion (i.e., dual emotion), and how the dual emotion appears in fake news. In the paper, we propose Dual Emotion Features to mine dual emotion and the relationship between them for fake news detection. And we design a universal paradigm to plug it into any existing detectors as an enhancement. Experimental results on three real-world datasets indicate the effectiveness of the proposed features.
In recent years, disinformation including fake news, has became a global phenomenon due to its explosive growth, particularly on social media. The wide spread of disinformation and fake news can cause detrimental societal effects. Despite the recent progress in detecting disinformation and fake news, it is still non-trivial due to its complexity, diversity, multi-modality, and costs of fact-checking or annotation. The goal of this chapter is to pave the way for appreciating the challenges and advancements via: (1) introducing the types of information disorder on social media and examine their differences and connections; (2) describing important and emerging tasks to combat disinformation for characterization, detection and attribution; and (3) discussing a weak supervision approach to detect disinformation with limited labeled data. We then provide an overview of the chapters in this book that represent the recent advancements in three related parts: (1) user engagements in the dissemination of information disorder; (2) techniques on detecting and mitigating disinformation; and (3) trending issues such as ethics, blockchain, clickbaits, etc. We hope this book to be a convenient entry point for researchers, practitioners, and students to understand the problems and challenges, learn state-of-the-art solutions for their specific needs, and quickly identify new research problems in their domains.