The rise of social networks as the primary means of communication in almost every country in the world has simultaneously triggered an increase in the amount of fake news circulating online. This fact became particularly evident during the 2016 U.S. political elections and even more so with the advent of the COVID-19 pandemic. Several research studies have shown how the effects of fake news dissemination can be mitigated by promoting greater competence through lifelong learning and discussion communities, and generally rigorous training in the scientific method and broad interdisciplinary education. The urgent need for models that can describe the growing infodemic of fake news has been highlighted by the current pandemic. The resulting slowdown in vaccination campaigns due to misinformation and generally the inability of individuals to discern the reliability of information is posing enormous risks to the governments of many countries. In this research using the tools of kinetic theory we describe the interaction between fake news spreading and competence of individuals through multi-population models in which fake news spreads analogously to an infectious disease with different impact depending on the level of competence of individuals. The level of competence, in particular, is subject to an evolutionary dynamic due to both social interactions between agents and external learning dynamics. The results show how the model is able to correctly describe the dynamics of diffusion of fake news and the important role of competence in their containment.
Models are used in both Software Engineering (SE) and Artificial Intelligence (AI). SE models may specify the architecture at different levels of abstraction and for addressing different concerns at various stages of the software development life-cycle, from early conceptualization and design, to verification, implementation, testing and evolution. However, AI models may provide smart capabilities, such as prediction and decision-making support. For instance, in Machine Learning (ML), which is currently the most popular sub-discipline of AI, mathematical models may learn useful patterns in the observed data and can become capable of making predictions. The goal of this work is to create synergy by bringing models in the said communities together and proposing a holistic approach to model-driven software development for intelligent systems that require ML. We illustrate how software models can become capable of creating and dealing with ML models in a seamless manner. The main focus is on the domain of the Internet of Things (IoT), where both ML and model-driven SE play a key role. In the context of the need to take a Cyber-Physical System-of-Systems perspective of the targeted architecture, an integrated design environment for both SE and ML sub-systems would best support the optimization and overall efficiency of the implementation of the resulting system. In particular, we implement the proposed approach, called ML-Quadrat, based on ThingML, and validate it using a case study from the IoT domain, as well as through an empirical user evaluation. It transpires that the proposed approach is not only feasible, but may also contribute to the performance leap of software development for smart Cyber-Physical Systems (CPS) which are connected to the IoT, as well as an enhanced user experience of the practitioners who use the proposed modeling solution.
Combination and aggregation techniques can significantly improve forecast accuracy. This also holds for probabilistic forecasting methods where predictive distributions are combined. There are several time-varying and adaptive weighting schemes such as Bayesian model averaging (BMA). However, the quality of different forecasts may vary not only over time but also within the distribution. For example, some distribution forecasts may be more accurate in the center of the distributions, while others are better at predicting the tails. Therefore, we introduce a new weighting method that considers the differences in performance over time and within the distribution. We discuss pointwise combination algorithms based on aggregation across quantiles that optimize with respect to the continuous ranked probability score (CRPS). After analyzing the theoretical properties of pointwise CRPS learning, we discuss B- and P-Spline-based estimation techniques for batch and online learning, based on quantile regression and prediction with expert advice. We prove that the proposed fully adaptive Bernstein online aggregation (BOA) method for pointwise CRPS online learning has optimal convergence properties. They are confirmed in simulations and a probabilistic forecasting study for European emission allowance (EUA) prices.
As of December 2020, the COVID-19 pandemic has infected over 75 million people, making it the deadliest pandemic in modern history. This study develops a novel compartmental epidemiological model specific to the SARS-CoV-2 virus and analyzes the effect of common preventative measures such as testing, quarantine, social distancing, and vaccination. By accounting for the most prevalent interventions that have been enacted to minimize the spread of the virus, the model establishes a paramount foundation for future mathematical modeling of COVID-19 and other modern pandemics. Specifically, the model expands on the classic SIR model and introduces separate compartments for individuals who are in the incubation period, asymptomatic, tested-positive, quarantined, vaccinated, or deceased. It also accounts for variable infection, testing, and death rates. I first analyze the outbreak in Santa Clara County, California, and later generalize the findings. The results show that, although all preventative measures reduce the spread of COVID-19, quarantine and social distancing mandates reduce the infection rate and subsequently are the most effective policies, followed by vaccine distribution and, finally, public testing. Thus, governments should concentrate resources on enforcing quarantine and social distancing policies. In addition, I find mathematical proof that the relatively high asymptomatic rate and long incubation period are driving factors of COVID-19's rapid spread.
Community detection, a fundamental task for network analysis, aims to partition a network into multiple sub-structures to help reveal their latent functions. Community detection has been extensively studied in and broadly applied to many real-world network problems. Classical approaches to community detection typically utilize probabilistic graphical models and adopt a variety of prior knowledge to infer community structures. As the problems that network methods try to solve and the network data to be analyzed become increasingly more sophisticated, new approaches have also been proposed and developed, particularly those that utilize deep learning and convert networked data into low dimensional representation. Despite all the recent advancement, there is still a lack of insightful understanding of the theoretical and methodological underpinning of community detection, which will be critically important for future development of the area of network analysis. In this paper, we develop and present a unified architecture of network community-finding methods to characterize the state-of-the-art of the field of community detection. Specifically, we provide a comprehensive review of the existing community detection methods and introduce a new taxonomy that divides the existing methods into two categories, namely probabilistic graphical model and deep learning. We then discuss in detail the main idea behind each method in the two categories. Furthermore, to promote future development of community detection, we release several benchmark datasets from several problem domains and highlight their applications to various network analysis tasks. We conclude with discussions of the challenges of the field and suggestions of possible directions for future research.
Recent years have witnessed remarkable progress towards computational fake news detection. To mitigate its negative impact, we argue that it is critical to understand what user attributes potentially cause users to share fake news. The key to this causal-inference problem is to identify confounders -- variables that cause spurious associations between treatments (e.g., user attributes) and outcome (e.g., user susceptibility). In fake news dissemination, confounders can be characterized by fake news sharing behavior that inherently relates to user attributes and online activities. Learning such user behavior is typically subject to selection bias in users who are susceptible to share news on social media. Drawing on causal inference theories, we first propose a principled approach to alleviating selection bias in fake news dissemination. We then consider the learned unbiased fake news sharing behavior as the surrogate confounder that can fully capture the causal links between user attributes and user susceptibility. We theoretically and empirically characterize the effectiveness of the proposed approach and find that it could be useful in protecting society from the perils of fake news.
Emotion plays an important role in detecting fake news online. When leveraging emotional signals, the existing methods focus on exploiting the emotions of news contents that conveyed by the publishers (i.e., publisher emotion). However, fake news is always fabricated to evoke high-arousal or activating emotions of people to spread like a virus, so the emotions of news comments that aroused by the crowd (i.e., social emotion) can not be ignored. Furthermore, it needs to be explored whether there exists a relationship between publisher emotion and social emotion (i.e., dual emotion), and how the dual emotion appears in fake news. In the paper, we propose Dual Emotion Features to mine dual emotion and the relationship between them for fake news detection. And we design a universal paradigm to plug it into any existing detectors as an enhancement. Experimental results on three real-world datasets indicate the effectiveness of the proposed features.
The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments to time-series in general dimension. For $\ell_p$-products of Euclidean metrics, for any $p$, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fr\'echet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms, our algorithm is especially efficient when the length of the curves is bounded.
In recent years, disinformation including fake news, has became a global phenomenon due to its explosive growth, particularly on social media. The wide spread of disinformation and fake news can cause detrimental societal effects. Despite the recent progress in detecting disinformation and fake news, it is still non-trivial due to its complexity, diversity, multi-modality, and costs of fact-checking or annotation. The goal of this chapter is to pave the way for appreciating the challenges and advancements via: (1) introducing the types of information disorder on social media and examine their differences and connections; (2) describing important and emerging tasks to combat disinformation for characterization, detection and attribution; and (3) discussing a weak supervision approach to detect disinformation with limited labeled data. We then provide an overview of the chapters in this book that represent the recent advancements in three related parts: (1) user engagements in the dissemination of information disorder; (2) techniques on detecting and mitigating disinformation; and (3) trending issues such as ethics, blockchain, clickbaits, etc. We hope this book to be a convenient entry point for researchers, practitioners, and students to understand the problems and challenges, learn state-of-the-art solutions for their specific needs, and quickly identify new research problems in their domains.
Fake news can significantly misinform people who often rely on online sources and social media for their information. Current research on fake news detection has mostly focused on analyzing fake news content and how it propagates on a network of users. In this paper, we emphasize the detection of fake news by assessing its credibility. By analyzing public fake news data, we show that information on news sources (and authors) can be a strong indicator of credibility. Our findings suggest that an author's history of association with fake news, and the number of authors of a news article, can play a significant role in detecting fake news. Our approach can help improve traditional fake news detection methods, wherein content features are often used to detect fake news.
Deep learning has been successfully applied to solve various complex problems ranging from big data analytics to computer vision and human-level control. Deep learning advances however have also been employed to create software that can cause threats to privacy, democracy and national security. One of those deep learning-powered applications recently emerged is "deepfake". Deepfake algorithms can create fake images and videos that humans cannot distinguish them from authentic ones. The proposal of technologies that can automatically detect and assess the integrity of digital visual media is therefore indispensable. This paper presents a survey of algorithms used to create deepfakes and, more importantly, methods proposed to detect deepfakes in the literature to date. We present extensive discussions on challenges, research trends and directions related to deepfake technologies. By reviewing the background of deepfakes and state-of-the-art deepfake detection methods, this study provides a comprehensive overview of deepfake techniques and facilitates the development of new and more robust methods to deal with the increasingly challenging deepfakes.