The growing prominence of social media in public discourse has led to greater scrutiny of the quality of information spreading online and the role that polarization plays in this process. However, studies of information spread on social media platforms like Twitter have been hampered by the difficulty of collecting data about the social graph, specifically follow links that shape what users see in their timelines. As a proxy of the follower graph, researchers use retweets to construct the diffusion graph, although it is not clear how these proxies affect studies of online information ecosystems. Using a dataset containing a sample of the Twitter follower graph and the tweets posted by users within it, we reconstruct the retweet graph and quantify its impact on the measures of exposure. While we find that echo chambers exist in both networks, they are more pronounced in the retweet neighborhood. We compare the polarization of information users see via their follower and retweet graphs to show that retweeted accounts systematically share more politically extreme content and misinformation. This bias cannot be explained by the activity or polarization within users' own social neighborhoods but by the increased attention they pay to more polarized sources. Our results suggest that studies relying on the follower graphs underestimate the polarization of information users pay attention to online.
Explainable AI was born as a pathway to allow humans to explore and understand the inner working of complex systems. However, establishing what is an explanation and objectively evaluating explainability are not trivial tasks. This paper presents a new model-agnostic metric to measure the Degree of Explainability of information in an objective way. We exploit a specific theoretical model from Ordinary Language Philosophy called the Achinstein's Theory of Explanations, implemented with an algorithm relying on deep language models for knowledge graph extraction and information retrieval. To understand whether this metric can measure explainability, we devised a few experiments and user studies involving more than 190 participants, evaluating two realistic systems for healthcare and finance using famous AI technology, including Artificial Neural Networks and TreeSHAP. The results we obtained are statistically significant (with P values lower than .01), suggesting that our proposed metric for measuring the Degree of Explainability is robust in several scenarios, and it aligns with concrete expectations.
Influencer marketing has become a very popular tool to reach customers. Despite the rapid growth in influencer videos, there has been little research on the effectiveness of their constituent elements in explaining video engagement. We study YouTube influencers and analyze their unstructured video data across text, audio and images using a novel "interpretable deep learning" framework that accomplishes both goals of prediction and interpretation. Our prediction-based approach analyzes unstructured data and finds that "what is said" in words (text) is more influential than "how it is said" in imagery (images) followed by acoustics (audio). Our interpretation-based approach is implemented after completion of model prediction by analyzing the same source of unstructured data to measure importance attributed to the video elements. We eliminate several spurious and confounded relationships, and identify a smaller subset of theory-based relationships. We uncover novel findings that establish distinct effects for measures of shallow and deep engagement which are based on the dual-system framework of human thinking. Our approach is validated using simulated data, and we discuss the learnings from our findings for influencers and brands.
Image denoising is a typical ill-posed problem due to complex degradation. Leading methods based on normalizing flows have tried to solve this problem with an invertible transformation instead of a deterministic mapping. However, the implicit bijective mapping is not explored well. Inspired by a latent observation that noise tends to appear in the high-frequency part of the image, we propose a fully invertible denoising method that injects the idea of disentangled learning into a general invertible neural network to split noise from the high-frequency part. More specifically, we decompose the noisy image into clean low-frequency and hybrid high-frequency parts with an invertible transformation and then disentangle case-specific noise and high-frequency components in the latent space. In this way, denoising is made tractable by inversely merging noiseless low and high-frequency parts. Furthermore, we construct a flexible hierarchical disentangling framework, which aims to decompose most of the low-frequency image information while disentangling noise from the high-frequency part in a coarse-to-fine manner. Extensive experiments on real image denoising, JPEG compressed artifact removal, and medical low-dose CT image restoration have demonstrated that the proposed method achieves competing performance on both quantitative metrics and visual quality, with significantly less computational cost.
Graph representation learning has become a prominent tool for the characterization and understanding of the structure of networks in general and social networks in particular. Typically, these representation learning approaches embed the networks into a low-dimensional space in which the role of each individual can be characterized in terms of their latent position. A major current concern in social networks is the emergence of polarization and filter bubbles promoting a mindset of "us-versus-them" that may be defined by extreme positions believed to ultimately lead to political violence and the erosion of democracy. Such polarized networks are typically characterized in terms of signed links reflecting likes and dislikes. We propose the latent Signed relational Latent dIstance Model (SLIM) utilizing for the first time the Skellam distribution as a likelihood function for signed networks and extend the modeling to the characterization of distinct extreme positions by constraining the embedding space to polytopes. On four real social signed networks of polarization, we demonstrate that the model extracts low-dimensional characterizations that well predict friendships and animosity while providing interpretable visualizations defined by extreme positions when endowing the model with an embedding space restricted to polytopes.
Explainability has become a central requirement for the development, deployment, and adoption of machine learning (ML) models and we are yet to understand what explanation methods can and cannot do. Several factors such as data, model prediction, hyperparameters used in training the model, and random initialization can all influence downstream explanations. While previous work empirically hinted that explanations (E) may have little relationship with the prediction (Y), there is a lack of conclusive study to quantify this relationship. Our work borrows tools from causal inference to systematically assay this relationship. More specifically, we measure the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors (hyperparameters) (inputs to generate saliency-based Es or Ys). We discover that Y's relative direct influence on E follows an odd pattern; the influence is higher in the lowest-performing models than in mid-performing models, and it then decreases in the top-performing models. We believe our work is a promising first step towards providing better guidance for practitioners who can make more informed decisions in utilizing these explanations by knowing what factors are at play and how they relate to their end task.
How do users and communities respond to news from unreliable sources? How does news from these sources change online conversations? In this work, we examine the role of misinformation in sparking political incivility and toxicity on the social media platform Reddit. Utilizing the Google Jigsaw Perspective API to identify toxicity, hate speech, and other forms of incivility, we find that Reddit comments posted in response to misinformation articles are 71.4% more likely to be toxic than comments responding to authentic news articles. Identifying specific instances of commenters' incivility and utilizing an exponential random graph model, we then show that when reacting to a misinformation story, Reddit users are more likely to be toxic to users of different political beliefs than in other settings. Finally, utilizing a zero-inflated negative binomial regression, we identify that as the toxicity of subreddits increases, users are more likely to comment on misinformation-related Reddit submissions.
Our generation has seen an exponential increase in digital tools adoption. One of the unique areas where digital tools have made an exponential foray is in the sphere of digital marketing, where goods and services have been extensively promoted through the use of digital advertisements. Following this growth, multiple companies have leveraged multiple apps and channels to display their brand identities to a significantly larger user base. This has resulted in products, worth billions of dollars to be sold online. Emails and push notifications have become critical channels to publish advertisement content, to proactively engage with their contacts. Several marketing tools provide a user interface for marketers to design Email and Push messages for digital marketing campaigns. Marketers are also given a predicted open rate for the entered subject line. For enabling marketers generate targeted subject lines, multiple machine learning techniques have been used in the recent past. In particular, deep learning techniques that have established good effectiveness and efficiency. However, these techniques require a sizable amount of labelled training data in order to get good results. The creation of such datasets, particularly those with subject lines that have a specific theme, is a challenging and time-consuming task. In this paper, we propose a novel Ngram and LSTM-based modeling approach (NLORPM) to predict open rates of entered subject lines that is easier to implement, has low prediction latency, and performs extremely well for sparse data. To assess the performance of this model, we also devise a new metric called 'Error_accuracy@C' which is simple to grasp and fully comprehensible to marketers.
Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. We first identify a group of interpretable units that are closely related to object concepts using a segmentation-based network dissection method. Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. We examine the contextual relationship between these units and their surroundings by inserting the discovered object concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a scene. We provide open source interpretation tools to help researchers and practitioners better understand their GAN models.
In structure learning, the output is generally a structure that is used as supervision information to achieve good performance. Considering the interpretation of deep learning models has raised extended attention these years, it will be beneficial if we can learn an interpretable structure from deep learning models. In this paper, we focus on Recurrent Neural Networks (RNNs) whose inner mechanism is still not clearly understood. We find that Finite State Automaton (FSA) that processes sequential data has more interpretable inner mechanism and can be learned from RNNs as the interpretable structure. We propose two methods to learn FSA from RNN based on two different clustering methods. We first give the graphical illustration of FSA for human beings to follow, which shows the interpretability. From the FSA's point of view, we then analyze how the performance of RNNs are affected by the number of gates, as well as the semantic meaning behind the transition of numerical hidden states. Our results suggest that RNNs with simple gated structure such as Minimal Gated Unit (MGU) is more desirable and the transitions in FSA leading to specific classification result are associated with corresponding words which are understandable by human beings.
We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.