Twitter is becoming an increasingly important platform for disseminating information during crisis situations, such as the COVID-19 pandemic. Effective crisis communication on Twitter can shape the public perception of the crisis, influence adherence to preventative measures, and thus affect public health. Influential accounts are particularly important as they reach large audiences quickly. This study identifies influential German-language accounts from almost 3 million German tweets collected between January and May 2020 by constructing a retweet network and calculating PageRank centrality values. We capture the volatility of crisis communication by structuring the analysis into seven stages based on key events during the pandemic and profile influential accounts into roles. Our analysis shows that news and journalist accounts were influential throughout all phases, while government accounts were particularly important shortly before and after the lockdown was instantiated. We discuss implications for crisis communication during health crises and for analyzing long-term crisis data.
We quantify the impact of COVID-19 vaccination on psychological well-being using information from a large-scale panel survey representative of the UK population. Exploiting exogenous variation in the timing of vaccinations, we find that vaccination increases psychological well-being (GHQ-12) by 0.12 standard deviation, compensating for around one-half of the overall decrease caused by the pandemic. This effect persists for at least two months, and it is associated with a decrease in the perceived likelihood of contracting COVID-19 and higher engagement in social activities. The improvement is 1.5 times larger for mentally distressed individuals, supporting the prioritization of this group in vaccination roll-outs.
As the COVID-19 ravaging through the globe, accurate forecasts of the disease spread is crucial for situational awareness, resource allocation, and public health decision-making. Alternative to the traditional disease surveillance data collected by the United States (US) Centers for Disease Control and Prevention (CDC), big data from Internet such as online search volumes has been previously shown to contain valuable information for tracking infectious disease dynamics. In this study, we evaluate the feasibility of using Internet search volume of relevant queries to track and predict COVID-19 pandemic. We found strong association between COVID-19 death trend and the search volume of symptom-related queries such as "loss of taste". Then, we further develop an influenza-tracking model to predict future 4-week COVID-19 deaths on the US national level, by combining search volume information with COVID-19 time series information. Encouraged by the 20% error reduction on national level comparing to the baseline time series model, we additionally build state-level COVID-19 deaths models, leveraging the cross-state cross-resolution spatial temporal framework that pools information from search volume and COVID-19 reports across states, regions and the nation. These variants of ARGOX are then aggregated in a winner-takes-all ensemble fashion to produce the final state-level 4-week forecasts. Numerical experiments demonstrate that our method steadily outperforms time series baseline models, and achieves the state-of-the-art performance among the publicly available benchmark models. Overall, we show that disease dynamics and relevant public search behaviors co-evolve during the COVID-19 pandemic, and capturing their dependencies while leveraging historical cases/deaths as well as spatial-temporal cross-region information will enable stable and accurate US national and state-level forecasts.
The Covid-19 pandemic has intersected with the opioid epidemic to create a unique public health crisis, with the health and economic consequences of the virus and associated lockdowns compounding pre-existing social and economic stressors associated with rising opioid and heroin use and abuse. In order to better understand these interlocking crises, we use social media data to extract qualitative and quantitative insights on the experiences of opioid users during the Covid-19 pandemic. In particular, we use an unsupervised learning approach to create a rich geolocated data source for public health surveillance and analysis. To do this we first infer the location of 26,000 Reddit users that participate in opiate-related sub-communities (subreddits) by combining named entity recognition, geocoding, density-based clustering, and heuristic methods. Our strategy achieves 63 percent accuracy at state-level location inference on a manually-annotated reference dataset. We then leverage the geospatial nature of our user cohort to answer policy-relevant questions about the impact of varying state-level policy approaches that balance economic versus health concerns during Covid-19. We find that state government strategies that prioritized economic reopening over curtailing the spread of the virus created a markedly different environment and outcomes for opioid users. Our results demonstrate that geospatial social media data can be used for agile monitoring of complex public health crises.
Dark web marketplaces (DWMs) are online platforms that facilitate illicit trade among millions of users generating billions of dollars in annual revenue. Recently, two interview-based studies have suggested that DWMs may also promote the emergence of direct user-to-user (U2U) trading relationships. Here, we quantify the scale of, and thoroughly investigate, U2U trading around DWMs by analysing 31 million Bitcoin transactions among users of 40 DWMs between June 2011 and Jan 2021. We find that half of the DWM users trade through U2U pairs generating a total trading volume greater than DWMs themselves. We then show that hundreds of thousands of DWM users form stable trading pairs that are persistent over time. Users in stable pairs are typically the ones with the largest trading volume on DWMs. Then, we show that new U2U pairs often form while both users are active on the same DWM, suggesting the marketplace may serve as a catalyst for new direct trading relationships. Finally, we reveal that stable U2U pairs tend to survive DWM closures and that they were not affected by COVID-19, indicating that their trading activity is resilient to external shocks. Our work unveils sophisticated patterns of trade emerging in the dark web and highlights the importance of investigating user behaviour beyond the immediate buyer-seller network on a single marketplace.
Objective: Leveraging machine learning methods, we aim to extract both explicit and implicit cause-effect associations in patient-reported, diabetes-related tweets and provide a tool to better understand opinion, feelings and observations shared within the diabetes online community from a causality perspective. Materials and Methods: More than 30 million diabetes-related tweets in English were collected between April 2017 and January 2021. Deep learning and natural language processing methods were applied to focus on tweets with personal and emotional content. A cause-effect-tweet dataset was manually labeled and used to train 1) a fine-tuned Bertweet model to detect causal sentences containing a causal association 2) a CRF model with BERT based features to extract possible cause-effect associations. Causes and effects were clustered in a semi-supervised approach and visualised in an interactive cause-effect-network. Results: Causal sentences were detected with a recall of 68% in an imbalanced dataset. A CRF model with BERT based features outperformed a fine-tuned BERT model for cause-effect detection with a macro recall of 68%. This led to 96,676 sentences with cause-effect associations. "Diabetes" was identified as the central cluster followed by "Death" and "Insulin". Insulin pricing related causes were frequently associated with "Death". Conclusions: A novel methodology was developed to detect causal sentences and identify both explicit and implicit, single and multi-word cause and corresponding effect as expressed in diabetes-related tweets leveraging BERT-based architectures and visualised as cause-effect-network. Extracting causal associations on real-life, patient reported outcomes in social media data provides a useful complementary source of information in diabetes research.
Social media platforms have been exploited to disseminate misinformation in recent years. The widespread online misinformation has been shown to affect users' beliefs and is connected to social impact such as polarization. In this work, we focus on misinformation's impact on specific user behavior and aim to understand whether general Twitter users changed their behavior after being exposed to misinformation. We compare the before and after behavior of exposed users to determine whether the frequency of the tweets they posted, or the sentiment of their tweets underwent any significant change. Our results indicate that users overall exhibited statistically significant changes in behavior across some of these metrics. Through language distance analysis, we show that exposed users were already different from baseline users before the exposure. We also study the characteristics of two specific user groups, multi-exposure and extreme change groups, which were potentially highly impacted. Finally, we study if the changes in the behavior of the users after exposure to misinformation tweets vary based on the number of their followers or the number of followers of the tweet authors, and find that their behavioral changes are all similar.
The past decade has witnessed a rapid increase in technology ownership across rural areas of India, signifying the potential for ICT initiatives to empower rural households. In our work, we focus on the web infrastructure of one such ICT - Digital Green that started in 2008. Following a participatory approach for content production, Digital Green disseminates instructional agricultural videos to smallholder farmers via human mediators to improve the adoption of farming practices. Their web-based data tracker, CoCo, captures data related to these processes, storing the attendance and adoption logs of over 2.3 million farmers across three continents and twelve countries. Using this data, we model the components of the Digital Green ecosystem involving the past attendance-adoption behaviours of farmers, the content of the videos screened to them and their demographic features across five states in India. We use statistical tests to identify different factors which distinguish farmers with higher adoption rates to understand why they adopt more than others. Our research finds that farmers with higher adoption rates adopt videos of shorter duration and belong to smaller villages. The co-attendance and co-adoption networks of farmers indicate that they greatly benefit from past adopters of a video from their village and group when it comes to adopting practices from the same video. Following our analysis, we model the adoption of practices from a video as a prediction problem to identify and assist farmers who might face challenges in adoption in each of the five states. We experiment with different model architectures and achieve macro-f1 scores ranging from 79% to 89% using a Random Forest classifier. Finally, we measure the importance of different features using SHAP values and provide implications for improving the adoption rates of nearly a million farmers across five states in India.
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
Background: Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. In prior work, [Crannell et. al.], we have studied an active cancer patient population on Twitter and compiled a set of tweets describing their experience with this disease. We refer to these online public testimonies as "Invisible Patient Reported Outcomes" (iPROs), because they carry relevant indicators, yet are difficult to capture by conventional means of self-report. Methods: Our present study aims to identify tweets related to the patient experience as an additional informative tool for monitoring public health. Using Twitter's public streaming API, we compiled over 5.3 million "breast cancer" related tweets spanning September 2016 until mid December 2017. We combined supervised machine learning methods with natural language processing to sift tweets relevant to breast cancer patient experiences. We analyzed a sample of 845 breast cancer patient and survivor accounts, responsible for over 48,000 posts. We investigated tweet content with a hedonometric sentiment analysis to quantitatively extract emotionally charged topics. Results: We found that positive experiences were shared regarding patient treatment, raising support, and spreading awareness. Further discussions related to healthcare were prevalent and largely negative focusing on fear of political legislation that could result in loss of coverage. Conclusions: Social media can provide a positive outlet for patients to discuss their needs and concerns regarding their healthcare coverage and treatment needs. Capturing iPROs from online communication can help inform healthcare professionals and lead to more connected and personalized treatment regimens.
This project addresses the problem of sentiment analysis in twitter; that is classifying tweets according to the sentiment expressed in them: positive, negative or neutral. Twitter is an online micro-blogging and social-networking platform which allows users to write short status updates of maximum length 140 characters. It is a rapidly expanding service with over 200 million registered users - out of which 100 million are active users and half of them log on twitter on a daily basis - generating nearly 250 million tweets per day. Due to this large amount of usage we hope to achieve a reflection of public sentiment by analysing the sentiments expressed in the tweets. Analysing the public sentiment is important for many applications such as firms trying to find out the response of their products in the market, predicting political elections and predicting socioeconomic phenomena like stock exchange. The aim of this project is to develop a functional classifier for accurate and automatic sentiment classification of an unknown tweet stream.