The present study investigates the role of source characteristics, the quality of evidence, and prior beliefs of the topic in adult readers' credibility evaluations of short health-related social media posts. The researchers designed content for the posts concerning five health topics by manipulating the source characteristics (source's expertise, gender, and ethnicity), the accuracy of the claims, and the quality of evidence (research evidence, testimony, consensus, and personal experience) of the posts. After this, accurate and inaccurate social media posts varying in the other manipulated aspects were programmatically generated. The crowdworkers (N = 844) recruited from two platforms were asked to evaluate the credibility of up to ten social media posts, resulting in 8380 evaluations. Before credibility evaluation, participants' prior beliefs on the topics of the posts were assessed. The results showed that prior belief consistency and the source's expertise affected the perceived credibility of the accurate and inaccurate social media posts the most after controlling for the topic of the post and the crowdworking platform. In contrast, the quality of evidence supporting the health claim mattered relatively little. The source's gender and ethnicity did not have any effect. The results are discussed in terms of first- and second-hand evaluation strategies.
Considerable advancements have been made to tackle the misrepresentation of information derived from reference articles in the domains of fact-checking and faithful summarization. However, an unaddressed aspect remains - the identification of social media posts that manipulate information within associated news articles. This task presents a significant challenge, primarily due to the prevalence of personal opinions in such posts. We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance. Additionally, we have developed a simple yet effective basic model that outperforms LLMs significantly on the ManiTweet dataset. Finally, we have conducted an exploratory analysis of human-written tweets, unveiling intriguing connections between manipulation and the domain and factuality of news articles, as well as revealing that manipulated sentences are more likely to encapsulate the main story or consequences of a news outlet.
Introduction: The COVID-19 pandemic highlighted the importance of making epidemiological data and scientific insights easily accessible and explorable for public health agencies, the general public, and researchers. State-of-the-art approaches for sharing data and insights included regularly updated reports and web dashboards. However, they face a trade-off between the simplicity and flexibility of data exploration. With the capabilities of recent large language models (LLMs) such as GPT-4, this trade-off can be overcome. Results: We developed the chatbot "GenSpectrum Chat" (//cov-spectrum.org/chat) which uses GPT-4 as the underlying large language model (LLM) to explore SARS-CoV-2 genomic sequencing data. Out of 500 inputs from real-world users, the chatbot provided a correct answer for 453 prompts; an incorrect answer for 13 prompts, and no answer although the question was within scope for 34 prompts. We also tested the chatbot with inputs from 10 different languages, and despite being provided solely with English instructions and examples, it successfully processed prompts in all tested languages. Conclusion: LLMs enable new ways of interacting with information systems. In the field of public health, GenSpectrum Chat can facilitate the analysis of real-time pathogen genomic data. With our chatbot supporting interactive exploration in different languages, we envision quick and direct access to the latest evidence for policymakers around the world.
Much attention and concern has been raised recently about bias and the use of machine learning algorithms in healthcare, especially as it relates to perpetuating racial discrimination and health disparities. Following an initial system dynamics workshop at the Data for Black Lives II conference hosted at MIT in January of 2019, a group of conference participants interested in building capabilities to use system dynamics to understand complex societal issues convened monthly to explore issues related to racial bias in AI and implications for health disparities through qualitative and simulation modeling. In this paper we present results and insights from the modeling process and highlight the importance of centering the discussion of data and healthcare on people and their experiences with healthcare and science, and recognizing the societal context where the algorithm is operating. Collective memory of community trauma, through deaths attributed to poor healthcare, and negative experiences with healthcare are endogenous drivers of seeking treatment and experiencing effective care, which impact the availability and quality of data for algorithms. These drivers have drastically disparate initial conditions for different racial groups and point to limited impact of focusing solely on improving diagnostic algorithms for achieving better health outcomes for some groups.
Financial distress of municipalities, although comparable to bankruptcy of private companies, has a far more serious impact on the well-being of communities. For this reason, it is essential to detect deficits as soon as possible. Predicting financial distress in municipalities can be a complex task, as it involves understanding a wide range of factors that can affect a municipality's financial health. In this paper, we evaluate machine learning models to predict financial distress in Italian municipalities. Accounting judiciary experts have specialized knowledge and experience in evaluating the financial performance, and they use a range of indicators to make their assessments. By incorporating these indicators in the feature extraction process, we can ensure that the model is taking into account a wide range of information that is relevant to the financial health of municipalities. The results of this study indicate that using machine learning models in combination with the knowledge of accounting judiciary experts can aid in the early detection of financial distress, leading to better outcomes for the communities.
When human listeners try to guess the spatial position of a speech source, they are influenced by the speaker's production level, regardless of the intensity level reaching their ears. Because the perception of distance is a very difficult task, they rely on their own experience, which tells them that a whispering talker is close to them, and that a shouting talker is far away. This study aims to test if similar results could be obtained for prosodic variations produced by a human speaker in an everyday life environment. It consists in a localization task, during which blindfolded subjects had to estimate the incoming voice direction, speaker orientation and distance of a trained female speaker, who uttered single words, following instructions concerning intensity and social-affect to be performed. This protocol was implemented in two experiments. First, a complex pretext task was used in order to distract the subjects from the strange behavior of the speaker. On the contrary, during the second experiment, the subjects were fully aware of the prosodic variations, which allowed them to adapt their perception. Results show the importance of the pretext task, and suggest that the perception of the speaker's orientation can be influenced by voice intensity.
Social determinants of health (SDOH) -- the conditions in which people live, grow, and age -- play a crucial role in a person's health and well-being. There is a large, compelling body of evidence in population health studies showing that a wide range of SDOH is strongly correlated with health outcomes. Yet, a majority of the risk prediction models based on electronic health records (EHR) do not incorporate a comprehensive set of SDOH features as they are often noisy or simply unavailable. Our work links a publicly available EHR database, MIMIC-IV, to well-documented SDOH features. We investigate the impact of such features on common EHR prediction tasks across different patient populations. We find that community-level SDOH features do not improve model performance for a general patient population, but can improve data-limited model fairness for specific subpopulations. We also demonstrate that SDOH features are vital for conducting thorough audits of algorithmic biases beyond protective attributes. We hope the new integrated EHR-SDOH database will enable studies on the relationship between community health and individual outcomes and provide new benchmarks to study algorithmic biases beyond race, gender, and age.
There has been growing interest in using QUIC as a transport protocol for the Internet of Things (IoT). QUIC provides several key advantages over TCP and TLS. Since IoT greatly differs from traditional networks in terms of architecture and resources, IoT specific parameter tuning has proven to be of significance. While RFC 9006 offers a guideline for tuning TCP within IoT, we have not found an equivalent for QUIC. This paper is the first of our knowledge to contribute empirically based insights towards tuning QUIC for IoT. To achieve this, we improved our pure HTTP/3 publish-subscribe architecture and rigorously benchmarked it against an alternative: MQTT-over-QUIC. To investigate the impact of transport layer parameters, we ran both applications on Raspberry Pi Zero hardware and collected 8 distinct metrics, while emulating different network conditions and message payloads. We enumerate the points we experimentally identified (notably, relating to authentication, MAX STREAM messages, and timers) and elaborate on how they can be tuned to improve resource consumption and performance. We also found that our application was preferable for reliable time-sensitive dissemination of information.
Clinical information systems have become large repositories for semi-structured and partly annotated electronic health record data, which have reached a critical mass that makes them interesting for supervised data-driven neural network approaches. We explored automated coding of 50 character long clinical problem list entries using the International Classification of Diseases (ICD-10) and evaluated three different types of network architectures on the top 100 ICD-10 three-digit codes. A fastText baseline reached a macro-averaged F1-score of 0.83, followed by a character-level LSTM with a macro-averaged F1-score of 0.84. The top performing approach used a downstreamed RoBERTa model with a custom language model, yielding a macro-averaged F1-score of 0.88. A neural network activation analysis together with an investigation of the false positives and false negatives unveiled inconsistent manual coding as a main limiting factor.
Large language Models (LLMs) have achieved promising performance on arithmetic reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting. However, LLMs face challenges in maintaining factual consistency during reasoning, exhibiting tendencies to condition overlooking, question misinterpretation, and condition hallucination over given problems. Existing methods use coarse-grained feedback (e.g., whether the answer is correct) to improve factual consistency. In this work, we propose RCoT (Reversing Chain-of-Thought), a novel method to improve LLMs' reasoning abilities by automatically detecting and rectifying factual inconsistency in LLMs' generated solutions. To detect factual inconsistency, RCoT first asks LLMs to reconstruct the problem based on generated solutions. Then fine-grained comparisons between the original problem and the reconstructed problem expose the factual inconsistency in the original solutions. To rectify the solution, RCoT formulates detected factual inconsistency into fine-grained feedback to guide LLMs in revising solutions. Experimental results demonstrate consistent improvements of RCoT over standard CoT across seven arithmetic datasets. Moreover, we find that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities (e.g., ChatGPT reaches 94.6% accuracy on GSM8K), encouraging the community to further explore the fine-grained feedback generation methods.
Past work demonstrated that using neural networks, we can extend unfinished music pieces while maintaining the music style of the musician. With recent advancements in large language models and diffusion models, we are now capable of generating comics with an interesting storyline while maintaining the art style of the artist. In this paper, we used ChatGPT to generate storylines and dialogue and then generated the comic using stable diffusion. We introduced a novel way to evaluate AI-generated stories, and we achieved SOTA performance on character fidelity and art style by fine-tuning stable diffusion using LoRA, ControlNet, etc.