This cross-sectional study characterizes users' attitudes towards the face mask requirements introduced by the Russian government as a response to the COVID-19 pandemic. We study how they relate to other users' characteristics such as age, gender, and political attitudes. Our results indicate that men and elder individuals - demographic groups that are most vulnerable to COVID-19 -- underestimate the benefits of wearing face masks. We also discovered that users in opposition to the Russian government highly approve of this anti-COVID-19 measure -- an oppositionist will approve of the face mask requirements with the probability of 0.95. For those who support the Russian government, the odds of approval are merely 0.45.
The approximate uniform sampling of graph realizations with a given degree sequence is an everyday task in several social science, computer science, engineering etc. projects. One approach is using Markov chains. The best available current result about the well-studied switch Markov chain is that it is rapidly mixing on P-stable degree sequences (see DOI:10.1016/j.ejc.2021.103421). The switch Markov chain does not change any degree sequence. However, there are cases where degree intervals are specified rather than a single degree sequence. (A natural scenario where this problem arises is in hypothesis testing on social networks that are only partially observed.) Rechner, Strowick, and M\"uller-Hannemann introduced in 2018 the notion of degree interval Markov chain which uses three (separately well-studied) local operations (switch, hinge-flip and toggle), and employing on degree sequence realizations where any two sequences under scrutiny have very small coordinate-wise distance. Recently Amanatidis and Kleer published a beautiful paper (arXiv:2110.09068), showing that the degree interval Markov chain is rapidly mixing if the sequences are coming from a system of very thin intervals which are centered not far from a regular degree sequence. In this paper we extend substantially their result, showing that the degree interval Markov chain is rapidly mixing if the intervals are centred at P-stable degree sequences.
The COVID-19 pandemic is accompanied by a massive "infodemic" that makes it hard to identify concise and credible information for COVID-19-related questions, like incubation time, infection rates, or the effectiveness of vaccines. As a novel solution, our paper is concerned with designing a question-answering system based on modern technologies from natural language processing to overcome information overload and misinformation in pandemic situations. To carry out our research, we followed a design science research approach and applied Ingwersen's cognitive model of information retrieval interaction to inform our design process from a socio-technical lens. On this basis, we derived prescriptive design knowledge in terms of design requirements and design principles, which we translated into the construction of a prototypical instantiation. Our implementation is based on the comprehensive CORD-19 dataset, and we demonstrate our artifact's usefulness by evaluating its answer quality based on a sample of COVID-19 questions labeled by biomedical experts.
The Coronavirus disease 2019 (COVID-19) outbreak quickly spread around the world, resulting in over 240 million infections and 4 million deaths by Oct 2021. While the virus is spreading from person to person silently, fear has also been spreading around the globe. The COVID-19 information from the Australian Government is convincing but not timely or detailed, and there is much information on social networks with both facts and rumors. As software engineers, we have spontaneously and rapidly constructed a COVID-19 information dashboard aggregating reliable information semi-automatically checked from different sources for providing one-stop information sharing site about the latest status in Australia. Inspired by the John Hopkins University COVID-19 Map, our dashboard contains the case statistics, case distribution, government policy, latest news, with interactive visualization. In this paper, we present a participant's in-person observations in which the authors acted as founders of //covid-19-au.com/ serving more than 830K users with 14M page views since March 2020. According to our first-hand experience, we summarize 9 lessons for developers, researchers and instructors. These lessons may inspire the development, research and teaching in software engineer aspects for coping with similar public crises in the future.
The rapid growth of biomedical literature poses a significant challenge for curation and interpretation. This has become more evident during the COVID-19 pandemic. LitCovid, a literature database of COVID-19 related papers in PubMed, has accumulated over 180,000 articles with millions of accesses. Approximately 10,000 new articles are added to LitCovid every month. A main curation task in LitCovid is topic annotation where an article is assigned with up to eight topics, e.g., Treatment and Diagnosis. The annotated topics have been widely used both in LitCovid (e.g., accounting for ~18% of total uses) and downstream studies such as network generation. However, it has been a primary curation bottleneck due to the nature of the task and the rapid literature growth. This study proposes LITMC-BERT, a transformer-based multi-label classification method in biomedical literature. It uses a shared transformer backbone for all the labels while also captures label-specific features and the correlations between label pairs. We compare LITMC-BERT with three baseline models on two datasets. Its micro-F1 and instance-based F1 are 5% and 4% higher than the current best results, respectively, and only requires ~18% of the inference time than the Binary BERT baseline. The related datasets and models are available via //github.com/ncbi/ml-transformer.
With the advent of open source software, a veritable treasure trove of previously proprietary software development data was made available. This opened the field of empirical software engineering research to anyone in academia. Data that is mined from software projects, however, requires extensive processing and needs to be handled with utmost care to ensure valid conclusions. Since the software development practices and tools have changed over two decades, we aim to understand the state-of-the-art research workflows and to highlight potential challenges. We employ a systematic literature review by sampling over one thousand papers from leading conferences and by analyzing the 286 most relevant papers from the perspective of data workflows, methodologies, reproducibility, and tools. We found that an important part of the research workflow involving dataset selection was particularly problematic, which raises questions about the generality of the results in existing literature. Furthermore, we found a considerable number of papers provide little or no reproducibility instructions -- a substantial deficiency for a data-intensive field. In fact, 33% of papers provide no information on how their data was retrieved. Based on these findings, we propose ways to address these shortcomings via existing tools and also provide recommendations to improve research workflows and the reproducibility of research.
Online review systems are the primary means through which many businesses seek to build the brand and spread their messages. Prior research studying the effects of online reviews has been mainly focused on a single numerical cause, e.g., ratings or sentiment scores. We argue that such notions of causes entail three key limitations: they solely consider the effects of single numerical causes and ignore different effects of multiple aspects -- e.g., Food, Service -- embedded in the textual reviews; they assume the absence of hidden confounders in observational studies, e.g., consumers' personal preferences; and they overlook the indirect effects of numerical causes that can potentially cancel out the effect of textual reviews on business revenue. We thereby propose an alternative perspective to this single-cause-based effect estimation of online reviews: in the presence of hidden confounders, we consider multi-aspect textual reviews, particularly, their total effects on business revenue and direct effects with the numerical cause -- ratings -- being the mediator. We draw on recent advances in machine learning and causal inference to together estimate the hidden confounders and causal effects. We present empirical evaluations using real-world examples to discuss the importance and implications of differentiating the multi-aspect effects in strategizing business operations.
Automatic text summarization has experienced substantial progress in recent years. With this progress, the question has arisen whether the types of summaries that are typically generated by automatic summarization models align with users' needs. Ter Hoeve et al (2020) answer this question negatively. Amongst others, they recommend focusing on generating summaries with more graphical elements. This is in line with what we know from the psycholinguistics literature about how humans process text. Motivated from these two angles, we propose a new task: summarization with graphical elements, and we verify that these summaries are helpful for a critical mass of people. We collect a high quality human labeled dataset to support research into the task. We present a number of baseline methods that show that the task is interesting and challenging. Hence, with this work we hope to inspire a new line of research within the automatic summarization community.
Along with the massive growth of the Internet from the 1990s until now, various innovative technologies have been created to bring users breathtaking experiences with more virtual interactions in cyberspace. Many virtual environments with thousands of services and applications, from social networks to virtual gaming worlds, have been developed with immersive experience and digital transformation, but most are incoherent instead of being integrated into a platform. In this context, metaverse, a term formed by combining meta and universe, has been introduced as a shared virtual world that is fueled by many emerging technologies, such as fifth-generation networks and beyond, virtual reality, and artificial intelligence (AI). Among such technologies, AI has shown the great importance of processing big data to enhance immersive experience and enable human-like intelligence of virtual agents. In this survey, we make a beneficial effort to explore the role of AI in the foundation and development of the metaverse. We first deliver a preliminary of AI, including machine learning algorithms and deep learning architectures, and its role in the metaverse. We then convey a comprehensive investigation of AI-based methods concerning six technical aspects that have potentials for the metaverse: natural language processing, machine vision, blockchain, networking, digital twin, and neural interface, and being potential for the metaverse. Subsequently, several AI-aided applications, such as healthcare, manufacturing, smart cities, and gaming, are studied to be deployed in the virtual worlds. Finally, we conclude the key contribution of this survey and open some future research directions in AI for the metaverse.
Recently, a considerable literature has grown up around the theme of Graph Convolutional Network (GCN). How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly propagating and updating the embeddings of both nodes and edges for target-driven objectives. This paper addresses these limitations by proposing a novel framework, namely the Knowledge Embedding based Graph Convolutional Network (KE-GCN), which combines the power of GCNs in graph-based belief propagation and the strengths of advanced knowledge embedding (a.k.a. knowledge graph embedding) methods, and goes beyond. Our theoretical analysis shows that KE-GCN offers an elegant unification of several well-known GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of KE-GCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification.
How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNN-based method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicate-aware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in real-world KGs with different characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.