With the emergence of a new pandemic worldwide, a novel strategy to approach it has emerged. Several initiatives under the umbrella of "open science" are contributing to tackle this unprecedented situation. In particular, the "R Language and Environment for Statistical Computing" offers an excellent tool and ecosystem for approaches focusing on open science and reproducible results. Hence it is not surprising that with the onset of the pandemic, a large number of R packages and resources were made available for researches working in the pandemic. In this paper, we present an R package that allows users to access and analyze worldwide data from resources publicly available. We will introduce the covid19.analytics package, focusing in its capabilities and presenting a particular study case where we describe how to deploy the "COVID19.ANALYTICS Dashboard Explorer".
The rapid spread of COVID-19 infections on a global level has highlighted the need for accurate, transparent and timely information regarding collective mobility patterns to inform de-escalation strategies as well as to provide forecasting capacity for re-escalation policies aiming at addressing further waves of the virus. Such information can be extracted using aggregate anonymised data from innovative sources such as mobile positioning data. This paper presents lessons learnt and results of a unique Business-to-Government (B2G) initiative between several Mobile Network Operators in Europe and the European Commission. Mobile positioning data have supported policy makers and practitioners with evidence and data-driven knowledge to understand and predict the spread of the disease, the effectiveness of the containment measures, their socio-economic impacts while feeding scenarios at EU scale and in a comparable way across countries. The challenges of this data sharing initiative are not limited to data quality, harmonisation, and comparability across countries, however important they are. Equally essential aspects that need to be addressed from the onset are related to data privacy, security, fundamental rights and commercial sensitivity.
Unlike most bibliometric studies focusing on publications, taking Big Data research as a case study, we introduce a novel bibliometric approach to unfold the status of a given scientific community from an individual level perspective. We study the academic age, production, and research focus of the community of authors active in Big Data research. Artificial Intelligence (AI) is selected as a reference area for comparative purposes. Results show that the academic realm of "Big Data" is a growing topic with an expanding community of authors, particularly of new authors every year. Compared to AI, Big Data attracts authors with a longer academic age, who can be regarded to have accumulated some publishing experience before entering the community. Despite the highly skewed distribution of productivity amongst researchers in both communities, Big Data authors have higher values of both research focus and production than those of AI. Considering the community size, overall academic age, and persistence of publishing on the topic, our results support the idea of Big Data as a research topic with attractiveness for researchers. We argue that the community-focused indicators proposed in this study could be generalized to investigate the development and dynamics of other research fields and topics.
Virtual reality (VR) is an emerging technology that enables new applications but also introduces privacy risks. In this paper, we focus on Oculus VR (OVR), the leading platform in the VR space, and we provide the first comprehensive analysis of personal data exposed by OVR apps and the platform itself, from a combined networking and privacy policy perspective. We experimented with the Quest 2 headset, and we tested the most popular VR apps available on the official Oculus and the SideQuest app stores. We developed OVRseen, a methodology and system for collecting, analyzing, and comparing network traffic and privacy policies on OVR. On the networking side, we captured and decrypted network traffic of VR apps, which was previously not possible on OVR, and we extracted data flows (defined as <app, data type, destination>). We found that the OVR ecosystem (compared to the mobile and other app ecosystems) is more centralized, and driven by tracking and analytics, rather than by third-party advertising. We show that the data types exposed by VR apps include personally identifiable information (PII), device information that can be used for fingerprinting, and VR-specific data types. By comparing the data flows found in the network traffic with statements made in the apps' privacy policies, we discovered that approximately 70% of OVR data flows were not properly disclosed. Furthermore, we provided additional context for these data flows, including the purpose, which we extracted from the privacy policies, and observed that 69% were sent for purposes unrelated to the core functionality of apps.
Since December 2019, the COVID-19 pandemic has caused people around the world to exercise social distancing, which has led to an abrupt rise in the adoption of remote communications for working, socializing, and learning from home. As remote communications will outlast the pandemic, it is crucial to protect users' security and respect their privacy in this unprecedented setting, and that requires a thorough understanding of their behaviors, attitudes, and concerns toward various aspects of remote communications. To this end, we conducted an online study with 220 worldwide Prolific participants. We found that privacy and security are among the most frequently mentioned factors impacting participants' attitude and comfort level with conferencing tools and meeting locations. Open-ended responses revealed that most participants lacked autonomy when choosing conferencing tools or using microphone/webcam in their remote meetings, which in several cases contradicted their personal privacy and security preferences. Based on our findings, we distill several recommendations on how employers, educators, and tool developers can inform and empower users to make privacy-protective decisions when engaging in remote communications.
Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their availability and the human resources needed to run the vaccination programmes have been scarce in many countries. Vaccine hesitancy is also being encountered from some sections of the general public. We emphasize that decision-making under uncertainty and imperfect information, and with only conditionally optimal outcomes, is a unique forte of established game-theoretic modelling. Therefore, we can use this approach to obtain the best framework for modelling and simulating vaccination prioritization and uptake that will be readily available to inform important policy decisions for the optimal control of the COVID-19 pandemic.
Temporal evolution of the coronavirus literature over the last thirty years (N=43,769) is analyzed along with its subdomain of SARS-CoV-2 articles (N=27,460) and the subdomain of reviews and meta-analytic studies (N=1,027). (i) The analyses on the subset of SARS-CoV-2 literature identified studies published prior to 2020 that have now proven highly instrumental in the development of various clusters of publications linked to SARS-CoV-2. In particular, the so-called sleeping beauties of the coronavirus literature with an awakening in 2020 were identified, i.e., previously published studies of this literature that had remained relatively unnoticed for several years but gained sudden traction in 2020 in the wake of the SARS-CoV-2 outbreak. (ii) The subset of 2020 SARS-CoV-2 articles is bibliographically distant from the rest of this literature published prior to 2020. Individual articles of the SARS-CoV-2 segment with a bridging role between the two bodies of articles (i.e., before and after 2020) are identifiable. (iii) Furthermore, the degree of bibliographic coupling within the 2020 SARS-CoV-2 cluster is much poorer compared to the cluster of articles published prior to 2020. This could, in part, be explained by the higher diversity of topics that are studied in relation to SARS-CoV-2 compared to the literature of coronaviruses published prior to the SARS-CoV-2 disease. This work demonstrates how scholarly efforts undertaken during peace time or prior to a disease outbreak could suddenly play a critical role in prevention and mitigation of health disasters caused by new diseases.
The World Wide Web is not only one of the most important platforms of communication and information at present, but also an area of growing interest for scientific research. This motivates a lot of work and projects that require large amounts of data. However, there is no dataset that integrates the parameters and visual appearance of Web pages, because its collection is a costly task in terms of time and effort. With the support of various computer tools and programming scripts, we have created a large dataset of 49,438 Web pages. It consists of visual, textual and numerical data types, includes all countries worldwide, and considers a broad range of topics such as art, entertainment, economy, business, education, government, news, media, science, and environment, covering different cultural characteristics and varied design preferences. In this paper, we describe the process of collecting, debugging and publishing the final product, which is freely available. To demonstrate the usefulness of our dataset, we expose a binary classification model for detecting error Web pages, and a multi-class Web subject-based categorization, both problems using convolutional neural networks.
We present best practices and tools for professionals who support computational and data intensive (CDI) research projects. The practices resulted from an initiative that brings together national projects and university teams that include individual or groups of such professionals. We focus particularly on practices that differ from those in a general software engineering context. The paper also describes the initiative , the Xpert Network , where participants exchange successes, challenges, and general information about their activities, leading to increased productivity, efficiency, and coordination in the ever growing community of scientists that use computational and data-intensive research methods.
The COVID-19 pandemic has challenged scientists and policy-makers internationally to develop novel approaches to public health policy. Furthermore, it has also been observed that the prevalence and spread of COVID-19 vary across different spatial, temporal, and demographics. Despite ramping up testing, we still are not at the required level in most parts of the globe. Therefore, we utilize self-reported symptoms survey data to understand trends in the spread of COVID-19. The aim of this study is to segment populations that are highly susceptible. In order to understand such populations, we perform exploratory data analysis, outbreak prediction, and time-series forecasting using public health and policy datasets. From our studies, we try to predict the likely % of the population that tested positive for COVID-19 based on self-reported symptoms. Our findings reaffirm the predictive value of symptoms, such as anosmia and ageusia. And we forecast that % of the population having COVID-19-like illness (CLI) and those tested positive as 0.15% and 1.14% absolute error respectively. These findings could help aid faster development of the public health policy, particularly in areas with low levels of testing and having a greater reliance on self-reported symptoms. Our analysis sheds light on identifying clinical attributes of interest across different demographics. We also provide insights into the effects of various policy enactments on COVID-19 prevalence.
The COVID-19 pandemic continues to have a devastating effect on the health and well-being of the global population. A critical step in the fight against COVID-19 is effective screening of infected patients, with one of the key screening approaches being radiological imaging using chest radiography. Motivated by this, a number of artificial intelligence (AI) systems based on deep learning have been proposed and results have been shown to be quite promising in terms of accuracy in detecting patients infected with COVID-19 using chest radiography images. However, to the best of the authors' knowledge, these developed AI systems have been closed source and unavailable to the research community for deeper understanding and extension, and unavailable for public access and use. Therefore, in this study we introduce COVID-Net, a deep convolutional neural network design tailored for the detection of COVID-19 cases from chest radiography images that is open source and available to the general public. We also describe the chest radiography dataset leveraged to train COVID-Net, which we will refer to as COVIDx and is comprised of 5941 posteroanterior chest radiography images across 2839 patient cases from two open access data repositories. Furthermore, we investigate how COVID-Net makes predictions using an explainability method in an attempt to gain deeper insights into critical factors associated with COVID cases, which can aid clinicians in improved screening. By no means a production-ready solution, the hope is that the open access COVID-Net, along with the description on constructing the open source COVIDx dataset, will be leveraged and build upon by both researchers and citizen data scientists alike to accelerate the development of highly accurate yet practical deep learning solutions for detecting COVID-19 cases and accelerate treatment of those who need it the most.