Mathematical models of infectious diseases exhibit robust dynamics such as stable endemic or a disease-free equilibrium, or convergence of the solutions to periodic epidemic waves. The present work shows that the accuracy of such dynamics can be significantly improved by incorporating both local and global dynamics of the infection in disease models. To demonstrate improved accuracies, we extended a standard Susceptible-Infected-Recovered (SIR) model by incorporating global dynamics of the COVID-19 pandemic. The extended SIR model assumes three possibilities for the susceptible individuals traveling outside of their community: They can return to the community without any exposure to the infection, they can be exposed and develop symptoms after returning to the community, or they can be tested positive during the trip and remain quarantined until fully recovered. To examine the predictive accuracies of the extended SIR model, we studied the prevalence of the COVID-19 infection in Kansas City, Missouri influenced by the COVID-19 global pandemic. Using a two-step model-fitting algorithm, the extended SIR model was parameterized using the Kansas City, Missouri COVID-19 data during March to October 2020. The extended SIR model significantly outperformed the standard SIR model and revealed oscillatory behaviors with an increasing trend of infected individuals. In conclusion, the analytics and predictive accuracies of disease models can be significantly improved by incorporating the global dynamics of the infection in the models.
Administering COVID-19 vaccines at a societal scale has been deemed as the most appropriate way to defend against the COVID-19 pandemic. This global vaccination drive naturally fueled a possibility of Pro-Vaxxers and Anti-Vaxxers strongly expressing their supports and concerns regarding the vaccines on social media platforms. Understanding this online discourse is crucial for policy makers. This understanding is likely to impact the success of vaccination drives and might even impact the final outcome of our fight against the pandemic. The goal of this work is to improve this understanding using the lens of Twitter-discourse data. We first develop a classifier that categorizes users according to their vaccine-related stance with high precision (97%). Using this method we detect and investigate specific user-groups who posted about vaccines in pre-COVID and COVID times. Specifically, we identify distinct topics that these users talk about, and investigate how vaccine-related discourse has changed between pre-COVID times and COVID times. Finally, for the first time, we investigate the change of vaccine-related stances in Twitter users and shed light on potential reasons for such changes in stance. Our dataset and classifier are available at //github.com/sohampoddar26/covid-vax-stance.
COVID-19 has likely been the most disruptive event at a global scale the world experienced since WWII. Our discipline never experienced such a phenomenon, whereby software engineers were forced to abruptly work from home. Nearly every developer started new working habits and organizational routines, while trying to stay mentally healthy and productive during the lockdowns. We are now starting to realize that some of these new habits and routines may stick with us in the future. Therefore, it is of importance to understand how we have worked from home so far. We investigated whether 15 psychological, social, and situational variables such as quality of social contacts or loneliness predict software engineers' well-being and productivity across a four wave longitudinal study of over 14 months. Additionally, we tested whether there were changes in any of these variables across time. We found that developers' well-being and quality of social contacts improved between April 2020 and July 2021, while their emotional loneliness went down. Other variables, such as productivity and boredom have not changed. We further found that developers' stress measured in May 2020 negatively predicted their well-being 14 months later, even after controlling for many other variables. Finally, comparisons of women and men, as well as between developers residing in the UK and USA, were not statistically different but revealed substantial similarities.
In material science, models are derived to predict emergent material properties (e.g. elasticity, strength, conductivity) and their relations to processing conditions. A major drawback is the calibration of model parameters that depend on processing conditions. Currently, these parameters must be optimized to fit measured data since their relations to processing conditions (e.g. deformation temperature, strain rate) are not fully understood. We present a new approach that identifies the functional dependency of calibration parameters from processing conditions based on genetic programming. We propose two (explicit and implicit) methods to identify these dependencies and generate short interpretable expressions. The approach is used to extend a physics-based constitutive model for deformation processes. This constitutive model operates with internal material variables such as a dislocation density and contains a number of parameters, among them three calibration parameters. The derived expressions extend the constitutive model and replace the calibration parameters. Thus, interpolation between various processing parameters is enabled. Our results show that the implicit method is computationally more expensive than the explicit approach but also produces significantly better results.
We aim to assess the impact of a pandemic data point on the calibration of a stochastic multi-population mortality projection model and its resulting projections for future mortality rates. Throughout the paper we put focus on the Li & Lee mortality model, which has become a standard for projecting mortality in Belgium and the Netherlands. We calibrate this mortality model on annual deaths and exposures at the level of individual ages. This type of mortality data is typically collected, produced and reported with a significant delay of -- for some countries -- several years on a platform such as the Human Mortality Database. To enable a timely evaluation of the impact of a pandemic data point we have to rely on other data sources (e.g. the Short-Term Mortality Fluctuations Data series) that swiftly publish weekly mortality data collected in age buckets. To be compliant with the design and calibration strategy of the Li & Lee model, we have to transform the weekly mortality data collected in age buckets to yearly, age-specific observations. Therefore, our paper constructs a protocol to ungroup the deaths and exposures registered in age buckets to individual ages. To evaluate the impact of a pandemic shock, like COVID-19 in the year 2020, we weigh this data point in either the calibration or projection step. Obviously, the more weight we place on this data point, the more impact we observe on future estimated mortality rates and life expectancies. Our paper allows to quantify this impact and provides actuaries and actuarial associations with a framework to generate scenarios of future mortality under various assessments of the pandemic data point.
Calibration is a vital aspect of the performance of risk prediction models, but research in the context of ordinal outcomes is scarce. This study compared calibration measures for risk models predicting a discrete ordinal outcome, and investigated the impact of the proportional odds assumption on calibration and overfitting. We studied the multinomial, cumulative, adjacent category, continuation ratio, and stereotype logit/logistic models. To assess calibration, we investigated calibration intercepts and slopes, calibration plots, and the estimated calibration index. Using large sample simulations, we studied the performance of models for risk estimation under various conditions, assuming that the true model has either a multinomial logistic form or a cumulative logit proportional odds form. Small sample simulations were used to compare the tendency for overfitting between models. As a case study, we developed models to diagnose the degree of coronary artery disease (five categories) in symptomatic patients. When the true model was multinomial logistic, proportional odds models often yielded poor risk estimates, with calibration slopes deviating considerably from unity even on large model development datasets. The stereotype logistic model improved the calibration slope, but still provided biased risk estimates for individual patients. When the true model had a cumulative logit proportional odds form, multinomial logistic regression provided biased risk estimates, although these biases were modest. Non-proportional odds models require more parameters to be estimated from the data, and hence suffered more from overfitting. Despite larger sample size requirements, we generally recommend multinomial logistic regression for risk prediction modeling of discrete ordinal outcomes.
In December 2019, a novel virus called COVID-19 had caused an enormous number of causalities to date. The battle with the novel Coronavirus is baffling and horrifying after the Spanish Flu 2019. While the front-line doctors and medical researchers have made significant progress in controlling the spread of the highly contiguous virus, technology has also proved its significance in the battle. Moreover, Artificial Intelligence has been adopted in many medical applications to diagnose many diseases, even baffling experienced doctors. Therefore, this survey paper explores the methodologies proposed that can aid doctors and researchers in early and inexpensive methods of diagnosis of the disease. Most developing countries have difficulties carrying out tests using the conventional manner, but a significant way can be adopted with Machine and Deep Learning. On the other hand, the access to different types of medical images has motivated the researchers. As a result, a mammoth number of techniques are proposed. This paper first details the background knowledge of the conventional methods in the Artificial Intelligence domain. Following that, we gather the commonly used datasets and their use cases to date. In addition, we also show the percentage of researchers adopting Machine Learning over Deep Learning. Thus we provide a thorough analysis of this scenario. Lastly, in the research challenges, we elaborate on the problems faced in COVID-19 research, and we address the issues with our understanding to build a bright and healthy environment.
This paper presents a model for COVID19 in Mexico City. The data analyzed were considered from the appearance of the first case in Mexico until July 2021. In this first approximation the states considered were Susceptible, Infected, Hospitalized, Intensive Care Unit, Intubated, and Dead. As a consequence of the lack of coronavirus testing, the number of infected and dead people is underestimated, although the results obtained give a good approximation to the evolution of the pandemic in Mexico City. The model is based on a discrete-time Markov chain considering data provided by the Mexican government, the main objective is to estimate the transient probabilities from one state to another for the Mexico City case.
As of December 2020, the COVID-19 pandemic has infected over 75 million people, making it the deadliest pandemic in modern history. This study develops a novel compartmental epidemiological model specific to the SARS-CoV-2 virus and analyzes the effect of common preventative measures such as testing, quarantine, social distancing, and vaccination. By accounting for the most prevalent interventions that have been enacted to minimize the spread of the virus, the model establishes a paramount foundation for future mathematical modeling of COVID-19 and other modern pandemics. Specifically, the model expands on the classic SIR model and introduces separate compartments for individuals who are in the incubation period, asymptomatic, tested-positive, quarantined, vaccinated, or deceased. It also accounts for variable infection, testing, and death rates. I first analyze the outbreak in Santa Clara County, California, and later generalize the findings. The results show that, although all preventative measures reduce the spread of COVID-19, quarantine and social distancing mandates reduce the infection rate and subsequently are the most effective policies, followed by vaccine distribution and, finally, public testing. Thus, governments should concentrate resources on enforcing quarantine and social distancing policies. In addition, I find mathematical proof that the relatively high asymptomatic rate and long incubation period are driving factors of COVID-19's rapid spread.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.