The Infection Fatality Rate (IFR) of COVID-19 is difficult to estimate because the number of infections is unknown and there is a lag between each infection and the potentially subsequent death. We introduce a new approach for estimating the IFR by first estimating the entire sequence of daily infections. Unlike prior approaches, we incorporate existing data on the number of daily COVID-19 tests into our estimation; knowing the test rates helps us estimate the ratio between the number of cases and the number of infections. Also unlike prior approaches, rather than determining a constant lag from studying a group of patients, we treat the lag as a random variable, whose parameters we determine empirically by fitting our infections sequence to the sequence of deaths. Our approach allows us to narrow our estimation to smaller time intervals in order to observe how the IFR changes over time. We analyze a 250 day period starting on March 1, 2020. We estimate that the IFR in the U.S. decreases from a high of $0.68\%$ down to $0.24\%$ over the course of this time period. We also provide IFR and lag estimates for Italy, Denmark, and the Netherlands, all of which also exhibit decreasing IFRs but to different degrees.
We propose a test procedure to compare simultaneously $K$ copulas, with $K \geq 2$. The $K$ observed populations can be paired. The test statistic is based on the differences between orthogonal projection coefficients associated to the density copulas, that we called {\it copula coefficients}. The procedure is data driven and we obtain a chi-square asymptotic distribution of the test statistic under the null. We illustrate our procedure via numerical studies and through two real datasets. Eventually, a clustering algorithm is deduced from the $K$-sample test and its performances are illustrated in a simulation experiment.
To answer the question of "Does everybody...?" in the context of performance on cognitive tasks, Haaf and Rouder (2017) developed a class of hierarchical Bayesian mixed models with varying levels of constraint on the individual effects. The models are then compared via Bayes factors, telling us which model best predicts the observed data. One common criticism of their method is that the observed data are assumed to be drawn from a normal distribution. However, for most cognitive tasks, the primary measure of performance is a response time, the distribution of which is well known to not be normal. In this technical note, I investigate the assumption of normality for two datasets in numerical cognition. Specifically, I show that using a shifted lognormal model for the response times does not change the overall pattern of inference. Further, since the model-estimated effects are now on a logarithmic scale, the interpretation of the modeling becomes more difficult, particularly because the estimated effect is now multiplicative rather than additive. As a result, I recommend that even though response times are not normally distributed in general, the simplification afforded by the Haaf and Rouder (2017) approach provides a pragmatic approach to modeling individual differences in cognitive tasks.
We consider online resource allocation under a typical non-profit setting, where limited or even scarce resources are administered by a not-for-profit organization like a government. We focus on the internal-equity by assuming that arriving requesters are homogeneous in terms of their external factors like demands but heterogeneous for their internal attributes like demographics. Specifically, we associate each arriving requester with one or several groups based on their demographics (i.e., race, gender, and age), and we aim to design an equitable distributing strategy such that every group of requesters can receive a fair share of resources proportional to a preset target ratio. We present two LP-based sampling algorithms and investigate them both theoretically (in terms of competitive-ratio analysis) and experimentally based on real COVID-19 vaccination data maintained by the Minnesota Department of Health. Both theoretical and numerical results show that our LP-based sampling strategies can effectively promote equity, especially when the arrival population is disproportionately represented, as observed in the early stage of the COVID-19 vaccine rollout.
This article considers the extension of two-grid $hp$-version discontinuous Galerkin finite element methods for the numerical approximation of second-order quasilinear elliptic boundary value problems of monotone type to the case when agglomerated polygonal/polyhedral meshes are employed for the coarse mesh approximation. We recall that within the two-grid setting, while it is necessary to solve a nonlinear problem on the coarse approximation space, only a linear problem must be computed on the original fine finite element space. In this article, the coarse space will be constructed by agglomerating elements from the original fine mesh. Here, we extend the existing a priori and a posteriori error analysis for the two-grid $hp$-version discontinuous Galerkin finite element method from 10.1007/s10915-012-9644-1 for coarse meshes consisting of standard element shapes to include arbitrarily agglomerated coarse grids. Moreover, we develop an $hp$-adaptive two-grid algorithm to adaptively design the fine and coarse finite element spaces; we stress that this is undertaken in a fully automatic manner, and hence can be viewed as blackbox solver. Numerical experiments are presented for two- and three-dimensional problems to demonstrate the computational performance of the proposed $hp$-adaptive two-grid method.
Understanding the underlying causes of maternal death across all regions of the world is essential to inform policies and resource allocation to reduce the mortality burden. However, in many countries there exists very little data on the causes of maternal death, and data that do exist do not capture the entire population at risk. In this paper, we present a Bayesian hierarchical multinomial model to estimate maternal cause of death distributions globally, regionally, and for all countries worldwide. The framework combines data from various sources to inform estimates, including data from civil registration and vital systems, smaller-scale surveys and studies, and high-quality data from confidential enquiries and surveillance systems. The framework accounts for varying data quality and coverage, and allows for situations where one or more causes of death are missing. We illustrate the results of the model on three case-study countries that have different data availability situations.
Estimating an individual treatment effect (ITE) is essential to personalized decision making. However, existing methods for estimating the ITE often rely on unconfoundedness, an assumption that is fundamentally untestable with observed data. To this end, this paper proposes a method for sensitivity analysis of the ITE, a way to estimate a range of the ITE under unobserved confounding. The method we develop quantifies unmeasured confounding through a marginal sensitivity model [Ros2002, Tan2006], and then adapts the framework of conformal inference to estimate an ITE interval at a given confounding strength. In particular, we formulate this sensitivity analysis problem as one of conformal inference under distribution shift, and we extend existing methods of covariate-shifted conformal inference to this more general setting. The result is a predictive interval that has guaranteed nominal coverage of the ITE, a method that provides coverage with distribution-free and nonasymptotic guarantees. We evaluate the method on synthetic data and illustrate its application in an observational study.
We develop novel methods for using persistent homology to infer the homology of an unknown Riemannian manifold $(M, g)$ from a point cloud sampled from an arbitrary smooth probability density function. Standard distance-based filtered complexes, such as the \v{C}ech complex, often have trouble distinguishing noise from features that are simply small. We address this problem by defining a family of "density-scaled filtered complexes" that includes a density-scaled \v{C}ech complex and a density-scaled Vietoris--Rips complex. We show that the density-scaled \v{C}ech complex is homotopy-equivalent to $M$ for filtration values in an interval whose starting point converges to $0$ in probability as the number of points $N \to \infty$ and whose ending point approaches infinity as $N \to \infty$. By contrast, the standard \v{C}ech complex may only be homotopy-equivalent to $M$ for a very small range of filtration values. The density-scaled filtered complexes also have the property that they are invariant under conformal transformations, such as scaling. We implement a filtered complex $\widehat{DVR}$ that approximates the density-scaled Vietoris--Rips complex, and we empirically test the performance of our implementation. As examples, we use $\widehat{DVR}$ to identify clusters that have different densities, and we apply $\widehat{DVR}$ to a time-delay embedding of the Lorenz dynamical system. Our implementation is stable (under conditions that are almost surely satisfied) and designed to handle outliers in the point cloud that do not lie on $M$.
One of the most pressing problems in modern analysis is the study of the growth rate of the norms of all possible matrix products $\|A_{i_{n}}\cdots A_{i_{0}}\|$ with factors from a set of matrices $\mathscr{A}$. So far, only for a relatively small number of classes of matrices $\mathscr{A}$ has it been possible to rigorously describe the sequences of matrices $\{A_{i_{n}}\}$ that guarantee the maximal growth rate of the corresponding norms. Moreover, in almost all theoretically studied cases, the index sequences $\{i_{n}\}$ of matrices maximizing the norms of the corresponding matrix products turned out to be periodic or so-called Sturmian sequences, which entails a whole set of "good" properties of the sequences $\{A_{i_{n}}\}$, in particular the existence of a limiting frequency of occurrence of each matrix factor $A_{i}\in\mathscr{A}$ in them. The paper determines a class of $2\times 2$ matrices consisting of two matrices similar to rotations of the plane in which the sequence $\{A_{i_{n}}\}$ maximizing the growth rate of the norms $\|A_{i_{n}}\cdots A_{i_{0}}\|$ is not Sturmian. All considerations are based on numerical modeling and cannot be considered mathematically rigorous in this part. Rather, they should be interpreted as a set of questions for further comprehensive theoretical analysis.
Facility management, which concerns the administration, operations, and mainte-nance of buildings, is a sector undergoing significant changes while becoming digitalized and data driven. In facility management sector, companies seek to ex-tract value from data about their buildings. As a consequence, craftsmen, such as janitors, are becoming involved in data curation. Data curation refers to activities related to cleaning, assembling, setting up, and stewarding data to make them fit existing templates. Craftsmen in facility management, despite holding a pivotal role for successful data curation in the domain, are understudied and disregarded. To remedy this, our holistic case study investigates how janitors' data curation practices shape the data being produced in three facility management organiza-tions. Our findings illustrate the unfortunate that janitors are treated more like a sensor than a human data curator. This treatment makes them less engaged in data curation, and hence do not engage in a much necessary correction of essential fa-cility data. We apply the conceptual lens of invisible work - work that blends into the background and is taken for granted - to explain why this happens and how data comes to be. The findings also confirm the usefulness of a previously pro-posed analytical framework by using it to interpret data curation practices within facility management. The paper contributes to practitioners by proposing training and education in data curation.
What happens when a machine learning dataset is deprecated for legal, ethical, or technical reasons, but continues to be widely used? In this paper, we examine the public afterlives of several prominent deprecated or redacted datasets, including ImageNet, 80 Million Tiny Images, MS-Celeb-1M, Duke MTMC, Brainwash, and HRT Transgender, in order to inform a framework for more consistent, ethical, and accountable dataset deprecation. Building on prior research, we find that there is a lack of consistency, transparency, and centralized sourcing of information on the deprecation of datasets, and as such, these datasets and their derivatives continue to be cited in papers and circulate online. These datasets that never die -- which we term "zombie datasets" -- continue to inform the design of production-level systems, causing technical, legal, and ethical challenges; in so doing, they risk perpetuating the harms that prompted their supposed withdrawal, including concerns around bias, discrimination, and privacy. Based on this analysis, we propose a Dataset Deprecation Framework that includes considerations of risk, mitigation of impact, appeal mechanisms, timeline, post-deprecation protocol, and publication checks that can be adapted and implemented by the machine learning community. Drawing on work on datasheets and checklists, we further offer two sample dataset deprecation sheets and propose a centralized repository that tracks which datasets have been deprecated and could be incorporated into the publication protocols of venues like NeurIPS.