Explicit knowledge of total community-level immune seroprevalence is critical to developing policies to mitigate the social and clinical impact of SARS-CoV-2. Publicly available vaccination data are frequently cited as a proxy for population immunity, but this metric ignores the effects of naturally-acquired immunity, which varies broadly throughout the country and world. Without broad or random sampling of the population, accurate measurement of persistent immunity post natural infection is generally unavailable. To enable tracking of both naturally-acquired and vaccine-induced immunity, we set up a synthetic random proxy based on routine hospital testing for estimating total Immunoglobulin G (IgG) prevalence in the sampled community. Our approach analyzes viral IgG testing data of asymptomatic patients who present for elective procedures within a hospital system. We apply multilevel regression and poststratification to adjust for demographic and geographic discrepancies between the sample and the community population. We then apply state-based vaccination data to categorize immune status as driven by natural infection or by vaccine. We have validated the model using verified clinical metrics of viral and symptomatic disease incidence to show the expected biological correlation of these entities with the timing, rate, and magnitude of seroprevalence. In mid-July 2021, the estimated immunity level was 74% with the administered vaccination rate of 45% in the two counties. The metric improves real-time understanding of immunity to COVID-19 as it evolves and the coordination of policy responses to the disease, toward an inexpensive and easily operational surveillance system that transcends the limits of vaccination datasets alone.
Community detection refers to the problem of clustering the nodes of a network into groups. Existing inferential methods for community structure mainly focus on unweighted (binary) networks. Many real-world networks are nonetheless weighted and a common practice is to dichotomize a weighted network to an unweighted one which is known to result in information loss. Literature on hypothesis testing in the latter situation is still missing. In this paper, we study the problem of testing the existence of community structure in weighted networks. Our contributions are threefold: (a). We use the (possibly infinite-dimensional) exponential family to model the weights and derive the sharp information-theoretic limit for the existence of consistent test. Within the limit, any test is inconsistent; and beyond the limit, we propose a useful consistent test. (b). Based on the information-theoretic limits, we provide the first formal way to quantify the loss of information incurred by dichotomizing weighted graphs into unweighted graphs in the context of hypothesis testing. (c). We propose several new and practically useful test statistics. Simulation study show that the proposed tests have good performance. Finally, we apply the proposed tests to an animal social network.
This paper introduces a new simulation-based inference procedure to model and sample from multi-dimensional probability distributions given access to i.i.d. samples, circumventing the usual approaches of explicitly modeling the density function or designing Markov chain Monte Carlo. Motivated by the seminal work on distance and isomorphism between metric measure spaces, we propose a new notion called the Reversible Gromov-Monge (RGM) distance and study how RGM can be used to design new transform samplers to perform simulation-based inference. Our RGM sampler can also estimate optimal alignments between two heterogeneous metric measure spaces $(\mathcal{X}, \mu, c_{\mathcal{X}})$ and $(\mathcal{Y}, \nu, c_{\mathcal{Y}})$ from empirical data sets, with estimated maps that approximately push forward one measure $\mu$ to the other $\nu$, and vice versa. Analytic properties of the RGM distance are derived; statistical rate of convergence, representation, and optimization questions regarding the induced sampler are studied. Synthetic and real-world examples showcasing the effectiveness of the RGM sampler are also demonstrated.
Covariance estimation for matrix-valued data has received an increasing interest in applications. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are established, and the derived minimax lower bound shows our proposed estimator is rate-optimal under certain divergence regimes of matrix size. We further introduce a class of robust covariance estimators and provide theoretical guarantees to deal with heavy-tailed data. We demonstrate the superior finite-sample performance of our methods using simulations and real applications from a gridded temperature anomalies dataset and a S&P 500 stock data analysis.
This paper explores Null Island, a fictional place located at 0$^\circ$ latitude and 0$^\circ$ longitude in the WGS84 geographic coordinate system. Null Island is erroneously associated with large amounts of geographic data in a wide variety of location-based services, place databases, social media and web-based maps. While it was originally considered a joke within the geospatial community, this article will demonstrate implications of its existence, both technological and social in nature, promoting Null Island as a fundamental issue of geographic information that requires more widespread awareness. The article summarizes error sources that lead to data being associated with Null Island. We identify four evolutionary phases which help explain how this fictional place evolved and established itself as an entity reaching beyond the geospatial profession to the point of being discovered by the visual arts and the general population. After providing an accurate account of data that can be found at (0, 0), geospatial, technological and social implications of Null Island are discussed. Guidelines to avoid misplacing data to Null Island are provided. Since data will likely continue to appear at this location, our contribution is aimed at both GIScientists and the general population to promote awareness of this error source.
Implicit bias may perpetuate healthcare disparities for marginalized patient populations. Such bias is expressed in communication between patients and their providers. We design an ecosystem with guidance from providers to make this bias explicit in patient-provider communication. Our end users are providers seeking to improve their quality of care for patients who are Black, Indigenous, People of Color (BIPOC) and/or Lesbian, Gay, Bisexual, Transgender, and Queer (LGBTQ). We present wireframes displaying communication metrics that negatively impact patient-centered care divided into the following categories: digital nudge, dashboard, and guided reflection. Our wireframes provide quantitative, real-time, and conversational feedback promoting provider reflection on their interactions with patients. This is the first design iteration toward the development of a tool to raise providers' awareness of their own implicit biases.
In randomized experiments, the actual treatments received by some experimental units may differ from their treatment assignments. This non-compliance issue often occurs in clinical trials, social experiments, and the applications of randomized experiments in many other fields. Under certain assumptions, the average treatment effect for the compliers is identifiable and equal to the ratio of the intention-to-treat effects of the potential outcomes to that of the potential treatment received. To improve the estimation efficiency, we propose three model-assisted estimators for the complier average treatment effect in randomized experiments with a binary outcome. We study their asymptotic properties, compare their efficiencies with that of the Wald estimator, and propose the Neyman-type conservative variance estimators to facilitate valid inferences. Moreover, we extend our methods and theory to estimate the multiplicative complier average treatment effect. Our analysis is randomization-based, allowing the working models to be misspecified. Finally, we conduct simulation studies to illustrate the advantages of the model-assisted methods and apply these analysis methods in a randomized experiment to evaluate the effect of academic services or incentives on academic performance.
In this paper we study the finite sample and asymptotic properties of various weighting estimators of the local average treatment effect (LATE), several of which are based on Abadie (2003)'s kappa theorem. Our framework presumes a binary endogenous explanatory variable ("treatment") and a binary instrumental variable, which may only be valid after conditioning on additional covariates. We argue that one of the Abadie estimators, which we show is weight normalized, is likely to dominate the others in many contexts. A notable exception is in settings with one-sided noncompliance, where certain unnormalized estimators have the advantage of being based on a denominator that is bounded away from zero. We use a simulation study and three empirical applications to illustrate our findings. In applications to causal effects of college education using the college proximity instrument (Card, 1995) and causal effects of childbearing using the sibling sex composition instrument (Angrist and Evans, 1998), the unnormalized estimates are clearly unreasonable, with "incorrect" signs, magnitudes, or both. Overall, our results suggest that (i) the relative performance of different kappa weighting estimators varies with features of the data-generating process; and that (ii) the normalized version of Tan (2006)'s estimator may be an attractive alternative in many contexts. Applied researchers with access to a binary instrumental variable should also consider covariate balancing or doubly robust estimators of the LATE.
Refractive freeform components are becoming increasingly relevant for generating controlled patterns of light, because of their capability to spatially-modulate optical signals with high efficiency and low background. However, the use of these devices is still limited by difficulties in manufacturing macroscopic elements with complex, 3-dimensional (3D) surface reliefs. Here, 3D-printed and stretchable magic windows generating light patterns by refraction are introduced. The shape and, consequently, the light texture achieved can be changed through controlled device strain. Cryptographic magic windows are demonstrated through exemplary light patterns, including micro-QR-codes, that are correctly projected and recognized upon strain gating while remaining cryptic for as-produced devices. The light pattern of micro-QR-codes can also be projected by two coupled magic windows, with one of them acting as the decryption key. Such novel, freeform elements with 3D shape and tailored functionalities is relevant for applications in illumination design, smart labels, anti-counterfeiting systems, and cryptographic communication.
The Model Order Reduction (MOR) technique can provide compact numerical models for fast simulation. Different from the intrusive MOR methods, the non-intrusive MOR does not require access to the Full Order Models (FOMs), especially system matrices. Since the non-intrusive MOR methods strongly rely on the snapshots of the FOMs, constructing good snapshot sets becomes crucial. In this work, we propose a new active learning approach with two novelties. A novel idea with our approach is the use of single-time step snapshots from the system states taken from an estimation of the reduced-state space. These states are selected using a greedy strategy supported by an error estimator based Gaussian Process Regression (GPR). Additionally, we introduce a use case-independent validation strategy based on Probably Approximately Correct (PAC) learning. In this work, we use Artificial Neural Networks (ANNs) to identify the Reduced Order Model (ROM), however the method could be similarly applied to other ROM identification methods. The performance of the whole workflow is tested by a 2-D thermal conduction and a 3-D vacuum furnace model. With little required user interaction and a training strategy independent to a specific use case, the proposed method offers a huge potential for industrial usage to create so-called executable Digital Twins (DTs).
This paper serves as a survey of recent advances in large margin training and its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade. We generalize the formulation of classification margins from classical research to latest DNNs, summarize theoretical connections between the margin, network generalization, and robustness, and introduce recent efforts in enlarging the margins for DNNs comprehensively. Since the viewpoint of different methods is discrepant, we categorize them into groups for ease of comparison and discussion in the paper. Hopefully, our discussions and overview inspire new research work in the community that aim to improve the performance of DNNs, and we also point to directions where the large margin principle can be verified to provide theoretical evidence why certain regularizations for DNNs function well in practice. We managed to shorten the paper such that the crucial spirit of large margin learning and related methods are better emphasized.