Results from randomized controlled trials (RCTs) help determine vaccination strategies and related public health policies. However, defining and identifying estimands that can guide policies in infectious disease settings is difficult, even in an RCT. The effects of vaccination critically depend on characteristics of the population of interest, such as the prevalence of infection, the number of vaccinated, and social behaviors. To mitigate the dependence on such characteristics, estimands (and study designs) that require conditioning or intervening on exposure to the infectious agent have been advocated. But a fundamental problem for both RCTs and observational studies is that exposure status is often unavailable or difficult to measure, which has made it impossible to apply existing methodology to study vaccine effects that account for exposure status. In this work, we present new results on this type of vaccine effects. Under plausible conditions, we show that point identification of certain relative effects is possible even when the exposure status is unknown. Furthermore, we derive sharp bounds on the corresponding absolute effects. We apply these results to estimate the effects of the ChAdOx1 nCoV-19 vaccine on SARS-CoV-2 disease (COVID-19) conditional on post-vaccine exposure to the virus, using data from a large RCT.
We prove a bound of $O( k (n+m)\log^{d-1})$ on the number of incidences between $n$ points and $m$ axis parallel boxes in $\mathbb{R}^d$, if no $k$ boxes contain $k$ common points. That is, the incidence graph between the points and the boxes does not contain $K_{k,k}$ as a subgraph. This new bound improves over previous work by a factor of $\log^d n$, for $d >2$. We also study other variants of the problem. For halfspaces, using shallow cuttings, we get a near linear bound in two and three dimensions. Finally, we present near linear bound for the case of shapes in the plane with low union complexity (e.g. fat triangles).
Recent advancements in miniaturized fluorescence microscopy have made it possible to investigate neuronal responses to external stimuli in awake behaving animals through the analysis of intra-cellular calcium signals. An on-going challenge is deconvolving the temporal signals to extract the spike trains from the noisy calcium signals' time-series. In this manuscript, we propose a nested Bayesian finite mixture specification that allows the estimation of spiking activity and, simultaneously, reconstructing the distributions of the calcium transient spikes' amplitudes under different experimental conditions. The proposed model leverages two nested layers of random discrete mixture priors to borrow information between experiments and discover similarities in the distributional patterns of neuronal responses to different stimuli. Furthermore, the spikes' intensity values are also clustered within and between experimental conditions to determine the existence of common (recurring) response amplitudes. Simulation studies and the analysis of a data set from the Allen Brain Observatory show the effectiveness of the method in clustering and detecting neuronal activities.
In feature-based dynamic pricing, a seller sets appropriate prices for a sequence of products (described by feature vectors) on the fly by learning from the binary outcomes of previous sales sessions ("Sold" if valuation $\geq$ price, and "Not Sold" otherwise). Existing works either assume noiseless linear valuation or precisely-known noise distribution, which limits the applicability of those algorithms in practice when these assumptions are hard to verify. In this work, we study two more agnostic models: (a) a "linear policy" problem where we aim at competing with the best linear pricing policy while making no assumptions on the data, and (b) a "linear noisy valuation" problem where the random valuation is linear plus an unknown and assumption-free noise. For the former model, we show a $\tilde{\Theta}(d^{\frac13}T^{\frac23})$ minimax regret up to logarithmic factors. For the latter model, we present an algorithm that achieves an $\tilde{O}(T^{\frac34})$ regret, and improve the best-known lower bound from $\Omega(T^{\frac35})$ to $\tilde{\Omega}(T^{\frac23})$. These results demonstrate that no-regret learning is possible for feature-based dynamic pricing under weak assumptions, but also reveal a disappointing fact that the seemingly richer pricing feedback is not significantly more useful than the bandit-feedback in regret reduction.
Subclassification and matching are often used in empirical studies to adjust for observed covariates; however, they are largely restricted to relatively simple study designs with a binary treatment and less developed for designs with a continuous exposure. Matching with exposure doses is particularly useful in instrumental variable designs and in understanding the dose-response relationships. In this article, we propose two criteria for optimal subclassification based on subclass homogeneity in the context of having a continuous exposure dose, and propose an efficient polynomial-time algorithm that is guaranteed to find an optimal subclassification with respect to one criterion and serves as a 2-approximation algorithm for the other criterion. We discuss how to incorporate dose and use appropriate penalties to control the number of subclasses in the design. Via extensive simulations, we systematically compare our proposed design to optimal non-bipartite pair matching, and demonstrate that combining our proposed subclassification scheme with regression adjustment helps reduce model dependence for parametric causal inference with a continuous dose. We apply the new design and associated randomization-based inferential procedure to study the effect of transesophageal echocardiography (TEE) monitoring during coronary artery bypass graft (CABG) surgery on patients' post-surgery clinical outcomes using Medicare and Medicaid claims data, and find evidence that TEE monitoring lowers patients' all-cause $30$-day mortality rate.
This paper considers identification and estimation of the causal effect of the time Z until a subject is treated on a survival outcome T. The treatment is not randomly assigned, T is randomly right censored by a random variable C and the time to treatment Z is right censored by min(T,C) The endogeneity issue is treated using an instrumental variable explaining Z and independent of the error term of the model. We study identification in a fully nonparametric framework. We show that our specification generates an integral equation, of which the regression function of interest is a solution. We provide identification conditions that rely on this identification equation. For estimation purposes, we assume that the regression function follows a parametric model. We propose an estimation procedure and give conditions under which the estimator is asymptotically normal. The estimators exhibit good finite sample properties in simulations. Our methodology is applied to find evidence supporting the efficacy of a therapy for burn-out.
In a previous study published in Nature Human Behaviour, Varnum and Grossmann claim that reductions in gender inequality are linked to reductions in pathogen prevalence in the United States between 1951 and 2013. Since the statistical methods used by Varnum and Grossmann are known to induce (seemingly) significant correlations between unrelated time series, so-called spurious or non-sense correlations, we test here whether the statistical association between gender inequality and pathogens prevalence in its current form also is the result of mis-specified models that do not correctly account for the temporal structure of the data. Our analysis clearly suggests that this is the case. We then discuss and apply several standard approaches of modelling time-series processes in the data and show that there is, at least as of now, no support for a statistical association between gender inequality and pathogen prevalence.
Creativity, or the ability to produce new useful ideas, is commonly associated to the human being; but there are many other examples in nature where this phenomenon can be observed. Inspired by this fact, in engineering and particularly in computational sciences, many different models have been developed to tackle a number of problems. Composing music, a form of art broadly present along the human history, is the main topic addressed in this thesis. Taking advantage of the kind of ideas that bring diversity and creativity to nature and computation, we present Melomics: an algorithmic composition method based on evolutionary search. The solutions have a genetic encoding based on formal grammars and these are interpreted in a complex developmental process followed by a fitness assessment, to produce valid music compositions in standard formats. The system has exhibited a high creative power and versatility to produce music of different types and it has been tested, proving on many occasions the outcome to be indistinguishable from the music made by human composers. The system has also enabled the emergence of a set of completely novel applications: from effective tools to help anyone to easily obtain the precise music that they need, to radically new uses, such as adaptive music for therapy, exercise, amusement and many others. It seems clear that automated composition is an active research area and that countless new uses will be discovered.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Current training objectives of existing person Re-IDentification (ReID) models only ensure that the loss of the model decreases on selected training batch, with no regards to the performance on samples outside the batch. It will inevitably cause the model to over-fit the data in the dominant position (e.g., head data in imbalanced class, easy samples or noisy samples). %We call the sample that updates the model towards generalizing on more data a generalizable sample. The latest resampling methods address the issue by designing specific criterion to select specific samples that trains the model generalize more on certain type of data (e.g., hard samples, tail data), which is not adaptive to the inconsistent real world ReID data distributions. Therefore, instead of simply presuming on what samples are generalizable, this paper proposes a one-for-more training objective that directly takes the generalization ability of selected samples as a loss function and learn a sampler to automatically select generalizable samples. More importantly, our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework which is able to simultaneously train ReID models and the sampler in an end-to-end fashion. The experimental results show that our method can effectively improve the ReID model training and boost the performance of ReID models.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.