When an exposure of interest is confounded by unmeasured factors, an instrumental variable (IV) can be used to identify and estimate certain causal contrasts. Identification of the marginal average treatment effect (ATE) from IVs relies on strong untestable structural assumptions. When one is unwilling to assert such structure, IVs can nonetheless be used to construct bounds on the ATE. Famously, Balke and Pearl (1997) proved tight bounds on the ATE for a binary outcome, in a randomized trial with noncompliance and no covariate information. We demonstrate how these bounds remain useful in observational settings with baseline confounders of the IV, as well as randomized trials with measured baseline covariates. The resulting bounds on the ATE are non-smooth functionals, and thus standard nonparametric efficiency theory is not immediately applicable. To remedy this, we propose (1) under a novel margin condition, influence function-based estimators of the bounds that can attain parametric convergence rates when the nuisance functions are modeled flexibly, and (2) estimators of smooth approximations of these bounds. We propose extensions to continuous outcomes, explore finite sample properties in simulations, and illustrate the proposed estimators in an observational study targeting the effect of higher education on wages.
Zero-shot audio captioning aims at automatically generating descriptive textual captions for audio content without prior training for this task. Different from speech recognition which translates audio content that contains spoken language into text, audio captioning is commonly concerned with ambient sounds, or sounds produced by a human performing an action. Inspired by zero-shot image captioning methods, we propose ZerAuCap, a novel framework for summarising such general audio signals in a text caption without requiring task-specific training. In particular, our framework exploits a pre-trained large language model (LLM) for generating the text which is guided by a pre-trained audio-language model to produce captions that describe the audio content. Additionally, we use audio context keywords that prompt the language model to generate text that is broadly relevant to sounds. Our proposed framework achieves state-of-the-art results in zero-shot audio captioning on the AudioCaps and Clotho datasets. Our code is available at //github.com/ExplainableML/ZerAuCap.
The interpretation of vaccine efficacy estimands is subtle, even in randomized trials designed to quantify immunological effects of vaccination. In this article, we introduce terminology to distinguish between different vaccine efficacy estimands and clarify their interpretations. This allows us to explicitly consider immunological and behavioural effects of vaccination, and establish that policy-relevant estimands can differ substantially from those commonly reported in vaccine trials. We further show that a conventional vaccine trial allows identification and estimation of different vaccine estimands under plausible conditions, if one additional post-treatment variable is measured. Specifically, we utilize a ``belief variable'' that indicates the treatment an individual believed they had received. The belief variable is similar to ``blinding assessment'' variables that are occasionally collected in placebo-controlled trials in other fields. We illustrate the relations between the different estimands, and their practical relevance, in numerical examples based on an influenza vaccine trial.
We give an almost complete characterization of the hardness of $c$-coloring $\chi$-chromatic graphs with distributed algorithms, for a wide range of models of distributed computing. In particular, we show that these problems do not admit any distributed quantum advantage. To do that: 1) We give a new distributed algorithm that finds a $c$-coloring in $\chi$-chromatic graphs in $\tilde{\mathcal{O}}(n^{\frac{1}{\alpha}})$ rounds, with $\alpha = \bigl\lfloor\frac{c-1}{\chi - 1}\bigr\rfloor$. 2) We prove that any distributed algorithm for this problem requires $\Omega(n^{\frac{1}{\alpha}})$ rounds. Our upper bound holds in the classical, deterministic LOCAL model, while the near-matching lower bound holds in the non-signaling model. This model, introduced by Arfaoui and Fraigniaud in 2014, captures all models of distributed graph algorithms that obey physical causality; this includes not only classical deterministic LOCAL and randomized LOCAL but also quantum-LOCAL, even with a pre-shared quantum state. We also show that similar arguments can be used to prove that, e.g., 3-coloring 2-dimensional grids or $c$-coloring trees remain hard problems even for the non-signaling model, and in particular do not admit any quantum advantage. Our lower-bound arguments are purely graph-theoretic at heart; no background on quantum information theory is needed to establish the proofs.
In spatial blind source separation the observed multivariate random fields are assumed to be mixtures of latent spatially dependent random fields. The objective is to recover latent random fields by estimating the unmixing transformation. Currently, the algorithms for spatial blind source separation can only estimate linear unmixing transformations. Nonlinear blind source separation methods for spatial data are scarce. In this paper we extend an identifiable variational autoencoder that can estimate nonlinear unmixing transformations to spatially dependent data and demonstrate its performance for both stationary and nonstationary spatial data using simulations. In addition, we introduce scaled mean absolute Shapley additive explanations for interpreting the latent components through nonlinear mixing transformation. The spatial identifiable variational autoencoder is applied to a geochemical dataset to find the latent random fields, which are then interpreted by using the scaled mean absolute Shapley additive explanations.
We introduce and analyse a family of hash and predicate functions that are more likely to produce collisions for small reducible configurations of vectors. These may offer practical improvements to lattice sieving for short vectors. In particular, in one asymptotic regime the family exhibits significantly different convergent behaviour than existing hash functions and predicates.
We propose a novel and simple spectral method based on the semi-discrete Fourier transforms to discretize the fractional Laplacian $(-\Delta)^\frac{\alpha}{2}$. Numerical analysis and experiments are provided to study its performance. Our method has the same symbol $|\xi|^\alpha$ as the fractional Laplacian $(-\Delta)^\frac{\alpha}{2}$ at the discrete level, and thus it can be viewed as the exact discrete analogue of the fractional Laplacian. This {\it unique feature} distinguishes our method from other existing methods for the fractional Laplacian. Note that our method is different from the Fourier pseudospectral methods in the literature, which are usually limited to periodic boundary conditions (see Remark \ref{remark0}). Numerical analysis shows that our method can achieve a spectral accuracy. The stability and convergence of our method in solving the fractional Poisson equations were analyzed. Our scheme yields a multilevel Toeplitz stiffness matrix, and thus fast algorithms can be developed for efficient matrix-vector products. The computational complexity is ${\mathcal O}(2N\log(2N))$, and the memory storage is ${\mathcal O}(N)$ with $N$ the total number of points. Extensive numerical experiments verify our analytical results and demonstrate the effectiveness of our method in solving various problems.
Mixtures of linear mixed models are widely used for modelling longitudinal data for which observation times differ between subjects. In typical applications, temporal trends are described using a basis expansion, with basis coefficients treated as random effects varying by subject. Additional random effects can describe variation between mixture components, or other known sources of variation in complex experimental designs. A key advantage of these models is that they provide a natural mechanism for clustering, which can be helpful for interpretation in many applications. Current versions of mixtures of linear mixed models are not specifically designed for the case where there are many observations per subject and a complex temporal trend, which requires a large number of basis functions to capture. In this case, the subject-specific basis coefficients are a high-dimensional random effects vector, for which the covariance matrix is hard to specify and estimate, especially if it varies between mixture components. To address this issue, we consider the use of recently-developed deep mixture of factor analyzers models as the prior for the random effects. The resulting deep mixture of linear mixed models is well-suited to high-dimensional settings, and we describe an efficient variational inference approach to posterior computation. The efficacy of the method is demonstrated on both real and simulated data.
Humans solving algorithmic (or) reasoning problems typically exhibit solution times that grow as a function of problem difficulty. Adaptive recurrent neural networks have been shown to exhibit this property for various language-processing tasks. However, little work has been performed to assess whether such adaptive computation can also enable vision models to extrapolate solutions beyond their training distribution's difficulty level, with prior work focusing on very simple tasks. In this study, we investigate a critical functional role of such adaptive processing using recurrent neural networks: to dynamically scale computational resources conditional on input requirements that allow for zero-shot generalization to novel difficulty levels not seen during training using two challenging visual reasoning tasks: PathFinder and Mazes. We combine convolutional recurrent neural networks (ConvRNNs) with a learnable halting mechanism based on Graves (2016). We explore various implementations of such adaptive ConvRNNs (AdRNNs) ranging from tying weights across layers to more sophisticated biologically inspired recurrent networks that possess lateral connections and gating. We show that 1) AdRNNs learn to dynamically halt processing early (or late) to solve easier (or harder) problems, 2) these RNNs zero-shot generalize to more difficult problem settings not shown during training by dynamically increasing the number of recurrent iterations at test time. Our study provides modeling evidence supporting the hypothesis that recurrent processing enables the functional advantage of adaptively allocating compute resources conditional on input requirements and hence allowing generalization to harder difficulty levels of a visual reasoning problem without training.
The satisfiability problem is NP-complete but there are subclasses where all the instances are satisfiable. For this, restrictions on the shape of the formula are made. Darman and D\"ocker show that the subclass MONOTONE $3$-SAT-($k$,1) with $k \geq 5$ proves to be NP-complete and pose the open question whether instances of MONOTONE $3$-SAT-(3,1) are satisfiable. This paper shows that all instances of MONOTONE $3$-SAT-(3,1) are satisfiable using the new concept of a color-structures.
We explore the Wilks phenomena in two random graph models: the $\beta$-model and the Bradley-Terry model. For two increasing dimensional null hypotheses, including a specified null $H_0: \beta_i=\beta_i^0$ for $i=1,\ldots, r$ and a homogenous null $H_0: \beta_1=\cdots=\beta_r$, we reveal high dimensional Wilks' phenomena that the normalized log-likelihood ratio statistic, $[2\{\ell(\widehat{\mathbf{\beta}}) - \ell(\widehat{\mathbf{\beta}}^0)\} -r]/(2r)^{1/2}$, converges in distribution to the standard normal distribution as $r$ goes to infinity. Here, $\ell( \mathbf{\beta})$ is the log-likelihood function on the model parameter $\mathbf{\beta}=(\beta_1, \ldots, \beta_n)^\top$, $\widehat{\mathbf{\beta}}$ is its maximum likelihood estimator (MLE) under the full parameter space, and $\widehat{\mathbf{\beta}}^0$ is the restricted MLE under the null parameter space. For the homogenous null with a fixed $r$, we establish Wilks-type theorems that $2\{\ell(\widehat{\mathbf{\beta}}) - \ell(\widehat{\mathbf{\beta}}^0)\}$ converges in distribution to a chi-square distribution with $r-1$ degrees of freedom, as the total number of parameters, $n$, goes to infinity. When testing the fixed dimensional specified null, we find that its asymptotic null distribution is a chi-square distribution in the $\beta$-model. However, unexpectedly, this is not true in the Bradley-Terry model. By developing several novel technical methods for asymptotic expansion, we explore Wilks type results in a principled manner; these principled methods should be applicable to a class of random graph models beyond the $\beta$-model and the Bradley-Terry model. Simulation studies and real network data applications further demonstrate the theoretical results.