When estimating an effect of an action with a randomized or observational study, that study is often not a random sample of the desired target population. Instead, estimates from that study can be transported to the target population. However, transportability methods generally rely on a positivity assumption, such that all relevant covariate patterns in the target population are also observed in the study sample. Strict eligibility criteria, particularly in the context of randomized trials, may lead to violations of this assumption. Two common approaches to address positivity violations are restricting the target population and restricting the relevant covariate set. As neither of these restrictions are ideal, we instead propose a synthesis of statistical and simulation models to address positivity violations. We propose corresponding g-computation and inverse probability weighting estimators. The restriction and synthesis approaches to addressing positivity violations are contrasted with a simulation experiment and an illustrative example in the context of sexually transmitted infection testing uptake. In both cases, the proposed synthesis approach accurately addressed the original research question when paired with a thoughtfully selected simulation model. Neither of the restriction approaches were able to accurately address the motivating question. As public health decisions must often be made with imperfect target population information, model synthesis is a viable approach given a combination of empirical data and external information based on the best available knowledge.
We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.
Survival time is the primary endpoint of many randomized controlled trials, and a treatment effect is typically quantified by the hazard ratio under the assumption of proportional hazards. Awareness is increasing that in many settings this assumption is a-priori violated, e.g. due to delayed onset of drug effect. In these cases, interpretation of the hazard ratio estimate is ambiguous and statistical inference for alternative parameters to quantify a treatment effect is warranted. We consider differences or ratios of milestone survival probabilities or quantiles, differences in restricted mean survival times and an average hazard ratio to be of interest. Typically, more than one such parameter needs to be reported to assess possible treatment benefits, and in confirmatory trials the according inferential procedures need to be adjusted for multiplicity. By using the counting process representation of the mentioned parameters, we show that their estimates are asymptotically multivariate normal and we propose according parametric multiple testing procedures and simultaneous confidence intervals. Also, the logrank test may be included in the framework. Finite sample type I error rate and power are studied by simulation. The methods are illustrated with an example from oncology. A software implementation is provided in the R package nph.
To reduce the dimensionality of the functional covariate, functional principal component analysis plays a key role, however, there is uncertainty on the number of principal components. Model averaging addresses this uncertainty by taking a weighted average of the prediction obtained from a set of candidate models. In this paper, we develop an optimal model averaging approach that selects the weights by minimizing a $K$-fold cross-validation criterion. We prove the asymptotic optimality of the selected weights in terms of minimizing the excess final prediction error, which greatly improves the usual asymptotic optimality in terms of minimizing the final prediction error in the literature. When the true regression relationship belongs to the set of candidate models, we provide the consistency of the averaged estimators. Numerical studies indicate that in most cases the proposed method performs better than other model selection and averaging methods, especially for extreme quantiles.
The equioscillation condition is extended to multivariate approximation. To this end, it is reformulated as the synchronized oscillations between the error maximizers and the components of a related Haar matrix kernel vector. This new condition gives rise to a multivariate equioscillation theorem where the Haar condition is not assumed and hence the existence and the characterization by equioscillation become independent of uniqueness. This allows the theorem to be applicable to problems with no strong uniqueness or even no uniqueness. A technical additional requirement on the involved Haar matrix and its kernel vector is proved to be sufficient for strong uniqueness. Instances of multivariate problems with strongly unique, unique and nonunique solutions are presented to illustrate the scope of the theorem.
Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite capacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractor-supported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information.
We show a relation, based on parallel repetition of the Magic Square game, that can be solved, with probability exponentially close to $1$ (worst-case input), by $1D$ (uniform) depth $2$, geometrically-local, noisy (noise below a threshold), fan-in $4$, quantum circuits. We show that the same relation cannot be solved, with an exponentially small success probability (averaged over inputs drawn uniformly), by $1D$ (non-uniform) geometrically-local, sub-linear depth, classical circuits consisting of fan-in $2$ NAND gates. Quantum and classical circuits are allowed to use input-independent (geometrically-non-local) resource states, that is entanglement and randomness respectively. To the best of our knowledge, previous best (analogous) depth separation for a task between quantum and classical circuits was constant v/s sub-logarithmic, although for general (geometrically non-local) circuits. Our hardness result for classical circuits is based on a direct product theorem about classical communication protocols from Jain and Kundu [JK22]. As an application, we propose a protocol that can potentially demonstrate verifiable quantum advantage in the NISQ era. We also provide generalizations of our result for higher dimensional circuits as well as a wider class of Bell games.
An overview is presented of a general theory of statistical inference that is referred to as the fiducial-Bayes fusion. This theory combines organic fiducial inference and Bayesian inference. The aim is that the reader is given a clear summary of the conceptual framework of the fiducial-Bayes fusion as well as pointers to further reading about its more technical aspects. Particular attention is paid to the issue of how much importance should be attached to the role of Bayesian inference within this framework. The appendix contains a substantive example of the application of the theory of the fiducial-Bayes fusion, which supplements various other examples of the application of this theory that are referenced in the paper.
A finite element based computational scheme is developed and employed to assess a duality based variational approach to the solution of the linear heat and transport PDE in one space dimension and time, and the nonlinear system of ODEs of Euler for the rotation of a rigid body about a fixed point. The formulation turns initial-(boundary) value problems into degenerate elliptic boundary value problems in (space)-time domains representing the Euler-Lagrange equations of suitably designed dual functionals in each of the above problems. We demonstrate reasonable success in approximating solutions of this range of parabolic, hyperbolic, and ODE primal problems, which includes energy dissipation as well as conservation, by a unified dual strategy lending itself to a variational formulation. The scheme naturally associates a family of dual solutions to a unique primal solution; such `gauge invariance' is demonstrated in our computed solutions of the heat and transport equations, including the case of a transient dual solution corresponding to a steady primal solution of the heat equation. Primal evolution problems with causality are shown to be correctly approximated by non-causal dual problems.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.