Traditional density and quantile estimators are often inconsistent with each other. Their simultaneous usage may lead to inconsistent results. To address this issue, we propose a novel smooth density estimator that is naturally consistent with the Harrell-Davis quantile estimator. We also provide a jittering implementation to support discrete-continuous mixture distributions.
Shannon defined the mutual information between two variables. We illustrate why the true mutual information between a variable and the predictions made by a prediction algorithm is not a suitable measure of prediction quality, but the apparent Shannon mutual information (ASI) is; indeed it is the unique prediction quality measure with either of two very different lists of desirable properties, as previously shown by de Finetti and other authors. However, estimating the uncertainty of the ASI is a difficult problem, because of long and non-symmetric heavy tails to the distribution of the individual values of $j(x,y)=\log\frac{Q_y(x)}{P(x)}$ We propose a Bayesian modelling method for the distribution of $j(x,y)$, from the posterior distribution of which the uncertainty in the ASI can be inferred. This method is based on Dirichlet-based mixtures of skew-Student distributions. We illustrate its use on data from a Bayesian model for prediction of the recurrence time of prostate cancer. We believe that this approach is generally appropriate for most problems, where it is infeasible to derive the explicit distribution of the samples of $j(x,y)$, though the precise modelling parameters may need adjustment to suit particular cases.
We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse problems (BIPs). While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires repeated computations of gradients and Hessians of the log-likelihood, which becomes prohibitive when the parameter-to-observable (PtO) map is defined through expensive-to-solve parametric partial differential equations (PDEs). We consider a delayed-acceptance geometric MCMC method driven by a neural operator surrogate of the PtO map, where the proposal exploits fast surrogate predictions of the log-likelihood and, simultaneously, its gradient and Hessian. To achieve a substantial speedup, the surrogate must accurately approximate the PtO map and its Jacobian, which often demands a prohibitively large number of PtO map samples via conventional operator learning methods. In this work, we present an extension of derivative-informed operator learning [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] that uses joint samples of the PtO map and its Jacobian. This leads to derivative-informed neural operator (DINO) surrogates that accurately predict the observables and posterior local geometry at a significantly lower training cost than conventional methods. Cost and error analysis for reduced basis DINO surrogates are provided. Numerical studies demonstrate that DINO-driven MCMC generates effective posterior samples 3--9 times faster than geometric MCMC and 60--97 times faster than prior geometry-based MCMC. Furthermore, the training cost of DINO surrogates breaks even compared to geometric MCMC after just 10--25 effective posterior samples.
Penalized empirical risk minimization with a surrogate loss function is often used to derive a high-dimensional linear decision rule in classification problems. Although much of the literature focuses on the generalization error, there is a lack of valid inference procedures to identify the driving factors of the estimated decision rule, especially when the surrogate loss is non-differentiable. In this work, we propose a kernel-smoothed decorrelated score to construct hypothesis testing and interval estimations for the linear decision rule estimated using a piece-wise linear surrogate loss, which has a discontinuous gradient and non-regular Hessian. Specifically, we adopt kernel approximations to smooth the discontinuous gradient near discontinuity points and approximate the non-regular Hessian of the surrogate loss. In applications where additional nuisance parameters are involved, we propose a novel cross-fitted version to accommodate flexible nuisance estimates and kernel approximations. We establish the limiting distribution of the kernel-smoothed decorrelated score and its cross-fitted version in a high-dimensional setup. Simulation and real data analysis are conducted to demonstrate the validity and superiority of the proposed method.
Confounder selection may be efficiently conducted using penalized regression methods when causal effects are estimated from observational data with many variables. An outcome-adaptive lasso was proposed to build a model for the propensity score that can be employed in conjunction with other variable selection methods for the outcome model to apply the augmented inverse propensity weighted (AIPW) estimator. However, researchers may not know which method is optimal to use for outcome model when applying the AIPW estimator with the outcome-adaptive lasso. This study provided hints on readily implementable penalized regression methods that should be adopted for the outcome model as a counterpart of the outcome-adaptive lasso. We evaluated the bias and variance of the AIPW estimators using the propensity score (PS) model and an outcome model based on penalized regression methods under various conditions by analyzing a clinical trial example and numerical experiments; the estimates and standard errors of the AIPW estimators were almost identical in an example with over 5000 participants. The AIPW estimators using penalized regression methods with the oracle property performed well in terms of bias and variance in numerical experiments with smaller sample sizes. Meanwhile, the bias of the AIPW estimator using the ordinary lasso for the PS and outcome models was considerably larger.
Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from opioid vs. non-steroidal anti-inflammatory drug (NSAID) initiators (X) with observed serum creatinine labs (Z2) and time-to-acute kidney injury as outcome. We simulated 100 cohorts with a null treatment effect, including X, Z2, atrial fibrillation (U), and 13 other investigator-derived confounders (Z1) in the outcome generation. We then imposed missingness (MZ2) on 50% of Z2 measurements as a function of Z2 and U and created different HDMI candidate AC using structured and NLP-derived features. We mimicked scenarios where U was unobserved by omitting it from all AC candidate sets. Using LASSO, we data-adaptively selected HDMI covariates associated with Z2 and MZ2 for MI, and with U to include in propensity score models. The treatment effect was estimated following propensity score matching in MI datasets and we benchmarked HDMI approaches against a baseline imputation and complete case analysis with Z1 only. HDMI using claims data showed the lowest bias (0.072). Combining claims and sentence embeddings led to an improvement in the efficiency displaying the lowest root-mean-squared-error (0.173) and coverage (94%). NLP-derived AC alone did not perform better than baseline MI. HDMI approaches may decrease bias in studies with partially observed confounders where missingness depends on unobserved factors.
Due to the linearity of quantum operations, it is not straightforward to implement nonlinear transformations on a quantum computer, making some practical tasks like a neural network hard to be achieved. In this work, we define a task called nonlinear transformation of complex amplitudes and provide an algorithm to achieve this task. Specifically, we construct a block-encoding of complex amplitudes from a state preparation unitary. This allows us to transform the complex amplitudes by using quantum singular value transformation. We evaluate the required overhead in terms of input dimension and precision, which reveals that the algorithm depends on the roughly square root of input dimension and achieves an exponential speedup on precision compared with previous work. We also discuss its possible applications to quantum machine learning, where complex amplitudes encoding classical or quantum data are processed by the proposed method. This paper provides a promising way to introduce highly complex nonlinearity of the quantum states, which is essentially missing in quantum mechanics.
Parametrized and random unitary (or orthogonal) $n$-qubit circuits play a central role in quantum information. As such, one could naturally assume that circuits implementing symplectic transformation would attract similar attention. However, this is not the case, as $\mathbb{SP}(d/2)$ -- the group of $d\times d$ unitary symplectic matrices -- has thus far been overlooked. In this work, we aim at starting to right this wrong. We begin by presenting a universal set of generators $\mathcal{G}$ for the symplectic algebra $i\mathfrak{sp}(d/2)$, consisting of one- and two-qubit Pauli operators acting on neighboring sites in a one-dimensional lattice. Here, we uncover two critical differences between such set, and equivalent ones for unitary and orthogonal circuits. Namely, we find that the operators in $\mathcal{G}$ cannot generate arbitrary local symplectic unitaries and that they are not translationally invariant. We then review the Schur-Weyl duality between the symplectic group and the Brauer algebra, and use tools from Weingarten calculus to prove that Pauli measurements at the output of Haar random symplectic circuits can converge to Gaussian processes. As a by-product, such analysis provides us with concentration bounds for Pauli measurements in circuits that form $t$-designs over $\mathbb{SP}(d/2)$. To finish, we present tensor-network tools to analyze shallow random symplectic circuits, and we use these to numerically show that computational-basis measurements anti-concentrate at logarithmic depth.
This tutorial discusses methodology for causal inference using longitudinal modified treatment policies. This method facilitates the mathematical formalization, identification, and estimation of many novel parameters, and mathematically generalizes many commonly used parameters, such as the average treatment effect. Longitudinal modified treatment policies apply to a wide variety of exposures, including binary, multivariate, and continuous, and can accommodate time-varying treatments and confounders, competing risks, loss-to-follow-up, as well as survival, binary, or continuous outcomes. Longitudinal modified treatment policies can be seen as an extension of static and dynamic interventions to involve the natural value of treatment, and, like dynamic interventions, can be used to define alternative estimands with a positivity assumption that is more likely to be satisfied than estimands corresponding to static interventions. This tutorial aims to illustrate several practical uses of the longitudinal modified treatment policy methodology, including describing different estimation strategies and their corresponding advantages and disadvantages. We provide numerous examples of types of research questions which can be answered using longitudinal modified treatment policies. We go into more depth with one of these examples--specifically, estimating the effect of delaying intubation on critically ill COVID-19 patients' mortality. We demonstrate the use of the open-source R package lmtp to estimate the effects, and we provide code on //github.com/kathoffman/lmtp-tutorial.
We extend the scope of a recently introduced dependence coefficient between a scalar response $Y$ and a multivariate covariate $X$ to the case where $X$ takes values in a general metric space. Particular attention is paid to the case where $X$ is a curve. While on the population level, this extension is straight forward, the asymptotic behavior of the estimator we consider is delicate. It crucially depends on the nearest neighbor structure of the infinite-dimensional covariate sample, where deterministic bounds on the degrees of the nearest neighbor graphs available in multivariate settings do no longer exist. The main contribution of this paper is to give some insight into this matter and to advise a way how to overcome the problem for our purposes. As an important application of our results, we consider an independence test.
We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise -- covering the TD($\lambda$) family of algorithms for all $\lambda \in [0, 1)$ -- and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $\lambda$ when running the TD($\lambda$) algorithm).