This paper studies model checking for general parametric regression models with no dimension reduction structures on the high-dimensional vector of predictors. Using existing test as an initial test, this paper combines the sample-splitting technique and conditional studentization approach to construct a COnditionally Studentized Test(COST). Unlike existing tests, whether the initial test is global or local smoothing-based, and whether the dimension of the predictor vector and the number of parameters are fixed, or diverge at a certain rate as the sample size goes to infinity, the proposed test always has a normal weak limit under the null hypothesis. Further, the test can detect the local alternatives distinct from the null hypothesis at the fastest possible rate of convergence in hypothesis testing. We also discuss the optimal sample splitting in power performance. The numerical studies offer information on its merits and limitations in finite sample cases. As a generic methodology, it could be applied to other testing problems.
We study parametric inference on a rich class of hazard regression models in the presence of right-censoring. Previous literature has reported some inferential challenges, such as multimodal or flat likelihood surfaces, in this class of models for some particular data sets. We formalize the study of these inferential problems by linking them to the concepts of near-redundancy and practical non-identifiability of parameters. We show that the maximum likelihood estimators of the parameters in this class of models are consistent and asymptotically normal. Thus, the inferential problems in this class of models are related to the finite-sample scenario, where it is difficult to distinguish between the fitted model and a nested non-identifiable (i.e., parameter-redundant) model. We propose a method for detecting near-redundancy, based on distances between probability distributions. We also employ methods used in other areas for detecting practical non-identifiability and near-redundancy, including the inspection of the profile likelihood function and the Hessian method. For cases where inferential problems are detected, we discuss alternatives such as using model selection tools to identify simpler models that do not exhibit these inferential problems, increasing the sample size, or extending the follow-up time. We illustrate the performance of the proposed methods through a simulation study. Our simulation study reveals a link between the presence of near-redundancy and practical non-identifiability. Two illustrative applications using real data, with and without inferential problems, are presented.
Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained optimization task involving shape optimization of rotor blades in turbo-machinery.
We propose a new Bayesian non-parametric (BNP) method for estimating the causal effects of mediation in the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounders, treatment, and baseline confounders). The proposed BNP model allows more confounder-based clusters than clusters for the outcome and mediator. For identifiability, we use the extended version of the standard sequential ignorability as introduced in \citet{hong2022posttreatment}. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, $i.e.$, the natural direct effects (NDE), and indirect effects (NIE). We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, demonstrating its practical utility in real-world scenarios. \keywords{Causal inference; Enriched Dirichlet process mixture model.}
In this paper, we focus our attention on the high-dimensional double sparse linear regression, that is, a combination of element-wise and group-wise sparsity.To address this problem, we propose an IHT-style (iterative hard thresholding) procedure that dynamically updates the threshold at each step. We establish the matching upper and lower bounds for parameter estimation, showing the optimality of our proposal in the minimax sense. Coupled with a novel sparse group information criterion, we develop a fully adaptive procedure to handle unknown group sparsity and noise levels.We show that our adaptive procedure achieves optimal statistical accuracy with fast convergence. Finally, we demonstrate the superiority of our method by comparing it with several state-of-the-art algorithms on both synthetic and real-world datasets.
Treatment effect estimation under unconfoundedness is a fundamental task in causal inference. In response to the challenge of analyzing high-dimensional datasets collected in substantive fields such as epidemiology, genetics, economics, and social sciences, many methods for treatment effect estimation with high-dimensional nuisance parameters (the outcome regression and the propensity score) have been developed in recent years. However, it is still unclear what is the necessary and sufficient sparsity condition on the nuisance parameters for the treatment effect to be $\sqrt{n}$-estimable. In this paper, we propose a new Double-Calibration strategy that corrects the estimation bias of the nuisance parameter estimates computed by regularized high-dimensional techniques and demonstrate that the corresponding Doubly-Calibrated estimator achieves $1 / \sqrt{n}$-rate as long as one of the nuisance parameters is sparse with sparsity below $\sqrt{n} / \log p$, where $p$ denotes the ambient dimension of the covariates, whereas the other nuisance parameter can be arbitrarily complex and completely misspecified. The Double-Calibration strategy can also be applied to settings other than treatment effect estimation, e.g. regression coefficient estimation in the presence of diverging number of controls in a semiparametric partially linear model.
Most existing studies on linear bandits focus on the one-dimensional characterization of the overall system. While being representative, this formulation may fail to model applications with high-dimensional but favorable structures, such as the low-rank tensor representation for recommender systems. To address this limitation, this work studies a general tensor bandits model, where actions and system parameters are represented by tensors as opposed to vectors, and we particularly focus on the case that the unknown system tensor is low-rank. A novel bandit algorithm, coined TOFU (Tensor Optimism in the Face of Uncertainty), is developed. TOFU first leverages flexible tensor regression techniques to estimate low-dimensional subspaces associated with the system tensor. These estimates are then utilized to convert the original problem to a new one with norm constraints on its system parameters. Lastly, a norm-constrained bandit subroutine is adopted by TOFU, which utilizes these constraints to avoid exploring the entire high-dimensional parameter space. Theoretical analyses show that TOFU improves the best-known regret upper bound by a multiplicative factor that grows exponentially in the system order. A novel performance lower bound is also established, which further corroborates the efficiency of TOFU.
Bayesian variable selection methods are powerful techniques for fitting and inferring on sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. In this paper, we proposed a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates of hyperparameters. Efficient maximum a posteriori (MAP) estimation is completed through a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm. The PX-ECM results in a robust computationally efficient coordinate-wise optimization, which adjusts for the impact of other predictor variables. The completion of the E-step uses an approach motivated by the popular two-groups approach to multiple testing. The result is a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm applied to sparse high-dimensional linear regression, which can be completed using one-at-a-time or all-at-once type optimization. We compare the empirical properties of PROBE to comparable approaches with numerous simulation studies and an analysis of cancer cell lines drug response study. The proposed approach is implemented in the R package probe.
Many state-of-the-art hyperparameter optimization (HPO) algorithms rely on model-based optimizers that learn surrogate models of the target function to guide the search. Gaussian processes are the de facto surrogate model due to their ability to capture uncertainty but they make strong assumptions about the observation noise, which might not be warranted in practice. In this work, we propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise and, as a result, models the target function in a more realistic and robust fashion which translates to quicker HPO convergence on empirical benchmarks. To apply our method in a multi-fidelity setting, we propose a simple, yet effective, technique that aggregates observed results across different resource levels and outperforms conventional methods across many empirical tasks.
Training a neural network (NN) typically relies on some type of curve-following method, such as gradient descent (GD) (and stochastic gradient descent (SGD)), ADADELTA, ADAM or limited memory algorithms. Convergence for these algorithms usually relies on having access to a large quantity of observations in order to achieve a high level of accuracy and, with certain classes of functions, these algorithms could take multiple epochs of data points to catch on. Herein, a different technique with the potential of achieving dramatically better speeds of convergence, especially for shallow networks, is explored: it does not curve-follow but rather relies on 'decoupling' hidden layers and on updating their weighted connections through bootstrapping, resampling and linear regression. By utilizing resampled observations, the convergence of this process is empirically shown to be remarkably fast and to require a lower amount of data points: in particular, our experiments show that one needs a fraction of the observations that are required with traditional neural network training methods to approximate various classes of functions.
Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, such a two-stage approach is uneconomical, especially in terms of actively collected interventional data, since the causal query of interest may not require a fully-specified causal model. From a Bayesian perspective, it is also unnatural, since a causal query (e.g., the causal graph or some causal effect) can be viewed as a latent quantity subject to posterior inference -- other unobserved quantities that are not of direct interest (e.g., the full causal model) ought to be marginalized out in this process and contribute to our epistemic uncertainty. In this work, we propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning, which jointly infers a posterior over causal models and queries of interest. In our approach to ABCI, we focus on the class of causally-sufficient, nonlinear additive noise models, which we model using Gaussian processes. We sequentially design experiments that are maximally informative about our target causal query, collect the corresponding interventional data, and update our beliefs to choose the next experiment. Through simulations, we demonstrate that our approach is more data-efficient than several baselines that only focus on learning the full causal graph. This allows us to accurately learn downstream causal queries from fewer samples while providing well-calibrated uncertainty estimates for the quantities of interest.