亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We develop a post-selective Bayesian framework to jointly and consistently estimate parameters within automatic group-sparse regression models. Selected through an indispensable class of learning algorithms, e.g. the Group LASSO, the overlapping Group LASSO, the sparse Group LASSO etc., uncertainty estimates for the matched parameters are unreliable in the absence of adjustments for selection bias. Limiting however the application of state of the art tools for the group-sparse problem include estimation strictly tailored to (i) real-valued projections onto very specific selected subspaces, (ii) selection events admitting representations as linear inequalities in the data variables. Our Bayesian methods address these gaps by deriving an adjustment factor in an easily feasible analytic form that eliminates bias from the selection of promising groups. Paying a very nominal price for this adjustment, experiments on simulated data and the Human Connectome Project demonstrate the efficacy of our methods at a joint estimation of group-sparse parameters learned from data.

相關內容

Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that finds effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include the interaction between treatment and a small number of covariates which are often chosen a priori. However, with increasingly large and complex data being collected, it is difficult to know which prognostic factors might be relevant in the treatment rule. Therefore, a more data-driven approach of selecting these covariates might improve the estimated decision rules and simplify models to make them easier to interpret. We propose a variable selection method for DTR estimation using penalized dynamic weighted least squares. Our method has the strong heredity property, that is, an interaction term can be included in the model only if the corresponding main terms have also been selected. Through simulations, we show our method has both the double robustness property and the oracle property, and the newly proposed methods compare favorably with other variable selection approaches.

In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading reflection of the intrinsic value of these features. To address this limitation, we propose a general framework for nonparametric inference on interpretable algorithm-agnostic variable importance. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. We propose a nonparametric efficient estimation procedure that allows the construction of valid confidence intervals, even when machine learning techniques are used. We also outline a valid strategy for testing the null importance hypothesis. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.

A high level of physical detail in a molecular model improves its ability to perform high accuracy simulations, but can also significantly affect its complexity and computational cost. In some situations, it is worthwhile to add additional complexity to a model to capture properties of interest; in others, additional complexity is unnecessary and can make simulations computationally infeasible. In this work we demonstrate the use of Bayes factors for molecular model selection, using Monte Carlo sampling techniques to evaluate the evidence for different levels of complexity in the two-centered Lennard-Jones + quadrupole (2CLJQ) fluid model. Examining three levels of nested model complexity, we demonstrate that the use of variable quadrupole and bond length parameters in this model framework is justified only sometimes. We also explore the effect of the Bayesian prior distribution on the Bayes factors, as well as ways to propose meaningful prior distributions. This Bayesian Markov Chain Monte Carlo (MCMC) process is enabled by the use of analytical surrogate models that accurately approximate the physical properties of interest. This work paves the way for further atomistic model selection work via Bayesian inference and surrogate modeling

Wavelet shrinkage estimators are widely applied in several fields of science for denoising data in wavelet domain by reducing the magnitudes of empirical coefficients. In nonparametric regression problem, most of the shrinkage rules are derived from models composed by an unknown function with additive gaussian noise. Although gaussian noise assumption is reasonable in several real data analysis, mainly for large sample sizes, it is not general. Contaminated data with positive noise can occur in practice and nonparametric regression models with positive noise bring challenges in wavelet shrinkage point of view. This work develops bayesian shrinkage rules to estimate wavelet coefficients from a nonparametric regression framework with additive and strictly positive noise under exponential and lognormal distributions. Computational aspects are discussed and simulation studies to analyse the performances of the proposed shrinkage rules and compare them with standard techniques are done. An application in winning times Boston Marathon dataset is also provided.

In this paper, a functional partial quantile regression approach, a quantile regression analog of the functional partial least squares regression, is proposed to estimate the function-on-function linear quantile regression model. A partial quantile covariance function is first used to extract the functional partial quantile regression basis functions. The extracted basis functions are then used to obtain the functional partial quantile regression components and estimate the final model. In our proposal, the functional forms of the discretely observed random variables are first constructed via a finite-dimensional basis function expansion method. The functional partial quantile regression constructed using the functional random variables is approximated via the partial quantile regression constructed using the basis expansion coefficients. The proposed method uses an iterative procedure to extract the partial quantile regression components. A Bayesian information criterion is used to determine the optimum number of retained components. The proposed functional partial quantile regression model allows for more than one functional predictor in the model. However, the true form of the proposed model is unspecified, as the relevant predictors for the model are unknown in practice. Thus, a forward variable selection procedure is used to determine the significant predictors for the proposed model. Moreover, a case-sampling-based bootstrap procedure is used to construct pointwise prediction intervals for the functional response. The predictive performance of the proposed method is evaluated using several Monte Carlo experiments under different data generation processes and error distributions. Through an empirical data example, air quality data are analyzed to demonstrate the effectiveness of the proposed method.

We study a functional linear regression model that deals with functional responses and allows for both functional covariates and high-dimensional vector covariates. The proposed model is flexible and nests several functional regression models in the literature as special cases. Based on the theory of reproducing kernel Hilbert spaces (RKHS), we propose a penalized least squares estimator that can accommodate functional variables observed on discrete sample points. Besides a conventional smoothness penalty, a group Lasso-type penalty is further imposed to induce sparsity in the high-dimensional vector predictors. We derive finite sample theoretical guarantees and show that the excess prediction risk of our estimator is minimax optimal. Furthermore, our analysis reveals an interesting phase transition phenomenon that the optimal excess risk is determined jointly by the smoothness and the sparsity of the functional regression coefficients. A novel efficient optimization algorithm based on iterative coordinate descent is devised to handle the smoothness and group penalties simultaneously. Simulation studies and real data applications illustrate the promising performance of the proposed approach compared to the state-of-the-art methods in the literature.

Causal inference using observational text data is becoming increasingly popular in many research areas. This paper presents the Bayesian Topic Regression (BTR) model that uses both text and numerical information to model an outcome variable. It allows estimation of both discrete and continuous treatment effects. Furthermore, it allows for the inclusion of additional numerical confounding factors next to text data. To this end, we combine a supervised Bayesian topic model with a Bayesian regression framework and perform supervised representation learning for the text features jointly with the regression parameter training, respecting the Frisch-Waugh-Lovell theorem. Our paper makes two main contributions. First, we provide a regression framework that allows causal inference in settings when both text and numerical confounders are of relevance. We show with synthetic and semi-synthetic datasets that our joint approach recovers ground truth with lower bias than any benchmark model, when text and numerical features are correlated. Second, experiments on two real-world datasets demonstrate that a joint and supervised learning strategy also yields superior prediction results compared to strategies that estimate regression weights for text and non-text features separately, being even competitive with more complex deep neural networks.

In this paper, we propose a propensity score adapted variable selection procedure to select covariates for inclusion in propensity score models, in order to eliminate confounding bias and improve statistical efficiency in observational studies. Our variable selection approach is specially designed for causal inference, it only requires the propensity scores to be $\sqrt{n}$-consistently estimated through a parametric model and need not correct specification of potential outcome models. By using estimated propensity scores as inverse probability treatment weights in performing an adaptive lasso on the outcome, it successfully excludes instrumental variables, and includes confounders and outcome predictors. We show its oracle properties under the "linear association" conditions. We also perform some numerical simulations to illustrate our propensity score adapted covariate selection procedure and evaluate its performance under model misspecification. Comparison to other covariate selection methods is made using artificial data as well, through which we find that it is more powerful in excluding instrumental variables and spurious covariates.

Propensity score methods have been shown to be powerful in obtaining efficient estimators of average treatment effect (ATE) from observational data, especially under the existence of confounding factors. When estimating, deciding which type of covariates need to be included in the propensity score function is important, since incorporating some unnecessary covariates may amplify both bias and variance of estimators of ATE. In this paper, we show that including additional instrumental variables that satisfy the exclusion restriction for outcome will do harm to the statistical efficiency. Also, we prove that, controlling for covariates that appear as outcome predictors, i.e. predict the outcomes and are irrelevant to the exposures, can help reduce the asymptotic variance of ATE estimation. We also note that, efficiently estimating the ATE by non-parametric or semi-parametric methods require the estimated propensity score function, as described in Hirano et al. (2003)\cite{Hirano2003}. Such estimation procedure usually asks for many regularity conditions, Rothe (2016)\cite{Rothe2016} also illustrated this point and proposed a known propensity score (KPS) estimator that requires mild regularity conditions and is still fully efficient. In addition, we introduce a linearly modified (LM) estimator that is nearly efficient in most general settings and need not estimation of the propensity score function, hence convenient to calculate. The construction of this estimator borrows idea from the interaction estimator of Lin (2013)\cite{Lin2013}, in which regression adjustment with interaction terms are applied to deal with data arising from a completely randomized experiment. As its name suggests, the LM estimator can be viewed as a linear modification on the IPW estimator using known propensity scores. We will also investigate its statistical properties both analytically and numerically.

Point processes in time have a wide range of applications that include the claims arrival process in insurance or the analysis of queues in operations research. Due to advances in technology, such samples of point processes are increasingly encountered. A key object of interest is the local intensity function. It has a straightforward interpretation that allows to understand and explore point process data. We consider functional approaches for point processes, where one has a sample of repeated realizations of the point process. This situation is inherently connected with Cox processes, where the intensity functions of the replications are modeled as random functions. Here we study a situation where one records covariates for each replication of the process, such as the daily temperature for bike rentals. For modeling point processes as responses with vector covariates as predictors we propose a novel regression approach for the intensity function that is intrinsically nonparametric. While the intensity function of a point process that is only observed once on a fixed domain cannot be identified, we show how covariates and repeated observations of the process can be utilized to make consistent estimation possible, and we also derive asymptotic rates of convergence without invoking parametric assumptions.

北京阿比特科技有限公司