亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we consider tests for ultrahigh-dimensional partially linear regression models. The presence of ultrahigh-dimensional nuisance covariates and unknown nuisance function makes the inference problem very challenging. We adopt machine learning methods to estimate the unknown nuisance function and introduce quadratic-form test statistics. Interestingly, though the machine learning methods can be very complex, under suitable conditions, we establish the asymptotic normality of our introduced test statistics under the null hypothesis and local alternative hypotheses. We further propose a power-enhanced procedure to improve the test statistics' performance. Two thresholding determination methods are provided for the power-enhanced procedure. We show that the power-enhanced procedure is powerful to detect signals under either sparse or dense alternatives and it can still control the type-I error asymptotically under the null hypothesis. Numerical studies are carried out to illustrate the empirical performance of our introduced procedures.

相關內容

Reduced-rank regression recognises the possibility of a rank-deficient matrix of coefficients, which is particularly useful when the data is high-dimensional. We propose a novel Bayesian model for estimating the rank of the rank of the coefficient matrix, which obviates the need of post-processing steps, and allows for uncertainty quantification. Our method employs a mixture prior on the regression coefficient matrix along with a global-local shrinkage prior on its low-rank decomposition. Then, we rely on the Signal Adaptive Variable Selector to perform sparsification, and define two novel tools, the Posterior Inclusion Probability uncertainty index and the Relevance Index. The validity of the method is assessed in a simulation study, then its advantages and usefulness are shown in real-data applications on the chemical composition of tobacco and on the photometry of galaxies.

We develop a novel, general and computationally efficient framework, called Divide and Conquer Dynamic Programming (DCDP), for localizing change points in time series data with high-dimensional features. DCDP deploys a class of greedy algorithms that are applicable to a broad variety of high-dimensional statistical models and can enjoy almost linear computational complexity. We investigate the performance of DCDP in three commonly studied change point settings in high dimensions: the mean model, the Gaussian graphical model, and the linear regression model. In all three cases, we derive non-asymptotic bounds for the accuracy of the DCDP change point estimators. We demonstrate that the DCDP procedures consistently estimate the change points with sharp, and in some cases, optimal rates while incurring significantly smaller computational costs than the best available algorithms. Our findings are supported by extensive numerical experiments on both synthetic and real data.

This paper investigates the efficient solution of penalized quadratic regressions in high-dimensional settings. We propose a novel and efficient algorithm for ridge-penalized quadratic regression that leverages the matrix structures of the regression with interactions. Building on this formulation, we develop an alternating direction method of multipliers (ADMM) framework for penalized quadratic regression with general penalties, including both single and hybrid penalty functions. Our approach greatly simplifies the calculations to basic matrix-based operations, making it appealing in terms of both memory storage and computational complexity.

We develop a new permutation test for inference on a subvector of coefficients in linear models. The test is exact when the regressors and the error terms are independent. Then, we show that the test is asymptotically of correct level, consistent and has power against local alternatives when the independence condition is relaxed, under two main conditions. The first is a slight reinforcement of the usual absence of correlation between the regressors and the error term. The second is that the number of strata, defined by values of the regressors not involved in the subvector test, is small compared to the sample size. The latter implies that the vector of nuisance regressors is discrete. Simulations and empirical illustrations suggest that the test has good power in practice if, indeed, the number of strata is small compared to the sample size.

Despite attractive theoretical guarantees and practical successes, Predictive Interval (PI) given by Conformal Prediction (CP) may not reflect the uncertainty of a given model. This limitation arises from CP methods using a constant correction for all test points, disregarding their individual uncertainties, to ensure coverage properties. To address this issue, we propose using a Quantile Regression Forest (QRF) to learn the distribution of nonconformity scores and utilizing the QRF's weights to assign more importance to samples with residuals similar to the test point. This approach results in PI lengths that are more aligned with the model's uncertainty. In addition, the weights learnt by the QRF provide a partition of the features space, allowing for more efficient computations and improved adaptiveness of the PI through groupwise conformalization. Our approach enjoys an assumption-free finite sample marginal and training-conditional coverage, and under suitable assumptions, it also ensures conditional coverage. Our methods work for any nonconformity score and are available as a Python package. We conduct experiments on simulated and real-world data that demonstrate significant improvements compared to existing methods.

Sensitivity analysis is an important tool used in many domains of computational science to either gain insight into the mathematical model and interaction of its parameters or study the uncertainty propagation through the input-output interactions. In many applications, the inputs are stochastically dependent, which violates one of the essential assumptions in the state-of-the-art sensitivity analysis methods. Consequently, the results obtained ignoring the correlations provide values which do not reflect the true contributions of the input parameters. This study proposes an approach to address the parameter correlations using a polynomial chaos expansion method and Rosenblatt and Cholesky transformations to reflect the parameter dependencies. Treatment of the correlated variables is discussed in context of variance and derivative-based sensitivity analysis. We demonstrate that the sensitivity of the correlated parameters can not only differ in magnitude, but even the sign of the derivative-based index can be inverted, thus significantly altering the model behavior compared to the prediction of the analysis disregarding the correlations. Numerous experiments are conducted using workflow automation tools within the VECMA toolkit.

With the rapid advancements in technology for data collection, the application of the spatial autoregressive (SAR) model has become increasingly prevalent in real-world analysis, particularly when dealing with large datasets. However, the commonly used quasi-maximum likelihood estimation (QMLE) for the SAR model is not computationally scalable to handle the data with a large size. In addition, when establishing the asymptotic properties of the parameter estimators of the SAR model, both weights matrix and regressors are assumed to be nonstochastic in classical spatial econometrics, which is perhaps not realistic in real applications. Motivated by the machine learning literature, this paper proposes quasi-score matching estimation for the SAR model. This new estimation approach is still likelihood-based, but significantly reduces the computational complexity of the QMLE. The asymptotic properties of parameter estimators under the random weights matrix and regressors are established, which provides a new theoretical framework for the asymptotic inference of the SAR-type models. The usefulness of the quasi-score matching estimation and its asymptotic inference is illustrated via extensive simulation studies and a case study of an anti-conflict social network experiment for middle school students.

It is more and more frequently the case in applications that the data we observe come from one or more random variables taking values in an infinite dimensional space, e.g. curves. The need to have tools adapted to the nature of these data explains the growing interest in the field of functional data analysis. The model we study in this paper assumes a linear dependence between a quantity of interest and several covariates, at least one of which has an infinite dimension. To select the relevant covariates in this context, we investigate adaptations of the Lasso method. Two estimation methods are defined. The first one consists in the minimization of a Group-Lasso criterion on the multivariate functional space H. The second one minimizes the same criterion but on a finite dimensional subspaces of H whose dimension is chosen by a penalized least squares method. We prove oracle inequalities of sparsity in the case where the design is fixed or random. To compute the solutions of both criteria in practice, we propose a coordinate descent algorithm. A numerical study on simulated and real data illustrates the behavior of the estimators.

We study the large sample properties of sparse M-estimators in the presence of pseudo-observations. Our framework covers a broad class of semi-parametric copula models, for which the marginal distributions are unknown and replaced by their empirical counterparts. It is well known that the latter modification significantly alters the limiting laws compared to usual M-estimation. We establish the consistency and the asymptotic normality of our sparse penalized M-estimator and we prove the asymptotic oracle property with pseudo-observations, possibly in the case when the number of parameters is diverging. Our framework allows to manage copula-based loss functions that are potentially unbounded. Additionally, we state the weak limit of multivariate rank statistics for an arbitrary dimension and the weak convergence of empirical copula processes indexed by maps. We apply our inference method to Canonical Maximum Likelihood losses with Gaussian copulas, mixtures of copulas or conditional copulas. The theoretical results are illustrated by two numerical experiments.

The rise of artificial intelligence (AI) hinges on the efficient training of modern deep neural networks (DNNs) for non-convex optimization and uncertainty quantification, which boils down to a non-convex Bayesian learning problem. A standard tool to handle the problem is Langevin Monte Carlo, which proposes to approximate the posterior distribution with theoretical guarantees. In this thesis, we start with the replica exchange Langevin Monte Carlo (also known as parallel tempering), which proposes appropriate swaps between exploration and exploitation to achieve accelerations. However, the na\"ive extension of swaps to big data problems leads to a large bias, and bias-corrected swaps are required. Such a mechanism leads to few effective swaps and insignificant accelerations. To alleviate this issue, we first propose a control variates method to reduce the variance of noisy energy estimators and show a potential to accelerate the exponential convergence. We also present the population-chain replica exchange based on non-reversibility and obtain an optimal round-trip rate for deep learning. In the second part of the thesis, we study scalable dynamic importance sampling algorithms based on stochastic approximation. Traditional dynamic importance sampling algorithms have achieved success, however, the lack of scalability has greatly limited their extensions to big data. To handle this scalability issue, we resolve the vanishing gradient problem and propose two dynamic importance sampling algorithms. Theoretically, we establish the stability condition for the underlying ordinary differential equation (ODE) system and guarantee the asymptotic convergence of the latent variable to the desired fixed point. Interestingly, such a result still holds given non-convex energy landscapes.

北京阿比特科技有限公司