亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Nonparametric feature selection in high-dimensional data is an important and challenging problem in statistics and machine learning fields. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. In this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel which depends on a set of parameters that determine the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters at the same time. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove both the oracle selection property and the Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and application to a microarray study of eye disease in animals.

相關內容

特(te)(te)(te)征(zheng)選擇( Feature Selection )也(ye)稱特(te)(te)(te)征(zheng)子集選擇( Feature Subset Selection , FSS ),或屬性選擇( Attribute Selection )。是(shi)指(zhi)(zhi)從(cong)已有的M個(ge)(ge)特(te)(te)(te)征(zheng)(Feature)中選擇N個(ge)(ge)特(te)(te)(te)征(zheng)使得(de)系(xi)統的特(te)(te)(te)定指(zhi)(zhi)標最(zui)優化,是(shi)從(cong)原始特(te)(te)(te)征(zheng)中選擇出(chu)一些最(zui)有效特(te)(te)(te)征(zheng)以降低數據集維度(du)的過程(cheng),是(shi)提高學(xue)習算法(fa)性能的一個(ge)(ge)重要手段(duan),也(ye)是(shi)模式識別中關鍵的數據預(yu)處理步驟(zou)。對(dui)于一個(ge)(ge)學(xue)習算法(fa)來(lai)說,好的學(xue)習樣本是(shi)訓練模型的關鍵。

Sparse regression and classification estimators capable of group selection have application to an assortment of statistical problems, from multitask learning to sparse additive modeling to hierarchical selection. This work introduces a class of group-sparse estimators that combine group subset selection with group lasso or ridge shrinkage. We develop an optimization framework for fitting the nonconvex regularization surface and present finite-sample error bounds for estimation of the regression function. Our methods and analyses accommodate the general setting where groups overlap. As an application of group selection, we study sparse semiparametric modeling, a procedure that allows the effect of each predictor to be zero, linear, or nonlinear. For this task, the new estimators improve across several metrics on synthetic data compared to alternatives. Finally, we demonstrate their efficacy in modeling supermarket foot traffic and economic recessions using many predictors. All of our proposals are made available in the scalable implementation grpsel.

We introduce a location statistic for distributions on non-linear geometric spaces, the diffusion mean, serving both as an extension of and an alternative to the Fr\'echet mean. The diffusion mean arises as the generalization of Gaussian maximum likelihood analysis to non-linear spaces by maximizing the likelihood of a Brownian motion. The diffusion mean depends on a time parameter $t$, which admits the interpretation of the allowed variance of the mean. The diffusion $t$-mean of a distribution $X$ is the most likely origin of a Brownian motion at time $t$, given the end-point distribution $X$. We give a detailed description of the asymptotic behavior of the diffusion estimator and provide sufficient conditions for the diffusion estimator to be strongly consistent. Furthermore, we present a smeary central limit theorem for diffusion means and investigate properties of the diffusion mean for distributions on the sphere $\mathcal{S}^n$. Experimentally, we consider simulated data and data from magnetic pole reversals, all indicating similar or improved convergence rate compared to the Fr\'echet mean. Here, we additionally estimate $t$ and consider its effects on smeariness and uniqueness of the diffusion mean for distributions on the sphere.

We propose a novel active learning strategy for regression, which is model-agnostic, robust against model mismatch, and interpretable. Assuming that a small number of initial samples are available, we derive the optimal training density that minimizes the generalization error of local polynomial smoothing (LPS) with its kernel bandwidth tuned locally: We adopt the mean integrated squared error (MISE) as a generalization criterion, and use the asymptotic behavior of the MISE as well as thelocally optimal bandwidths (LOB) -- the bandwidth function that minimizes MISE in the asymptotic limit. The asymptotic expression of our objective then reveals the dependence of the MISE on the training density, enabling analytic minimization. As a result, we obtain the optimal training density in a closed-form. The almost model-free nature of our approach should encode raw properties of the target problem, and thus provide a robust and model-agnostic active learning strategy. Furthermore, the obtained training density factorizes the influence of local function complexity, noise leveland test density in a transparent and interpretable way. We validate our theory in numerical simulations, and show that the proposed active learning method outperforms the existing state-of-the-art model-agnostic approaches.

Thanks to its fine balance between model flexibility and interpretability, the nonparametric additive model has been widely used, and variable selection for this type of model has received constant attention. However, none of the existing solutions can control the false discovery rate (FDR) under the finite sample setting. The knockoffs framework is a recent proposal that can effectively control the FDR with a finite sample size, but few knockoffs solutions are applicable to nonparametric models. In this article, we propose a novel kernel knockoffs selection procedure for the nonparametric additive model. We integrate three key components: the knockoffs, the subsampling for stability, and the random feature mapping for nonparametric function approximation. We show that the proposed method is guaranteed to control the FDR under any finite sample size, and achieves a power that approaches one as the sample size tends to infinity. We demonstrate the efficacy of our method through intensive numerical analyses and comparisons with the alternative solutions. Our proposal thus makes useful contributions to the methodology of nonparametric variable selection, FDR-based inference, as well as knockoffs.

We study efficiency of non-parametric estimation of diffusions (stochastic differential equations driven by Brownian motion) from long stationary trajectories. First, we introduce estimators based on conditional expectation which is motivated by the definition of drift and diffusion coefficients. These estimators involve time- and space-discretization parameters for computing expected values from discretely-sampled stationary data. Next, we analyze consistency and mean squared error of these estimators depending on computational parameters. We derive relationships between the number of observational points, time- and space-discretization parameters in order to achieve the optimal speed of convergence and minimize computational complexity. We illustrate our approach with numerical simulations.

Spike-and-slab and horseshoe regression are arguably the most popular Bayesian variable selection approaches for linear regression models. However, their performance can deteriorate if outliers and heteroskedasticity are present in the data, which are common features in many real-world statistics and machine learning applications. In this work, we propose a Bayesian nonparametric approach to linear regression that performs variable selection while accounting for outliers and heteroskedasticity. Our proposed model is an instance of a Dirichlet process scale mixture model with the advantage that we can derive the full conditional distributions of all parameters in closed form, hence producing an efficient Gibbs sampler for posterior inference. Moreover, we present how to extend the model to account for heavy-tailed response variables. The performance of the model is tested against competing algorithms on synthetic and real-world datasets.

In Quality-Diversity (QD) algorithms, which evolve a behaviourally diverse archive of high-performing solutions, the behaviour space is a difficult design choice that should be tailored to the target application. In QD meta-evolution, one evolves a population of QD algorithms to optimise the behaviour space based on an archive-level objective, the meta-fitness. This paper proposes an improved meta-evolution system such that (i) the database used to rapidly populate new archives is reformulated to prevent loss of quality-diversity; (ii) the linear transformation of base-features is generalised to a feature-map, a function of the base-features parametrised by the meta-genotype; and (iii) the mutation rate of the QD algorithm and the number of generations per meta-generation are controlled dynamically. Experiments on an 8-joint planar robot arm compare feature-maps (linear, non-linear, and feature-selection), parameter control strategies (static, endogenous, reinforcement learning, and annealing), and traditional MAP-Elites variants, for a total of 49 experimental conditions. Results reveal that non-linear and feature-selection feature-maps yield a 15-fold and 3-fold improvement in meta-fitness, respectively, over linear feature-maps. Reinforcement learning ranks among top parameter control methods. Finally, our approach allows the robot arm to recover a reach of over 80% for most damages and at least 60% for severe damages.

We consider the problem of the estimation of a high-dimensional probability distribution from i.i.d. samples of the distribution using model classes of functions in tree-based tensor formats, a particular case of tensor networks associated with a dimension partition tree. The distribution is assumed to admit a density with respect to a product measure, possibly discrete for handling the case of discrete random variables. After discussing the representation of classical model classes in tree-based tensor formats, we present learning algorithms based on empirical risk minimization using a $L^2$ contrast. These algorithms exploit the multilinear parametrization of the formats to recast the nonlinear minimization problem into a sequence of empirical risk minimization problems with linear models. A suitable parametrization of the tensor in tree-based tensor format allows to obtain a linear model with orthogonal bases, so that each problem admits an explicit expression of the solution and cross-validation risk estimates. These estimations of the risk enable the model selection, for instance when exploiting sparsity in the coefficients of the representation. A strategy for the adaptation of the tensor format (dimension tree and tree-based ranks) is provided, which allows to discover and exploit some specific structures of high-dimensional probability distributions such as independence or conditional independence. We illustrate the performances of the proposed algorithms for the approximation of classical probabilistic models (such as Gaussian distribution, graphical models, Markov chain).

Because of continuous advances in mathematical programing, Mix Integer Optimization has become a competitive vis-a-vis popular regularization method for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. We tackle these challenges, reducing computational burden when tuning the sparsity bound (a parameter which is critical for effectiveness) and improving performance in the presence of feature collinearity and of signals that vary in nature and strength. Importantly, we render the approach efficient and effective in applications of realistic size and complexity - without resorting to relaxations or heuristics in the optimization, or abandoning rigorous cross-validation tuning. Computational viability and improved performance in subtler scenarios is achieved with a multi-pronged blueprint, leveraging characteristics of the Mixed Integer Programming framework and by means of whitening, a data pre-processing step.

Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.

北京阿比特科技有限公司