亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Kernel-based feature selection is an important tool in nonparametric statistics. Despite many practical applications of kernel-based feature selection, there is little statistical theory available to support the method. A core challenge is the objective function of the optimization problems used to define kernel-based feature selection are nonconvex. The literature has only studied the statistical properties of the \emph{global optima}, which is a mismatch, given that the gradient-based algorithms available for nonconvex optimization are only able to guarantee convergence to local minima. Studying the full landscape associated with kernel-based methods, we show that feature selection objectives using the Laplace kernel (and other $\ell_1$ kernels) come with statistical guarantees that other kernels, including the ubiquitous Gaussian kernel (or other $\ell_2$ kernels) do not possess. Based on a sharp characterization of the gradient of the objective function, we show that $\ell_1$ kernels eliminate unfavorable stationary points that appear when using an $\ell_2$ kernel. Armed with this insight, we establish statistical guarantees for $\ell_1$ kernel-based feature selection which do not require reaching the global minima. In particular, we establish model-selection consistency of $\ell_1$-kernel-based feature selection in recovering main effects and hierarchical interactions in the nonparametric setting with $n \sim \log p$ samples.

相關內容

特征選擇( Feature Selection )也稱特征子集選擇( Feature Subset Selection , FSS ),或屬性選擇( Attribute Selection )。是指從已有的M個特征(Feature)中選擇N個特征使得系統的特定指標最優化,是從原始特征中選擇出一些最有效特征以降低數據集維度的過程,是提高學習算法性能的一個重要手段,也是模式識別中關鍵的數據預處理步驟。對于一個學習算法來說,好的學習樣本是訓練模型的關鍵。

Partially linear additive models generalize the linear models since they model the relation between a response variable and covariates by assuming that some covariates are supposed to have a linear relation with the response but each of the others enter with unknown univariate smooth functions. The harmful effect of outliers either in the residuals or in the covariates involved in the linear component has been described in the situation of partially linear models, that is, when only one nonparametric component is involved in the model. When dealing with additive components, the problem of providing reliable estimators when atypical data arise, is of practical importance motivating the need of robust procedures. Hence, we propose a family of robust estimators for partially linear additive models by combining $B-$splines with robust linear regression estimators. We obtain consistency results, rates of convergence and asymptotic normality for the linear components, under mild assumptions. A Monte Carlo study is carried out to compare the performance of the robust proposal with its classical counterpart under different models and contamination schemes. The numerical experiments show the advantage of the proposed methodology for finite samples. We also illustrate the usefulness of the proposed approach on a real data set.

Many different simulation methods for Stokes flow problems involve a common computationally intense task -- the summation of a kernel function over $O(N^2)$ pairs of points. One popular technique is the Kernel Independent Fast Multipole Method (KIFMM), which constructs a spatial adaptive octree for all points and places a small number of equivalent multipole and local equivalent points around each octree box, and completes the kernel sum with $O(N)$ cost, using these equivalent points. Simpler kernels can be used between these equivalent points to improve the efficiency of KIFMM. Here we present further extensions and applications to this idea, to enable efficient summations and flexible boundary conditions for various kernels. We call our method the Kernel Aggregated Fast Multipole Method (KAFMM), because it uses different kernel functions at different stages of octree traversal. We have implemented our method as an open-source software library STKFMM based on the high performance library PVFMM, with support for Laplace kernels, the Stokeslet, regularized Stokeslet, Rotne-Prager-Yamakawa (RPY) tensor, and the Stokes double-layer and traction operators. Open and periodic boundary conditions are supported for all kernels, and the no-slip wall boundary condition is supported for the Stokeslet and RPY tensor. The package is designed to be ready-to-use as well as being readily extensible to additional kernels.

This paper studies the asymptotic properties of and improved inference methods for kernel density estimation (KDE) for dyadic data. We first establish novel uniform convergence rates for dyadic KDE under general assumptions. As the existing analytic variance estimator is known to behave unreliably in finite samples, we propose a modified jackknife empirical likelihood procedure for inference. The proposed test statistic is self-normalised and no variance estimator is required. In addition, it is asymptotically pivotal regardless of presence of dyadic clustering. The results are extended to cover the practically relevant case of incomplete dyadic network data. Simulations show that this jackknife empirical likelihood-based inference procedure delivers precise coverage probabilities even under modest sample sizes and with incomplete dyadic data. Finally, we illustrate the method by studying airport congestion.

Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters' variability thanks to Rubin's rules. The imputation-based peptide's intensities' variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. For protein-level results based on peptide-level quantification data, an aggregation step is also included. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.

Random graph models are used to describe the complex structure of real-world networks in diverse fields of knowledge. Studying their behavior and fitting properties are still critical challenges, that in general, require model specific techniques. An important line of research is to develop generic methods able to fit and select the best model among a collection. Approaches based on spectral density (i.e., distribution of the graph adjacency matrix eigenvalues) are appealing for that purpose: they apply to different random graph models. Also, they can benefit from the theoretical background of random matrix theory. This work investigates the convergence properties of model fitting procedures based on the graph spectral density and the corresponding cumulative distribution function. We also review results on the convergence of the spectral density for the most widely used random graph models. Moreover, we explore through simulations the limits of these graph spectral density convergence results, particularly in the case of the block model, where only partial results have been established.

The histogram estimator of a discrete probability mass function often exhibits undesirable properties related to zero probability estimation both within the observed range of counts and outside into the tails of the distribution. To circumvent this, we formulate a novel second-order discrete kernel smoother based on the recently developed mean-parametrized Conway--Maxwell--Poisson distribution which allows for both over- and under-dispersion. Two automated bandwidth selection approaches, one based on a simple minimization of the Kullback--Leibler divergence and another based on a more computationally demanding cross-validation criterion, are introduced. Both methods exhibit excellent small- and large-sample performance. Computational results on simulated datasets from a range of target distributions illustrate the flexibility and accuracy of the proposed method compared to existing smoothed and unsmoothed estimators. The method is applied to the modelling of somite counts in earthworms, and the number of development days of insect pests on the Hura tree.

We study ergodic properties of some Markov chains models in random environments when the random Markov kernels that define the dynamic satisfy some usual drift and small set conditions but with random coefficients. In particular, we adapt a standard coupling scheme used for getting geometric ergodic properties for homogeneous Markov chains to the random environment case and we prove the existence of a process of randomly invariant probability measures for such chains, in the spirit of the approach of Kifer for chains satisfying some Doeblin type conditions. We then deduce ergodic properties of such chains when the environment is itself ergodic. Our results complement and sharpen existing ones by providing quite weak and easily checkable assumptions on the random Markov kernels. As a by-product, we obtain a framework for studying some time series models with strictly exogenous covariates. We illustrate our results with autoregressive time series with functional coefficients and some threshold autoregressive processes.

Distributionally robust optimization (DRO) is a worst-case framework for stochastic optimization under uncertainty that has drawn fast-growing studies in recent years. When the underlying probability distribution is unknown and observed from data, DRO suggests to compute the worst-case distribution within a so-called uncertainty set that captures the involved statistical uncertainty. In particular, DRO with uncertainty set constructed as a statistical divergence neighborhood ball has been shown to provide a tool for constructing valid confidence intervals for nonparametric functionals, and bears a duality with the empirical likelihood (EL). In this paper, we show how adjusting the ball size of such type of DRO can reduce higher-order coverage errors similar to the Bartlett correction. Our correction, which applies to general von Mises differentiable functionals, is more general than the existing EL literature that only focuses on smooth function models or $M$-estimation. Moreover, we demonstrate a higher-order "self-normalizing" property of DRO regardless of the choice of divergence. Our approach builds on the development of a higher-order expansion of DRO, which is obtained through an asymptotic analysis on a fixed point equation arising from the Karush-Kuhn-Tucker conditions.

We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activiation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data. The key idea of our proof is that Gaussian random initialization followed by (stochastic) gradient descent produces a sequence of iterates that stay inside a small perturbation region centering around the initial weights, in which the empirical loss function of deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of (stochastic) gradient descent. Our theoretical results shed light on understanding the optimization of deep learning, and pave the way to study the optimization dynamics of training modern deep neural networks.

Personalized recommendation systems (RS) are extensively used in many services. Many of these are based on learning algorithms where the RS uses the recommendation history and the user response to learn an optimal strategy. Further, these algorithms are based on the assumption that the user interests are rigid. Specifically, they do not account for the effect of learning strategy on the evolution of the user interests. In this paper we develop influence models for a learning algorithm that is used to optimally recommend websites to web users. We adapt the model of \cite{Ioannidis10} to include an item-dependent reward to the RS from the suggestions that are accepted by the user. For this we first develop a static optimisation scheme when all the parameters are known. Next we develop a stochastic approximation based learning scheme for the RS to learn the optimal strategy when the user profiles are not known. Finally, we describe several user-influence models for the learning algorithm and analyze their effect on the steady user interests and on the steady state optimal strategy as compared to that when the users are not influenced.

北京阿比特科技有限公司