顾美玲国产一区二区三区_黄色视频在线观看男人插女人的视频在线观看_曰产曰韩天天躁一区二区三区_97精品国产高清久久久久久_国产91久久九九免费精品无码_国产黄色片在线观看_黄色视频的网站也

The curse of dimensionality is a recognized challenge in nonparametric estimation. This paper develops a new L0-norm regularization approach to the convex quantile and expectile regressions for subset variable selection. We show how to use mixed integer programming to solve the proposed L0-norm regularization approach in practice and build a link to the commonly used L1-norm regularization approach. A Monte Carlo study is performed to compare the finite sample performances of the proposed L0-penalized convex quantile and expectile regression approaches with the L1-norm regularization approaches. The proposed approach is further applied to benchmark the sustainable development performance of the OECD countries and empirically analyze the accuracy in the dimensionality reduction of variables. The results from the simulation and application illustrate that the proposed L0-norm regularization approach can more effectively address the curse of dimensionality than the L1-norm regularization approach in multidimensional spaces.

相關內容

正則(ze)化項

關注 0

Extensibility · MoDELS · 極大似然 · Performer · MCMC ·

2021 年 9 月 9 日

A Bayesian framework for case-cohort Cox regression: application to dietary epidemiology

Andrew Yiu,Robert J. B. Goudie,Stephen J. Sharp,Paul J. Newcombe,Brian D. M. Tom

from arxiv, 29 pages, 3 figures, 2 tables

The case-cohort study design bypasses resource constraints by collecting certain expensive covariates for only a small subset of the full cohort. Weighted Cox regression is the most widely used approach for analysing case-cohort data within the Cox model, but is inefficient. Alternative approaches based on multiple imputation and nonparametric maximum likelihood suffer from incompatibility and computational issues respectively. We introduce a novel Bayesian framework for case-cohort Cox regression that avoids the aforementioned problems. Users can include auxiliary variables to help predict the unmeasured expensive covariates with a prediction model of their choice, while the models for the nuisance parameters are nonparametrically specified and integrated out. Posterior sampling can be carried out using procedures based on the pseudo-marginal MCMC algorithm. The method scales effectively to large, complex datasets, as demonstrated in our application: investigating the associations between saturated fatty acids and type 2 diabetes using the EPIC-Norfolk study. As part of our analysis, we also develop a new approach for handling compositional data in the Cox model, leading to more reliable and interpretable results compared to previous studies. The performance of our method is illustrated with extensive simulations. The code used to produce the results in this paper can be found at //github.com/andrewyiu/bayes_cc .

離散化 · 估計/估計量 · 線性的 · 樣例 · 數值分析 ·

2021 年 9 月 9 日

Fractional Crank-Nicolson-Galerkin finite element methods for nonlinear time fractional parabolic problems with time delay

Lili Li,Mianfu She,Yuanling Niu

A linearized numerical scheme is proposed to solve the nonlinear time fractional parabolic problems with time delay. The scheme is based on the standard Galerkin finite element method in the spatial direction, the fractional Crank-Nicolson method and extrapolation methods in the temporal direction. A novel discrete fractional Gr\"{o}nwall inequality is established. Thanks to the inequality, the error estimate of fully discrete scheme is obtained. Several numerical examples are provided to verify the effectiveness of the fully discrete numerical method.

MoDELS · Performer · Better · 估計/估計量 · 蒙特卡羅 ·

2021 年 9 月 8 日

On a quantile autoregressive conditional duration model applied to high-frequency financial data

Helton Saulo,Narayanaswamy Balakrishnan,Roberto Vila

from arxiv, 29 pages, 5 figuras

Autoregressive conditional duration (ACD) models are primarily used to deal with data arising from times between two successive events. These models are usually specified in terms of a time-varying conditional mean or median duration. In this paper, we relax this assumption and consider a conditional quantile approach to facilitate the modeling of different percentiles. The proposed ACD quantile model is based on a skewed version of Birnbaum-Saunders distribution, which provides better fitting of the tails than the traditional Birnbaum-Saunders distribution, in addition to advancing the implementation of an expectation conditional maximization (ECM) algorithm. A Monte Carlo simulation study is performed to assess the behavior of the model as well as the parameter estimation method and to evaluate a form of residual. A real financial transaction data set is finally analyzed to illustrate the proposed approach.

估計/估計量 · 協方差矩陣 · 優化器 · 損失函數（機器學習） · 統計量 ·

2021 年 9 月 8 日

Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions

Philippe Boileau,Nima S. Hejazi,Mark J. van der Laan,Sandrine Dudoit

from arxiv, 32 pages, 8 figures; updated contents of section 3, fixed typos

The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of this parameter is well-established. High-dimensional regimes do not admit such a convenience, however. As such, a variety of estimators have been derived to overcome the shortcomings of the sample covariance matrix in these settings. Yet, the question of selecting an optimal estimator from among the plethora available remains largely unaddressed. Using the framework of cross-validated loss-based estimation, we develop the theoretical underpinnings of just such an estimator selection procedure. In particular, we propose a general class of loss functions for covariance matrix estimation and establish finite-sample risk bounds and conditions for the asymptotic optimality of the cross-validated estimator selector with respect to these loss functions. We evaluate our proposed approach via a comprehensive set of simulation experiments and demonstrate its practical benefits by application in the exploratory analysis of two single-cell transcriptome sequencing datasets. A free and open-source software implementation of the proposed methodology, the cvCovEst R package, is briefly introduced.

估計/估計量 · 線性回歸 · 泛函 · 線性的 · 準則 ·

2021 年 9 月 8 日

Variable selection and estimation in multivariate functional linear regression via the Lasso

Angelina Roche

In more and more applications, a quantity of interest may depend on several covariates, with at least one of them infinite-dimensional (e.g. a curve). To select the relevant covariates in this context, we propose an adaptation of the Lasso method. Two estimation methods are defined. The first one consists in the minimisation of a criterion inspired by classical Lasso inference under group sparsity (Yuan and Lin, 2006; Lounici et al., 2011) on the whole multivariate functional space H. The second one minimises the same criterion but on a finite-dimensional subspace of H which dimension is chosen by a penalized leasts-squares method base on the work of Barron et al. (1999). Sparsity-oracle inequalities are proven in case of fixed or random design in our infinite-dimensional context. To calculate the solutions of both criteria, we propose a coordinate-wise descent algorithm, inspired by the glmnet algorithm (Friedman et al., 2007). A numerical study on simulated and experimental datasets illustrates the behavior of the estimators.

子空間 · 泛函 · 子采樣 · 縮放 · 可約的 ·

2021 年 9 月 8 日

Functional Principal Subspace Sampling for Large Scale Functional Data Analysis

Shiyuan He,Xiaomeng Yan

Functional data analysis (FDA) methods have computational and theoretical appeals for some high dimensional data, but lack the scalability to modern large sample datasets. To tackle the challenge, we develop randomized algorithms for two important FDA methods: functional principal component analysis (FPCA) and functional linear regression (FLR) with scalar response. The two methods are connected as they both rely on the accurate estimation of functional principal subspace. The proposed algorithms draw subsamples from the large dataset at hand and apply FPCA or FLR over the subsamples to reduce the computational cost. To effectively preserve subspace information in the subsamples, we propose a functional principal subspace sampling probability, which removes the eigenvalue scale effect inside the functional principal subspace and properly weights the residual. Based on the operator perturbation analysis, we show the proposed probability has precise control over the first order error of the subspace projection operator and can be interpreted as an importance sampling for functional subspace estimation. Moreover, concentration bounds for the proposed algorithms are established to reflect the low intrinsic dimension nature of functional data in an infinite dimensional space. The effectiveness of the proposed algorithms is demonstrated upon synthetic and real datasets.

估計/估計量 · MoDELS · 發散 · 優化器 · 線性的 ·

2021 年 9 月 6 日

Broadcasted Nonparametric Tensor Regression

Ya Zhou,Raymond K. W. Wong,Kejun He

We propose a novel broadcasting idea to model the nonlinearity in tensor regression non-parametrically. Unlike existing non-parametric tensor regression models, the resulting model strikes a good balance between flexibility and interpretability. A penalized estimation and corresponding algorithm are proposed. Our theoretical investigation, which allows the dimensions of the tensor covariate to diverge, indicates that the proposed estimation enjoys a desirable convergence rate. We also provide a minimax lower bound, which characterizes the optimality of the proposed estimator in a wide range of scenarios. Numerical experiments are conducted to confirm the theoretical finding and show that the proposed model has advantages over existing linear counterparts.

Facebook AI Research · 可約的 · 優化器 · 約束優化 · 約束 ·

2018 年 8 月 2 日

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

L. Elisa Celis,Lingxiao Huang,Vijay Keswani,Nisheeth K. Vishnoi

Developing classification algorithms that are fair with respect to sensitive attributes of the data has become an important problem due to the growing deployment of classification algorithms in various social contexts. Several recent works have focused on fairness with respect to a specific metric, modeled the corresponding fair classification problem as a constrained optimization problem, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which we do not have fair classifiers and many of the aforementioned algorithms do not come with theoretical guarantees; perhaps because the resulting optimization problem is non-convex. The main contribution of this paper is a new meta-algorithm for classification that takes as input a large class of fairness constraints, with respect to multiple non-disjoint sensitive attributes, and which comes with provable guarantees. This is achieved by first developing a meta-algorithm for a large family of classification problems with convex constraints, and then showing that classification problems with general types of fairness constraints can be reduced to those in this family. We present empirical results that show that our algorithm can achieve near-perfect fairness with respect to various fairness metrics, and that the loss in accuracy due to the imposed fairness constraints is often small. Overall, this work unifies several prior works on fair classification, presents a practical algorithm with theoretical guarantees, and can handle fairness metrics that were previously not possible.

單純形 · Performer · Processing（編程語言） · 貝葉斯推斷 · 離散化 ·

2018 年 6 月 19 日

Large-Scale Stochastic Sampling from the Probability Simplex

Jack Baker,Paul Fearnhead,Emily B Fox,Christopher Nemeth

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.

Performer · 估計/估計量 · 經驗風險最小化 · 經驗風險 · 方差 ·

2017 年 12 月 14 日

Variance-based regularization with convex objectives

John Duchi,Hongseok Namkoong

We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.