顾美玲国产一区二区三区,日韩A级毛片免费视频

Logistic regression is a widely used statistical model to describe the relationship between a binary response variable and predictor variables in data sets. It is often used in machine learning to identify important predictor variables. This task, variable selection, typically amounts to fitting a logistic regression model regularized by a convex combination of $\ell_1$ and $\ell_{2}^{2}$ penalties. Since modern big data sets can contain hundreds of thousands to billions of predictor variables, variable selection methods depend on efficient and robust optimization algorithms to perform well. State-of-the-art algorithms for variable selection, however, were not traditionally designed to handle big data sets; they either scale poorly in size or are prone to produce unreliable numerical results. It therefore remains challenging to perform variable selection on big data sets without access to adequate and costly computational resources. In this paper, we propose a nonlinear primal-dual algorithm that addresses these shortcomings. Specifically, we propose an iterative algorithm that provably computes a solution to a logistic regression problem regularized by an elastic net penalty in $O(T(m,n)\log(1/\epsilon))$ operations, where $\epsilon \in (0,1)$ denotes the tolerance and $T(m,n)$ denotes the number of arithmetic operations required to perform matrix-vector multiplication on a data set with $m$ samples each comprising $n$ features. This result improves on the known complexity bound of $O(\min(m^2n,mn^2)\log(1/\epsilon))$ for first-order optimization methods such as the classic primal-dual hybrid gradient or forward-backward splitting methods.

相關內容

對數幾率回歸

關注 0

對數幾率回歸 · 正則化項 · 稀疏 · 機器學習 · 計算學習理論 ·

2022 年 2 月 1 日

Safe Screening for Logistic Regression with $\ell_0$-$\ell_2$ Regularization

Anna Deza,Alper Atamturk

In logistic regression, it is often desirable to utilize regularization to promote sparse solutions, particularly for problems with a large number of features compared to available labels. In this paper, we present screening rules that safely remove features from logistic regression with $\ell_0-\ell_2$ regularization before solving the problem. The proposed safe screening rules are based on lower bounds from the Fenchel dual of strong conic relaxations of the logistic regression problem. Numerical experiments with real and synthetic data suggest that a high percentage of the features can be effectively and safely removed apriori, leading to substantial speed-up in the computations.

模型選擇 · FAST · MoDELS · 馬爾可夫鏈蒙特卡羅 · Lasso回歸 ·

2022 年 2 月 1 日

Bayesian Generalized Additive Model Selection Including a Fast Variational Option

Virginia X. He,Matt P. Wand

We use Bayesian model selection paradigms, such as group least absolute shrinkage and selection operator priors, to facilitate generalized additive model selection. Our approach allows for the effects of continuous predictors to be categorized as either zero, linear or non-linear. Employment of carefully tailored auxiliary variables results in Gibbsian Markov chain Monte Carlo schemes for practical implementation of the approach. In addition, mean field variational algorithms with closed form updates are obtained. Whilst not as accurate, this fast variational option enhances scalability to very large data sets. A package in the R language aids use in practice.

對率損失 · 對數幾率回歸 · 全局極小解 · 優化器 · 損失 ·

2022 年 1 月 31 日

Agnostic Learnability of Halfspaces via Logistic Loss

Ziwei Ji,Kwangjun Ahn,Pranjal Awasthi,Satyen Kale,Stefani Karp

We investigate approximation guarantees provided by logistic regression for the fundamental problem of agnostic learning of homogeneous halfspaces. Previously, for a certain broad class of "well-behaved" distributions on the examples, Diakonikolas et al. (2020) proved an $\tilde{\Omega}(\textrm{OPT})$ lower bound, while Frei et al. (2021) proved an $\tilde{O}(\sqrt{\textrm{OPT}})$ upper bound, where $\textrm{OPT}$ denotes the best zero-one/misclassification risk of a homogeneous halfspace. In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $\Omega(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021). On the other hand, we also show that if we impose a radial-Lipschitzness condition in addition to well-behaved-ness on the distribution, logistic regression on a ball of bounded radius reaches $\tilde{O}(\textrm{OPT})$ misclassification risk. Our techniques also show for any well-behaved distribution, regardless of radial Lipschitzness, we can overcome the $\Omega(\sqrt{\textrm{OPT}})$ lower bound for logistic loss simply at the cost of one additional convex optimization step involving the hinge loss and attain $\tilde{O}(\textrm{OPT})$ misclassification risk. This two-step convex optimization algorithm is simpler than previous methods obtaining this guarantee, all of which require solving $O(\log(1/\textrm{OPT}))$ minimization problems.

方差減小 · 方差 · 價值函數 · 優化器 · 線性的 ·

2022 年 1 月 31 日

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Mathieu Dagréou,Pierre Ablin,Samuel Vaiter,Thomas Moreau

Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.

CLUE · 動力系統 · 確切的 · MoDELS · 線性的 ·

2022 年 1 月 31 日

Exact linear reduction for rational dynamical systems

Antonio Jiménez-Pastor,Joshua Paul Jacob,Gleb Pogudin

from arxiv, 21 pages, 4 tables, 1 figure

Detailed dynamical systems models used in life sciences may include dozens or even hundreds of state variables. Models of large dimension are not only harder from the numerical perspective (e.g., for parameter estimation or simulation), but it is also becoming challenging to derive mechanistic insights from such models. Exact model reduction is a way to address this issue by finding a self-consistent lower-dimensional projection of the corresponding dynamical system. A recent algorithm CLUE allows one to construct an exact linear reduction of the smallest possible dimension such that the fixed variables of interest are preserved. However, CLUE is restricted to systems with polynomial dynamics. Since rational dynamics occurs frequently in the life sciences (e.g., Michaelis-Menten or Hill kinetics), it is desirable to extend CLUE to the models with rational dynamics. In this paper, we present an extension of CLUE to the case of rational dynamics and demonstrate its applicability on examples from literature. Our implementation is available in version 1.5 of CLUE at //github.com/pogudingleb/CLUE.

非凸 · state-of-the-art · 優化器 · FAST · INFORMS ·

2022 年 1 月 31 日

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

Haoyu Zhao,Boyue Li,Zhize Li,Peter Richtárik,Yuejie Chi

from arxiv, 25 pages

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime.

特征選擇 · 稀疏 · 數據縮減 · 優化器 · MoDELS ·

2022 年 1 月 30 日

Sparse Centroid-Encoder: A Nonlinear Model for Feature Selection

Tomojit Ghosh,Michael Kirby

from arxiv, 14 pages, 6 figures, 3 tables. Used 9 data sets and 6 state-of-the-art models for comparison

We develop a sparse optimization problem for the determination of the total set of features that discriminate two or more classes. This is a sparse implementation of the centroid-encoder for nonlinear data reduction and visualization called Sparse Centroid-Encoder (SCE). We also provide a feature selection framework that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set. The algorithm is applied to a wide variety of data sets including, single-cell biological data, high dimensional infectious disease data, hyperspectral data, image data, and speech data. We compared our method to various state-of-the-art feature selection techniques, including two neural network-based models (DFS, and LassoNet), Sparse SVM, and Random Forest. We empirically showed that SCE features produced better classification accuracy on the unseen test data, often with fewer features.

對數幾率回歸 · 樣本復雜度 · 估計/估計量 · 秩 · 成比例 ·

2022 年 1 月 29 日

A Minimax Lower Bound for Low-Rank Matrix-Variate Logistic Regression

Batoul Taki,Mohsen Ghassemi,Anand D. Sarwate,Waheed U. Bajwa

from arxiv, 8 pages; published in Proc. 55th Asilomar Conf. Signals, Systems, and Computers, Pacific Grove, CA, Oct. 31-Nov. 3, 2021

This paper considers the problem of matrix-variate logistic regression. It derives the fundamental error threshold on estimating low-rank coefficient matrices in the logistic regression problem by obtaining a lower bound on the minimax risk. The bound depends explicitly on the dimension and distribution of the covariates, the rank and energy of the coefficient matrix, and the number of samples. The resulting bound is proportional to the intrinsic degrees of freedom in the problem, which suggests the sample complexity of the low-rank matrix logistic regression problem can be lower than that for vectorized logistic regression. The proof techniques utilized in this work also set the stage for development of minimax lower bounds for tensor-variate logistic regression problems.

估計/估計量 · 對數幾率回歸 · 穩健性 · MoDELS · 統計量 ·

2022 年 1 月 28 日

Asymptotic behaviour of penalized robust estimators in logistic regression when dimension increases

Ana M. Bianco,Graciela Boente,Gonzalo Chebi

Penalized $M-$estimators for logistic regression models have been previously study for fixed dimension in order to obtain sparse statistical models and automatic variable selection. In this paper, we derive asymptotic results for penalized $M-$estimators when the dimension $p$ grows to infinity with the sample size $n$. Specifically, we obtain consistency and rates of convergence results, for some choices of the penalty function. Moreover, we prove that these estimators consistently select variables with probability tending to 1 and derive their asymptotic distribution.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.