国产一区二区高清无码,欧美日韩在线精品视频一区二区三

We consider a class of structured fractional minimization problems, in which the numerator part of the objective is the sum of a differentiable convex function and a convex nonsmooth function, while the denominator part is a concave or convex function. This problem is difficult to solve since it is nonconvex. By exploiting the structure of the problem, we propose two Coordinate Descent (CD) methods for solving this problem. One is applied to the original fractional function, the other is based on the associated parametric problem. The proposed methods iteratively solve a one-dimensional subproblem \textit{globally}, and they are guaranteed to converge to coordinate-wise stationary points. In the case of a convex denominator, we prove that the proposed CD methods using sequential nonconvex approximation find stronger stationary points than existing methods. Under suitable conditions, CD methods with an appropriate initialization converge linearly to the optimal point (also the coordinate-wise stationary point). In the case of a concave denominator, we show that the resulting problem is quasi-convex, and any critical point is a global minimum. We prove that the algorithms converge to the global optimal solution with a sublinear convergence rate. We demonstrate the applicability of the proposed methods to some machine learning and signal processing models. Our experiments on real-world data have shown that our method significantly and consistently outperforms existing methods in terms of accuracy.

相關內容

坐標下降

關注 0

坐標下降法（coordinate descent）是一種非梯度優化算法。算法在每次迭代中，在當前點處沿一個坐標方向進行一維搜索以求得一個函數的局部極小值。在整個過程中循環使用不同的坐標方向。對于不可拆分的函數而言，算法可能無法在較小的迭代步數中求得最優解。為了加速收斂，可以采用一個適當的坐標系，例如通過主成分分析獲得一個坐標間盡可能不相互關聯的新坐標系.

子空間 · 優化器 · 可約的 · 核化 · Extensibility ·

2022 年 4 月 19 日

CobBO: Coordinate Backoff Bayesian Optimization with Two-Stage Kernels

Jian Tan,Niv Nayman,Mengchang Wang

from arxiv, Jian Tan and Niv Nayman contributed equally. An implementation of CobBO is available at: //github.com/Alibaba-MIIL/CobBO

Bayesian optimization is a popular method for optimizing expensive black-box functions. Yet it oftentimes struggles in high dimensions where the computation could be prohibitively heavy. To alleviate this problem, we introduce Coordinate backoff Bayesian Optimization (CobBO) with two-stage kernels. During each round, the first stage uses a simple coarse kernel that sacrifices the approximation accuracy for computational efficiency. It captures the global landscape by purposely smoothing away local fluctuations. Then, in the second stage of the same round, past observed points in the full space are projected to the selected subspace to form virtual points. These virtual points, along with the means and variances of their unknown function values estimated using the simple kernel of the first stage, are fitted to a more sophisticated kernel model in the second stage. Within the selected low dimensional subspace, the computational cost of conducting Bayesian optimization therein becomes affordable. To further enhance the performance, a sequence of consecutive observations in the same subspace are collected, which can effectively refine the approximation of the function. This refinement lasts until a stopping rule is met determining when to back off from a certain subspace and switch to another. This decoupling significantly reduces the computational burden in high dimensions, which fully leverages the observations in the whole space rather than only relying on observations in each coordinate subspace. Extensive evaluations show that CobBO finds solutions comparable to or better than other state-of-the-art methods for dimensions ranging from tens to hundreds, while reducing both the trial complexity and computational costs.

Extensibility · 在線 · CASES · Less · 優化器 ·

2022 年 4 月 19 日

Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent

Weiming Liu,Huacong Jiang,Bin Li,Houqiang Li

Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) are regret minimization algorithms for Online Convex Optimization (OCO), they are mathematically elegant but less practical in solving Extensive-Form Games (EFGs). Counterfactual Regret Minimization (CFR) is a technique for approximating Nash equilibria in EFGs. CFR and its variants have a fast convergence rate in practice, but their theoretical results are not satisfactory. In recent years, researchers have been trying to link CFRs with OCO algorithms, which may provide new theoretical results and inspire new algorithms. However, existing analysis is restricted to local decision points. In this paper, we show that CFRs with Regret Matching and Regret Matching+ are equivalent to special cases of FTRL and OMD, respectively. According to these equivalences, a new FTRL and a new OMD algorithm, which can be considered as extensions of vanilla CFR and CFR+, are derived. The experimental results show that the two variants converge faster than conventional FTRL and OMD, even faster than vanilla CFR and CFR+ in some EFGs.

坐標下降 · 子空間 · 正交 · 貪心 · 超平面 ·

2022 年 4 月 19 日

Greedy double subspaces coordinate descent method via orthogonalization

Li-Li Jin,Hou-Biao Li

from arxiv, 12 pages,4 tables,8 figures

The coordinate descent method is an effective iterative method for solving large linear least-squares problems. In this paper, for the highly coherent columns case, we construct an effective coordinate descent method which iteratively projects the estimate onto a solution space formed by two greedily selected hyperplanes via Gram-Schmidt orthogonalization. Our methods may be regarded as a simple block version of coordinate descent method which involves two active columns. The convergence analysis of this method is provided and numerical simulations also confirm the effectiveness for matrices with highly coherent columns.

估計/估計量 · Kronecker積 · 協方差矩陣 · Performer · 正則化項 ·

2022 年 4 月 18 日

Covariance Estimation for Matrix-valued Data

Yichi Zhang,Weining Shen,Dehan Kong

Covariance estimation for matrix-valued data has received an increasing interest in applications. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are established, and the derived minimax lower bound shows our proposed estimator is rate-optimal under certain divergence regimes of matrix size. We further introduce a class of robust covariance estimators and provide theoretical guarantees to deal with heavy-tailed data. We demonstrate the superior finite-sample performance of our methods using simulations and real applications from a gridded temperature anomalies dataset and a S&P 500 stock data analysis.

優化器 · Extensibility · 圖 · 結點 · SLAM ·

2022 年 4 月 18 日

Majorization Minimization Methods for Distributed Pose Graph Optimization

Taosha Fan,Todd Murphey

from arxiv, 33 pages

We consider the problem of distributed pose graph optimization (PGO) that has important applications in multi-robot simultaneous localization and mapping (SLAM). We propose the majorization minimization (MM) method for distributed PGO ($\mathsf{MM\!\!-\!\!PGO}$) that applies to a broad class of robust loss kernels. The $\mathsf{MM\!\!-\!\!PGO}$ method is guaranteed to converge to first-order critical points under mild conditions. Furthermore, noting that the $\mathsf{MM\!\!-\!\!PGO}$ method is reminiscent of proximal methods, we leverage Nesterov's method and adopt adaptive restarts to accelerate convergence. The resulting accelerated MM methods for distributed PGO -- both with a master node in the network ($\mathsf{AMM\!\!-\!\!PGO}^*$) and without ($\mathsf{AMM\!\!-\!\!PGO}^{#}$) -- have faster convergence in contrast to the $\mathsf{MM\!\!-\!\!PGO}$ method without sacrificing theoretical guarantees. In particular, the $\mathsf{AMM\!\!-\!\!PGO}^{#}$ method, which needs no master node and is fully decentralized, features a novel adaptive restart scheme and has a rate of convergence comparable to that of the $\mathsf{AMM\!\!-\!\!PGO}^*$ method using a master node to aggregate information from all the other nodes. The efficacy of this work is validated through extensive applications to 2D and 3D SLAM benchmark datasets and comprehensive comparisons against existing state-of-the-art methods, indicating that our MM methods converge faster and result in better solutions to distributed PGO.

方差減小 · 平均梯度 · 估計/估計量 · Batch Size · contrastive ·

2022 年 4 月 17 日

Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization

Gideon Dresdner,Maria-Luiza Vladarean,Gunnar R?tsch,Francesco Locatello,Volkan Cevher,Alp Yurtsever

from arxiv, Artificial Intelligence and Statistics (AISTATS) 2022

We propose a stochastic conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. Existing CGM variants for this template either suffer from slow convergence rates, or require carefully increasing the batch size over the course of the algorithm's execution, which leads to computing full gradients. In contrast, the proposed method, equipped with a stochastic average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques. In applications we put special emphasis on problems with a large number of separable constraints. Such problems are prevalent among semidefinite programming (SDP) formulations arising in machine learning and theoretical computer science. We provide numerical experiments on matrix completion, unsupervised clustering, and sparsest-cut SDPs.

曲率 · FAST · 增廣拉格朗日法 · PARCO · 均值 ·

2022 年 4 月 17 日

Fast Multi-grid Methods for Minimizing Curvature Energy

Zhenwei Zhang,Ke Chen,Yuping Duan

The geometric high-order regularization methods such as mean curvature and Gaussian curvature, have been intensively studied during the last decades due to their abilities in preserving geometric properties including image edges, corners, and image contrast. However, the dilemma between restoration quality and computational efficiency is an essential roadblock for high-order methods. In this paper, we propose fast multi-grid algorithms for minimizing both mean curvature and Gaussian curvature energy functionals without sacrificing the accuracy for efficiency. Unlike the existing approaches based on operator splitting and the Augmented Lagrangian method (ALM), no artificial parameters are introduced in our formulation, which guarantees the robustness of the proposed algorithm. Meanwhile, we adopt the domain decomposition method to promote parallel computing and use the fine-to-coarse structure to accelerate the convergence. Numerical experiments are presented on both image denoising and CT reconstruction problem to demonstrate the ability to recover image texture and the efficiency of the proposed method.

統計量 · 估計/估計量 · 似然 · 參數化模型 · MoDELS ·

2022 年 4 月 15 日

A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation

Braghadeesh Lakshminarayanan,Cristian R. Rojas

from arxiv, 7 pages, 6 figures, 1 table

One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.

殘差網絡 · Networking · 正則化項 · 泛函 · 層 ·

2022 年 4 月 14 日

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Rama Cont,Alain Rossier,RenYuan Xu

We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.

馬哈拉諾比斯距離 · 奇異的 · 方陣 · Signal Processing · 優化器 ·

2022 年 4 月 14 日

Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition

Navjot Singh,Edgar Solomonik

CP decomposition (CPD) is prevalent in chemometrics, signal processing, data mining and many more fields. While many algorithms have been proposed to compute the CPD, alternating least squares (ALS) remains one of the most widely used algorithm for computing the decomposition. Recent works have introduced the notion of eigenvalues and singular values of a tensor and explored applications of eigenvectors and singular vectors in areas like signal processing, data analytics and in various other fields. We introduce a new formulation for deriving singular values and vectors of a tensor by considering the critical points of a function different from what is used in the previous work. Computing these critical points in an alternating manner motivates an alternating optimization algorithm which corresponds to alternating least squares algorithm in the matrix case. However, for tensors with order greater than equal to $3$, it minimizes an objective function which is different from the commonly used least squares loss. Alternating optimization of this new objective leads to simple updates to the factor matrices with the same asymptotic computational cost as ALS. We show that a subsweep of this algorithm can achieve a superlinear convergence rate for exact CPD with known rank and verify it experimentally. We then view the algorithm as optimizing a Mahalanobis distance with respect to each factor with ground metric dependent on the other factors. This perspective allows us to generalize our approach to interpolate between updates corresponding to the ALS and the new algorithm to manage the tradeoff between stability and fitness of the decomposition. Our experimental results show that for approximating synthetic and real-world tensors, this algorithm and its variants converge to a better conditioned decomposition with comparable and sometimes better fitness as compared to the ALS algorithm.