亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper studies quasi-Newton methods for solving strongly-convex-strongly-concave saddle point problems (SPP). We propose a variant of general greedy Broyden family update for SPP, which has explicit local superlinear convergence rate of ${\mathcal O}\big(\big(1-\frac{1}{n\kappa^2}\big)^{k(k-1)/2}\big)$, where $n$ is dimensions of the problem, $\kappa$ is the condition number and $k$ is the number of iterations. The design and analysis of proposed algorithm are based on estimating the square of indefinite Hessian matrix, which is different from classical quasi-Newton methods in convex optimization. We also present two specific Broyden family algorithms with BFGS-type and SR1-type updates, which enjoy the faster local convergence rate of $\mathcal O\big(\big(1-\frac{1}{n}\big)^{k(k-1)/2}\big)$.

相關內容

擬牛頓法(Quasi-Newton Methods)是求(qiu)解非(fei)線性(xing)優(you)化(hua)問(wen)題最有效的(de)方(fang)法之一,于(yu)20世紀(ji)50年代由美國Argonne國家實(shi)驗(yan)室的(de)物理(li)學家W. C. Davidon所提(ti)出來。Davidon設計的(de)這種算法在(zai)當時(shi)看來是非(fei)線性(xing)優(you)化(hua)領域最具創(chuang)造性(xing)的(de)發明之一。不久R. Fletcher和M. J. D. Powell證實(shi)了這種新(xin)的(de)算法遠比(bi)其他方(fang)法快速和可靠,使得非(fei)線性(xing)優(you)化(hua)這門學科在(zai)一夜(ye)之間突飛猛進(jin)。

This paper focuses on stochastic saddle point problems with decision-dependent distributions in both the static and time-varying settings. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -- which are saddle points for the stationary stochastic minimax problem that they induce -- and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two classes of solutions is bounded provided that the objective has a strongly-convex-strongly-concave payoff and Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration; moreover, we show convergence to a neighborhood in expectation and almost surely. Finally, we investigate a condition on the distributional map -- which we call opposing mixture dominance -- that ensures the objective is strongly-convex-strongly-concave. Under this assumption, we show that primal-dual algorithms converge to the saddle points in a similar fashion.

The global minimum point of an optimization problem is of interest in engineering fields and it is difficult to be solved, especially for a nonconvex large-scale optimization problem. In this article, we consider a new memetic algorithm for this problem. That is to say, we use the determined points (the stationary points of the function) as the initial seeds of the evolutionary algorithm, other than the random initial seeds of the known evolutionary algorithms. We compare it with the multi-start method (the built-in subroutine GlobalSearch.m of the MATLAB R2020a environment), the branch-and-bound method (Couenne of the state-of-the-art open-source solver for mixed integer nonlinear programming problems), and two representative derivative-free algorithms (CMA-ES and MCS), respectively. Numerical results show that the proposed method performs well for the large-scale global optimization problems, especially the problems of which are difficult to be solved by the known global optimization methods.

The non-convexity of the artificial neural network (ANN) training landscape brings inherent optimization difficulties. While the traditional back-propagation stochastic gradient descent (SGD) algorithm and its variants are effective in certain cases, they can become stuck at spurious local minima and are sensitive to initializations and hyperparameters. Recent work has shown that the training of an ANN with ReLU activations can be reformulated as a convex program, bringing hope to globally optimizing interpretable ANNs. However, naively solving the convex training formulation has an exponential complexity, and even an approximation heuristic requires cubic time. In this work, we characterize the quality of this approximation and develop two efficient algorithms that train ANNs with global convergence guarantees. The first algorithm is based on the alternating direction method of multiplier (ADMM). It solves both the exact convex formulation and the approximate counterpart. Linear global convergence is achieved, and the initial several iterations often yield a solution with high prediction accuracy. When solving the approximate formulation, the per-iteration time complexity is quadratic. The second algorithm, based on the "sampled convex programs" theory, is simpler to implement. It solves unconstrained convex formulations and converges to an approximately globally optimal classifier. The non-convexity of the ANN training landscape exacerbates when adversarial training is considered. We apply the robust convex optimization theory to convex training and develop convex formulations that train ANNs robust to adversarial inputs. Our analysis explicitly focuses on one-hidden-layer fully connected ANNs, but can extend to more sophisticated architectures.

Stochastic majorization-minimization (SMM) is an online extension of the classical principle of majorization-minimization, which consists of sampling i.i.d. data points from a fixed data distribution and minimizing a recursively defined majorizing surrogate of an objective function. In this paper, we introduce stochastic block majorization-minimization, where the surrogates can now be only block multi-convex and a single block is optimized at a time within a diminishing radius. Relaxing the standard strong convexity requirements for surrogates in SMM, our framework gives wider applicability including online CANDECOMP/PARAFAC (CP) dictionary learning and yields greater computational efficiency especially when the problem dimension is large. We provide an extensive convergence analysis on the proposed algorithm, which we derive under possibly dependent data streams, relaxing the standard i.i.d. assumption on data samples. We show that the proposed algorithm converges almost surely to the set of stationary points of a nonconvex objective under constraints at a rate $O((\log n)^{1+\eps}/n^{1/2})$ for the empirical loss function and $O((\log n)^{1+\eps}/n^{1/4})$ for the expected loss function, where $n$ denotes the number of data samples processed. Under some additional assumption, the latter convergence rate can be improved to $O((\log n)^{1+\eps}/n^{1/2})$. Our results provide first convergence rate bounds for various online matrix and tensor decomposition algorithms under a general Markovian data setting.

The goal of this paper is to reduce the total complexity of gradient-based methods for two classes of problems: affine-constrained composite convex optimization and bilinear saddle-point structured non-smooth convex optimization. Our technique is based on a double-loop inexact accelerated proximal gradient (APG) method for minimizing the summation of a non-smooth but proximable convex function and two smooth convex functions with different smoothness constants and computational costs. Compared to the standard APG method, the inexact APG method can reduce the total computation cost if one smooth component has higher computational cost but a smaller smoothness constant than the other. With this property, the inexact APG method can be applied to approximately solve the subproblems of a proximal augmented Lagrangian method for affine-constrained composite convex optimization and the smooth approximation for bilinear saddle-point structured non-smooth convex optimization, where the smooth function with a smaller smoothness constant has significantly higher computational cost. Thus it can reduce total complexity for finding an approximately optimal/stationary solution. This technique is similar to the gradient sliding technique in the literature. The difference is that our inexact APG method can efficiently stop the inner loop by using a computable condition based on a measure of stationarity violation, while the gradient sliding methods need to pre-specify the number of iterations for the inner loop. Numerical experiments demonstrate significantly higher efficiency of our methods over an optimal primal-dual first-order method and the gradient sliding methods.

We propose an approach to saddle point optimization relying only on oracles that solve minimization problems approximately. We analyze its convergence property on a strongly convex--concave problem and show its linear convergence toward the global min--max saddle point. Based on the convergence analysis, we develop a heuristic approach to adapt the learning rate. An implementation of the developed approach using the (1+1)-CMA-ES as the minimization oracle, namely Adversarial-CMA-ES, is shown to outperform several existing approaches on test problems. Numerical evaluation confirms the tightness of the theoretical convergence rate bound as well as the efficiency of the learning rate adaptation mechanism. As an example of real-world problems, the suggested optimization method is applied to automatic berthing control problems under model uncertainties, showing its usefulness in obtaining solutions robust to uncertainty.

We study the class of first-order locally-balanced Metropolis--Hastings algorithms introduced in Livingstone & Zanella (2021). To choose a specific algorithm within the class the user must select a balancing function $g:\mathbb{R} \to \mathbb{R}$ satisfying $g(t) = tg(1/t)$, and a noise distribution for the proposal increment. Popular choices within the class are the Metropolis-adjusted Langevin algorithm and the recently introduced Barker proposal. We first establish a universal limiting optimal acceptance rate of 57% and scaling of $n^{-1/3}$ as the dimension $n$ tends to infinity among all members of the class under mild smoothness assumptions on $g$ and when the target distribution for the algorithm is of the product form. In particular we obtain an explicit expression for the asymptotic efficiency of an arbitrary algorithm in the class, as measured by expected squared jumping distance. We then consider how to optimise this expression under various constraints. We derive an optimal choice of noise distribution for the Barker proposal, optimal choice of balancing function under a Gaussian noise distribution, and optimal choice of first-order locally-balanced algorithm among the entire class, which turns out to depend on the specific target distribution. Numerical simulations confirm our theoretical findings and in particular show that a bi-modal choice of noise distribution in the Barker proposal gives rise to a practical algorithm that is consistently more efficient than the original Gaussian version.

Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

北京阿比特科技有限公司