亚洲AV永久无码精品九之_蜜臀AV国内精品久久久夜夜嗨_国产深夜激情一区二区_码精品一区二区三区四区_伊人影院国产热门在线午夜_亚洲制服丝袜无码第3页_国产精品色婷婷在线观看0000

Nonconvex-nonconcave minimax optimization has been the focus of intense research over the last decade due to its broad applications in machine learning and operation research. Unfortunately, most existing algorithms cannot be guaranteed to converge and always suffer from limit cycles. Their global convergence relies on certain conditions that are difficult to check, including but not limited to the global Polyak-\L{}ojasiewicz condition, the existence of a solution satisfying the weak Minty variational inequality and $\alpha$-interaction dominant condition. In this paper, we develop the first provably convergent algorithm called doubly smoothed gradient descent ascent method, which gets rid of the limit cycle without requiring any additional conditions. We further show that the algorithm has an iteration complexity of $\mathcal{O}(\epsilon^{-4})$ for finding a game stationary point, which matches the best iteration complexity of single-loop algorithms under nonconcave-concave settings. The algorithm presented here opens up a new path for designing provable algorithms for nonconvex-nonconcave minimax optimization problems.

相關內容

Minimax

關注 0

近似 · Performer · Extensibility · 相互獨立的 · 近似誤差 ·

2023 年 2 月 24 日

Randomized low-rank approximation of parameter-dependent matrices

Daniel Kressner,Hei Yin Lam

This work considers the low-rank approximation of a matrix $A(t)$ depending on a parameter $t$ in a compact set $D \subset \mathbb{R}^d$. Application areas that give rise to such problems include computational statistics and dynamical systems. Randomized algorithms are an increasingly popular approach for performing low-rank approximation and they usually proceed by multiplying the matrix with random dimension reduction matrices (DRMs). Applying such algorithms directly to $A(t)$ would involve different, independent DRMs for every $t$, which is not only expensive but also leads to inherently non-smooth approximations. In this work, we propose to use constant DRMs, that is, $A(t)$ is multiplied with the same DRM for every $t$. The resulting parameter-dependent extensions of two popular randomized algorithms, the randomized singular value decomposition and the generalized Nystr\"{o}m method, are computationally attractive, especially when $A(t)$ admits an affine linear decomposition with respect to $t$. We perform a probabilistic analysis for both algorithms, deriving bounds on the expected value as well as failure probabilities for the $L^2$ approximation error when using Gaussian random DRMs. Both, the theoretical results and numerical experiments, show that the use of constant DRMs does not impair their effectiveness; our methods reliably return quasi-best low-rank approximations.

Continuity · Networking · Neural Networks · 泛函 · 近似 ·

2023 年 2 月 24 日

Neural Network Approximation of Continuous Functions in High Dimensions with Applications to Inverse Problems

Santhosh Karnik,Rongrong Wang,Mark Iwen

from arxiv, 22 pages, 1 figure

The remarkable successes of neural networks in a huge variety of inverse problems have fueled their adoption in disciplines ranging from medical imaging to seismic analysis over the past decade. However, the high dimensionality of such inverse problems has simultaneously left current theory, which predicts that networks should scale exponentially in the dimension of the problem, unable to explain why the seemingly small networks used in these settings work as well as they do in practice. To reduce this gap between theory and practice, we provide a general method for bounding the complexity required for a neural network to approximate a H\"older (or uniformly) continuous function defined on a high-dimensional set with a low-complexity structure. The approach is based on the observation that the existence of a Johnson-Lindenstrauss embedding $A\in\mathbb{R}^{d\times D}$ of a given high-dimensional set $S\subset\mathbb{R}^D$ into a low dimensional cube $[-M,M]^d$ implies that for any H\"older (or uniformly) continuous function $f:S\to\mathbb{R}^p$, there exists a H\"older (or uniformly) continuous function $g:[-M,M]^d\to\mathbb{R}^p$ such that $g(Ax)=f(x)$ for all $x\in S$. Hence, if one has a neural network which approximates $g:[-M,M]^d\to\mathbb{R}^p$, then a layer can be added that implements the JL embedding $A$ to obtain a neural network that approximates $f:S\to\mathbb{R}^p$. By pairing JL embedding results along with results on approximation of H\"older (or uniformly) continuous functions by neural networks, one then obtains results which bound the complexity required for a neural network to approximate H\"older (or uniformly) continuous functions on high dimensional sets. The end result is a general theoretical framework which can then be used to better explain the observed empirical successes of smaller networks in a wider variety of inverse problems than current theory allows.

泛函 · 平滑 · 類別 · 非凸 · 單峰值 ·

2023 年 2 月 24 日

Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond

Oliver Hinder,Aaron Sidford,Nimit S. Sohoni

from arxiv, 48 pages. Published as a conference paper at COLT 2020

In this paper, we provide near-optimal accelerated first-order methods for minimizing a broad class of smooth nonconvex functions that are strictly unimodal on all lines through a minimizer. This function class, which we call the class of smooth quasar-convex functions, is parameterized by a constant $\gamma \in (0,1]$, where $\gamma = 1$ encompasses the classes of smooth convex and star-convex functions, and smaller values of $\gamma$ indicate that the function can be "more nonconvex." We develop a variant of accelerated gradient descent that computes an $\epsilon$-approximate minimizer of a smooth $\gamma$-quasar-convex function with at most $O(\gamma^{-1} \epsilon^{-1/2} \log(\gamma^{-1} \epsilon^{-1}))$ total function and gradient evaluations. We also derive a lower bound of $\Omega(\gamma^{-1} \epsilon^{-1/2})$ on the worst-case number of gradient evaluations required by any deterministic first-order method, showing that, up to a logarithmic factor, no deterministic first-order method can improve upon ours.

優化器 · Learning · Bandits · 設計 · 泛函 ·

2023 年 2 月 23 日

Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws

Kush Bhatia,Wenshuo Guo,Jacob Steinhardt

from arxiv, Accepted to AISTATS 2023

Specifying reward functions for complex tasks like object manipulation or driving is challenging to do by hand. Reward learning seeks to address this by learning a reward model using human feedback on selected query policies. This shifts the burden of reward specification to the optimal design of the queries. We propose a theoretical framework for studying reward learning and the associated optimal experiment design problem. Our framework models rewards and policies as nonparametric functions belonging to subsets of Reproducing Kernel Hilbert Spaces (RKHSs). The learner receives (noisy) oracle access to a true reward and must output a policy that performs well under the true reward. For this setting, we first derive non-asymptotic excess risk bounds for a simple plug-in estimator based on ridge regression. We then solve the query design problem by optimizing these risk bounds with respect to the choice of query set and obtain a finite sample statistical rate, which depends primarily on the eigenvalue spectrum of a certain linear operator on the RKHSs. Despite the generality of these results, our bounds are stronger than previous bounds developed for more specialized problems. We specifically show that the well-studied problem of Gaussian process (GP) bandit optimization is a special case of our framework, and that our bounds either improve or are competitive with known regret guarantees for the Mat\'ern kernel.

優化器 · Extensibility · Performer · 正則化項 · 線性的 ·

2023 年 2 月 23 日

A subgradient method with constant step-size for $\ell_1$-composite optimization

Alessandro Scagliotti,Piero Colli Franzone

from arxiv, 17 pages, 2 figures

Subgradient methods are the natural extension to the non-smooth case of the classical gradient descent for regular convex optimization problems. However, in general, they are characterized by slow convergence rates, and they require decreasing step-sizes to converge. In this paper we propose a subgradient method with constant step-size for composite convex objectives with $\ell_1$-regularization. If the smooth term is strongly convex, we can establish a linear convergence result for the function values. This fact relies on an accurate choice of the element of the subdifferential used for the update, and on proper actions adopted when non-differentiability regions are crossed. Then, we propose an accelerated version of the algorithm, based on conservative inertial dynamics and on an adaptive restart strategy. Finally, we test the performances of our algorithms on some strongly and non-strongly convex examples.

規范化的 · 維數災難 · 模型評估 · Frobenius 范數 · Performer ·

2023 年 2 月 23 日

On the curse of dimensionality for Normalizing Flows

Andrea Coccaro,Marco Letizia,Humberto Reyes-Gonzalez,Riccardo Torre

from arxiv, 26 pages, 6 figures

Normalizing Flows have emerged as a powerful brand of generative models, as they not only allow for efficient sampling of complicated target distributions, but also deliver density estimation by construction. We propose here an in-depth comparison of coupling and autoregressive flows, both of the affine and rational quadratic spline type, considering four different architectures: Real-valued Non-Volume Preserving (RealNVP), Masked Autoregressive Flow (MAF), Coupling Rational Quadratic Spline (C-RQS), and Autoregressive Rational Quadratic Spline (A-RQS). We focus on different target distributions of increasing complexity with dimensionality ranging from 4 to 1000. The performances are discussed in terms of different figures of merit: the one-dimensional Wasserstein distance, the one-dimensional Kolmogorov-Smirnov test, the Frobenius norm of the difference between correlation matrices, and the training time. Our results indicate that the A-RQS algorithm stands out both in terms of accuracy and training speed. Nonetheless, all the algorithms are generally able, without much fine-tuning, to learn complex distributions with limited training data and in a reasonable time, of the order of hours on a Tesla V100 GPU. The only exception is the C-RQS, which takes significantly longer to train, and does not always provide good accuracy. All algorithms have been implemented using TensorFlow2 and TensorFlow Probability and made available on GitHub.

優化器 · 有向 · 最優化 · 線性組合 · 講稿 ·

2023 年 2 月 23 日

A gradient descent akin method for constrained optimization: algorithms and applications

Long Chen,Kai-Uwe Bletzinger,Nicolas R. Gauger,Yinyu Ye

We present a first-order method for solving constrained optimization problems. The method is derived from our previous work, a modified search direction method inspired by singular value decomposition. In this work, we simplify its computational framework to a ``gradient descent akin'' method (GDAM), i.e., the search direction is computed using a linear combination of the negative and normalized objective and constraint gradient. We give fundamental theoretical guarantees on the global convergence of the method. This work focuses on the algorithms and applications of GDAM. We present computational algorithms that adapt common strategies for the gradient descent method. We demonstrate the potential of the method using two engineering applications, shape optimization and sensor network localization. When practically implemented, GDAM is robust and very competitive in solving the considered large and challenging optimization problems.

隨機漫步 · 自助法/自舉法 · Analysis · 講稿 · 泛函 ·

2023 年 2 月 22 日

Fine Grained Analysis of High Dimensional Random Walks

Roy Gotlib,Tali Kaufman

One of the most important properties of high dimensional expanders is that high dimensional random walks converge rapidly. This property has proven to be extremely useful in variety of fields in the theory of computer science from agreement testing to sampling, coding theory and more. In this paper we present a state of the art result in a line of works analyzing the convergence of high dimensional random walks~\cite{DBLP:conf/innovations/KaufmanM17,DBLP:conf/focs/DinurK17, DBLP:conf/approx/KaufmanO18,DBLP:journals/corr/abs-2001-02827}, by presenting a \emph{structured} version of the result of~\cite{DBLP:journals/corr/abs-2001-02827}. While previous works examined the expansion in the viewpoint of the worst possible eigenvalue, in this work we relate the expansion of a function to the entire spectrum of the random walk operator using the structure of the function; We call such a theorem a Fine Grained High Order Random Walk Theorem. In sufficiently structured cases the fine grained result that we present here can be much better than the worst case while in the worst case our result is equivalent to~\cite{DBLP:journals/corr/abs-2001-02827}. In order to prove the Fine Grained High Order Random Walk Theorem we introduce a way to bootstrap the expansion of random walks on the vertices of a complex into a fine grained understanding of higher order random walks, provided that the expansion is good enough. In addition, our \emph{single} bootstrapping theorem can simultaneously yield our Fine Grained High Order Random Walk Theorem as well as the well known Trickling down Theorem. Prior to this work, High order Random walks theorems and Tricking down Theorem have been obtained from different proof methods.

估計/估計量 · 核化 · 近似 · Learning · 可交換的 ·

2023 年 2 月 22 日

Error Estimation for Random Fourier Features

Junwen Yao,N. Benjamin Erichson,Miles E. Lopes

from arxiv, Accepted to AISTATS 2023

Random Fourier Features (RFF) is among the most popular and broadly applicable approaches for scaling up kernel methods. In essence, RFF allows the user to avoid costly computations on a large kernel matrix via a fast randomized approximation. However, a pervasive difficulty in applying RFF is that the user does not know the actual error of the approximation, or how this error will propagate into downstream learning tasks. Up to now, the RFF literature has primarily dealt with these uncertainties using theoretical error bounds, but from a user's standpoint, such results are typically impractical -- either because they are highly conservative or involve unknown quantities. To tackle these general issues in a data-driven way, this paper develops a bootstrap approach to numerically estimate the errors of RFF approximations. Three key advantages of this approach are: (1) The error estimates are specific to the problem at hand, avoiding the pessimism of worst-case bounds. (2) The approach is flexible with respect to different uses of RFF, and can even estimate errors in downstream learning tasks. (3) The approach enables adaptive computation, so that the user can quickly inspect the error of a rough initial kernel approximation and then predict how much extra work is needed. Lastly, in exchange for all of these benefits, the error estimates can be obtained at a modest computational cost.

Continuity · 離散化 · 核化 · 粵港澳大灣區數字經濟研究院 · 細致平衡 ·

2023 年 2 月 21 日

A convergent finite-volume scheme for nonlocal cross-diffusion systems for multi-species populations

Ansgar Jüngel,Stefan Portisch,Antoine Zurek

An implicit Euler finite-volume scheme for a nonlocal cross-diffusion system on the one-dimensional torus, arising in population dynamics, is proposed and analyzed. The kernels are assumed to be in detailed balance and satisfy a weak cross-diffusion condition. The latter condition allows for negative off-diagonal coefficients and for kernels defined by an indicator function. The scheme preserves the nonnegativity of the densities, conservation of mass, and production of the Boltzmann and Rao entropies. The key idea is to ``translate'' the entropy calculations for the continuous equations to the finite-volume scheme, in particular to design discretizations of the mobilities, which guarantee a discrete chain rule even in the presence of nonlocal terms. Based on this idea, the existence of finite-volume solutions and the convergence of the scheme are proven. As a by-product, we deduce the existence of weak solutions to the continuous cross-diffusion system. Finally, we present some numerical experiments illustrating the behavior of the solutions to the nonlocal and associated local models.