苹果电影在线观看免费高清,国色天香网站,久久久久国产一级毛高清版

In distributed optimization problems, a technique called gradient coding, which involves replicating data points, has been used to mitigate the effect of straggling machines. Recent work has studied approximate gradient coding, which concerns coding schemes where the replication factor of the data is too low to recover the full gradient exactly. Our work is motivated by the challenge of creating approximate gradient coding schemes that simultaneously work well in both the adversarial and stochastic models. To that end, we introduce novel approximate gradient codes based on expander graphs, in which each machine receives exactly two blocks of data points. We analyze the decoding error both in the random and adversarial straggler setting, when optimal decoding coefficients are used. We show that in the random setting, our schemes achieve an error to the gradient that decays exponentially in the replication factor. In the adversarial setting, the error is nearly a factor of two smaller than any existing code with similar performance in the random setting. We show convergence bounds both in the random and adversarial setting for gradient descent under standard assumptions using our codes. In the random setting, our convergence rate improves upon block-box bounds. In the adversarial setting, we show that gradient descent can converge down to a noise floor that scales linearly with the adversarial error to the gradient. We demonstrate empirically that our schemes achieve near-optimal error in the random setting and converge faster than algorithms which do not use the optimal decoding coefficients.

相關內容

優化器

關注 4

優化器 · 散度 · GANs · 最優化 · Machine Learning ·

2021 年 10 月 6 日

Solve Minimax Optimization by Anderson Acceleration

Huan He,Shifan Zhao,Yuanzhe Xi,Joyce C Ho,Yousef Saad

from arxiv, 21 Pages

Many modern machine learning algorithms such as generative adversarial networks (GANs) and adversarial training can be formulated as minimax optimization. Gradient descent ascent (GDA) is the most commonly used algorithm due to its simplicity. However, GDA can converge to non-optimal minimax points. We propose a new minimax optimization framework, GDA-AM, that views the GDAdynamics as a fixed-point iteration and solves it using Anderson Mixing to con-verge to the local minimax. It addresses the diverging issue of simultaneous GDAand accelerates the convergence of alternating GDA. We show theoretically that the algorithm can achieve global convergence for bilinear problems under mild conditions. We also empirically show that GDA-AMsolves a variety of minimax problems and improves GAN training on several datasets

近似 · 示例 · 約束 · 收縮 · SimPLe ·

2021 年 10 月 5 日

An Approximation Algorithm for Maximum Stable Matching with Ties and Constraints

Yu Yokoi

from arxiv, 19 pages

We present a polynomial-time $\frac{3}{2}$-approximation algorithm for the problem of finding a maximum-cardinality stable matching in a many-to-many matching model with ties and laminar constraints on both sides. We formulate our problem using a bipartite multigraph whose vertices are called workers and firms, and edges are called contracts. Our algorithm is described as the computation of a stable matching in an auxiliary instance, in which each contract is replaced with three of its copies and all agents have strict preferences on the copied contracts. The construction of this auxiliary instance is symmetric for the two sides, which facilitates a simple symmetric analysis. We use the notion of matroid-kernel for computation in the auxiliary instance and exploit the base-orderability of laminar matroids to show the approximation ratio. In a special case in which each worker is assigned at most one contract and each firm has a strict preference, our algorithm defines a $\frac{3}{2}$-approximation mechanism that is strategy-proof for workers.

雅克比 · MoDELS · Machine Learning · 近似 · 泛函 ·

2021 年 10 月 4 日

Differentiable Spline Approximations

Minsu Cho,Aditya Balu,Ameya Joshi,Anjana Deva Prasad,Biswajit Khara,Soumik Sarkar,Baskar Ganapathysubramanian,Adarsh Krishnamurthy,Chinmay Hegde

from arxiv, 9 pages, accepted in Neurips 2021

The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend gradient-based optimization to functions well modeled by splines, which encompass a large family of piecewise polynomial models. We derive the form of the (weak) Jacobian of such functions and show that it exhibits a block-sparse structure that can be computed implicitly and efficiently. Overall, we show that leveraging this redesigned Jacobian in the form of a differentiable "layer" in predictive models leads to improved performance in diverse applications such as image segmentation, 3D point cloud reconstruction, and finite element analysis.

優化器 · 近似 · 控制器 · 離散化 · 分段 ·

2021 年 10 月 4 日

Discrete-time approximation for stochastic optimal control problems under the $G$-expectation framework

Lianzi Jiang

from arxiv, 20 pages

In this paper, we propose a class of discrete-time approximation schemes for stochastic optimal control problems under the $G$-expectation framework. The proposed schemes are constructed recursively based on piecewise constant policy. We prove the convergence of the discrete schemes and determine the convergence rates. Several numerical examples are presented to illustrate the effectiveness of the obtained results.

塊坐標下降 · 坐標下降 · 駐點 · 非凸 · 塊 ·

2021 年 10 月 3 日

Convergence and complexity of block coordinate descent with diminishing radius for nonconvex optimization

Hanbaek Lyu

from arxiv, 20 pages, 2 figure. Complexity bounds, robustness added

Block coordinate descent (BCD), also known as nonlinear Gauss-Seidel, is a simple iterative algorithm for nonconvex optimization that sequentially minimizes the objective function in each block coordinate while the other coordinates are held fixed. We propose a version of BCD that, for block multi-convex and smooth objective functions under constraints, is guaranteed to converge to the stationary points with worst-case rate of convergence of $O((\log n)^{2}/n)$ for $n$ iterations, and a bound of $O(\epsilon^{-1}(\log \epsilon^{-1})^{2})$ for the number of iterations to achieve an $\epsilon$-approximate stationary point. Furthermore, we show that these results continue to hold even when the convex sub-problems are inexactly solved if the optimality gaps are uniformly summable against initialization. A key idea is to restrict the parameter search within a diminishing radius to promote stability of iterates. As an application, we provide an alternating least squares algorithm with diminishing radius for nonnegative CP tensor decomposition that converges to the stationary points of the reconstruction error with the same robust worst-case convergence rate and complexity bounds. We also experimentally validate our results with both synthetic and real-world data and demonstrate that using auxiliary search radius restriction can in fact improve the rate of convergence.

估計/估計量 · Performer · 方差 · 有偏 · 隨機變量 ·

2021 年 10 月 1 日

Expected Validation Performance and Estimation of a Random Variable's Maximum

Jesse Dodge,Suchin Gururangan,Dallas Card,Roy Schwartz,Noah A. Smith

Research in NLP is often supported by experimental results, and improved reporting of such results can lead to better understanding and more reproducible science. In this paper we analyze three statistical estimators for expected validation performance, a tool used for reporting performance (e.g., accuracy) as a function of computational budget (e.g., number of hyperparameter tuning experiments). Where previous work analyzing such estimators focused on the bias, we also examine the variance and mean squared error (MSE). In both synthetic and realistic scenarios, we evaluate three estimators and find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias; the estimator with the smallest MSE strikes a balance between bias and variance, displaying a classic bias-variance tradeoff. We use expected validation performance to compare between different models, and analyze how frequently each estimator leads to drawing incorrect conclusions about which of two models performs best. We find that the two biased estimators lead to the fewest incorrect conclusions, which hints at the importance of minimizing variance and MSE.

估計/估計量 · 平滑 · 優化器 · 統計量 · CC ·

2021 年 10 月 1 日

A Dimension-free Computational Upper-bound for Smooth Optimal Transport Estimation

Adrien Vacher,Boris Muzellec,Alessandro Rudi,Francis Bach,Francois-Xavier Vialard

from arxiv, 30 pages

It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality. Despite recent efforts to improve the rate of estimation with the smoothness of the problem, the computational complexity of these recently proposed methods still degrades exponentially with the dimension. In this paper, thanks to an infinite-dimensional sum-of-squares representation, we derive a statistical estimator of smooth optimal transport which achieves a precision $\varepsilon$ from $\tilde{O}(\varepsilon^{-2})$ independent and identically distributed samples from the distributions, for a computational cost of $\tilde{O}(\varepsilon^{-4})$ when the smoothness increases, hence yielding dimension-free statistical and computational rates, with potentially exponentially dimension-dependent constants.

近似 · 馬爾可夫鏈 · 轉移概率 · 標注 ·

2021 年 10 月 1 日

Approximate Bisimulation Minimisation

Stefan Kiefer,Qiyi Tang

from arxiv, Full version of an FSTTCS'21 paper

We propose polynomial-time algorithms to minimise labelled Markov chains whose transition probabilities are not known exactly, have been perturbed, or can only be obtained by sampling. Our algorithms are based on a new notion of an approximate bisimulation quotient, obtained by lumping together states that are exactly bisimilar in a slightly perturbed system. We present experiments that show that our algorithms are able to recover the structure of the bisimulation quotient of the unperturbed system.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

似然 · 估計/估計量 · 最大似然估計 · 極大似然 · MoDELS ·

2018 年 9 月 24 日

Implicit Maximum Likelihood Estimation

Ke Li,Jitendra Malik

from arxiv, 21 pages, 4 figures. In the interest of promoting discussion, we make the reviews available at //people.eecs.berkeley.edu/~ke.li/papers/imle_reviews.pdf

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.