又黄又爽又色的视频免费_亚洲综合在线观看一区二区三区_免费国产在线观看_久久欧美AⅤ无码精品色午夜麻_国产清纯白嫩初高中在线播放_国产男女爽爽爽爽爽免费视频_AV在线播放日韩亚洲欧我不卡

We study variance reduction methods for extragradient (EG) algorithms for a class of variational inequalities satisfying a classical error-bound condition. Previously, linear convergence was only known to hold under strong monotonicity. The error-bound condition is much weaker than strong monotonicity and captures a larger class of problems, including bilinear saddle-point problems such as those arising from two-player zero-sum Nash equilibrium computation. We show that EG algorithms with SVRG-style variance reduction (SVRG-EG) achieve linear convergence under the error-bound condition. In addition, motivated by the empirical success of increasing iterate averaging techniques in solving saddle-point problems, we also establish new convergence results for variance-reduced EG with increasing iterate averaging. Finally, we conduct numerical experiments to demonstrate the advantage of SVRG-EG, with and without increasing iterate averaging, over deterministic EG.

相關內容

關注 0

Eurographics是唯(wei)一在歐洲(zhou)范(fan)圍內(nei)真正的(de)專(zhuan)業計算機圖形(xing)協(xie)會。它匯集了來(lai)自世界各地(di)(di)的(de)圖形(xing)專(zhuan)家(jia)，該協(xie)會支(zhi)持其(qi)成員(yuan)推進(jin)計算機圖形(xing)學以及多媒體，科學可視化和人機界面等相關(guan)領域的(de)最新技(ji)術水平(ping)。通過其(qi)全球(qiu)成員(yuan)資格，EG與美國(guo)，日本和其(qi)他國(guo)家(jia)/地(di)(di)區的(de)發展保持著密切(qie)聯系，從而促進(jin)了全球(qiu)范(fan)圍內(nei)科學技(ji)術信息(xi)和技(ji)能(neng)的(de)交流。官網地(di)(di)址：

豪斯多夫距離 · Performer · 泛化理論 · 最優化 · 優化器 ·

2023 年 7 月 25 日

Computing the Gromov--Hausdorff distance using first-order methods

Vladyslav Oles

The Gromov--Hausdorff distance measures the difference in shape between compact metric spaces and poses a notoriously difficult problem in combinatorial optimization. We introduce its quadratic relaxation over a convex polytope whose solutions provably deliver the Gromov--Hausdorff distance. The optimality guarantee is enabled by the fact that the search space of our approach is not constrained to a generalization of bijections, unlike in other relaxations such as the Gromov--Wasserstein distance. We suggest the Frank--Wolfe algorithm with $O(n^3)$-time iterations for solving the relaxation and numerically demonstrate its performance on metric spaces of hundreds of points. In particular, we obtain a new upper bound of the Gromov--Hausdorff distance between the unit circle and the unit hemisphere equipped with Euclidean metric. Our approach is implemented as a Python package dGH.

Performer · 連結 · 解碼 · 代碼 · LDPC ·

2023 年 7 月 25 日

On the Error-Reducing Properties of Superposition Codes

Kirill Andreev,Pavel Rybin,Alexey Frolov

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Next-generation wireless communication systems impose much stricter requirements for transmission rate, latency, and reliability. The peak data rate of 6G networks should be no less than 1 Tb/s, which is comparable to existing long-haul optical transport networks. It is believed that using long error-correcting codes (ECC) with soft-decision decoding (SDD) is not feasible in this case due to the resulting high power consumption. On the other hand, ECC with hard-decision decoding (HDD) suffers from significant performance degradation. In this paper, we consider a concatenated solution consisting of an outer long HDD code and an inner short SDD code. The latter code is a crucial component of the system and the focus of our research. Due to its short length, the code cannot correct all errors, but it is designed to minimize the number of errors. Such codes are known as error-reducing codes. We investigate the error-reducing properties of superposition codes. Initially, we explore sparse regression codes (SPARCs) with Gaussian signals. This approach outperforms error-reducing binary LDPC codes optimized by Barakatain, et al. (2018) in terms of performance but faces limitations in practical applicability due to high implementation complexity. Subsequently, we propose an LDPC-based superposition code scheme with low-complexity soft successive interference cancellation (SIC) decoding. This scheme demonstrates comparable performance to SPARCs while maintaining manageable complexity. Numerical results were obtained for inner codes with an overhead (OH) of 8.24% within a concatenated scheme (15% OH) with an outer hard-decision decoded staircase code (6.25% OH).

Subspace · 估計/估計量 · Learning · 均值 · Extensibility ·

2023 年 7 月 25 日

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

Puning Zhao,Zhiguo Wan

Robust distributed learning with Byzantine failures has attracted extensive research interests in recent years. However, most of existing methods suffer from curse of dimensionality, which is increasingly serious with the growing complexity of modern machine learning models. In this paper, we design a new method that is suitable for high dimensional problems, under arbitrary number of Byzantine attackers. The core of our design is a direct high dimensional semi-verified mean estimation method. Our idea is to identify a subspace first. The components of mean value perpendicular to this subspace can be estimated via gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. We then use our new method as the aggregator of distributed learning problems. Our theoretical analysis shows that the new method has minimax optimal statistical rates. In particular, the dependence on dimensionality is significantly improved compared with previous works.

Weight · 語言模型化 · MoDELS · HTTPS · Analysis ·

2023 年 7 月 25 日

QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Jerry Chee,Yaohui Cai,Volodymyr Kuleshov,Christopher De Sa

This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code can be found at //github.com/jerry-chee/QuIP .

曲率 · 優化器 · Better · 流形 · CASE ·

2023 年 7 月 24 日

Curvature and complexity: Better lower bounds for geodesically convex optimization

Christopher Criscitiello,Nicolas Boumal

from arxiv, v1 to v2: Renamed the method of Rusciano 2019 from "center-of-gravity method" to "centerpoint method"

We study the query complexity of geodesically convex (g-convex) optimization on a manifold. To isolate the effect of that manifold's curvature, we primarily focus on hyperbolic spaces. In a variety of settings (smooth or not; strongly g-convex or not; high- or low-dimensional), known upper bounds worsen with curvature. It is natural to ask whether this is warranted, or an artifact. For many such settings, we propose a first set of lower bounds which indeed confirm that (negative) curvature is detrimental to complexity. To do so, we build on recent lower bounds (Hamilton and Moitra, 2021; Criscitiello and Boumal, 2022) for the particular case of smooth, strongly g-convex optimization. Using a number of techniques, we also secure lower bounds which capture dependence on condition number and optimality gap, which was not previously the case. We suspect these bounds are not optimal. We conjecture optimal ones, and support them with a matching lower bound for a class of algorithms which includes subgradient descent, and a lower bound for a related game. Lastly, to pinpoint the difficulty of proving lower bounds, we study how negative curvature influences (and sometimes obstructs) interpolation with g-convex functions.

估計/估計量 · 控制器 · PID · 時間步 · 設計 ·

2023 年 7 月 24 日

Stability of step size control based on a posteriori error estimates

Hendrik Ranocha,Jan Giesselmann

A posteriori error estimates based on residuals can be used for reliable error control of numerical methods. Here, we consider them in the context of ordinary differential equations and Runge-Kutta methods. In particular, we take the approach of Dedner & Giesselmann (2016) and investigate it when used to select the time step size. We focus on step size control stability when combined with explicit Runge-Kutta methods and demonstrate that a standard I controller is unstable while more advanced PI and PID controllers can be designed to be stable. We compare the stability properties of residual-based estimators and classical error estimators based on an embedded Runge-Kutta method both analytically and in numerical experiments.

Oracle · SAT · SODA · Less · 線性的 ·

2023 年 7 月 21 日

Computations with polynomial evaluation oracle: ruling out superlinear SETH-based lower bounds

Tatiana Belova,Alexander S. Kulikov,Ivan Mihajlin,Olga Ratseeva,Grigory Reznikov,Denil Sharipov

The field of fine-grained complexity aims at proving conditional lower bounds on the time complexity of computational problems. One of the most popular assumptions, Strong Exponential Time Hypothesis (SETH), implies that SAT cannot be solved in $2^{(1-\epsilon)n}$ time. In recent years, it has been proved that known algorithms for many problems are optimal under SETH. Despite the wide applicability of SETH, for many problems, there are no known SETH-based lower bounds, so the quest for new reductions continues. Two barriers for proving SETH-based lower bounds are known. Carmosino et al. (ITCS 2016) introduced the Nondeterministic Strong Exponential Time Hypothesis (NSETH) stating that TAUT cannot be solved in time $2^{(1-\epsilon)n}$ even if one allows nondeterminism. They used this hypothesis to show that some natural fine-grained reductions would be difficult to obtain: proving that, say, 3-SUM requires time $n^{1.5+\epsilon}$ under SETH, breaks NSETH and this, in turn, implies strong circuit lower bounds. Recently, Belova et al. (SODA 2023) introduced the so-called polynomial formulations to show that for many NP-hard problems, proving any explicit exponential lower bound under SETH also implies strong circuit lower bounds. We prove that for a range of problems from P, including $k$-SUM and triangle detection, proving superlinear lower bounds under SETH is challenging as it implies new circuit lower bounds. To this end, we show that these problems can be solved in nearly linear time with oracle calls to evaluating a polynomial of constant degree. Then, we introduce a strengthening of SETH stating that solving SAT in time $2^{(1-\varepsilon)n}$ is difficult even if one has constant degree polynomial evaluation oracle calls. This hypothesis is stronger and less believable than SETH, but refuting it is still challenging: we show that this implies circuit lower bounds.

Adam · 超參數 · 優化器 · CASE · 情景 ·

2023 年 7 月 20 日

Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case

Meixuan He,Yuqing Liang,Jinlan Liu,Dongpo Xu

Adam is a commonly used stochastic optimization algorithm in machine learning. However, its convergence is still not fully understood, especially in the non-convex setting. This paper focuses on exploring hyperparameter settings for the convergence of vanilla Adam and tackling the challenges of non-ergodic convergence related to practical application. The primary contributions are summarized as follows: firstly, we introduce precise definitions of ergodic and non-ergodic convergence, which cover nearly all forms of convergence for stochastic optimization algorithms. Meanwhile, we emphasize the superiority of non-ergodic convergence over ergodic convergence. Secondly, we establish a weaker sufficient condition for the ergodic convergence guarantee of Adam, allowing a more relaxed choice of hyperparameters. On this basis, we achieve the almost sure ergodic convergence rate of Adam, which is arbitrarily close to $o(1/\sqrt{K})$. More importantly, we prove, for the first time, that the last iterate of Adam converges to a stationary point for non-convex objectives. Finally, we obtain the non-ergodic convergence rate of $O(1/K)$ for function values under the Polyak-Lojasiewicz (PL) condition. These findings build a solid theoretical foundation for Adam to solve non-convex stochastic optimization problems.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

優化器 · MoDELS · 分布式機器學習 · Performer · CIFAR-10 ·

2020 年 2 月 18 日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Yikai Yan,Chaoyue Niu,Yucheng Ding,Zhenzhe Zheng,Fan Wu,Guihai Chen,Shaojie Tang,Zhihua Wu

from arxiv, ICML 2020 Submission

Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue in federated learning: intermittent client availability, where the set of eligible clients may change during the training process. Such an intermittent client availability model would significantly deteriorate the performance of the classical Federated Averaging algorithm (FedAvg for short). We propose a simple distributed non-convex optimization algorithm, called Federated Latest Averaging (FedLaAvg for short), which leverages the latest gradients of all clients, even when the clients are not available, to jointly update the global model in each iteration. Our theoretical analysis shows that FedLaAvg attains the convergence rate of $O(1/(N^{1/4} T^{1/2}))$, achieving a sublinear speedup with respect to the total number of clients. We implement and evaluate FedLaAvg with the CIFAR-10 dataset. The evaluation results demonstrate that FedLaAvg indeed reaches a sublinear speedup and achieves 4.23% higher test accuracy than FedAvg.