亚洲黄色网站不卡免费-把女人弄的特爽视频

Based on an observation that additive Schwarz methods for general convex optimization can be interpreted as gradient methods, we propose an acceleration scheme for additive Schwarz methods. Adopting acceleration techniques developed for gradient methods such as momentum and adaptive restarting, the convergence rate of additive Schwarz methods is greatly improved. The proposed acceleration scheme does not require any a priori information on the levels of smoothness and sharpness of a target energy functional, so that it can be applied to various convex optimization problems. Numerical results for linear elliptic problems, nonlinear elliptic problems, nonsmooth problems, and nonsharp problems are provided to highlight the superiority and the broad applicability of the proposed scheme.

相關內容

優化器

關注 4

Extensibility · 平滑 · 模型評估 · 方陣 · 泛函 ·

2021 年 11 月 16 日

The Projection Extension Method: A Spectrally Accurate Technique for Complex Domains

Saad Qadeer,Ehssan Nazockdast,Boyce E. Griffith

from arxiv, 27 pages, approved for release by Pacific Northwest National Laboratory (PNNL-SA-168413)

An essential ingredient of a spectral method is the choice of suitable bases for test and trial spaces. On complex domains, these bases are harder to devise, necessitating the use of domain partitioning techniques such as the spectral element method. In this study, we introduce the Projection Extension (PE) method, an approach that yields spectrally accurate solutions to various problems on complex geometries without requiring domain decomposition. This technique builds on the insights used by extension methodologies such as the immersed boundary smooth extension and Smooth Forcing Extension (SFE) methods that are designed to improve the order of accuracy of the immersed boundary method. In particular, it couples an accurate extension procedure, that functions on arbitrary domains regardless of connectedness or regularity, with a least squares minimization of the boundary conditions. The resulting procedure is stable under iterative application and straightforward to generalize to higher dimensions. Moreover, it rapidly and robustly yields exponentially convergent solutions to a number of challenging test problems, including elliptic, parabolic, Newtonian fluid flow, and viscoelastic problems.

泛化理論 · 損失函數（機器學習） · 輸入分布 · 噪聲 · 經驗誤差 ·

2021 年 11 月 16 日

Generalization Bounds and Algorithms for Learning to Communicate over Additive Noise Channels

Nir Weinberger

An additive noise channel is considered, in which the distribution of the noise is nonparametric and unknown. The problem of learning encoders and decoders based on noise samples is considered. For uncoded communication systems, the problem of choosing a codebook and possibly also a generalized minimal distance decoder (which is parameterized by a covariance matrix) is addressed. High probability generalization bounds for the error probability loss function, as well as for a hinge-type surrogate loss function are provided. A stochastic-gradient based alternating-minimization algorithm for the latter loss function is proposed. In addition, a Gibbs-based algorithm that gradually expurgates an initial codebook from codewords in order to obtain a smaller codebook with improved error probability is proposed, and bounds on its average empirical error and generalization error, as well as a high probability generalization bound, are stated. Various experiments demonstrate the performance of the proposed algorithms. For coded systems, the problem of maximizing the mutual information between the input and the output with respect to the input distribution is addressed, and uniform convergence bounds for two different classes of input distributions are obtained.

噪聲 · 全 · 離散化 · AIM · 后向 ·

2021 年 11 月 16 日

Weak convergence rates for a full implicit scheme of stochastic Cahn-Hilliard equation with additive noise

Meng Cai,Siqing Gan,Yaozhong Hu

from arxiv, 25 pages

The aim of this study is the weak convergence rate of a temporal and spatial discretization scheme for stochastic Cahn-Hilliard equation with additive noise, where the spectral Galerkin method is used in space and the backward Euler scheme is used in time. The presence of the unbounded operator in front of the nonlinear term and the lack of the associated Kolmogorov equations make the error analysis much more challenging and demanding. To overcome these difficulties, we further exploit a novel approach proposed in [7] and combine it with Malliavin calculus to obtain an improved weak rate of convergence, in comparison with the corresponding strong convergence rates. The techniques used here are quite general and hence have the potential to be applied to other non-Markovian equations. As a byproduct the rate of the strong error can also be easily obtained.

正則化項 · Oracle · 秩 · 損失 · 廣義線性模型 ·

2021 年 11 月 15 日

Low-rank matrix recovery with non-quadratic loss: projected gradient method and regularity projection oracle

Lijun Ding,Yuqian Zhang,Yudong Chen

from arxiv, 30 pages and 3 figures

Existing results for low-rank matrix recovery largely focus on quadratic loss, which enjoys favorable properties such as restricted strong convexity/smoothness (RSC/RSM) and well conditioning over all low rank matrices. However, many interesting problems involve more general, non-quadratic losses, which do not satisfy such properties. For these problems, standard nonconvex approaches such as rank-constrained projected gradient descent (a.k.a. iterative hard thresholding) and Burer-Monteiro factorization could have poor empirical performance, and there is no satisfactory theory guaranteeing global and fast convergence for these algorithms. In this paper, we show that a critical component in provable low-rank recovery with non-quadratic loss is a regularity projection oracle. This oracle restricts iterates to low-rank matrices within an appropriate bounded set, over which the loss function is well behaved and satisfies a set of approximate RSC/RSM conditions. Accordingly, we analyze an (averaged) projected gradient method equipped with such an oracle, and prove that it converges globally and linearly. Our results apply to a wide range of non-quadratic low-rank estimation problems including one bit matrix sensing/completion, individualized rank aggregation, and more broadly generalized linear models with rank constraints.

線性的 · 衰減 · 估計/估計量 · Extensibility · 離散化 ·

2021 年 11 月 15 日

Convergence Analysis of A Second-order Accurate, Linear Numerical Scheme for The Landau-Lifshitz Equation with Large Damping Parameters

Yongyong Cai,Jingrun Chen,Cheng Wang,Changjian Xie

A second order accurate, linear numerical method is analyzed for the Landau-Lifshitz equation with large damping parameters. This equation describes the dynamics of magnetization, with a non-convexity constraint of unit length of the magnetization. The numerical method is based on the second-order backward differentiation formula in time, combined with an implicit treatment of the linear diffusion term and explicit extrapolation for the nonlinear terms. Afterward, a projection step is applied to normalize the numerical solution at a point-wise level. This numerical scheme has shown extensive advantages in the practical computations for the physical model with large damping parameters, which comes from the fact that only a linear system with constant coefficients (independent of both time and the updated magnetization) needs to be solved at each time step, and has greatly improved the numerical efficiency. Meanwhile, a theoretical analysis for this linear numerical scheme has not been available. In this paper, we provide a rigorous error estimate of the numerical scheme, in the discrete $\ell^{\infty}(0,T; \ell^2) \cap \ell^2(0,T; H_h^1)$ norm, under suitable regularity assumptions and reasonable ratio between the time step-size and the spatial mesh-size. In particular, the projection operation is nonlinear, and a stability estimate for the projection step turns out to be highly challenging. Such a stability estimate is derived in details, which will play an essential role in the convergence analysis for the numerical scheme, if the damping parameter is greater than 3.

TinyML · ML · 模型評估 · SC · 查準率/準確率 ·

2021 年 11 月 12 日

BSC: Block-based Stochastic Computing to Enable Accurate and Efficient TinyML

Yuhong Song,Edwin Hsing-Mean Sha,Qingfeng Zhuge,Rui Xu,Yongzhuo Zhang,Bingzhe Li,Lei Yang

from arxiv, Accept by ASP-DAC 2022

Along with the progress of AI democratization, machine learning (ML) has been successfully applied to edge applications, such as smart phones and automated driving. Nowadays, more applications require ML on tiny devices with extremely limited resources, like implantable cardioverter defibrillator (ICD), which is known as TinyML. Unlike ML on the edge, TinyML with a limited energy supply has higher demands on low-power execution. Stochastic computing (SC) using bitstreams for data representation is promising for TinyML since it can perform the fundamental ML operations using simple logical gates, instead of the complicated binary adder and multiplier. However, SC commonly suffers from low accuracy for ML tasks due to low data precision and inaccuracy of arithmetic units. Increasing the length of the bitstream in the existing works can mitigate the precision issue but incur higher latency. In this work, we propose a novel SC architecture, namely Block-based Stochastic Computing (BSC). BSC divides inputs into blocks, such that the latency can be reduced by exploiting high data parallelism. Moreover, optimized arithmetic units and output revision (OUR) scheme are proposed to improve accuracy. On top of it, a global optimization approach is devised to determine the number of blocks, which can make a better latency-power trade-off. Experimental results show that BSC can outperform the existing designs in achieving over 10% higher accuracy on ML tasks and over 6 times power reduction.

Neural Networks · 優化器 · Networks · 局部極小 · Networking ·

2019 年 12 月 19 日

Optimization for deep learning: theory and algorithms

Ruoyu Sun

from arxiv, 38 pages of main body; 5 pages of appendix; 12 pages of references

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

學成 · 深度強化學習 · 強化學習 · 樣本復雜度 · Atari ·

2019 年 1 月 10 日

Accelerated Methods for Deep Reinforcement Learning

Adam Stooke,Pieter Abbeel

from arxiv, v2: -Added game performance statistics summary for algorithm scaling across full Atari game set. -Added full set of learning curves (appendix). -Fixed images to remove phantom borders. -Streamlined some discussion, moved some details to appendix

Deep reinforcement learning (RL) has achieved many recent successes, yet experiment turn-around time remains a key bottleneck in research and in practice. We investigate how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs. We confirm that both policy gradient and Q-value learning algorithms can be adapted to learn using many parallel simulator instances. We further find it possible to train using batch sizes considerably larger than are standard, without negatively affecting sample complexity or final performance. We leverage these facts to build a unified framework for parallelization that dramatically hastens experiments in both classes of algorithm. All neural network computations use GPUs, accelerating both data collection and training. Our results include using an entire DGX-1 to learn successful strategies in Atari games in mere minutes, using both synchronous and asynchronous algorithms.

坐標下降 · 優化器 · Performer · 學成 · 在線 ·

2018 年 7 月 16 日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Akshita Bhandari,Chandramani Singh

from arxiv, 20 pages, 4 figures, 2 tables

We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.

Better · 強化學習 · 學成 · Performer · 最優化 ·

2018 年 4 月 24 日

Accelerated Reinforcement Learning

K. Lakshmanan

from arxiv, The proof is not complete as it has to be shown the algorithm tracks the ODE

Policy gradient methods are widely used in reinforcement learning algorithms to search for better policies in the parameterized policy space. They do gradient search in the policy space and are known to converge very slowly. Nesterov developed an accelerated gradient search algorithm for convex optimization problems. This has been recently extended for non-convex and also stochastic optimization. We use Nesterov's acceleration for policy gradient search in the well-known actor-critic algorithm and show the convergence using ODE method. We tested this algorithm on a scheduling problem. Here an incoming job is scheduled into one of the four queues based on the queue lengths. We see from experimental results that algorithm using Nesterov's acceleration has significantly better performance compared to algorithm which do not use acceleration. To the best of our knowledge this is the first time Nesterov's acceleration has been used with actor-critic algorithm.