男女一边脱一边亲一边膜_精品国产V一区二区三区_日韩免费视频一二三四_国产美女午夜区一区二区三区_亚洲欧美中文字幕_欧美亚洲图色视频_精品一区二区三区潘金莲

from arxiv, In this work, we proposed an enhanced Adam method via quadratic gradient and applied the quadratic gradient to the general numerical optimization problems. The quadratic gradient can indeed be used to build enhanced gradient methods for general optimization problems. There is a good chance that quadratic gradient can also be applied to quasi-Newton methods, such as the famous BFGS method

It might be inadequate for the line search technique for Newton's method to use only one floating point number. A column vector of the same size as the gradient might be better than a mere float number to accelerate each of the gradient elements with different rates. Moreover, a square matrix of the same order as the Hessian matrix might be helpful to correct the Hessian matrix. Chiang applied something between a column vector and a square matrix, namely a diagonal matrix, to accelerate the gradient and further proposed a faster gradient variant called quadratic gradient. In this paper, we present a new way to build a new version of the quadratic gradient. This new quadratic gradient doesn't satisfy the convergence conditions of the fixed Hessian Newton's method. However, experimental results show that it sometimes has a better performance than the original one in convergence rate. Also, Chiang speculates that there might be a relation between the Hessian matrix and the learning rate for the first-order gradient descent method. We prove that the floating number $\frac{1}{\epsilon + \max \{| \lambda_i | \}}$ can be a good learning rate of the gradient methods, where $\epsilon$ is a number to avoid division by zero and $\lambda_i$ the eigenvalues of the Hessian matrix.

相關內容

梯度

關注 2

梯(ti)度的(de)(de)本意是一(yi)(yi)個(ge)向(xiang)量（矢量），表示(shi)某(mou)一(yi)(yi)函數(shu)在該(gai)點處(chu)的(de)(de)方向(xiang)導數(shu)沿著該(gai)方向(xiang)取(qu)得最(zui)大值，即函數(shu)在該(gai)點處(chu)沿著該(gai)方向(xiang)（此(ci)梯(ti)度的(de)(de)方向(xiang)）變化最(zui)快(kuai)，變化率最(zui)大（為該(gai)梯(ti)度的(de)(de)模）。

奇異的 · 雅克比 · 近似 · 求逆 · 方陣 ·

2023 年 5 月 18 日

Two-step Newton's method for deflation-one singular zeros of analytic systems

Kisun Lee,Nan Li,Lihong Zhi

from arxiv, 22pages, 1 figure, 4 tables

We propose a two-step Newton's method for refining an approximation of a singular zero whose deflation process terminates after one step, also known as a deflation-one singularity. Given an isolated singular zero of a square analytic system, our algorithm exploits an invertible linear operator obtained by combining the Jacobian and a projection of the Hessian in the direction of the kernel of the Jacobian. We prove the quadratic convergence of the two-step Newton method when it is applied to an approximation of a deflation-one singular zero. Also, the algorithm requires a smaller size of matrices than the existing methods, making it more efficient. We demonstrate examples and experiments to show the efficiency of the method.

INFORMS · Performer · Less · FAST · 離散數學 ·

2023 年 5 月 17 日

On the extremal families for the Kruskal--Katona theorem

Oriol Serra,Lluís Vena

from arxiv, 44 pages, 3 figures, 1 table

In \cite[Serra, Vena, Extremal families for the Kruskal-Katona theorem]{sv21}, the authors have shown a characterization of the extremal families for the Kruskal-Katona Theorem. We further develop some of the arguments given in \cite{sv21} and give additional properties of these extremal families. F\"uredi-Griggs/M\"ors theorem from 1986/85 \cite{furgri86,mors85} claims that, for some cardinalities, the initial segment of the colexicographical is the unique extremal family; we extend their result as follows: the number of (non-isomorphic) extremal families strictly grows with the gap between the last two coefficients of the $k$-binomial decomposition. We also show that every family is an induced subfamily of an extremal family, and that, somewhat going in the opposite direction, every extremal family is close to being the inital segment of the colex order; namely, if the family is extremal, then after performing $t$ lower shadows, with $t=O(\log(\log n))$, we obtain the initial segment of the colexicographical order. We also give a ``fast'' algorithm to determine whether, for a given $t$ and $m$, there exists an extremal family of size $m$ for which its $t$-th lower shadow is not yet the initial segment in the colexicographical order. As a byproduct of these arguments, we give yet another characterization of the families of $k$-sets satisfying equality in the Kruskal--Katona theorem. Such characterization is, at first glance, less appealing than the one in \cite{sv21}, since the additional information that it provides is indirect. However, the arguments used to prove such characterization provide additional insight on the structure of the extremal families themselves.

離散化 · 可約的 · 近似 · AIM · 正則化項 ·

2023 年 5 月 17 日

A filtering monotonization approach for DG discretizations of hyperbolic problems

Giuseppe Orlando

We introduce a filtering technique for Discontinuous Galerkin approximations of hyperbolic problems. Following an approach already proposed for the Hamilton-Jacobi equations by other authors, we aim at reducing the spurious oscillations that arise in presence of discontinuities when high order spatial discretizations are employed. This goal is achieved using a filter function that keeps the high order scheme when the solution is regular and switches to a monotone low order approximation if it is not. The method has been implemented in the framework of the $deal.II$ numerical library, whose mesh adaptation capabilities are also used to reduce the region in which the low order approximation is used. A number of numerical experiments demonstrate the potential of the proposed filtering technique.

SGP · 近似 · Processing（編程語言） · 確切的 · MoDELS ·

2023 年 5 月 17 日

Efficient Modeling of Quasi-Periodic Data with Seasonal Gaussian Process

Ziang Zhang,Patrick Brown,Jamie Stafford

Quasi-periodicity refers to a pattern in a function where it appears periodic but has evolving amplitudes over time. This is often the case in practical settings such as the modeling of case counts of infectious disease or the carbon dioxide (CO2) concentration over time. In this paper, we introduce a class of Gaussian processes, called seasonal Gaussian Processes (sGP), for model-based inference of such quasi-periodic behavior. We illustrate that the exact sGP can be efficiently fit within $O(n)$ time using its state space representation for equally spaced locations. However, for large datasets with irregular spacing, the exact approach becomes computationally inefficient and unstable. To address this, we develop a continuous finite dimensional approximation for sGP using the seasonal B-spline (sB-spline) basis constructed by damping B-splines with sinusoidal functions. We prove that the proposed approximation converges in distribution to the true sGP as the number of basis functions increases, and show its superior approximation quality through numerical studies. We also provide a unified and interpretable way to define priors for the sGP, based on the notion of predictive standard deviation (PSD). Finally, we implement the proposed inference method on several real data examples to illustrate its practical usage.

PDE · 操作 · 得分 · Learning · 易處理的 ·

2023 年 5 月 16 日

A score-based operator Newton method for measure transport

Nisha Chandramoorthy,Florian Schaefer,Youssef Marzouk

from arxiv, 19 pages; 2 figures

Transportation of probability measures underlies many core tasks in statistics and machine learning, from variational inference to generative modeling. A typical goal is to represent a target probability measure of interest as the push-forward of a tractable source measure through a learned map. We present a new construction of such a transport map, given the ability to evaluate the score of the target distribution. Specifically, we characterize the map as a zero of an infinite-dimensional score-residual operator and derive a Newton-type method for iteratively constructing such a zero. We prove convergence of these iterations by invoking classical elliptic regularity theory for partial differential equations (PDE) and show that this construction enjoys rapid convergence, under smoothness assumptions on the target score. A key element of our approach is a generalization of the elementary Newton method to infinite-dimensional operators, other forms of which have appeared in nonlinear PDE and in dynamical systems. Our Newton construction, while developed in a functional setting, also suggests new iterative algorithms for approximating transport maps.

線性模型 · 線性的 · MoDELS · Performer · 優化器 ·

2023 年 5 月 16 日

The Power of Learned Locally Linear Models for Nonlinear Policy Optimization

Daniel Pfrommer,Max Simchowitz,Tyler Westenbroek,Nikolai Matni,Stephen Tu

A common pipeline in learning-based control is to iteratively estimate a model of system dynamics, and apply a trajectory optimization algorithm - e.g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost. This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems. We analyze an algorithm which iterates between estimating local linear models of nonlinear system dynamics and performing $\mathtt{iLQR}$-like policy updates. We demonstrate that this algorithm attains sample complexity polynomial in relevant problem parameters, and, by synthesizing locally stabilizing gains, overcomes exponential dependence in problem horizon. Experimental results validate the performance of our algorithm, and compare to natural deep-learning baselines.

線搜索 · ROC · 確切的 · 梯度下降法 · Analysis ·

2023 年 5 月 16 日

The Average Rate of Convergence of the Exact Line Search Gradient Descent Method

Thomas Yu

from arxiv, 16 pages, 4 figures

It is very well-known that when the exact line search gradient descent method is applied to a convex quadratic objective, the worst case rate of convergence (among all seed vectors) deteriorates as the condition number of the Hessian of the objective grows. By an elegant analysis by H. Akaike, it is generally believed -- but not proved -- that in the ill-conditioned regime the ROC for almost all initial vectors, and hence also the average ROC, is close to the worst case ROC. We complete Akaike's analysis using the theorem of center and stable manifolds. Our analysis also makes apparent the effect of an intermediate eigenvalue in the Hessian by establishing the following somewhat amusing result: In the absence of an intermediate eigenvalue, the average ROC gets arbitrarily fast -- not slow -- as the Hessian gets increasingly ill-conditioned. We discuss in passing some contemporary applications of exact line search GD to polynomial optimization problems arising from imaging and data sciences.

泛函 · 平滑 · 梯度下降法 · 歐氏空間 · 估計/估計量 ·

2023 年 5 月 16 日

Accelerated gradient descent method for functionals of probability measures by new convexity and smoothness based on transport maps

Ken'ichiro Tanaka

from arxiv, 31 pages

We consider problems of minimizing functionals $\mathcal{F}$ of probability measures on the Euclidean space. To propose an accelerated gradient descent algorithm for such problems, we consider gradient flow of transport maps that give push-forward measures of an initial measure. Then we propose a deterministic accelerated algorithm by extending Nesterov's acceleration technique with momentum. This algorithm do not based on the Wasserstein geometry. Furthermore, to estimate the convergence rate of the accelerated algorithm, we introduce new convexity and smoothness for $\mathcal{F}$ based on transport maps. As a result, we can show that the accelerated algorithm converges faster than a normal gradient descent algorithm. Numerical experiments support this theoretical result.

單純形 · 優化器 · Sphering · INFORMS · 信息理論 ·

2023 年 5 月 15 日

Convex optimization over a probability simplex

James Chok,Geoffrey M. Vasil

We propose a new iteration scheme, the Cauchy-Simplex, to optimize convex problems over the probability simplex $\{w\in\mathbb{R}^n\ |\ \sum_i w_i=1\ \textrm{and}\ w_i\geq0\}$. Other works have taken steps to enforce positivity or unit normalization automatically but never simultaneously within a unified setting. This paper presents a natural framework for manifestly requiring the probability condition. Specifically, we map the simplex to the positive quadrant of a unit sphere, envisage gradient descent in latent variables, and map the result back in a way that only depends on the simplex variable. Moreover, proving rigorous convergence results in this formulation leads inherently to tools from information theory (e.g. cross entropy and KL divergence). Each iteration of the Cauchy-Simplex consists of simple operations, making it well-suited for high-dimensional problems. We prove that it has a convergence rate of ${O}(1/T)$ for convex functions, and numerical experiments of projection onto convex hulls show faster convergence than similar algorithms. Finally, we apply our algorithm to online learning problems and prove the convergence of the average regret for (1) Prediction with expert advice and (2) Universal Portfolios.

Neural Networks · Networking · 可約的 · Continuity · 推斷 ·

2021 年 6 月 21 日

A Survey of Quantization Methods for Efficient Neural Network Inference

Amir Gholami,Sehoon Kim,Zhen Dong,Zhewei Yao,Michael W. Mahoney,Kurt Keutzer

from arxiv, Book Chapter: Low-Power Computer Vision: Improving the Efficiency of Artificial Intelligence

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.