亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper deals with a general class of algorithms for the solution of fixed-point problems that we refer to as \emph{Anderson--Pulay acceleration}. This family includes the DIIS technique and its variant sometimes called commutator-DIIS, both introduced by Pulay in the 1980s to accelerate the convergence of self-consistent field procedures in quantum chemistry, as well as the related Anderson acceleration which dates back to the 1960s, and the wealth of techniques they have inspired. Such methods aim at accelerating the convergence of any fixed-point iteration method by combining several iterates in order to generate the next one at each step. This extrapolation process is characterised by its \emph{depth}, i.e. the number of previous iterates stored, which is a crucial parameter for the efficiency of the method. It is generally fixed to an empirical value. In the present work, we consider two parameter-driven mechanisms to let the depth vary along the iterations. In the first one, the depth grows until a certain nondegeneracy condition is no longer satisfied; then the stored iterates (save for the last one) are discarded and the method "restarts". In the second one, we adapt the depth continuously by eliminating at each step some of the oldest, less relevant, iterates. In an abstract and general setting, we prove under natural assumptions the local convergence and acceleration of these two adaptive Anderson--Pulay methods, and we show that one can theoretically achieve a superlinear convergence rate with each of them. We then investigate their behaviour in quantum chemistry calculations. These numerical experiments show that both adaptive variants exhibit a faster convergence than a standard fixed-depth scheme, and require on average less computational effort per iteration. This study is complemented by a review of known facts on the DIIS, in particular its link with the Anderson acceleration and some multisecant-type quasi-Newton methods.

相關內容

LESS 是(shi)一個開源的(de)樣式(shi)語(yu)言(yan),受(shou)到 Sass 的(de)影(ying)響。嚴(yan)格來說,LESS 是(shi)一個嵌套的(de)元語(yu)言(yan),符合語(yu)法(fa)規范的(de) CSS 語(yu)句(ju)也是(shi)符合規范的(de) Less 代碼。

Understanding the convergence properties of learning dynamics in repeated auctions is a timely and important question in the area of learning in auctions, with numerous applications in, e.g., online advertising markets. This work focuses on repeated first price auctions where bidders with fixed values for the item learn to bid using mean-based algorithms -- a large class of online learning algorithms that include popular no-regret algorithms such as Multiplicative Weights Update and Follow the Perturbed Leader. We completely characterize the learning dynamics of mean-based algorithms, in terms of convergence to a Nash equilibrium of the auction, in two senses: (1) time-average: the fraction of rounds where bidders play a Nash equilibrium approaches 1 in the limit; (2)last-iterate: the mixed strategy profile of bidders approaches a Nash equilibrium in the limit. Specifically, the results depend on the number of bidders with the highest value: - If the number is at least three, the bidding dynamics almost surely converges to a Nash equilibrium of the auction, both in time-average and in last-iterate. - If the number is two, the bidding dynamics almost surely converges to a Nash equilibrium in time-average but not necessarily in last-iterate. - If the number is one, the bidding dynamics may not converge to a Nash equilibrium in time-average nor in last-iterate. Our discovery opens up new possibilities in the study of convergence dynamics of learning algorithms.

For a map (function) $F(x):\ftwo^n\rightarrow\ftwo^n$ and a given $y$ in the image of $F$ the problem of \emph{local inversion} of $F$ is to find all inverse images $x$ in $\ftwo^n$ such that $y=F(x)$. In Cryptology, such a problem arises in Cryptanalysis of One way Functions (OWFs). The well known TMTO attack in Cryptanalysis is a probabilistic algorithm for computing one solution of local inversion using $O(\sqrt N)$ order computation in offline as well as online for $N=2^n$. This paper proposes a complete algorithm for solving the local inversion problem which uses linear complexity for a unique solution in a periodic orbit. The algorithm is shown to require an offline computation to solve a hard problem (possibly requiring exponential computation) and an online computation dependent on $y$ that of repeated forward evaluation $F(x)$ on points $x$ in $\ff_{2^n}$ which is polynomial time at each evaluation. However the forward evaluation is repeated at most as many number of times as the Linear Complexity of the sequence $\{y,F(y),\ldots\}$ to get one possible solution when this sequence is periodic. All other solutions are obtained in chains $\{e,F(e),\ldots\}$ for all points $e$ in the Garden of Eden (GOE) of the map $F$. Hence a solution $x$ exists iff either the former sequence is periodic or a solution occurs in a chain starting from a point in GOE. The online computation then turns out to be polynomial time $O(L^k)$ in the linear complexity $L$ of the sequence to compute one possible solution in a periodic orbit or $O(l)$ the chain length for a fixed $n$. Hence this is a complete algorithm for solving the problem of finding all rational solutions $x$ of the equation $F(x)=y$ for a given $y$ and a map $F$ in $\ff_{2^n}$.

We present a fundamentally new regularization method for the solution of the Fredholm integral equation of the first kind, in which we incorporate solutions corresponding to a range of Tikhonov regularizers into the end result. This method identifies solutions within a much larger function space, spanned by this set of regularized solutions, than is available to conventional regularizaton methods. Each of these solutions is regularized to a different extent. In effect, we combine the stability of solutions with greater degrees of regularization with the resolution of those that are less regularized. In contrast, current methods involve selection of a single, or in some cases several, regularization parameters that define an optimal degree of regularization. Because the identified solution is within the span of a set of differently-regularized solutions, we call this method \textit{span of regularizations}, or SpanReg. We demonstrate the performance of SpanReg through a non-negative least squares analysis employing a Gaussian basis, and demonstrate the improved recovery of bimodal Gaussian distribution functions as compared to conventional methods. We also demonstrate that this method exhibits decreased dependence of the end result on the optimality of regularization parameter selection. We further illustrate the method with an application to myelin water fraction mapping in the human brain from experimental magnetic resonance imaging relaxometry data. We expect SpanReg to be widely applicable as an effective new method for regularization of inverse problems.

For the approximation and simulation of twofold iterated stochastic integrals and the corresponding L\'{e}vy areas w.r.t. a multi-dimensional Wiener process, we review four algorithms based on a Fourier series approach. Especially, the very efficient algorithm due to Wiktorsson and a newly proposed algorithm due to Mrongowius and R\"ossler are considered. To put recent advances into context, we analyse the four Fourier-based algorithms in a unified framework to highlight differences and similarities in their derivation. A comparison of theoretical properties is complemented by a numerical simulation that reveals the order of convergence for each algorithm. Further, concrete instructions for the choice of the optimal algorithm and parameters for the simulation of solutions for stochastic (partial) differential equations are given. Additionally, we provide advice for an efficient implementation of the considered algorithms and incorporated these insights into an open source toolbox that is freely available for both Julia and MATLAB programming languages. The performance of this toolbox is analysed by comparing it to some existing implementations, where we observe a significant speed-up.

We consider infinite-horizon discounted Markov decision problems with finite state and action spaces. We show that with direct parametrization in the policy space, the weighted value function, although non-convex in general, is both quasi-convex and quasi-concave. While quasi-convexity helps explain the convergence of policy gradient methods to global optima, quasi-concavity hints at their convergence guarantees using arbitrarily large step sizes that are not dictated by the Lipschitz constant charactering smoothness of the value function. In particular, we show that when using geometrically increasing step sizes, a general class of policy mirror descent methods, including the natural policy gradient method and a projected Q-descent method, all enjoy a linear rate of convergence without relying on entropy or other strongly convex regularization. In addition, we develop a theory of weak gradient-mapping dominance and use it to prove sharper sublinear convergence rate of the projected policy gradient method. Finally, we also analyze the convergence rate of an inexact policy mirror descent method and estimate its sample complexity under a simple generative model.

Despite their overwhelming capacity to overfit, deep neural networks trained by specific optimization algorithms tend to generalize well to unseen data. Recently, researchers explained it by investigating the implicit regularization effect of optimization algorithms. A remarkable progress is the work (Lyu&Li, 2019), which proves gradient descent (GD) maximizes the margin of homogeneous deep neural networks. Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process. However, theoretical guarantee for the generalization of adaptive optimization algorithms is still lacking. In this paper, we study the implicit regularization of adaptive optimization algorithms when they are optimizing the logistic loss on homogeneous deep neural networks. We prove that adaptive algorithms that adopt exponential moving average strategy in conditioner (such as Adam and RMSProp) can maximize the margin of the neural network, while AdaGrad that directly sums historical squared gradients in conditioner can not. It indicates superiority on generalization of exponential moving average strategy in the design of the conditioner. Technically, we provide a unified framework to analyze convergent direction of adaptive optimization algorithms by constructing novel adaptive gradient flow and surrogate margin. Our experiments can well support the theoretical findings on convergent direction of adaptive optimization algorithms.

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

北京阿比特科技有限公司