亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper we provide a rigorous convergence analysis for the renowned particle swarm optimization method by using tools from stochastic calculus and the analysis of partial differential equations. Based on a time-continuous formulation of the particle dynamics as a system of stochastic differential equations, we establish convergence to a global minimizer of a possibly nonconvex and nonsmooth objective function in two steps. First, we prove consensus formation of an associated mean-field dynamics by analyzing the time-evolution of the variance of the particle distribution. We then show that this consensus is close to a global minimizer by employing the asymptotic Laplace principle and a tractability condition on the energy landscape of the objective function. These results allow for the usage of memory mechanisms, and hold for a rich class of objectives provided certain conditions of well-preparation of the hyperparameters and the initial datum. In a second step, at least for the case without memory effects, we provide a quantitative result about the mean-field approximation of particle swarm optimization, which specifies the convergence of the interacting particle system to the associated mean-field limit. Combining these two results allows for global convergence guarantees of the numerical particle swarm optimization method with provable polynomial complexity. To demonstrate the applicability of the method we propose an efficient and parallelizable implementation, which is tested in particular on a competitive and well-understood high-dimensional benchmark problem in machine learning.

相關內容

The Stochastic Primal-Dual Hybrid Gradient or SPDHG is an algorithm proposed by Chambolle et al. to efficiently solve a wide class of nonsmooth large-scale optimization problems. In this paper we contribute to its theoretical foundations and prove its almost sure convergence for convex but neither necessarily strongly convex nor smooth functionals, defined on Hilbert spaces of arbitrary dimension. We also prove its convergence for any arbitrary sampling, and for some specific samplings we propose theoretically optimal step size parameters which yield faster convergence. In addition, we propose using SPDHG for parallel Magnetic Resonance Imaging reconstruction, where data from different coils are randomly selected at each iteration. We apply SPDHG using a wide range of random sampling methods. We compare its performance across a range of settings, including mini-batch size, step size parameters, and both convex and strongly convex objective functionals. We show that the sampling can significantly affect the convergence speed of SPDHG. We conclude that for many cases an optimal sampling method can be identified.

We propose a unified framework for time-varying convex optimization based on the prediction-correction paradigm, both in the primal and dual spaces. In this framework, a continuously varying optimization problem is sampled at fixed intervals, and each problem is approximately solved with a primal or dual correction step. The solution method is warm-started with the output of a prediction step, which solves an approximation of a future problem using past information. Prediction approaches are studied and compared under different sets of assumptions. Examples of algorithms covered by this framework are time-varying versions of the gradient method, splitting methods, and the celebrated alternating direction method of multipliers (ADMM).

The random batch method provides an efficient algorithm for computing statistical properties of a canonical ensemble of interacting particles. In this work, we study the error estimates of the fully discrete random batch method, especially in terms of approximating the invariant distribution. Using a triangle inequality framework, we show that the long-time error of the method is $O(\sqrt{\tau} + e^{-\gamma t})$, where $\tau$ is the time step and $\gamma$ is the convergence rate which does not depend on the time step $\tau$ or the number of particles $N$. Our results also apply to the McKean-Vlasov process, which is the mean-field limit of the interacting particle system as the number of particles $N\rightarrow\infty$.

In this work, we prove the convergence of residual distribution schemes to dissipative weak solutions of the Euler equations. We need to guarantee that the residual distribution schemes are fulfilling the underlying structure preserving properties such as positivity of density and internal energy. Consequently, the residual distribution schemes lead to a consistent and stable approximation of the Euler equations. Our result can be seen as a generalization of the Lax-Richtmyer equivalence theorem to nonlinear problems that consistency plus stability is equivalent to convergence.

We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the $L_2$-regularized logistic regression.

SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machine learning problems. Yet, when optimizing generic convex functions, no advantage is known for any SGDM algorithm over plain SGD. Moreover, even the most recent results require changes to the SGDM algorithms, like averaging of the iterates and a projection onto a bounded domain, which are rarely used in practice. In this paper, we focus on the convergence rate of the last iterate of SGDM. For the first time, we prove that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate of SGDM suffers from a suboptimal convergence rate of $\Omega(\frac{\ln T}{\sqrt{T}})$ after $T$ iterations. Based on this fact, we study a class of (both adaptive and non-adaptive) Follow-The-Regularized-Leader-based SGDM algorithms with increasing momentum and shrinking updates. For these algorithms, we show that the last iterate has optimal convergence $O(\frac{1}{\sqrt{T}})$ for unconstrained convex stochastic optimization problems without projections onto bounded domains nor knowledge of $T$. Further, we show a variety of results for FTRL-based SGDM when used with adaptive stepsizes. Empirical results are shown as well.

The Heuristic Rating Estimation Method enables decision-makers to decide based on existing ranking data and expert comparisons. In this approach, the ranking values of selected alternatives are known in advance, while these values have to be calculated for the remaining ones. Their calculation can be performed using either an additive or a multiplicative method. Both methods assumed that the pairwise comparison sets involved in the computation were complete. In this paper, we show how these algorithms can be extended so that the experts do not need to compare all alternatives pairwise. Thanks to the shortening of the work of experts, the presented, improved methods will reduce the costs of the decision-making procedure and facilitate and shorten the stage of collecting decision-making data.

In this work we are interested in general linear inverse problems where the corresponding forward problem is solved iteratively using fixed point methods. Then one-shot methods, which iterate at the same time on the forward problem solution and on the inverse problem unknown, can be applied. We analyze two variants of the so-called multi-step one-shot methods and establish sufficient conditions on the descent step for their convergence, by studying the eigenvalues of the block matrix of the coupled iterations. Several numerical experiments are provided to illustrate the convergence of these methods in comparison with the classical usual and shifted gradient descent. In particular, we observe that very few inner iterations on the forward problem are enough to guarantee good convergence of the inversion algorithm.

With the aid of hardware and software developments, there has been a surge of interests in solving partial differential equations by deep learning techniques, and the integration with domain decomposition strategies has recently attracted considerable attention due to its enhanced representation and parallelization capacity of the network solution. While there are already several works that substitute the numerical solver of overlapping Schwarz methods with the deep learning approach, the non-overlapping counterpart has not been thoroughly studied yet because of the inevitable interface overfitting problem that would propagate the errors to neighbouring subdomains and eventually hamper the convergence of outer iteration. In this work, a novel learning approach, i.e., the compensated deep Ritz method, is proposed to enable the flux transmission across subregion interfaces with guaranteed accuracy, thereby allowing us to construct effective learning algorithms for realizing the more general non-overlapping domain decomposition methods in the presence of overfitted interface conditions. Numerical experiments on a series of elliptic boundary value problems including the regular and irregular interfaces, low and high dimensions, smooth and high-contrast coefficients on multidomains are carried out to validate the effectiveness of our proposed domain decomposition learning algorithms.

This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.

北京阿比特科技有限公司