亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the asymptotic behavior of second-order algorithms mixing Newton's method and inertial gradient descent in non-convex landscapes. We show that, despite the Newtonian behavior of these methods, they almost always escape strict saddle points. We also evidence the role played by the hyper-parameters of these methods in their qualitative behavior near critical points. The theoretical results are supported by numerical illustrations.

相關內容

在數學中,鞍點或極大極小點是函數圖形表面上的一點,其正交方向上的斜率(導數)都為零,但它不是函數的局部極值。鞍點是在某一軸向(峰值之間)有一個相對最小的臨界點,在交叉軸上有一個相對最大的臨界點。

Element Method. The Finite Volume Method guarantees local and global mass conservation. A property not satisfied by the Finite Volume Method. On the down side, the Finite Volume Method requires non trivial modifications to attain high order approximations unlike the Finite Volume Method. It has been contended that the Discontinuous Galerkin Method, locally conservative and high order, is a natural progression for Coastal Ocean Modeling. Consequently, as a primer we consider the vertical ocean-slice model with the inclusion of density effects. To solve these non steady Partial Differential Equations, we develop a pressure projection method for solution. We propose a Hybridized Discontinuous Galerkin solution for the required Poisson Problem in each time step. The purpose, is to reduce the computational cost of classical applications of the Discontinuous Galerkin method. The Hybridized Discontinuous Galerkin method is first presented as a general elliptic problem solver. It is shown that a high order implementation yields fast and accurate approximations on coarse meshes.

In this paper, we propose a semigroup method for solving high-dimensional elliptic partial differential equations (PDEs) and the associated eigenvalue problems based on neural networks. For the PDE problems, we reformulate the original equations as variational problems with the help of semigroup operators and then solve the variational problems with neural network (NN) parameterization. The main advantages are that no mixed second-order derivative computation is needed during the stochastic gradient descent training and that the boundary conditions are taken into account automatically by the semigroup operator. Unlike popular methods like PINN \cite{raissi2019physics} and Deep Ritz \cite{weinan2018deep} where the Dirichlet boundary condition is enforced solely through penalty functions and thus changes the true solution, the proposed method is able to address the boundary conditions without penalty functions and it gives the correct true solution even when penalty functions are added, thanks to the semigroup operator. For eigenvalue problems, a primal-dual method is proposed, efficiently resolving the constraint with a simple scalar dual variable and resulting in a faster algorithm compared with the BSDE solver \cite{han2020solving} in certain problems such as the eigenvalue problem associated with the linear Schr\"odinger operator. Numerical results are provided to demonstrate the performance of the proposed methods.

Two harmonic extraction based Jacobi--Davidson (JD) type algorithms are proposed to compute a partial generalized singular value decomposition (GSVD) of a large regular matrix pair. They are called cross product-free (CPF) and inverse-free (IF) harmonic JDGSVD algorithms, abbreviated as CPF-HJDGSVD and IF-HJDGSVD, respectively. Compared with the standard extraction based JDGSVD algorithm, the harmonic extraction based algorithms converge more regularly and suit better for computing GSVD components corresponding to interior generalized singular values. Thick-restart CPF-HJDGSVD and IF-HJDGSVD algorithms with some deflation and purgation techniques are developed to compute more than one GSVD components. Numerical experiments confirm the superiority of CPF-HJDGSVD and IF-HJDGSVD to the standard extraction based JDGSVD algorithm.

This paper focuses on stochastic saddle point problems with decision-dependent distributions in both the static and time-varying settings. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -- which are saddle points for the stationary stochastic minimax problem that they induce -- and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two classes of solutions is bounded provided that the objective has a strongly-convex-strongly-concave payoff and Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration; moreover, we show convergence to a neighborhood in expectation and almost surely. Finally, we investigate a condition on the distributional map -- which we call opposing mixture dominance -- that ensures the objective is strongly-convex-strongly-concave. Under this assumption, we show that primal-dual algorithms converge to the saddle points in a similar fashion.

Spectral clustering has been one of the widely used methods for community detection in networks. However, large-scale networks bring computational challenges to the eigenvalue decomposition therein. In this paper, we study the spectral clustering using randomized sketching algorithms from a statistical perspective, where we typically assume the network data are generated from a stochastic block model that is not necessarily of full rank. To do this, we first use the recently developed sketching algorithms to obtain two randomized spectral clustering algorithms, namely, the random projection-based and the random sampling-based spectral clustering. Then we study the theoretical bounds of the resulting algorithms in terms of the approximation error for the population adjacency matrix, the misclassification error, and the estimation error for the link probability matrix. It turns out that, under mild conditions, the randomized spectral clustering algorithms lead to the same theoretical bounds as those of the original spectral clustering algorithm. We also extend the results to degree-corrected stochastic block models. Numerical experiments support our theoretical findings and show the efficiency of randomized methods. A new R package called Rclust is developed and made available to the public.

Advanced finite-element discretizations and preconditioners for models of poroelasticity have attracted significant attention in recent years. The equations of poroelasticity offer significant challenges in both areas, due to the potentially strong coupling between unknowns in the system, saddle-point structure, and the need to account for wide ranges of parameter values, including limiting behavior such as incompressible elasticity. This paper was motivated by an attempt to develop monolithic multigrid preconditioners for the discretization developed in [48]; we show here why this is a difficult task and, as a result, we modify the discretization in [48] through the use of a reduced quadrature approximation, yielding a more "solver-friendly" discretization. Local Fourier analysis is used to optimize parameters in the resulting monolithic multigrid method, allowing a fair comparison between the performance and costs of methods based on Vanka and Braess-Sarazin relaxation. Numerical results are presented to validate the LFA predictions and demonstrate efficiency of the algorithms. Finally, a comparison to existing block-factorization preconditioners is also given.

We outline the general framework of machine learning (ML) methods for multi-scale dynamical modeling of condensed matter systems, and in particular of strongly correlated electron models. Complex spatial temporal behaviors in these systems often arise from the interplay between quasi-particles and the emergent dynamical classical degrees of freedom, such as local lattice distortions, spins, and order-parameters. Central to the proposed framework is the ML energy model that, by successfully emulating the time-consuming electronic structure calculation, can accurately predict a local energy based on the classical field in the intermediate neighborhood. In order to properly include the symmetry of the electron Hamiltonian, a crucial component of the ML energy model is the descriptor that transforms the neighborhood configuration into invariant feature variables, which are input to the learning model. A general theory of the descriptor for the classical fields is formulated, and two types of models are distinguished depending on the presence or absence of an internal symmetry for the classical field. Several specific approaches to the descriptor of the classical fields are presented. Our focus is on the group-theoretical method that offers a systematic and rigorous approach to compute invariants based on the bispectrum coefficients. We propose an efficient implementation of the bispectrum method based on the concept of reference irreducible representations. Finally, the implementations of the various descriptors are demonstrated on well-known electronic lattice models.

We study the class of first-order locally-balanced Metropolis--Hastings algorithms introduced in Livingstone & Zanella (2021). To choose a specific algorithm within the class the user must select a balancing function $g:\mathbb{R} \to \mathbb{R}$ satisfying $g(t) = tg(1/t)$, and a noise distribution for the proposal increment. Popular choices within the class are the Metropolis-adjusted Langevin algorithm and the recently introduced Barker proposal. We first establish a universal limiting optimal acceptance rate of 57% and scaling of $n^{-1/3}$ as the dimension $n$ tends to infinity among all members of the class under mild smoothness assumptions on $g$ and when the target distribution for the algorithm is of the product form. In particular we obtain an explicit expression for the asymptotic efficiency of an arbitrary algorithm in the class, as measured by expected squared jumping distance. We then consider how to optimise this expression under various constraints. We derive an optimal choice of noise distribution for the Barker proposal, optimal choice of balancing function under a Gaussian noise distribution, and optimal choice of first-order locally-balanced algorithm among the entire class, which turns out to depend on the specific target distribution. Numerical simulations confirm our theoretical findings and in particular show that a bi-modal choice of noise distribution in the Barker proposal gives rise to a practical algorithm that is consistently more efficient than the original Gaussian version.

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

北京阿比特科技有限公司