亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Stochastic versions of proximal methods have gained much attention in statistics and machine learning. These algorithms tend to admit simple, scalable forms, and enjoy numerical stability via implicit updates. In this work, we propose and analyze a stochastic version of the recently proposed proximal distance algorithm, a class of iterative optimization methods that recover a desired constrained estimation problem as a penalty parameter $\rho \rightarrow \infty$. By uncovering connections to related stochastic proximal methods and interpreting the penalty parameter as the learning rate, we justify heuristics used in practical manifestations of the proximal distance method, establishing their convergence guarantees for the first time. Moreover, we extend recent theoretical devices to establish finite error bounds and a complete characterization of convergence rates regimes. We validate our analysis via a thorough empirical study, also showing that unsurprisingly, the proposed method outpaces batch versions on popular learning tasks.

相關內容

The present study is an extension of the work done in [16] and [10], where a two-level Parareal method with averaging was examined. The method proposed in this paper is a multi-level Parareal method with arbitrarily many levels, which is not restricted to the two-level case. We give an asymptotic error estimate which reduces to the two-level estimate for the case when only two levels are considered. Introducing more than two levels has important consequences for the averaging procedure, as we choose separate averaging windows for each of the different levels, which is an additional new feature of the present study. The different averaging windows make the proposed method especially appropriate for multi-scale problems, because we can introduce a level for each intrinsic scale of the problem and adapt the averaging procedure such that we reproduce the behavior of the model on the particular scale resolved by the level.

Nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose a primal dual alternating proximal gradient (PDAPG) algorithm and a primal dual proximal gradient (PDPG-L) algorithm for solving nonsmooth nonconvex-strongly concave and nonconvex-linear minimax problems with coupled linear constraints, respectively. The corresponding iteration complexity of the two algorithms are proved to be $\mathcal{O}\left( \varepsilon ^{-2} \right)$ and $\mathcal{O}\left( \varepsilon ^{-3} \right)$ to reach an $\varepsilon$-stationary point, respectively. To our knowledge, they are the first two algorithms with iteration complexity guarantee for solving the two classes of minimax problems.

A recently developed measure-theoretic framework solves a stochastic inverse problem (SIP) for models where uncertainties in model output data are predominantly due to aleatoric (i.e., irreducible) uncertainties in model inputs (i.e., parameters). The subsequent inferential target is a distribution on parameters. Another type of inverse problem is to quantify uncertainties in estimates of "true" parameter values under the assumption that such uncertainties should be reduced as more data are incorporated into the problem, i.e., the uncertainty is considered epistemic. A major contribution of this work is the formulation and solution of such a parameter identification problem (PIP) within the measure-theoretic framework developed for the SIP. The approach is novel in that it utilizes a solution to a stochastic forward problem (SFP) to update an initial density only in the parameter directions informed by the model output data. In other words, this method performs "selective regularization" only in the parameter directions not informed by data. The solution is defined by a maximal updated density (MUD) point where the updated density defines the measure-theoretic solution to the PIP. Another significant contribution of this work is the full theory of existence and uniqueness of MUD points for linear maps with Gaussian distributions. Data-constructed Quantity of Interest (QoI) maps are also presented and analyzed for solving the PIP within this measure-theoretic framework as a means of reducing uncertainties in the MUD estimate. We conclude with a demonstration of the general applicability of the method on two problems involving either spatial or temporal data for estimating uncertain model parameters.

Expander graphs play a central role in graph theory and algorithms. With a number of powerful algorithmic tools developed around them, such as the Cut-Matching game, expander pruning, expander decomposition, and algorithms for decremental All-Pairs Shortest Paths (APSP) in expanders, to name just a few, the use of expanders in the design of graph algorithms has become ubiquitous. Specific applications of interest to us are fast deterministic algorithms for cut problems in static graphs, and algorithms for dynamic distance-based graph problems, such as APSP. Unfortunately, the use of expanders in these settings incurs a number of drawbacks. For example, the best currently known algorithm for decremental APSP in constant-degree expanders can only achieve a $(\log n)^{O(1/\epsilon^2)}$-approximation with $n^{1+O(\epsilon)}$ total update time for any $\epsilon$. All currently known algorithms for the Cut Player in the Cut-Matching game are either randomized, or provide rather weak guarantees. This, in turn, leads to somewhat weak algorithmic guarantees for several central cut problems: for example, the best current almost linear time deterministic algorithm for Sparsest Cut can only achieve approximation factor $(\log n)^{\omega(1)}$. Lastly, when relying on expanders in distance-based problems, such as dynamic APSP, via current methods, it seems inevitable that one has to settle for approximation factors that are at least $\Omega(\log n)$. In this paper we propose the use of well-connected graphs, and introduce a new algorithmic toolkit for such graphs that, in a sense, mirrors the above mentioned algorithmic tools for expanders. One of these new tools is the Distanced Matching game, an analogue of the Cut-Matching game for well-connected graphs. We demonstrate the power of these new tools by obtaining better results for several of the problems mentioned above.

Recent mean field interpretations of learning dynamics in over-parameterized neural networks offer theoretical insights on the empirical success of first order optimization algorithms in finding global minima of the nonconvex risk landscape. In this paper, we explore applying mean field learning dynamics as a computational algorithm, rather than as an analytical tool. Specifically, we design a Sinkhorn regularized proximal algorithm to approximate the distributional flow from the learning dynamics in the mean field regime over weighted point clouds. In this setting, a contractive fixed point recursion computes the time-varying weights, numerically realizing the interacting Wasserstein gradient flow of the parameter distribution supported over the neuronal ensemble. An appealing aspect of the proposed algorithm is that the measure-valued recursions allow meshless computation. We demonstrate the proposed computational framework of interacting weighted particle evolution on binary and multi-class classification. Our algorithm performs gradient descent of the free energy associated with the risk functional.

Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of a given matrix from sample entries, i.e. from small random submatrices. Although a number of sample-based methods exist for this problem (e.g. Oja's rule \citep{oja1982simplified}), these assume access to full columns of the matrix or particular matrix structure such as symmetry and cannot be combined as-is with neural networks \citep{baldi1989neural}. In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace is represented by a neural network, and hence can be scaled to datasets with an effectively infinite number of rows and columns. Our method consists in defining a loss function whose minimizer is the desired principal subspace, and constructing a gradient estimate of this loss whose bias can be controlled. We complement our theoretical analysis with a series of experiments on synthetic matrices, the MNIST dataset \citep{lecun2010mnist} and the reinforcement learning domain PuddleWorld \citep{sutton1995generalization} demonstrating the usefulness of our approach.

The A* algorithm is commonly used to solve NP-hard combinatorial optimization problems. When provided with a completely informed heuristic function, A* solves many NP-hard minimum-cost path problems in time polynomial in the branching factor and the number of edges in a minimum-cost path. Thus, approximating their completely informed heuristic functions with high precision is NP-hard. We therefore examine recent publications that propose the use of neural networks for this purpose. We support our claim that these approaches do not scale to large instance sizes both theoretically and experimentally. Our first experimental results for three representative NP-hard minimum-cost path problems suggest that using neural networks to approximate completely informed heuristic functions with high precision might result in network sizes that scale exponentially in the instance sizes. The research community might thus benefit from investigating other ways of integrating heuristic search with machine learning.

The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is formulated as the minimization of the $\ell_0$-`norm' of an \emph{outlier indicator vector}, under a second moment constraint on the datapoints. The $\ell_0$-`norm' is then relaxed to the $\ell_p$-norm ($0<p\leq 1$) in the objective, and it is shown that the global minima for each of these objectives are order-optimal and have optimal breakdown point for the robust mean estimation problem. Furthermore, a computationally tractable iterative $\ell_p$-minimization and hard thresholding algorithm is proposed that outputs an order-optimal robust estimate of the population mean. The proposed algorithm (with breakdown point $\approx 0.3$) does not require prior knowledge of the fraction of outliers, in contrast with most existing algorithms, and for $p=1$ it has near-linear time complexity. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods.

In this paper, we propose a new approach for the time-discretization of the incompressible stochastic Stokes equations with multiplicative noise. Our new strategy is based on the classical Milstein method from stochastic differential equations. We use the energy method for its error analysis and show a strong convergence order of at most $1$ for both velocity and pressure approximations. The proof is based on a new H\"older continuity estimate of the velocity solution. While the errors of the velocity approximation are estimated in the standard $L^2$- and $H^1$-norms, the pressure errors are carefully analyzed in a special norm because of the low regularity of the pressure solution. In addition, a new interpretation of the pressure solution, which is very useful in computation, is also introduced. Numerical experiments are also provided to validate the error estimates and their sharpness.

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

北京阿比特科技有限公司