亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Stochastic Approximation (SA) is a popular approach for solving fixed-point equations where the information is corrupted by noise. In this paper, we consider an SA involving a contraction mapping with respect to an arbitrary norm, and show its finite-sample error bounds while using different stepsizes. The idea is to construct a smooth Lyapunov function using the generalized Moreau envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function. Our result is applicable in Reinforcement Learning (RL). In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning. Moreover, we also use it to study TD-learning in the on-policy setting, and recover the existing state-of-the-art results for $Q$-learning. Importantly, our construction results in only a logarithmic dependence of the convergence bound on the size of the state-space.

相關內容

The Chebyshev or $\ell_{\infty}$ estimator is an unconventional alternative to the ordinary least squares in solving linear regressions. It is defined as the minimizer of the $\ell_{\infty}$ objective function \begin{align*} \hat{\boldsymbol{\beta}} := \arg\min_{\boldsymbol{\beta}} \|\boldsymbol{Y} - \mathbf{X}\boldsymbol{\beta}\|_{\infty}. \end{align*} The asymptotic distribution of the Chebyshev estimator under fixed number of covariates were recently studied (Knight, 2020), yet finite sample guarantees and generalizations to high-dimensional settings remain open. In this paper, we develop non-asymptotic upper bounds on the estimation error $\|\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}^*\|_2$ for a Chebyshev estimator $\hat{\boldsymbol{\beta}}$, in a regression setting with uniformly distributed noise $\varepsilon_i\sim U([-a,a])$ where $a$ is either known or unknown. With relatively mild assumptions on the (random) design matrix $\mathbf{X}$, we can bound the error rate by $\frac{C_p}{n}$ with high probability, for some constant $C_p$ depending on the dimension $p$ and the law of the design. Furthermore, we illustrate that there exist designs for which the Chebyshev estimator is (nearly) minimax optimal. In addition we show that "Chebyshev's LASSO" has advantages over the regular LASSO in high dimensional situations, provided that the noise is uniform. Specifically, we argue that it achieves a much faster rate of estimation under certain assumptions on the growth rate of the sparsity level and the ambient dimension with respect to the sample size.

In magnetic confinement fusion devices, the equilibrium configuration of a plasma is determined by the balance between the hydrostatic pressure in the fluid and the magnetic forces generated by an array of external coils and the plasma itself. The location of the plasma is not known a priori and must be obtained as the solution to a free boundary problem. The partial differential equation that determines the behavior of the combined magnetic field depends on a set of physical parameters (location of the coils, intensity of the electric currents going through them, magnetic permeability, etc.) that are subject to uncertainty and variability. The confinement region is in turn a function of these stochastic parameters as well. In this work, we consider variations on the current intensities running through the external coils as the dominant source of uncertainty. This leads to a parameter space of dimension equal to the number of coils in the reactor. With the aid of a surrogate function built on a sparse grid in parameter space, a Monte Carlo strategy is used to explore the effect that stochasticity in the parameters has on important features of the plasma boundary such as the location of the x-point, the strike points, and shaping attributes such as triangularity and elongation. The use of the surrogate function reduces the time required for the Monte Carlo simulations by factors that range between 7 and over 30.

We consider the problem of minimizing a convex function that is evolving in time according to unknown and possibly stochastic dynamics. Such problems abound in the machine learning and signal processing literature, under the names of concept drift and stochastic tracking. We provide novel non-asymptotic convergence guarantees for stochastic algorithms with iterate averaging, focusing on bounds valid both in expectation and with high probability. Notably, we show that the tracking efficiency of the proximal stochastic gradient method depends only logarithmically on the initialization quality, when equipped with a step-decay schedule. The results moreover naturally extend to settings where the dynamics depend jointly on time and on the decision variable itself, as in the performative prediction framework.

In this paper we consider the linear regression model $Y =S X+\varepsilon $ with functional regressors and responses. We develop new inference tools to quantify deviations of the true slope $S$ from a hypothesized operator $S_0$ with respect to the Hilbert--Schmidt norm $\| S- S_0\|^2$, as well as the prediction error $\mathbb{E} \| S X - S_0 X \|^2$. Our analysis is applicable to functional time series and based on asymptotically pivotal statistics. This makes it particularly user friendly, because it avoids the choice of tuning parameters inherent in long-run variance estimation or bootstrap of dependent data. We also discuss two sample problems as well as change point detection. Finite sample properties are investigated by means of a simulation study.\\ Mathematically our approach is based on a sequential version of the popular spectral cut-off estimator $\hat S_N$ for $S$. It is well-known that the $L^2$-minimax rates in the functional regression model, both in estimation and prediction, are substantially slower than $1/\sqrt{N}$ (where $N$ denotes the sample size) and that standard estimators for $S$ do not converge weakly to non-degenerate limits. However, we demonstrate that simple plug-in estimators - such as $\| \hat S_N - S_0 \|^2$ for $\| S - S_0 \|^2$ - are $\sqrt{N}$-consistent and its sequential versions satisfy weak invariance principles. These results are based on the smoothing effect of $L^2$-norms and established by a new proof-technique, the {\it smoothness shift}, which has potential applications in other statistical inverse problems.

We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log T)$ regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after $T$ rounds of interaction, each player experiences total regret that is ${\rm poly}(\log T)$. Our bound improves, exponentially, the $O({T}^{1/2})$ regret attainable by standard no-regret learners in games, the $O(T^{1/4})$ regret attainable by no-regret learners with recency bias (Syrgkanis et al., 2015), and the ${O}(T^{1/6})$ bound that was recently shown for Optimistic Hedge in the special case of two-player games (Chen & Pen, 2020). A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $\tilde{O}\left(\frac 1T\right)$.

We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization error that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output. Our key technical tool is combining the information-theoretic generalization bounds previously used for analyzing randomized variants of SGD with a perturbation analysis of the iterates.

In this paper, we study the weight spectrum of linear codes with \emph{super-linear} field size and use the probabilistic method to show that for nearly all such codes, the corresponding weight spectrum is very close to that of a maximum distance separable (MDS) code.

Distributed Learning often suffers from Byzantine failures, and there have been a number of works studying the problem of distributed stochastic optimization under Byzantine failures, where only a portion of workers, instead of all the workers in a distributed learning system, compute stochastic gradients at each iteration. These methods, albeit workable under Byzantine failures, have the shortcomings of either a sub-optimal convergence rate or high computation cost. To this end, we propose a new Byzantine-resilient stochastic gradient descent algorithm (BrSGD for short) which is provably robust against Byzantine failures. BrSGD obtains the optimal statistical performance and efficient computation simultaneously. In particular, BrSGD can achieve an order-optimal statistical error rate for strongly convex loss functions. The computation complexity of BrSGD is O(md), where d is the model dimension and m is the number of machines. Experimental results show that BrSGD can obtain competitive results compared with non-Byzantine machines in terms of effectiveness and convergence.

This paper establishes the optimal approximation error characterization of deep rectified linear unit (ReLU) networks for smooth functions in terms of both width and depth simultaneously. To that end, we first prove that multivariate polynomials can be approximated by deep ReLU networks of width $\mathcal{O}(N)$ and depth $\mathcal{O}(L)$ with an approximation error $\mathcal{O}(N^{-L})$. Through local Taylor expansions and their deep ReLU network approximations, we show that deep ReLU networks of width $\mathcal{O}(N\ln N)$ and depth $\mathcal{O}(L\ln L)$ can approximate $f\in C^s([0,1]^d)$ with a nearly optimal approximation error $\mathcal{O}(\|f\|_{C^s([0,1]^d)}N^{-2s/d}L^{-2s/d})$. Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, respectively.

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.

北京阿比特科技有限公司