一本色道综合久久欧美日韩精品_我和子的性关系过程在线观看_欧美亚洲丝袜另类在线视频_亚洲美女视频黄色在线免费观看_国产清纯白嫩美女在线播放_国产真实伦实例对白_国产又爽又黄不遮挡视频

The paper concerns convergence and asymptotic statistics for stochastic approximation driven by Markovian noise: $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0, $$ in which each $\theta_n\in\Re^d$, $ \{ \Phi_n \}$ is a Markov chain on a general state space X with stationary distribution $\pi$, and $f:\Re^d\times \text{X} \to\Re^d$. In addition to standard Lipschitz bounds on $f$, and conditions on the vanishing step-size sequence $\{\alpha_n\}$, it is assumed that the associated ODE is globally asymptotically stable with stationary point denoted $\theta^*$, where $\bar f(\theta)=E[f(\theta,\Phi)]$ with $\Phi\sim\pi$. Moreover, the ODE@$\infty$ defined with respect to the vector field, $$ \bar f_\infty(\theta):= \lim_{r\to\infty} r^{-1} \bar f(r\theta) \,,\qquad \theta\in\Re^d, $$ is asymptotically stable. The main contributions are summarized as follows: (i) The sequence $\theta$ is convergent if $\Phi$ is geometrically ergodic, and subject to compatible bounds on $f$. The remaining results are established under a stronger assumption on the Markov chain: A slightly weaker version of the Donsker-Varadhan Lyapunov drift condition known as (DV3). (ii) A Lyapunov function is constructed for the joint process $\{\theta_n,\Phi_n\}$ that implies convergence of $\{ \theta_n\}$ in $L_4$. (iii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error $z_n:= (\theta_n-\theta^*)/\sqrt{\alpha_n}$. Moment bounds combined with the CLT imply convergence of the normalized covariance, $$ \lim_{n \to \infty} E [ z_n z_n^T ] = \Sigma_\theta, $$ where $\Sigma_\theta$ is the asymptotic covariance appearing in the CLT. (iv) An example is provided where the Markov chain $\Phi$ is geometrically ergodic but it does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded.

相關內容

馬爾可夫鏈

關注 289

馬爾(er)(er)可(ke)(ke)夫(fu)(fu)鏈(lian)(lian)，因安德烈·馬爾(er)(er)可(ke)(ke)夫(fu)(fu)（A.A.Markov，1856－1922）得名，是(shi)(shi)指數(shu)學中具有(you)馬爾(er)(er)可(ke)(ke)夫(fu)(fu)性質的(de)(de)(de)(de)(de)離(li)散事件隨機過程。該過程中，在(zai)(zai)給定當(dang)(dang)前知識或信息的(de)(de)(de)(de)(de)情況下，過去（即當(dang)(dang)前以前的(de)(de)(de)(de)(de)歷史狀態(tai)(tai)(tai)）對于預測將來（即當(dang)(dang)前以后的(de)(de)(de)(de)(de)未來狀態(tai)(tai)(tai)）是(shi)(shi)無關的(de)(de)(de)(de)(de)。在(zai)(zai)馬爾(er)(er)可(ke)(ke)夫(fu)(fu)鏈(lian)(lian)的(de)(de)(de)(de)(de)每(mei)一(yi)步(bu)(bu)，系統根據概(gai)率(lv)分(fen)布，可(ke)(ke)以從一(yi)個(ge)狀態(tai)(tai)(tai)變到另一(yi)個(ge)狀態(tai)(tai)(tai)，也(ye)可(ke)(ke)以保持當(dang)(dang)前狀態(tai)(tai)(tai)。狀態(tai)(tai)(tai)的(de)(de)(de)(de)(de)改變叫(jiao)做(zuo)轉移，與不(bu)同的(de)(de)(de)(de)(de)狀態(tai)(tai)(tai)改變相(xiang)關的(de)(de)(de)(de)(de)概(gai)率(lv)叫(jiao)做(zuo)轉移概(gai)率(lv)。隨機漫步(bu)(bu)就(jiu)是(shi)(shi)馬爾(er)(er)可(ke)(ke)夫(fu)(fu)鏈(lian)(lian)的(de)(de)(de)(de)(de)例(li)子。隨機漫步(bu)(bu)中每(mei)一(yi)步(bu)(bu)的(de)(de)(de)(de)(de)狀態(tai)(tai)(tai)是(shi)(shi)在(zai)(zai)圖形中的(de)(de)(de)(de)(de)點(dian)，每(mei)一(yi)步(bu)(bu)可(ke)(ke)以移動(dong)到任何一(yi)個(ge)相(xiang)鄰的(de)(de)(de)(de)(de)點(dian)，在(zai)(zai)這里移動(dong)到每(mei)一(yi)個(ge)點(dian)的(de)(de)(de)(de)(de)概(gai)率(lv)都是(shi)(shi)相(xiang)同的(de)(de)(de)(de)(de)（無論之前漫步(bu)(bu)路徑是(shi)(shi)如何的(de)(de)(de)(de)(de)）。

估計/估計量 · 極大值 · 泛函 · AIM · 規范化的 ·

2021 年 12 月 27 日

Non-parametric estimator of a multivariate madogram for missing-data and extreme value framework

Alexis Boulin,Elena Di Bernardino,Thomas Lalo?,Gwladys Toulemonde

from arxiv, 29 pages, 4 figures

The modeling of dependence between maxima is an important subject in several applications in risk analysis. To this aim, the extreme value copula function, characterised via the madogram, can be used as a margin-free description of the dependence structure. From a practical point of view, the family of extreme value distributions is very rich and arise naturally as the limiting distribution of properly normalised component-wise maxima. In this paper, we investigate the nonparametric estimation of the madogram where data are completely missing at random. We provide the functional central limit theorem for the considered multivariate madrogram correctly normalized, towards a tight Gaussian process for which the covariance function depends on the probabilities of missing. Explicit formula for the asymptotic variance is also given. Our results are illustrated in a finite sample setting with a simulation study.

隨機梯度下降 · SGD · 噪聲 · 小批量 · 優化器 ·

2021 年 12 月 26 日

Non Asymptotic Bounds for Optimization via Online Multiplicative Stochastic Gradient Descent

Riddhiman Bhattacharya

The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its properties (e.g. escaping low potential points and regularization). Past research has indicated that the covariance of the SGD error done via minibatching plays a critical role in determining its regularization and escape from low potential points. It is however not much explored how much the distribution of the error influences the behavior of the algorithm. Motivated by some new research in this area, we prove universality results by showing that noise classes that have the same mean and covariance structure of SGD via minibatching have similar properties. We mainly consider the Multiplicative Stochastic Gradient Descent (M-SGD) algorithm as introduced by Wu et al., which has a much more general noise class than the SGD algorithm done via minibatching. We establish nonasymptotic bounds for the M-SGD algorithm mainly with respect to the Stochastic Differential Equation corresponding to SGD via minibatching. We also show that the M-SGD error is approximately a scaled Gaussian distribution with mean $0$ at any fixed point of the M-SGD algorithm. We also establish bounds for the convergence of the M-SGD algorithm in the strongly convex regime.

策略評估 · 線性的 · 近似 · 泛函 · 混合時間 ·

2021 年 12 月 24 日

Accelerated and instance-optimal policy evaluation with linear function approximation

Tianjiao Li,Guanghui Lan,Ashwin Pananjady

We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results that match those in the i.i.d. setting up to a multiplicative factor that is proportional to the mixing time of the chain. Our theoretical guarantees of optimality are corroborated by numerical experiments.

層 · SLAM · 泛函 · 可理解性 · 內積 ·

2021 年 12 月 24 日

The layer complexity of Arthur-Merlin-like communication

D. Gavinsky

In communication complexity the Arthur-Merlin (AM) model is the most natural one that allows both randomness and non-determinism. Presently we do not have any super-logarithmic lower bound for the AM-complexity of an explicit function. Obtaining such a bound is a fundamental challenge to our understanding of communication phenomena. In this article we explore the gap between the known techniques and the complexity class AM. In the first part we define a new natural class, Small-advantage Layered Arthur-Merlin (SLAM), that has the following properties: - SLAM is (strictly) included in AM and includes all previously known subclasses of AM with non-trivial lower bounds. - SLAM is qualitatively stronger than the union of those classes. - SLAM is a subject to the discrepancy bound: in particular, the inner product function does not have an efficient SLAM-protocol. Structurally this can be summarised as SBP $\cup$ UAM $\subset$ SLAM $\subseteq$ AM $\cap$ PP. In the second part we ask why proving a lower bound of $\omega(\sqrt n)$ on the MA-complexity of an explicit function seems to be difficult. Both of these results are related to the notion of layer complexity, which is, informally, the number of "layers of non-determinism" used by a protocol.

統計量 · 散度 · 近似 · 樣本 · 似然 ·

2021 年 12 月 24 日

Bounds for the chi-square approximation of the power divergence family of statistics

Robert E. Gaunt

from arxiv, 26 pages

It is well-known that each statistic in the family of power divergence statistics, across $n$ trials and $r$ classifications with index parameter $\lambda\in\mathbb{R}$ (the Pearson, likelihood ratio and Freeman-Tukey statistics correspond to $\lambda=1,0,-1/2$, respectively) is asymptotically chi-square distributed as the sample size tends to infinity. In this paper, we obtain explicit bounds on this distributional approximation, measured using smooth test functions, that hold for a given finite sample $n$, and all index parameters ($\lambda>-1$) for which such finite sample bounds are meaningful. We obtain bounds that are of the optimal order $n^{-1}$. The dependence of our bounds on the index parameter $\lambda$ and the cell classification probabilities is also optimal, and the dependence on the number of cells is also respectable. Our bounds generalise, complement and improve on recent results from the literature.

學成 · Lipschitz · 噪聲 · UniFormer · 全局優化 ·

2021 年 12 月 24 日

Stochastic Learning Equation using Monotone Increasing Resolution of Quantization

Jinwuk Seok,Jeong-Si Kim

from arxiv, 11 pages, 2 figures, NIPS 20201 workshop OTP 20201

In this paper, we propose a quantized learning equation with a monotone increasing resolution of quantization and stochastic analysis for the proposed algorithm. According to the white noise hypothesis for the quantization error with dense and uniform distribution, we can regard the quantization error as i.i.d.\ white noise. Based on this, we show that the learning equation with monotonically increasing quantization resolution converges weakly as the distribution viewpoint. The analysis of this paper shows that global optimization is possible for a domain that satisfies the Lipschitz condition instead of local convergence properties such as the Hessian constraint of the objective function.

近似 · 等式約束 · 單純形 · 泛函 · 約束 ·

2021 年 12 月 23 日

Bounds-constrained polynomial approximation using the Bernstein basis

Larry Allen,Robert C. Kirby

from arxiv, 20 pages, 3 figures

A fundamental problem in numerical analysis and approximation theory is approximating smooth functions by polynomials. A much harder version under recent consideration is to enforce bounds constraints on the approximating polynomial. In this paper, we consider the problem of approximating functions by polynomials whose Bernstein coefficients with respect to a given degree satisfy such bounds, which implies such bounds on the approximant. We frame the problem as an inequality-constrained optimization problem and give an algorithm for finding the Bernstein coefficients of the exact solution. Additionally, our method can be modified slightly to include equality constraints such as mass preservation. It also extends naturally to multivariate polynomials over a simplex.

泛函 · 約束 · 強化學習 · Q函數 · 學成 ·

2021 年 6 月 24 日

Density Constrained Reinforcement Learning

Zengyi Qin,Yuxiao Chen,Chuchu Fan

from arxiv, Accepted by ICML, 2021

We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. Density constraints can also avoid the time-consuming process of designing and tuning cost functions required by value function-based constraints to encode system specifications. We leverage the duality between density functions and Q functions to develop an effective algorithm to solve the density constrained RL problem optimally and the constrains are guaranteed to be satisfied. We prove that the proposed algorithm converges to a near-optimal solution with a bounded error even when the policy update is imperfect. We use a set of comprehensive experiments to demonstrate the advantages of our approach over state-of-the-art CRL methods, with a wide range of density constrained tasks as well as standard CRL benchmarks such as Safety-Gym.

隨機梯度下降 · ReLU · 優化器 · Networking · 修正線性單元/整流線性單元 ·

2018 年 11 月 21 日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Difan Zou,Yuan Cao,Dongruo Zhou,Quanquan Gu

from arxiv, 47 pages

We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activiation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data. The key idea of our proof is that Gaussian random initialization followed by (stochastic) gradient descent produces a sequence of iterates that stay inside a small perturbation region centering around the initial weights, in which the empirical loss function of deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of (stochastic) gradient descent. Our theoretical results shed light on understanding the optimization of deep learning, and pave the way to study the optimization dynamics of training modern deep neural networks.

優化器 · 強化學習 · 學成 · state-of-the-art · SimPLe ·

2018 年 7 月 25 日

Variational Bayesian Reinforcement Learning with Regret Bounds

Brendan O'Donoghue

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.