国产日黄色大片一区二区_WWW91自拍偷拍熟女_国产高清二本道在线播放_国产欧美日韩一区二区久久_亚洲欧美一区二区三区不卡_久久五月婷婷一本综合中文字幕_精品视频一区二区三区免费看

This paper studies regret minimization with randomized value functions in reinforcement learning. In tabular finite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm, randomized least-squares value iteration (RLSVI). Our $\tilde{\mathrm{O}}(H^2S\sqrt{AT})$ high-probability worst-case regret bound improves the previous sharpest worst-case regret bounds for RLSVI and matches the existing state-of-the-art worst-case TS-based regret bounds.

相關內容

值迭代(dai)

關注 0

估計/估計量 · 統計量 · 優化器 · 正交 · 向量化 ·

2022 年 1 月 13 日

Statistically Optimal First Order Algorithms: A Proof via Orthogonalization

Andrea Montanari,Yuchen Wu

from arxiv, 35 pages; 27 pdf figures

We consider a class of statistical estimation problems in which we are given a random data matrix ${\boldsymbol X}\in {\mathbb R}^{n\times d}$ (and possibly some labels ${\boldsymbol y}\in{\mathbb R}^n$) and would like to estimate a coefficient vector ${\boldsymbol \theta}\in{\mathbb R}^d$ (or possibly a constant number of such vectors). Special cases include low-rank matrix estimation and regularized estimation in generalized linear models (e.g., sparse regression). First order methods proceed by iteratively multiplying current estimates by ${\boldsymbol X}$ or its transpose. Examples include gradient descent or its accelerated variants. Celentano, Montanari, Wu proved that for any constant number of iterations (matrix vector multiplications), the optimal first order algorithm is a specific approximate message passing algorithm (known as `Bayes AMP'). The error of this estimator can be characterized in the high-dimensional asymptotics $n,d\to\infty$, $n/d\to\delta$, and provides a lower bound to the estimation error of any first order algorithm. Here we present a simpler proof of the same result, and generalize it to broader classes of data distributions and of first order algorithms, including algorithms with non-separable nonlinearities. Most importantly, the new proof technique does not require to construct an equivalent tree-structured estimation problem, and is therefore susceptible of a broader range of applications.

Color · 圖 · 情景 · 離散數學 ·

2022 年 1 月 12 日

Bounds on expected propagation time of probabilistic zero forcing

Shyam Narayanan,Alec Sun

from arxiv, 16 pages

Probabilistic zero forcing is a coloring game played on a graph where the goal is to color every vertex blue starting with an initial blue vertex set. As long as the graph is connected, if at least one vertex is blue then eventually all of the vertices will be colored blue. The most studied parameter in probabilistic zero forcing is the expected propagation time starting from a given vertex of $G.$ In this paper we improve on upper bounds for the expected propagation time by Geneson and Hogben and Chan et al. in terms of a graph's order and radius. In particular, for a connected graph $G$ of order $n$ and radius $r,$ we prove the bound $\text{ept}(G) = O(r\log(n/r)).$ We also show using Doob's Optional Stopping Theorem and a combinatorial object known as a cornerstone that $\text{ept}(G) \le n/2 + O(\log n).$ Finally, we derive an explicit lower bound $\text{ept}(G)\ge \log_2 \log_2 n.$

馬爾可夫鏈蒙特卡羅 · 馬爾可夫鏈 · 蒙特卡羅 · 相互獨立的 · Neural Networks ·

2022 年 1 月 11 日

Analysis of autocorrelation times in Neural Markov Chain Monte Carlo simulations

Piotr Bia?as,Piotr Korcyl,Tomasz Stebel

from arxiv, 23 pages, 10 figures

We provide a deepened study of autocorrelations in Neural Markov Chain Monte Carlo simulations, a version of the traditional Metropolis algorithm which employs neural networks to provide independent proposals. We illustrate our ideas using the two-dimensional Ising model. We propose several estimates of autocorrelation times, some inspired by analytical results derived for the Metropolized Independent Sampler, which we compare and study as a function of inverse temperature $\beta$. Based on that we propose an alternative loss function and study its impact on the autocorelation times. Furthermore, we investigate the impact of imposing system symmetries ($Z_2$ and/or translational) in the neural network training process on the autocorrelation times. Eventually, we propose a scheme which incorporates partial heat-bath updates. The impact of the above enhancements is discussed for a $16 \times 16$ spin system. The summary of our findings may serve as a guide to the implementation of Neural Markov Chain Monte Carlo simulations of more complicated models.

近似誤差 · 線性的 · 閉式 · 近似 · 分解的 ·

2022 年 1 月 11 日

A Closed-Form Bound on the Asymptotic Linear Convergence of Iterative Methods via Fixed Point Analysis

Trung Vu,Raviv Raich

In many iterative optimization methods, fixed-point theory enables the analysis of the convergence rate via the contraction factor associated with the linear approximation of the fixed-point operator. While this factor characterizes the asymptotic linear rate of convergence, it does not explain the non-linear behavior of these algorithms in the non-asymptotic regime. In this letter, we take into account the effect of the first-order approximation error and present a closed-form bound on the convergence in terms of the number of iterations required for the distance between the iterate and the limit point to reach an arbitrarily small fraction of the initial distance. Our bound includes two terms: one corresponds to the number of iterations required for the linearized version of the fixed-point operator and the other corresponds to the overhead associated with the approximation error. With a focus on the convergence in the scalar case, the tightness of the proposed bound is proven for positively quadratic first-order difference equations.

估計/估計量 · 極大似然 · 最大似然估計 · 似然 · MoDELS ·

2022 年 1 月 11 日

Maximum likelihood estimation in the additive hazards model

Chengyuan Lu,Jelle Goeman,Hein Putter

The additive hazards model specifies the effect of covariates on the hazard in an additive way, in contrast to the popular Cox model, in which it is multiplicative. As non-parametric model, it offers a very flexible way of modeling time-varying covariate effects. It is most commonly estimated by ordinary least squares. In this paper we consider the case where covariates are bounded, and derive the maximum likelihood estimator under the constraint that the hazard is non-negative for all covariate values in their domain. We describe an efficient algorithm to find the maximum likelihood estimator. The method is contrasted with the ordinary least squares approach in a simulation study, and the method is illustrated on a realistic data set.

SimPLe · Storage · 近似 · 近鄰 · 相似度 ·

2022 年 1 月 11 日

Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic

Arnold Filtser,Omrit Filtser,Matthew J. Katz

In the $(1+\varepsilon,r)$-approximate near-neighbor problem for curves (ANNC) under some distance measure $\delta$, the goal is to construct a data structure for a given set $\mathcal{C}$ of curves that supports approximate near-neighbor queries: Given a query curve $Q$, if there exists a curve $C\in\mathcal{C}$ such that $\delta(Q,C)\le r$, then return a curve $C'\in\mathcal{C}$ with $\delta(Q,C')\le(1+\varepsilon)r$. There exists an efficient reduction from the $(1+\varepsilon)$-approximate nearest-neighbor problem to ANNC, where in the former problem the answer to a query is a curve $C\in\mathcal{C}$ with $\delta(Q,C)\le(1+\varepsilon)\cdot\delta(Q,C^*)$, where $C^*$ is the curve of $\mathcal{C}$ closest to $Q$. Given a set $\mathcal{C}$ of $n$ curves, each consisting of $m$ points in $d$ dimensions, we construct a data structure for ANNC that uses $n\cdot O(\frac{1}{\varepsilon})^{md}$ storage space and has $O(md)$ query time (for a query curve of length $m$), where the similarity between two curves is their discrete Fr\'echet or dynamic time warping distance. Our method is simple to implement, deterministic, and results in an exponential improvement in both query time and storage space compared to all previous bounds. Further, we also consider the asymmetric version of ANNC, where the length of the query curves is $k \ll m$, and obtain essentially the same storage and query bounds as above, except that $m$ is replaced by $k$. Finally, we apply our method to a version of approximate range counting for curves and achieve similar bounds.

秩 · MoDELS · 優化器 · 奇異值分解 · 列 ·

2018 年 10 月 18 日

Testing Matrix Rank, Optimally

Maria-Florina Balcan,Yi Li,David P. Woodruff,Hongyang Zhang

from arxiv, 51 pages. To appear in SODA 2019

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.

優化器 · 強化學習 · 學成 · state-of-the-art · SimPLe ·

2018 年 7 月 25 日

Variational Bayesian Reinforcement Learning with Regret Bounds

Brendan O'Donoghue

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.

方差減小 · 估計/估計量 · 置信度 · 優化器 · 學成 ·

2018 年 4 月 25 日

Variance Reduction Methods for Sublinear Reinforcement Learning

Sham Kakade,Mengdi Wang,Lin F. Yang

from arxiv, Fixed a bug of a previous version

This work considers the problem of provably optimal reinforcement learning for episodic finite horizon MDPs, i.e. how an agent learns to maximize his/her long term reward in an uncertain environment. The main contribution is in providing a novel algorithm --- Variance-reduced Upper Confidence Q-learning (vUCQ) --- which enjoys a regret bound of $\widetilde{O}(\sqrt{HSAT} + H^5SA)$, where the $T$ is the number of time steps the agent acts in the MDP, $S$ is the number of states, $A$ is the number of actions, and $H$ is the (episodic) horizon time. This is the first regret bound that is both sub-linear in the model size and asymptotically optimal. The algorithm is sub-linear in that the time to achieve $\epsilon$-average regret for any constant $\epsilon$ is $O(SA)$, which is a number of samples that is far less than that required to learn any non-trivial estimate of the transition model (the transition model is specified by $O(S^2A)$ parameters). The importance of sub-linear algorithms is largely the motivation for algorithms such as $Q$-learning and other "model free" approaches. vUCQ algorithm also enjoys minimax optimal regret in the long run, matching the $\Omega(\sqrt{HSAT})$ lower bound. Variance-reduced Upper Confidence Q-learning (vUCQ) is a successive refinement method in which the algorithm reduces the variance in $Q$-value estimates and couples this estimation scheme with an upper confidence based algorithm. Technically, the coupling of both of these techniques is what leads to the algorithm enjoying both the sub-linear regret property and the asymptotically optimal regret.

MoDELS · SimPLe · CC · 模型評估 · 高斯混合（模型） ·

2018 年 2 月 24 日

The Search Problem in Mixture Models

Avik Ray,Joe Neeman,Sujay Sanghavi,Sanjay Shakkottai

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.