一级欧美一级日韩大片,国产精品免费线观看你懂的,国产成人精品综合久久久免费,亚洲特级无码AV播放人

We present and analyze the Krylov-Bellman Boosting (KBB) algorithm for policy evaluation in general state spaces. It alternates between fitting the Bellman residual using non-parametric regression (as in boosting), and estimating the value function via the least-squares temporal difference (LSTD) procedure applied with a feature set that grows adaptively over time. By exploiting the connection to Krylov methods, we equip this method with two attractive guarantees. First, we provide a general convergence bound that allows for separate estimation errors in residual fitting and LSTD computation. Consistent with our numerical experiments, this bound shows that convergence rates depend on the restricted spectral structure, and are typically super-linear. Second, by combining this meta-result with sample-size dependent guarantees for residual fitting and LSTD computation, we obtain concrete statistical guarantees that depend on the sample size along with the complexity of the function class used to fit the residuals. We illustrate the behavior of the KBB algorithm for various types of policy evaluation problems, and typically find large reductions in sample complexity relative to the standard approach of fitted value iterationn.

相關內容

策略評估

關注 0

Analysis · 離散化 · 估計/估計量 · 近似 · 正則化項 ·

2022 年 12 月 2 日

Convergence Rate Analysis of Galerkin Approximation of Inverse Potential Problem

Bangti Jin,Xiliang Lu,Qimeng Quan,Zhi Zhou

from arxiv, 23 pages, 4 figures

In this work we analyze the inverse problem of recovering the space-dependent potential coefficient in an elliptic / parabolic problem from distributed observation. We establish novel (weighted) conditional stability estimates under very mild conditions on the problem data. Then we provide an error analysis of a standard reconstruction scheme based on the standard output least-squares formulation with Tikhonov regularization (by an $H^1$-seminorm penalty), which is then discretized by the Galerkin finite element method with continuous piecewise linear finite elements in space (and also backward Euler method in time for parabolic problems). We present a detailed analysis of the discrete scheme, and provide convergence rates in a weighted $L^2(\Omega)$ for discrete approximations with respect to the exact potential. The error bounds are explicitly dependent on the noise level, regularization parameter and discretization parameter(s). Under suitable conditions, we also derive error estimates in the standard $L^2(\Omega)$ and interior $L^2$ norms. The analysis employs sharp a priori error estimates and nonstandard test functions. Several numerical experiments are given to complement the theoretical analysis.

線性的 · 統計量 · 線性回歸 · MoDELS · 泛函 ·

2022 年 12 月 1 日

Testing linearity in semi-functional partially linear regression models

Yongzhen Feng,Jie Li,Xiaojun Song

This paper proposes a Kolmogorov-Smirnov type statistic and a Cram\'er-von Mises type statistic to test linearity in semi-functional partially linear regression models. Our test statistics are based on a residual marked empirical process indexed by a randomly projected functional covariate,which is able to circumvent the "curse of dimensionality" brought by the functional covariate. The asymptotic properties of the proposed test statistics under the null, the fixed alternative, and a sequence of local alternatives converging to the null at the $n^{1/2}$ rate are established. A straightforward wild bootstrap procedure is suggested to estimate the critical values that are required to carry out the tests in practical applications. Results from an extensive simulation study show that our tests perform reasonably well in finite samples.Finally, we apply our tests to the Tecator and AEMET datasets to check whether the assumption of linearity is supported by these datasets.

策略評估 · 優化器 · 情景 · Performer · 有偏 ·

2022 年 12 月 1 日

Offline Policy Evaluation and Optimization under Confounding

Kevin Tan,Yangyi Lu,Chinmaya Kausik,Yixin Wang,Ambuj Tewari

With a few exceptions, work in offline reinforcement learning (RL) has so far assumed that there is no confounding. In a classical regression setting, confounders introduce omitted variable bias and inhibit the identification of causal effects. In offline RL, they prevent the identification of a policy's value, and therefore make it impossible to perform policy improvement. Using conventional methods in offline RL in the presence of confounding can therefore not only lead to poor decisions and poor policies, but can also have disastrous effects in applications such as healthcare and education. We provide approaches for both off-policy evaluation (OPE) and local policy optimization in the settings of i.i.d. and global confounders. Theoretical and empirical results confirm the validity and viability of these methods.

狀態空間 · 近似 · MoDELS · 估計/估計量 · 確切的 ·

2022 年 11 月 30 日

Efficient variational approximations for state space models

Rubén Loaiza-Maya,Didier Nibbering

Variational Bayes methods are a scalable estimation approach for many complex state space models. However, existing methods exhibit a trade-off between accurate estimation and computational efficiency. This paper proposes a variational approximation that mitigates this trade-off. This approximation is based on importance densities that have been proposed in the context of efficient importance sampling. By directly conditioning on the observed data, the proposed method produces an accurate approximation to the exact posterior distribution. Because the steps required for its calibration are computationally efficient, the approach is faster than existing variational Bayes methods. The proposed method can be applied to any state space model that has a closed-form measurement density function and a state transition distribution that belongs to the exponential family of distributions. We illustrate the method in numerical experiments with stochastic volatility models and a macroeconomic empirical application using a high-dimensional state space model.

近似貝葉斯計算 · 統計量 · 近似 · 估計/估計量 · 核化 ·

2022 年 11 月 30 日

Approximate Bayesian Computation via Classification

Yuexi Wang,Tetsuya Kaji,Veronika Ro?ková

Approximate Bayesian Computation (ABC) enables statistical inference in simulator-based models whose likelihoods are difficult to calculate but easy to simulate from. ABC constructs a kernel-type approximation to the posterior distribution through an accept/reject mechanism which compares summary statistics of real and simulated data. To obviate the need for summary statistics, we directly compare empirical distributions with a Kullback-Leibler (KL) divergence estimator obtained via contrastive learning. In particular, we blend flexible machine learning classifiers within ABC to automate fake/real data comparisons. We consider the traditional accept/reject kernel as well as an exponential weighting scheme which does not require the ABC acceptance threshold. Our theoretical results show that the rate at which our ABC posterior distributions concentrate around the true parameter depends on the estimation error of the classifier. We derive limiting posterior shape results and find that, with a properly scaled exponential kernel, asymptotic normality holds. We demonstrate the usefulness of our approach on simulated examples as well as real data in the context of stock volatility estimation.

均值 · 曲率 · 估計/估計量 · 向量化 · 情景 ·

2022 年 11 月 30 日

Exponential Concentration for Geometric-Median-of-Means in Non-Positive Curvature Spaces

Ho Yun,Byeong U. Park

In Euclidean spaces, the empirical mean vector as an estimator of the population mean is known to have polynomial concentration unless a strong tail assumption is imposed on the underlying probability measure. The idea of median-of-means tournament has been considered as a way of overcoming the sub-optimality of the empirical mean vector. In this paper, to address the sub-optimal performance of the empirical mean in a more general setting, we consider general Polish spaces with a general metric, which are allowed to be non-compact and of infinite-dimension. We discuss the estimation of the associated population Frechet mean, and for this we extend the existing notion of median-of-means to this general setting. We devise several new notions and inequalities associated with the geometry of the underlying metric, and using them we study the concentration properties of the extended notions of median-of-means as the estimators of the population Frechet mean. We show that the new estimators achieve exponential concentration under only a second moment condition on the underlying distribution, while the empirical Frechet mean has polynomial concentration. We focus our study on spaces with non-positive Alexandrov curvature since they afford slower rates of convergence than spaces with positive curvature. We note that this is the first work that derives non-asymptotic concentration inequalities for extended notions of the median-of-means in non-vector spaces with a general metric.

INTERACT · 易處理的 · Markov · 樣本 · 類別 ·

2022 年 11 月 30 日

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

Han Zhong,Wei Xiong,Sirui Zheng,Liwei Wang,Zhaoran Wang,Zhuoran Yang,Tong Zhang

from arxiv, We changed the title from the first version. We fixed a technical issue in the first version regarding the $\ell_2$ eluder technique (Lemma D.2)

We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation in online interactive decision making. In specific, GEC captures the hardness of exploration by comparing the error of predicting the performance of the updated policy with the in-sample training error evaluated on the historical data. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR, where generalized regular PSR, a new tractable PSR class identified by us, includes nearly all known tractable POMDPs and PSRs. Furthermore, in terms of algorithm design, we propose a generic posterior sampling algorithm, which can be implemented in both model-free and model-based fashion, under both fully observable and partially observable settings. The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning. We prove that the proposed algorithm is sample efficient by establishing a sublinear regret upper bound in terms of GEC. In summary, we provide a new and unified understanding of both fully observable and partially observable RL.

語音增強 · 泰勒 · motivation · 泛函 · Learning ·

2022 年 11 月 30 日

A General Deep Learning Speech Enhancement Framework Motivated by Taylor's Theorem

Andong Li,Guochen Yu,Chengshi Zheng,Wenzhe Liu,Xiaodong Li

from arxiv, Submitted to TASLP, 13 pages

While deep neural networks greatly facilitate the proliferation of the speech enhancement field, most of the existing methods are developed following either heuristic or blind optimization criteria, which severely hampers interpretability and transparency. Inspired by Taylor's theorem, we propose a general unfolding framework for both single- and multi-channel speech enhancement tasks. Concretely, we formulate the complex spectrum recovery into the spectral magnitude mapping in the neighboring space of the noisy mixture, in which the sparse prior is introduced for phase modification in advance. Based on that, the mapping function is decomposed into the superimposition of the 0th-order and high-order polynomials in Taylor's series, where the former coarsely removes the interference in the magnitude domain and the latter progressively complements the remaining spectral detail in the complex spectrum domain. In addition, we study the relation between adjacent order term and reveal that each high-order term can be recursively estimated with its lower-order term, and each high-order term is then proposed to evaluate using a surrogate function with trainable weights, so that the whole system can be trained in an end-to-end manner. Extensive experiments are conducted on WSJ0-SI84, DNS-Challenge, Voicebank+Demand, and spatialized Librispeech datasets. Quantitative results show that the proposed approach not only yields competitive performance over existing top-performed approaches, but also enjoys decent internal transparency and flexibility.

近似 · 優化器 · 泛函 · 全局優化 · 近似誤差 ·

2022 年 11 月 30 日

Policy Optimization over General State and Action Spaces

Guanghui Lan

Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging. In contrast to the tableau setting, one can not enumerate all the states and then iteratively update the policies for each state. This prevents the application of many well-studied RL methods especially those with provable convergence guarantees. In this paper, we first present a substantial generalization of the recently developed policy mirror descent method to deal with general state and action spaces. We introduce new approaches to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all. Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation techniques can be applied. We establish linear convergence rate to global optimality or sublinear convergence to stationarity for these methods applied to solve different classes of RL problems under exact policy evaluation. We then define proper notions of the approximation errors for policy evaluation and investigate their impact on the convergence of these methods applied to general-state RL problems with either finite-action or continuous-action spaces. To the best of our knowledge, the development of these algorithmic frameworks as well as their convergence analysis appear to be new in the literature.

線性的 · 值域 · 情景 · 優化器 · 約束 ·

2022 年 11 月 30 日

Surrogate "Level-Based" Lagrangian Relaxation for Mixed-Integer Linear Programming

Mikhail A. Bragin,Emily L. Tucker

Mixed-Integer Linear Programming (MILP) plays an important role across a range of scientific disciplines and within areas of strategic importance to society. The MILP problems, however, suffer from combinatorial complexity. Because of integer decision variables, as the problem size increases, the number of possible solutions increases super-linearly thereby leading to a drastic increase in the computational effort. To efficiently solve MILP problems, a "price-based" decomposition and coordination approach is developed to exploit 1. the super-linear reduction of complexity upon the decomposition and 2. the geometric convergence potential inherent to Polyak's stepsizing formula for the fastest coordination possible to obtain near-optimal solutions in a computationally efficient manner. Unlike all previous methods to set stepsizes heuristically by adjusting hyperparameters, the key novel way to obtain stepsizes is purely decision-based: a novel "auxiliary" constraint satisfaction problem is solved, from which the appropriate stepsizes are inferred. Testing results for large-scale Generalized Assignment Problems (GAP) demonstrate that for the majority of instances, certifiably optimal solutions are obtained. For stochastic job-shop scheduling as well as for pharmaceutical scheduling, computational results demonstrate the two orders of magnitude speedup as compared to Branch-and-Cut (B&C). The new method has a major impact on the efficient resolution of complex Mixed-Integer Programming (MIP) problems arising within a variety of scientific fields.