顾美玲国产一区二区三区,日韩A级毛片免费视频

We study dynamic discrete choice models, where a commonly studied problem involves estimating parameters of agent reward functions (also known as "structural" parameters), using agent behavioral data. Maximum likelihood estimation for such models requires dynamic programming, which is limited by the curse of dimensionality. In this work, we present a novel algorithm that provides a data-driven method for selecting and aggregating states, which lowers the computational and sample complexity of estimation. Our method works in two stages. In the first stage, we use a flexible inverse reinforcement learning approach to estimate agent Q-functions. We use these estimated Q-functions, along with a clustering algorithm, to select a subset of states that are the most pivotal for driving changes in Q-functions. In the second stage, with these selected "aggregated" states, we conduct maximum likelihood estimation using a commonly used nested fixed-point algorithm. The proposed two-stage approach mitigates the curse of dimensionality by reducing the problem dimension. Theoretically, we derive finite-sample bounds on the associated estimation error, which also characterize the trade-off of computational complexity, estimation error, and sample complexity. We demonstrate the empirical performance of the algorithm in two classic dynamic discrete choice estimation applications.

相關內容

離散

關注 0

蒙特卡羅 · Performer · Q函數 · Markov · Learning ·

2023 年 5 月 29 日

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

Haque Ishfaq,Qingfeng Lan,Pan Xu,A. Rupam Mahmood,Doina Precup,Anima Anandkumar,Kamyar Azizzadenesheli

We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of $\tilde{O}(d^{3/2}H^{5/2}\sqrt{T})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $T$ is the total number of steps. We apply this approach to deep RL, by using Adam optimizer to perform gradient updates. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.

值域 · Networking · state-of-the-art · Performer · Ghost（博客程序） ·

2023 年 5 月 29 日

Deep Progressive Feature Aggregation Network for High Dynamic Range Imaging

Jun Xiao,Qian Ye,Tianshan Liu,Cong Zhang,Kin-Man Lam

High dynamic range (HDR) imaging is an important task in image processing that aims to generate well-exposed images in scenes with varying illumination. Although existing multi-exposure fusion methods have achieved impressive results, generating high-quality HDR images in dynamic scenes is still difficult. The primary challenges are ghosting artifacts caused by object motion between low dynamic range images and distorted content in under and overexposed regions. In this paper, we propose a deep progressive feature aggregation network for improving HDR imaging quality in dynamic scenes. To address the issues of object motion, our method implicitly samples high-correspondence features and aggregates them in a coarse-to-fine manner for alignment. In addition, our method adopts a densely connected network structure based on the discrete wavelet transform, which aims to decompose the input features into multiple frequency subbands and adaptively restore corrupted contents. Experiments show that our proposed method can achieve state-of-the-art performance under different scenes, compared to other promising HDR imaging methods. Specifically, the HDR images generated by our method contain cleaner and more detailed content, with fewer distortions, leading to better visual quality.

估計/估計量 · MoDELS · 吉布斯采樣/吉布斯抽樣 · 馬爾可夫鏈蒙特卡羅 · Markov ·

2023 年 5 月 27 日

bqror: An R package for Bayesian Quantile Regression in Ordinal Models

Prajual Maheshwari,Mohammad Arshad Rahman

from arxiv, 17 Pages, 4 figures, 2 Algorithms

This article describes an R package bqror that estimates Bayesian quantile regression for ordinal models introduced in Rahman (2016). The paper classifies ordinal models into two types and offers computationally efficient, yet simple, Markov chain Monte Carlo (MCMC) algorithms for estimating ordinal quantile regression. The generic ordinal model with 3 or more outcomes (labeled ORI model) is estimated by a combination of Gibbs sampling and Metropolis-Hastings algorithm. Whereas an ordinal model with exactly 3 outcomes (labeled ORII model) is estimated using Gibbs sampling only. In line with the Bayesian literature, we suggest using marginal likelihood for comparing alternative quantile regression models and explain how to compute the same. The models and their estimation procedures are illustrated via multiple simulation studies and implemented in two applications. The article also describes several other functions contained within the bqror package, which are necessary for estimation, inference, and assessing model fit.

Learning · INTERACT · Agent · 強化學習 · 可辨認的 ·

2023 年 5 月 26 日

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

Paul Barde,Jakob Foerster,Derek Nowrouzezahrai,Amy Zhang

Training multiple agents to coordinate is an important problem with applications in robotics, game theory, economics, and social sciences. However, most existing Multi-Agent Reinforcement Learning (MARL) methods are online and thus impractical for real-world applications in which collecting new interactions is costly or dangerous. While these algorithms should leverage offline data when available, doing so gives rise to the offline coordination problem. Specifically, we identify and formalize the strategy agreement (SA) and the strategy fine-tuning (SFT) challenges, two coordination issues at which current offline MARL algorithms fail. To address this setback, we propose a simple model-based approach that generates synthetic interaction data and enables agents to converge on a strategy while fine-tuning their policies accordingly. Our resulting method, Model-based Offline Multi-Agent Proximal Policy Optimization (MOMA-PPO), outperforms the prevalent learning methods in challenging offline multi-agent MuJoCo tasks even under severe partial observability and with learned world models.

有偏 · 主動學習 · MoDELS · Learning · 簇 ·

2023 年 5 月 26 日

D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias

Sabit Hassan,Malihe Alikhani

from arxiv, ACL FINDINGS 2023

Despite recent advancements, NLP models continue to be vulnerable to bias. This bias often originates from the uneven distribution of real-world data and can propagate through the annotation process. Escalated integration of these models in our lives calls for methods to mitigate bias without overbearing annotation costs. While active learning (AL) has shown promise in training models with a small amount of annotated data, AL's reliance on the model's behavior for selective sampling can lead to an accumulation of unwanted bias rather than bias mitigation. However, infusing clustering with AL can overcome the bias issue of both AL and traditional annotation methods while exploiting AL's annotation efficiency. In this paper, we propose a novel adaptive clustering-based active learning algorithm, D-CALM, that dynamically adjusts clustering and annotation efforts in response to an estimated classifier error-rate. Experiments on eight datasets for a diverse set of text classification tasks, including emotion, hatespeech, dialog act, and book type detection, demonstrate that our proposed algorithm significantly outperforms baseline AL approaches with both pretrained transformers and traditional Support Vector Machines. D-CALM showcases robustness against different measures of information gain and, as evident from our analysis of label and error distribution, can significantly reduce unwanted model bias.

dynamic programming · Learning · Performer · 變換 · MoDELS ·

2023 年 5 月 25 日

Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

Taku Yamagata,Ahmed Khalil,Raul Santos-Rodriguez

Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results. The Decision Transformer (DT) combines the conditional policy approach and a transformer architecture, showing competitive performance against several benchmarks. However, DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal policy from sub-optimal trajectories. This issue becomes particularly significant when the offline dataset only contains sub-optimal trajectories. On the other hand, the conventional RL approaches based on Dynamic Programming (such as Q-learning) do not have the same limitation; however, they suffer from unstable learning behaviours, especially when they rely on function approximation in an off-policy learning setting. In this paper, we propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT by leveraging the benefits of Dynamic Programming (Q-learning). It utilises the Dynamic Programming results to relabel the return-to-go in the training data to then train the DT with the relabelled data. Our approach efficiently exploits the benefits of these two approaches and compensates for each other's shortcomings to achieve better performance. We empirically show these in both simple toy environments and the more complex D4RL benchmark, showing competitive performance gains.

Processing（編程語言） · Markov · 回合 · 易處理的 · 策略迭代 ·

2023 年 5 月 25 日

Markov Decision Process with an External Temporal Process

Ranga Shaarad Ayyagari,Ambedkar Dukkipati

Most reinforcement learning algorithms treat the context under which they operate as a stationary, isolated and undisturbed environment. However, in the real world, the environment is constantly changing due to a variety of external influences. To address this problem, we study Markov Decision Processes (MDP) under the influence of an external temporal process. We formalize this notion and discuss conditions under which the problem becomes tractable with suitable solutions. We propose a policy iteration algorithm to solve this problem and theoretically analyze its performance.

Learning · ENJOY · 代價 · PAC學習理論 · 優化器 ·

2023 年 5 月 25 日

The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

Kaiwen Wang,Kevin Zhou,Runzhe Wu,Nathan Kallus,Wen Sun

While distributional reinforcement learning (RL) has demonstrated empirical success, the question of when and why it is beneficial has remained unanswered. In this work, we provide one explanation for the benefits of distributional RL through the lens of small-loss bounds, which scale with the instance-dependent optimal cost. If the optimal cost is small, our bounds are stronger than those from non-distributional approaches. As warmup, we show that learning the cost distribution leads to small-loss regret bounds in contextual bandits (CB), and we find that distributional CB empirically outperforms the state-of-the-art on three challenging tasks. For online RL, we propose a distributional version-space algorithm that constructs confidence sets using maximum likelihood estimation, and we prove that it achieves small-loss regret in the tabular MDPs and enjoys small-loss PAC bounds in latent variable models. Building on similar insights, we propose a distributional offline RL algorithm based on the pessimism principle and prove that it enjoys small-loss PAC bounds, which exhibit a novel robustness property. For both online and offline RL, our results provide the first theoretical benefits of learning distributions even when we only need the mean for making decisions.

可辨認的 · 情景 · MoDELS · 矩 · 離散化 ·

2023 年 5 月 25 日

Identification in Some Discrete Choice Models: A Computational Approach

Eric Mbakop

This paper presents an algorithm that generates the conditional moment inequalities that characterize the identified set of the common parameter of various semi-parametric panel multinomial choice models. I consider both static and dynamic models, and consider various weak stochastic restrictions on the distribution of observed and unobserved components of the models. For a broad class of such stochastic restrictions, the paper demonstrates that the inequalities characterizing the identified set of the common parameter can be obtained as solutions of multiple objective linear programs (MOLPs), thereby transforming the task of finding these inequalities into a purely computational problem. The algorithm that I provide reproduces many well-known results, including the conditional moment inequalities derived in Manski 1987, Pakes and Porter 2023, and Khan, Ponomareva, and Tamer 2023. Moreover, I use the algorithm to generate some new results, by providing characterizations of the identified set in some cases that were left open in Pakes and Porter 2023 and Khan, Ponomareva, and Tamer 2023, as well as characterizations of the identified set under alternative stochastic restrictions.

估計/估計量 · 狀態估計 · MCMC · 可交換的 · Networking ·

2023 年 5 月 24 日

Deep Learning-enabled MCMC for Probabilistic State Estimation in District Heating Grids

Andreas Bott,Tim Janke,Florian Steinke

from arxiv, The code for this paper is available under //github.com/EINS-TUDa/DNN_MCMC4DH

Flexible district heating grids form an important part of future, low-carbon energy systems. We examine probabilistic state estimation in such grids, i.e., we aim to estimate the posterior probability distribution over all grid state variables such as pressures, temperatures, and mass flows conditional on measurements of a subset of these states. Since the posterior state distribution does not belong to a standard class of probability distributions, we use Markov Chain Monte Carlo (MCMC) sampling in the space of network heat exchanges and evaluate the samples in the grid state space to estimate the posterior. Converting the heat exchange samples into grid states by solving the non-linear grid equations makes this approach computationally burdensome. However, we propose to speed it up by employing a deep neural network that is trained to approximate the solution of the exact but slow non-linear solver. This novel approach is shown to deliver highly accurate posterior distributions both for classic tree-shaped as well as meshed heating grids, at significantly reduced computational costs that are acceptable for online control. Our state estimation approach thus enables tightening the safety margins for temperature and pressure control and thereby a more efficient grid operation.