亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Interpretability of reinforcement learning policies is essential for many real-world tasks but learning such interpretable policies is a hard problem. Particularly rule-based policies such as decision trees and rules lists are difficult to optimize due to their non-differentiability. While existing techniques can learn verifiable decision tree policies there is no guarantee that the learners generate a decision that performs optimally. In this work, we study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) and propose OMDTs: Optimal MDP Decision Trees. Given a user-defined size limit and MDP formulation OMDT directly maximizes the expected discounted return for the decision tree using Mixed-Integer Linear Programming. By training optimal decision tree policies for different MDPs we empirically study the optimality gap for existing imitation learning techniques and find that they perform sub-optimally. We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees. In such cases, it is better to directly optimize the tree for expected return. While there is generally a trade-off between the performance and interpretability of machine learning models, we find that OMDTs limited to a depth of 3 often perform close to the optimal limit.

相關內容

A central task in control theory, artificial intelligence, and formal methods is to synthesize reward-maximizing strategies for agents that operate in partially unknown environments. In environments modeled by gray-box Markov decision processes (MDPs), the impact of the agents' actions are known in terms of successor states but not the stochastics involved. In this paper, we devise a strategy synthesis algorithm for gray-box MDPs via reinforcement learning that utilizes interval MDPs as internal model. To compete with limited sampling access in reinforcement learning, we incorporate two novel concepts into our algorithm, focusing on rapid and successful learning rather than on stochastic guarantees and optimality: lower confidence bound exploration reinforces variants of already learned practical strategies and action scoping reduces the learning action space to promising actions. We illustrate benefits of our algorithms by means of a prototypical implementation applied on examples from the AI and formal methods communities.

Understanding dynamics in complex systems is challenging because there are many degrees of freedom, and those that are most important for describing events of interest are often not obvious. The leading eigenfunctions of the transition operator are useful for visualization, and they can provide an efficient basis for computing statistics such as the likelihood and average time of events (predictions). Here we develop inexact iterative linear algebra methods for computing these eigenfunctions (spectral estimation) and making predictions from a data set of short trajectories sampled at finite intervals. We demonstrate the methods on a low-dimensional model that facilitates visualization and a high-dimensional model of a biomolecular system. Implications for the prediction problem in reinforcement learning are discussed.

Most recent 6D object pose estimation methods first use object detection to obtain 2D bounding boxes before actually regressing the pose. However, the general object detection methods they use are ill-suited to handle cluttered scenes, thus producing poor initialization to the subsequent pose network. To address this, we propose a rigidity-aware detection method exploiting the fact that, in 6D pose estimation, the target objects are rigid. This lets us introduce an approach to sampling positive object regions from the entire visible object area during training, instead of naively drawing samples from the bounding box center where the object might be occluded. As such, every visible object part can contribute to the final bounding box prediction, yielding better detection robustness. Key to the success of our approach is a visibility map, which we propose to build using a minimum barrier distance between every pixel in the bounding box and the box boundary. Our results on seven challenging 6D pose estimation datasets evidence that our method outperforms general detection frameworks by a large margin. Furthermore, combined with a pose regression network, we obtain state-of-the-art pose estimation results on the challenging BOP benchmark.

Estimating the entropy rate of discrete time series is a challenging problem with important applications in numerous areas including neuroscience, genomics, image processing and natural language processing. A number of approaches have been developed for this task, typically based either on universal data compression algorithms, or on statistical estimators of the underlying process distribution. In this work, we propose a fully-Bayesian approach for entropy estimation. Building on the recently introduced Bayesian Context Trees (BCT) framework for modelling discrete time series as variable-memory Markov chains, we show that it is possible to sample directly from the induced posterior on the entropy rate. This can be used to estimate the entire posterior distribution, providing much richer information than point estimates. We develop theoretical results for the posterior distribution of the entropy rate, including proofs of consistency and asymptotic normality. The practical utility of the method is illustrated on both simulated and real-world data, where it is found to outperform state-of-the-art alternatives.

Currently decision making is one of the biggest challenges in autonomous driving. This paper introduces a method for safely navigating an autonomous vehicle in highway scenarios by combining deep Q-Networks and insight from control theory. A Deep Q-Network is trained in simulation to serve as a central decision-making unit by proposing targets for a trajectory planner. The generated trajectories in combination with a controller for longitudinal movement are used to execute lane change maneuvers. In order to prove the functionality of this approach it is evaluated on two different highway traffic scenarios. Furthermore, the impact of different state representations on the performance and training process is analyzed. The results show that the proposed system can produce efficient and safe driving behavior.

Information that is of relevance for decision-making is often distributed, and held by self-interested agents. Decision markets are well-suited mechanisms to elicit such information and aggregate it into conditional forecasts that can be used for decision-making. However, for incentive-compatible elicitation, decision markets rely on stochastic decision rules which entails that sometimes actions have to be taken that have been predicted to be sub-optimal. In this work, we propose three closely related mechanisms that elicit and aggregate information similar to a decision market, but are incentive compatible despite using a deterministic decision rule. Following ideas from peer prediction mechanisms, proxies rather than observed future outcomes are used to score predictions. The first mechanism requires the principal to have her own signal, which is then used as a proxy to elicit information from a group of self-interested agents. The principal then deterministically maps the aggregated forecasts and the proxy to the best possible decision. The second and third mechanisms expand the first to cover a scenario where the principal does not have access to her own signal. The principal offers a partial profit to align the interest of one agent and retrieve its signal as a proxy; or alternatively uses a proper peer prediction mechanism to elicit signals from two agents. Aggregation and decision-making then follow the first mechanism. We evaluate our first mechanism using a multi-agent bandit learning system. The result suggests that the mechanism can train agents to achieve a performance similar to a Bayesian inference model with access to all information held by the agents.

We consider the problem of state estimation from $m$ linear measurements, where the state $u$ to recover is an element of the manifold $\mathcal{M}$ of solutions of a parameter-dependent equation. The state is estimated using a prior knowledge on $\mathcal{M}$ coming from model order reduction. Variational approaches based on linear approximation of $\mathcal{M}$, such as PBDW, yields a recovery error limited by the Kolmogorov $m$-width of $\mathcal{M}$. To overcome this issue, piecewise-affine approximations of $\mathcal{M}$ have also be considered, that consist in using a library of linear spaces among which one is selected by minimizing some distance to $\mathcal{M}$. In this paper, we propose a state estimation method relying on dictionary-based model reduction, where a space is selected from a library generated by a dictionary of snapshots, using a distance to the manifold. The selection is performed among a set of candidate spaces obtained from the path of a $\ell_1$-regularized least-squares problem. Then, in the framework of parameter-dependent operator equations (or PDEs) with affine parameterizations, we provide an efficient offline-online decomposition based on randomized linear algebra, that ensures efficient and stable computations while preserving theoretical guarantees.

Event extraction (EE) plays an important role in many industrial application scenarios, and high-quality EE methods require a large amount of manual annotation data to train supervised learning models. However, the cost of obtaining annotation data is very high, especially for annotation of domain events, which requires the participation of experts from corresponding domain. So we introduce active learning (AL) technology to reduce the cost of event annotation. But the existing AL methods have two main problems, which make them not well used for event extraction. Firstly, the existing pool-based selection strategies have limitations in terms of computational cost and sample validity. Secondly, the existing evaluation of sample importance lacks the use of local sample information. In this paper, we present a novel deep AL method for EE. We propose a batch-based selection strategy and a Memory-Based Loss Prediction model (MBLP) to select unlabeled samples efficiently. During the selection process, we use an internal-external sample loss ranking method to evaluate the sample importance by using local information. Finally, we propose a delayed training strategy to train the MBLP model. Extensive experiments are performed on three domain datasets, and our method outperforms other state-of-the-art methods.

This paper proposes an extension of regression trees by quadratic unconstrained binary optimization (QUBO). Regression trees are very popular prediction models that are trainable with tabular datasets, but their accuracy is insufficient because the decision rules are too simple. The proposed method extends the decision rules in decision trees to multi-dimensional boundaries. Such an extension is generally unimplementable because of computational limitations, however, the proposed method transforms the training process to QUBO, which enables an annealing machine to solve this problem.

Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.

北京阿比特科技有限公司