国产欧美日韩综合在线-日韩专区欧美专区亚洲福利

from arxiv, 8 pages main content, 14 with references and appendix. 5 figures in total. Submitted and accepted to ICLR 2022 workshop on Generalizable Policy Learning in the Physical World (//ai-workshops.github.io/generalizable-policy-learning-in-the-physical-world/)

Training self-driving systems to be robust to the long-tail of driving scenarios is a critical problem. Model-based approaches leverage simulation to emulate a wide range of scenarios without putting users at risk in the real world. One promising path to faithful simulation is to train a forward model of the world to predict the future states of both the environment and the ego-vehicle given past states and a sequence of actions. In this paper, we argue that it is beneficial to model the state of the ego-vehicle, which often has simple, predictable and deterministic behavior, separately from the rest of the environment, which is much more complex and highly multimodal. We propose to model the ego-vehicle using a simple and differentiable kinematic model, while training a stochastic convolutional forward model on raster representations of the state to predict the behavior of the rest of the environment. We explore several configurations of such decoupled models, and evaluate their performance both with Model Predictive Control (MPC) and direct policy learning. We test our methods on the task of highway driving and demonstrate lower crash rates and better stability. The code is available at //github.com/vladisai/pytorch-PPUU/tree/ICLR2022.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 多任務學習 · 層 · 可辨認的 · 知識 (knowledge) ·

2022 年 6 月 6 日

Automatic Expert Selection for Multi-Scenario and Multi-Task Search

Xinyu Zou,Zhi Hu,Yiming Zhao,Xuchu Ding,Zhongyi Liu,Chenliang Li,Aixin Sun

from arxiv, Accepted by SIGIR 2022; 10 pages, 8 figures

Multi-scenario learning (MSL) enables a service provider to cater for users' fine-grained demands by separating services for different user sectors, e.g., by user's geographical region. Under each scenario there is a need to optimize multiple task-specific targets e.g., click through rate and conversion rate, known as multi-task learning (MTL). Recent solutions for MSL and MTL are mostly based on the multi-gate mixture-of-experts (MMoE) architecture. MMoE structure is typically static and its design requires domain-specific knowledge, making it less effective in handling both MSL and MTL. In this paper, we propose a novel Automatic Expert Selection framework for Multi-scenario and Multi-task search, named AESM^{2}. AESM^{2} integrates both MSL and MTL into a unified framework with an automatic structure learning. Specifically, AESM^{2} stacks multi-task layers over multi-scenario layers. This hierarchical design enables us to flexibly establish intrinsic connections between different scenarios, and at the same time also supports high-level feature extraction for different tasks. At each multi-scenario/multi-task layer, a novel expert selection algorithm is proposed to automatically identify scenario-/task-specific and shared experts for each input. Experiments over two real-world large-scale datasets demonstrate the effectiveness of AESM^{2} over a battery of strong baselines. Online A/B test also shows substantial performance gain on multiple metrics. Currently, AESM^{2} has been deployed online for serving major traffic.

特化 · 剪枝 · 優化器 · Learning · 可約的 ·

2022 年 6 月 6 日

Optimal Fine-Grained N:M sparsity for Activations and Neural Gradients

Brian Chmiel,Itay Hubara,Ron Banner,Daniel Soudry

from arxiv, Main changes: 1) Experiments (see also experiments in the appendix). 2) Overhead analysis (Tab 3)

In deep learning, fine-grained N:M sparsity reduces the data footprint and bandwidth of a General Matrix multiply (GEMM) by x2, and doubles throughput by skipping computation of zero values. So far, it was only used to prune weights. We examine how this method can be used also for activations and their gradients (i.e., "neural gradients"). To this end, we first establish a tensor-level optimality criteria. Previous works aimed to minimize the mean-square-error (MSE) of each pruned block. We show that while minimization of the MSE works fine for pruning the activations, it catastrophically fails for the neural gradients. Instead, we show that optimal pruning of the neural gradients requires an unbiased minimum-variance pruning mask. We design such specialized masks, and find that in most cases, 1:2 sparsity is sufficient for training, and 2:4 sparsity is usually enough when this is not the case. Further, we suggest combining several such methods together in order to potentially speed up training even more. A reference implementation is supplied in //github.com/brianchmiel/Act-and-Grad-structured-sparsity.

INTERACT · 控制器 · INFORMS · 優化器 · Attention ·

2022 年 6 月 5 日

Active Uncertainty Reduction for Human-Robot Interaction: An Implicit Dual Control Approach

Haimin Hu,Jaime F. Fisac

from arxiv, Workshop on the Algorithmic Foundations of Robotics (WAFR) 2022

The ability to accurately predict human behavior is central to the safety and efficiency of robot autonomy in interactive settings. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as people's goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning, mainly due to the fundamental coupling between robot trajectory optimization and human intent inference. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem that can be readily solved by real-time gradient-based optimization methods. The resulting policy is shown to preserve the dual control effect for a broad class of predictive human models with both continuous and categorical uncertainty. The efficacy of our approach is demonstrated with simulated driving examples.

Learning · 獎勵函數 · 泛函 · 可辨認的 · MoDELS ·

2022 年 6 月 5 日

Models of human preference for learning reward functions

W. Bradley Knox,Stephane Hatgis-Kessell,Serena Booth,Scott Niekum,Peter Stone,Alessandro Allievi

from arxiv, 9 pages (24 pages with references and appendix), 13 figures

The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments. These human preferences are typically assumed to be informed solely by partial return, the sum of rewards along each segment. We find this assumption to be flawed and propose modeling preferences instead as arising from a different statistic: each segment's regret, a measure of a segment's deviation from optimal decision-making. Given infinitely many preferences generated according to regret, we prove that we can identify a reward function equivalent to the reward function that generated those preferences. We also prove that the previous partial return model lacks this identifiability property without preference noise that reveals rewards' relative proportions, and we empirically show that our proposed regret preference model outperforms it with finite training data in otherwise the same setting. Additionally, our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned. Overall, this work establishes that the choice of preference model is impactful, and our proposed regret preference model provides an improvement upon a core assumption of recent research.

Learning · Agent · 回合 · 近似 · 強化學習 ·

2022 年 6 月 4 日

Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning

Dilip Arumugam,Benjamin Van Roy

from arxiv, Accepted to the Multi-Disciplinary Conference on Reinforcement Learning and Decision Making (RLDM) 2022

The quintessential model-based reinforcement-learning agent iteratively refines its estimates or prior beliefs about the true underlying model of the environment. Recent empirical successes in model-based reinforcement learning with function approximation, however, eschew the true model in favor of a surrogate that, while ignoring various facets of the environment, still facilitates effective planning over behaviors. Recently formalized as the value equivalence principle, this algorithmic technique is perhaps unavoidable as real-world reinforcement learning demands consideration of a simple, computationally-bounded agent interacting with an overwhelmingly complex environment. In this work, we entertain an extreme scenario wherein some combination of immense environment complexity and limited agent capacity entirely precludes identifying an exactly value-equivalent model. In light of this, we embrace a notion of approximate value equivalence and introduce an algorithm for incrementally synthesizing simple and useful approximations of the environment from which an agent might still recover near-optimal behavior. Crucially, we recognize the information-theoretic nature of this lossy environment compression problem and use the appropriate tools of rate-distortion theory to make mathematically precise how value equivalence can lend tractability to otherwise intractable sequential decision-making problems.

估計/估計量 · MoDELS · Performer · 圖片分類 · 自助法/自舉法 ·

2022 年 6 月 3 日

Estimation of Over-parameterized Models via Fitting to Future Observations

Yiran Jiang,Chuanhai Liu

from arxiv, 20 pages, 6 figures

From a model-building perspective, in this paper we propose a paradigm shift for fitting over-parameterized models. Philosophically, the mindset is to fit models to future observations rather than to the observed sample. Technically, choosing an imputation model for generating future observations, we fit over-parameterized models to future observations via optimizing an approximation to the desired expected loss-function based on its sample counterpart and an adaptive simplicity-preference function. This technique is discussed in detail to both creating bootstrap imputation and final estimation with bootstrap imputation. The method is illustrated with the many-normal-means problem, $n < p$ linear regression, and deep convolutional neural networks for image classification of MNIST digits. The numerical results demonstrate superior performance across these three different types of applications. For example, for the many-normal-means problem, our method uniformly dominates James-Stein and Efron's $g-$modeling, and for the MNIST image classification, it performs better than all existing methods and reaches arguably the best possible result. While this paper is largely expository because of the ambitious task of taking a look at over-parameterized models from the new perspective, fundamental theoretical properties are also investigated. We conclude the paper with a few remarks.

前向搜索 · 前向 · 可行 · Continuity · INFORMS ·

2022 年 6 月 3 日

Bidirectional Sampling Based Search Without Two Point Boundary Value Solution

Sharan Nayak,Michael W. Otte

from arxiv, Journal version (Video: //youtu.be/Rumg66UHfyQ). Accepted to IEEE Transactions on Robotics (T-RO)

Bidirectional motion planning approaches decrease planning time, on average, compared to their unidirectional counterparts. In single-query feasible motion planning, using bidirectional search to find a continuous motion plan requires an edge connection between the forward and reverse search trees. Such a tree-tree connection requires solving a two-point Boundary Value Problem (BVP). However, a two-point BVP solution can be difficult or impossible to calculate for many systems. We present a novel bidirectional search strategy that does not require solving the two-point BVP. Instead of connecting the forward and reverse trees directly, the reverse tree's cost information is used as a guiding heuristic for the forward search. This enables the forward search to quickly converge to a feasible solution without solving the two-point BVP. We propose two new algorithms (GBRRT and GABRRT) that use this strategy and run multiple software simulations using multiple dynamical systems and real-world hardware experiments to show that our algorithms perform on-par or better than existing state-of-the-art methods in quickly finding an initial feasible solution.

Learning · Analysis · 等分回歸 · 可辨認的 · 易處理的 ·

2022 年 6 月 3 日

Market Segmentation Trees

Ali Aouad,Adam N. Elmachtoub,Kris J. Ferreira,Ryan McNellis

We seek to provide an interpretable framework for segmenting users in a population for personalized decision-making. We propose a general methodology, Market Segmentation Trees (MSTs), for learning market segmentations explicitly driven by identifying differences in user response patterns. To demonstrate the versatility of our methodology, we design two new, specialized MST algorithms: (i) Choice Model Trees (CMTs), which can be used to predict a user's choice amongst multiple options and (ii) Isotonic Regression Trees (IRTs), which can be used to solve the bid landscape forecasting problem. We provide a theoretical analysis of the asymptotic running times of our algorithmic methods, which validates their computational tractability on large datasets. We also provide a customizable, open-source code base for training MSTs in Python which employs several strategies for scalability, including parallel processing and warm starts. Finally, we assess the practical performance of MSTs on several synthetic and real world datasets, showing that our method reliably finds market segmentations which accurately model response behavior. Moreover, MSTs are interpretable since the market segments can easily be described by a decision tree and often require only a fraction of the number of market segments generated by traditional approaches.

INFORMS · FCN · Attention · 變換 · 全卷積網絡 ·

2022 年 6 月 3 日

A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images

Libo Wang,Rui Li,Chenxi Duan,Ce Zhang,Xiaoliang Meng,Shenghui Fang

from arxiv, Accepted by GRSL.//ieeexplore.ieee.org/abstract/document/9681903

The fully convolutional network (FCN) with an encoder-decoder architecture has been the standard paradigm for semantic segmentation. The encoder-decoder architecture utilizes an encoder to capture multilevel feature maps, which are incorporated into the final prediction by a decoder. As the context is crucial for precise segmentation, tremendous effort has been made to extract such information in an intelligent fashion, including employing dilated/atrous convolutions or inserting attention modules. However, these endeavors are all based on the FCN architecture with ResNet or other backbones, which cannot fully exploit the context from the theoretical concept. By contrast, we introduce the Swin Transformer as the backbone to extract the context information and design a novel decoder of densely connected feature aggregation module (DCFAM) to restore the resolution and produce the segmentation map. The experimental results on two remotely sensed semantic segmentation datasets demonstrate the effectiveness of the proposed scheme.Code is available at //github.com/WangLibo1995/GeoSeg

注意力機制 · 學成 · 端到端 · Networking · 損失函數（機器學習） ·

2018 年 3 月 28 日

End-to-End Multi-Task Learning with Attention

Shikun Liu,Edward Johns,Andrew J. Davison

from arxiv, submitted to ECCV 2018

In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.