99热日韩这里只有国产中文精品_91超碰人妻偷情在线播放_高清精品一区二区三区_久久久久久亚洲精品影院_91舒服在线视频_久久水蜜桃国产精品一区_国产传媒一区二区在线观看

Recently, self-learning methods based on user satisfaction metrics and contextual bandits have shown promising results to enable consistent improvements in conversational AI systems. However, directly targeting such metrics by off-policy bandit learning objectives often increases the risk of making abrupt policy changes that break the current user experience. In this study, we introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints. For example, we may want to ensure fewer policy deviations in business-critical domains such as shopping, while allocating more exploration budget to domains such as music. Furthermore, we present a novel meta-gradient learning approach that is scalable and practical to address this problem. The proposed method adjusts constraint violation penalty terms adaptively through a meta objective that encourages balanced constraint satisfaction across domains. We conduct extensive experiments using data from a real-world conversational AI on a set of realistic constraint benchmarks. Based on the experimental results, we demonstrate that the proposed approach is capable of achieving the best balance between the policy value and constraint satisfaction rate.

相關內容

賭博機(ji)/老(lao)虎(hu)機(ji)

關注 0

相關系數 · Learning · 正則化項 · MoDELS · 層 ·

2022 年 10 月 26 日

On the Versatile Uses of Partial Distance Correlation in Deep Learning

Xingjian Zhen,Zihang Meng,Rudrasis Chakraborty,Vikas Singh

from arxiv, This paper has been selected as best paper award for ECCV 2022!

Comparing the functional behavior of neural network models, whether it is a single network over time or two (or more networks) during or post-training, is an essential step in understanding what they are learning (and what they are not), and for identifying strategies for regularization or efficiency improvements. Despite recent progress, e.g., comparing vision transformers to CNNs, systematic comparison of function, especially across different networks, remains difficult and is often carried out layer by layer. Approaches such as canonical correlation analysis (CCA) are applicable in principle, but have been sparingly used so far. In this paper, we revisit a (less widely known) from statistics, called distance correlation (and its partial variant), designed to evaluate correlation between feature spaces of different dimensions. We describe the steps necessary to carry out its deployment for large scale models -- this opens the door to a surprising array of applications ranging from conditioning one deep model w.r.t. another, learning disentangled representations as well as optimizing diverse models that would directly be more robust to adversarial attacks. Our experiments suggest a versatile regularizer (or constraint) with many advantages, which avoids some of the common difficulties one faces in such analyses. Code is at //github.com/zhenxingjian/Partial_Distance_Correlation.

近似 · 優化器 · 泛化理論 · Pivotal（公司） · Notability ·

2022 年 10 月 26 日

Approximations for Generalized Unsplittable Flow on Paths with Application to Power Systems Optimization

Areg Karapetyan,Khaled Elbassioni,Majid Khonji,Chi-Kin Chau

The Unsplittable Flow on a Path (UFP) problem has garnered considerable attention as a challenging combinatorial optimization problem with notable practical implications. Steered by its pivotal applications in power engineering, the present work formulates a novel generalization of UFP, wherein demands and capacities in the input instance are monotone step functions over the set of edges. As an initial step towards tackling this generalization, we draw on and extend ideas from prior research to devise a quasi-polynomial time approximation scheme (QPTAS) under the premise that the demands and capacities lie in a quasi-polynomial range. Second, retaining the same assumption, an efficient logarithmic approximation is introduced for the single-source variant of the problem. Finally, we round up the contributions by designing a (kind of) black-box reduction that, under some mild conditions, allows to translate LP-based approximation algorithms for the studied problem into their counterparts for the Alternating Current Optimal Power Flow (AC OPF) problem -- a fundamental workflow in operation and control of power systems.

穩健性 · Learning · INFORMS · Performer · 回合 ·

2022 年 10 月 26 日

Uncertainty-based Meta-Reinforcement Learning for Robust Radar Tracking

Julius Ott,Lorenzo Servadei,Gianfranco Mauro,Thomas Stadelmayer,Avik Santra,Robert Wille

from arxiv, accepted at ICMLA 2022

Nowadays, Deep Learning (DL) methods often overcome the limitations of traditional signal processing approaches. Nevertheless, DL methods are barely applied in real-life applications. This is mainly due to limited robustness and distributional shift between training and test data. To this end, recent work has proposed uncertainty mechanisms to increase their reliability. Besides, meta-learning aims at improving the generalization capability of DL models. By taking advantage of that, this paper proposes an uncertainty-based Meta-Reinforcement Learning (Meta-RL) approach with Out-of-Distribution (OOD) detection. The presented method performs a given task in unseen environments and provides information about its complexity. This is done by determining first and second-order statistics on the estimated reward. Using information about its complexity, the proposed algorithm is able to point out when tracking is reliable. To evaluate the proposed method, we benchmark it on a radar-tracking dataset. There, we show that our method outperforms related Meta-RL approaches on unseen tracking scenarios in peak performance by 16% and the baseline by 35% while detecting OOD data with an F1-Score of 72%. This shows that our method is robust to environmental changes and reliably detects OOD scenarios.

約束優化 · Learning · Machine Learning · 優化器 · 設計 ·

2022 年 10 月 25 日

UNIFY: a Unified Policy Designing Framework for Solving Constrained Optimization Problems with Machine Learning

Mattia Silvestri,Allegra De Filippo,Michele Lombardi,Michela Milano

The interplay between Machine Learning (ML) and Constrained Optimization (CO) has recently been the subject of increasing interest, leading to a new and prolific research area covering (e.g.) Decision Focused Learning and Constrained Reinforcement Learning. Such approaches strive to tackle complex decision problems under uncertainty over multiple stages, involving both explicit (cost function, constraints) and implicit knowledge (from data), and possibly subject to execution time restrictions. While a good degree of success has been achieved, the existing methods still have limitations in terms of both applicability and effectiveness. For problems in this class, we propose UNIFY, a unified framework to design a solution policy for complex decision-making problems. Our approach relies on a clever decomposition of the policy in two stages, namely an unconstrained ML model and a CO problem, to take advantage of the strength of each approach while compensating for its weaknesses. With a little design effort, UNIFY can generalize several existing approaches, thus extending their applicability. We demonstrate the method effectiveness on two practical problems, namely an Energy Management System and the Set Multi-cover with stochastic coverage requirements. Finally, we highlight some current challenges of our method and future research directions that can benefit from the cross-fertilization of the two fields.

優化器 · Continuity · Extensibility · 相互獨立的 · MoDELS ·

2022 年 10 月 25 日

DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems

Ruizhong Qiu,Zhiqing Sun,Yiming Yang

from arxiv, Accepted at NeurIPS 2022

Recently, deep reinforcement learning (DRL) models have shown promising results in solving NP-hard Combinatorial Optimization (CO) problems. However, most DRL solvers can only scale to a few hundreds of nodes for combinatorial optimization problems on graphs, such as the Traveling Salesman Problem (TSP). This paper addresses the scalability challenge in large-scale combinatorial optimization by proposing a novel approach, namely, DIMES. Unlike previous DRL methods which suffer from costly autoregressive decoding or iterative refinements of discrete solutions, DIMES introduces a compact continuous space for parameterizing the underlying distribution of candidate solutions. Such a continuous space allows stable REINFORCE-based training and fine-tuning via massively parallel sampling. We further propose a meta-learning framework to enable the effective initialization of model parameters in the fine-tuning stage. Extensive experiments show that DIMES outperforms recent DRL-based methods on large benchmark datasets for Traveling Salesman Problems and Maximal Independent Set problems.

覆蓋 · 損失 · 優化器 · 估計/估計量 · Performer ·

2022 年 10 月 25 日

Evaluating and Optimizing Hearing-Aid Self-Fitting Methods using Population Coverage

Dhruv Vyas,Erik Jorgensen,Yu-Hsiang Wu,Octav Chipara

Adults with mild-to-moderate hearing loss can use over-the-counter hearing aids to treat their hearing loss at a fraction of traditional hearing care costs. These products incorporate self-fitting methods that allow end-users to configure their hearing aids without the help of an audiologist. A self-fitting method helps users configure the gain-frequency responses that control the amplification for each frequency band of the incoming sound. This paper considers how to design effective self-fitting methods and whether we may evaluate certain aspects of their design without resorting to expensive user studies. Most existing fitting methods provide various user interfaces to allow users to select a configuration from a predetermined set of presets. We propose a novel metric for evaluating the performance of preset-based approaches by computing their population coverage. The population coverage estimates the fraction of users for which it is possible to find a configuration they prefer. A unique aspect of our approach is a probabilistic model that captures how a user's unique preferences differ from other users with similar hearing loss. Next, we develop methods for determining presets to maximize population coverage. Exploratory results demonstrate that the proposed algorithms can effectively select a small number of presets that provide higher population coverage than clustering-based approaches. Moreover, we may use our algorithms to configure the number of increments for slider-based methods.

簇 · 近似 · 易處理的 · 可辨認的 · 講稿 ·

2022 年 10 月 24 日

Clustering with fair-center representation: parameterized approximation algorithms and heuristics

Suhas Thejaswi,Ameet Gadekar,Bruno Ordozgoiti,Michal Osadnik

We study a variant of classical clustering formulations in the context of algorithmic fairness, known as diversity-aware clustering. In this variant we are given a collection of facility subsets, and a solution must contain at least a specified number of facilities from each subset while simultaneously minimizing the clustering objective ($k$-median or $k$-means). We investigate the fixed-parameter tractability of these problems and show several negative hardness and inapproximability results, even when we afford exponential running time with respect to some parameters. Motivated by these results we identify natural parameters of the problem, and present fixed-parameter approximation algorithms with approximation ratios $\big(1 + \frac{2}{e} +\epsilon \big)$ and $\big(1 + \frac{8}{e}+ \epsilon \big)$ for diversity-aware $k$-median and diversity-aware $k$-means respectively, and argue that these ratios are essentially tight assuming the gap-exponential time hypothesis. We also present a simple and more practical bicriteria approximation algorithm with better running time bounds. We finally propose efficient and practical heuristics. We evaluate the scalability and effectiveness of our methods in a wide variety of rigorously conducted experiments, on both real and synthetic data.

剪枝 · Performer · Weight · Networking · Neural Networks ·

2022 年 10 月 24 日

On the optimization and pruning for Bayesian deep learning

Xiongwen Ke,Yanan Fan

from arxiv, 11 pages

The goal of Bayesian deep learning is to provide uncertainty quantification via the posterior distribution. However, exact inference over the weight space is computationally intractable due to the ultra-high dimensions of the neural network. Variational inference (VI) is a promising approach, but naive application on weight space does not scale well and often underperform on predictive accuracy. In this paper, we propose a new adaptive variational Bayesian algorithm to train neural networks on weight space that achieves high predictive accuracy. By showing that there is an equivalence to Stochastic Gradient Hamiltonian Monte Carlo(SGHMC) with preconditioning matrix, we then propose an MCMC within EM algorithm, which incorporates the spike-and-slab prior to capture the sparsity of the neural network. The EM-MCMC algorithm allows us to perform optimization and model pruning within one-shot. We evaluate our methods on CIFAR-10, CIFAR-100 and ImageNet datasets, and demonstrate that our dense model can reach the state-of-the-art performance and our sparse model perform very well compared to previously proposed pruning schemes.

Facebook AI Research · 正則化項 · 優化器 · Markov · 分解的 ·

2022 年 10 月 22 日

Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems

Eric Yang Yu,Zhizhen Qin,Min Kyung Lee,Sicun Gao

from arxiv, 14 pages, published in NeurIPS 2022

Long-term fairness is an important factor of consideration in designing and deploying learning-based decision systems in high-stake decision-making contexts. Recent work has proposed the use of Markov Decision Processes (MDPs) to formulate decision-making with long-term fairness requirements in dynamically changing environments, and demonstrated major challenges in directly deploying heuristic and rule-based policies that worked well in static environments. We show that policy optimization methods from deep reinforcement learning can be used to find strictly better decision policies that can often achieve both higher overall utility and less violation of the fairness requirements, compared to previously-known strategies. In particular, we propose new methods for imposing fairness requirements in policy optimization by regularizing the advantage evaluation of different actions. Our proposed methods make it easy to impose fairness constraints without reward engineering or sacrificing training efficiency. We perform detailed analyses in three established case studies, including attention allocation in incident monitoring, bank loan approval, and vaccine distribution in population networks.

state-of-the-art · 任務對話系統 · CASE · 自動問答 · GROUP ·

2018 年 9 月 21 日

Neural Approaches to Conversational AI

Jianfeng Gao,Michel Galley,Lihong Li

from arxiv, Submitted to Foundations and Trends in Information Retrieval (85 pages)

The present paper surveys neural approaches to conversational AI that have been developed in the last few years. We group conversational systems into three categories: (1) question answering agents, (2) task-oriented dialogue agents, and (3) chatbots. For each category, we present a review of state-of-the-art neural approaches, draw the connection between them and traditional approaches, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies.