国产乱伦对白刺激视频_国产一级一区二区三区四区_美女18黄禁视频网站在线观看_日韩丝袜亚洲国产中文欧美一区_欧美激情视频一区二区_色综合一个色综合亚洲一网_欧美日韩精品一区二区三区视频在线

In dynamic mechanism design literature, one critical aspect has been typically ignored-the agents' periodic participation, which they can adapt and plan strategically. We propose a framework for dynamic principal-multiagent problems, augmenting the classic model by incorporating agents' periodic coupled decisions on participation and regular action selections. The principal faces adverse selection and designs a mechanism comprising a task policy profile (defining evolving agent action menus), a coupling policy profile (affecting agent utilities), and an off-switch function profile (assigning rewards or penalties upon agent withdrawal). Firstly, we introduce payoff-flow conservation-a sufficient condition to ensure dynamic incentive compatibility for regular actions. Secondly, we formulate a unique process, persistence transformation, which integrates task policy's implicit functions, enabling a closed-form off-switch function derivation, hence securing sufficient conditions for agents' coupled decisions' incentive compatibility, aligning with the principal's preferences. Thirdly, we go beyond the traditional envelope theorem by presenting a necessary condition for incentive compatibility, leveraging the coupled optimality of principal-desired actions. This approach helps explicitly formulate both the coupling and off-switch functions. Finally, we establish envelope-like conditions exclusively on the task policies, facilitating the application of the first-order approach.

相關內容

Agent

關注 16

Performer · INTERACT · Markov · 塑造 · 極大 ·

2023 年 8 月 2 日

Reward Shaping for Building Trustworthy Robots in Sequential Human-Robot Interaction

Yaohui Guo,X. Jessie Yang,Cong Shi

from arxiv, In Proceedings of 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Trust-aware human-robot interaction (HRI) has received increasing research attention, as trust has been shown to be a crucial factor for effective HRI. Research in trust-aware HRI discovered a dilemma -- maximizing task rewards often leads to decreased human trust, while maximizing human trust would compromise task performance. In this work, we address this dilemma by formulating the HRI process as a two-player Markov game and utilizing the reward-shaping technique to improve human trust while limiting performance loss. Specifically, we show that when the shaping reward is potential-based, the performance loss can be bounded by the potential functions evaluated at the final states of the Markov game. We apply the proposed framework to the experience-based trust model, resulting in a linear program that can be efficiently solved and deployed in real-world applications. We evaluate the proposed framework in a simulation scenario where a human-robot team performs a search-and-rescue mission. The results demonstrate that the proposed framework successfully modifies the robot's optimal policy, enabling it to increase human trust at a minimal task performance cost.

剪枝 · Performer · 價值函數 · 泛函 · 部分可觀測馬爾可夫決策過程 ·

2023 年 8 月 1 日

Data Association Aware POMDP Planning with Hypothesis Pruning Performance Guarantees

Moran Barenboim,Idan Lev-Yehudi,Vadim Indelman

Autonomous agents that operate in the real world must often deal with partial observability, which is commonly modeled as partially observable Markov decision processes (POMDPs). However, traditional POMDP models rely on the assumption of complete knowledge of the observation source, known as fully observable data association. To address this limitation, we propose a planning algorithm that maintains multiple data association hypotheses, represented as a belief mixture, where each component corresponds to a different data association hypothesis. However, this method can lead to an exponential growth in the number of hypotheses, resulting in significant computational overhead. To overcome this challenge, we introduce a pruning-based approach for planning with ambiguous data associations. Our key contribution is to derive bounds between the value function based on the complete set of hypotheses and the value function based on a pruned-subset of the hypotheses, enabling us to establish a trade-off between computational efficiency and performance. We demonstrate how these bounds can both be used to certify any pruning heuristic in retrospect and propose a novel approach to determine which hypotheses to prune in order to ensure a predefined limit on the loss. We evaluate our approach in simulated environments and demonstrate its efficacy in handling multi-modal belief hypotheses with ambiguous data associations.

INFORMS · 設計 · MoDELS · 查準率/準確率 · Analysis ·

2023 年 7 月 31 日

Augmented Symbolic Execution for Information Flow in Hardware Designs

Kaki Ryan,Matthew Gregoire,Cynthia Sturton

We present SEIF, a methodology that combines static analysis with symbolic execution to verify and explicate information flow paths in a hardware design. SEIF begins with a statically built model of the information flow through a design and uses guided symbolic execution to recognize and eliminate non-flows with high precision or to find corresponding paths through the design state for true flows. We evaluate SEIF on two open-source CPUs, an AES core, and the AKER access control module. SEIF can exhaustively explore 10-12 clock cycles deep in 4-6 seconds on average, and can automatically account for 86-90% of the paths in the statically built model. Additionally, SEIF can be used to find multiple violating paths for security properties, providing a new angle for security verification.

知識 (knowledge) · 可理解性 · 蒸餾 · 端到端 · Learning ·

2023 年 7 月 31 日

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding

Umberto Cappellazzo,Muqiao Yang,Daniele Falavigna,Alessio Brutti

from arxiv, Accepted at INTERSPEECH 2023. Code (will be) available at //github.com/umbertocappellazzo/SLURP-SeqKD

The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments. Their propensity to fit the current data distribution to the detriment of the past acquired knowledge leads to the catastrophic forgetting issue. In this work we tackle the problem of Spoken Language Understanding applied to a continual learning setting. We first define a class-incremental scenario for the SLURP dataset. Then, we propose three knowledge distillation (KD) approaches to mitigate forgetting for a sequence-to-sequence transformer model: the first KD method is applied to the encoder output (audio-KD), and the other two work on the decoder output, either directly on the token-level (tok-KD) or on the sequence-level (seq-KD) distributions. We show that the seq-KD substantially improves all the performance metrics, and its combination with the audio-KD further decreases the average WER and enhances the entity prediction metric.

潛在 · MoDELS · 同質 · 圖像分割 · 不可約的 ·

2023 年 7 月 31 日

Investigating and Improving Latent Density Segmentation Models for Aleatoric Uncertainty Quantification in Medical Imaging

M. M. Amaan Valiuddin,Christiaan G. A. Viviers,Ruud J. G. van Sloun,Peter H. N. de With,Fons van der Sommen

from arxiv, 12 pages incl. references, 11 figures

Data uncertainties, such as sensor noise or occlusions, can introduce irreducible ambiguities in images, which result in varying, yet plausible, semantic hypotheses. In Machine Learning, this ambiguity is commonly referred to as aleatoric uncertainty. Latent density models can be utilized to address this problem in image segmentation. The most popular approach is the Probabilistic U-Net (PU-Net), which uses latent Normal densities to optimize the conditional data log-likelihood Evidence Lower Bound. In this work, we demonstrate that the PU- Net latent space is severely inhomogenous. As a result, the effectiveness of gradient descent is inhibited and the model becomes extremely sensitive to the localization of the latent space samples, resulting in defective predictions. To address this, we present the Sinkhorn PU-Net (SPU-Net), which uses the Sinkhorn Divergence to promote homogeneity across all latent dimensions, effectively improving gradient-descent updates and model robustness. Our results show that by applying this on public datasets of various clinical segmentation problems, the SPU-Net receives up to 11% performance gains compared against preceding latent variable models for probabilistic segmentation on the Hungarian-Matched metric. The results indicate that by encouraging a homogeneous latent space, one can significantly improve latent density modeling for medical image segmentation.

Learning · TOOLS · 策略搜索 · HTTPS · 機器人 ·

2023 年 7 月 31 日

Learning Generalizable Tool Use with Non-rigid Grasp-pose Registration

Malte Mosbach,Sven Behnke

Tool use, a hallmark feature of human intelligence, remains a challenging problem in robotics due the complex contacts and high-dimensional action space. In this work, we present a novel method to enable reinforcement learning of tool use behaviors. Our approach provides a scalable way to learn the operation of tools in a new category using only a single demonstration. To this end, we propose a new method for generalizing grasping configurations of multi-fingered robotic hands to novel objects. This is used to guide the policy search via favorable initializations and a shaped reward signal. The learned policies solve complex tool use tasks and generalize to unseen tools at test time. Visualizations and videos of the trained policies are available at //maltemosbach.github.io/generalizable_tool_use.

MoDELS · 估計/估計量 · 控制器 · 馬爾可夫鏈蒙特卡羅 · Markov ·

2023 年 7 月 30 日

A Switching State-Space Transmission Model for Tracking Epidemics and Assessing Interventions

Jingxue Feng,Liangliang Wang

from arxiv, 41 pages, 16 figures

The effective control of infectious diseases relies on accurate assessment of the impact of interventions, which is often hindered by the complex dynamics of the spread of disease. We propose a Beta-Dirichlet switching state-space transmission model to track underlying dynamics of disease and evaluate the effectiveness of interventions simultaneously. As time evolves, the switching mechanism introduced in the susceptible-exposed-infected-recovered (SEIR) model is able to capture the timing and magnitude of changes in the transmission rate due to the effectiveness of control measures. The implementation of this model is based on a particle Markov Chain Monte Carlo algorithm, which can estimate the time evolution of SEIR states, switching states, and high-dimensional parameters efficiently. The efficacy of our model and estimation procedure are demonstrated through simulation studies. With a real-world application to British Columbia's COVID-19 outbreak, it indicates approximately a 66.6\% reduction of transmission rate following interventions such as distancing, closures and vaccination. Our proposed model provides a promising tool to inform public health policies aimed at studying the underlying dynamics and evaluating of the effectiveness of interventions during the spread of the disease.

穩健性 · MoDELS · 有偏 · INFORMS · Better ·

2023 年 7 月 28 日

Towards Building More Robust Models with Frequency Bias

Qingwen Bu,Dong Huang,Heming Cui

from arxiv, Accepted by ICCV23

The vulnerability of deep neural networks to adversarial samples has been a major impediment to their broad applications, despite their success in various fields. Recently, some works suggested that adversarially-trained models emphasize the importance of low-frequency information to achieve higher robustness. While several attempts have been made to leverage this frequency characteristic, they have all faced the issue that applying low-pass filters directly to input images leads to irreversible loss of discriminative information and poor generalizability to datasets with distinct frequency features. This paper presents a plug-and-play module called the Frequency Preference Control Module that adaptively reconfigures the low- and high-frequency components of intermediate feature representations, providing better utilization of frequency in robust learning. Empirical studies show that our proposed module can be easily incorporated into any adversarial training framework, further improving model robustness across different architectures and datasets. Additionally, experiments were conducted to examine how the frequency bias of robust models impacts the adversarial training process and its final robustness, revealing interesting insights.

Learning · Agent · INTERACT · 深度強化學習 · motivation ·

2022 年 8 月 2 日

Deep Reinforcement Learning for Multi-Agent Interaction

Ibrahim H. Ahmed,Cillian Brewitt,Ignacio Carlucho,Filippos Christianos,Mhairi Dunion,Elliot Fosong,Samuel Garcin,Shangmin Guo,Balint Gyevnar,Trevor McInroe,Georgios Papoudakis,Arrasy Rahman,Lukas Sch?fer,Massimiliano Tamborski,Giuseppe Vecchio,Cheng Wang,Stefano V. Albrecht

from arxiv, Published in AI Communications Special Issue on Multi-Agent Systems Research in the UK

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.

INFORMS · 可辨認的 · Networking · Neural Networks · 黑盒 ·

2021 年 10 月 4 日

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

Yang Zhang,Ashkan Khakzar,Yawei Li,Azade Farshad,Seong Tae Kim,Nassir Navab

from arxiv, Accepted in NeurIPS 2021 (Neural Information Processing Systems)

One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network's prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features' information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.