亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='H8TBM'></tfoot>

<legend id='qiD6X'><style id='qruZS'><dir id='8Hvw4'><q id='wXer3'></q></dir></style></legend>

<i id='0jJeH'><tr id='A2IfD'><dt id='KpxCQ'><q id='mtY2I'><span id='AHQkf'><b id='lPKO2'><form id='kp5FZ'><ins id='cDDGO'></ins><ul id='2NN80'></ul><sub id='ALzy6'></sub></form><legend id='fBPfU'></legend><bdo id='BCz0w'><pre id='Vr4ot'><center id='vu3MO'></center></pre></bdo></b><th id='T1rmR'></th></span></q></dt></tr></i><div id='pkCx7'><tfoot id='HSouW'></tfoot><dl id='FfkCL'><fieldset id='WEhoe'></fieldset></dl></div>

<li id='rjJyo'><abbr id='4LYtm'></abbr></li>

·

設計 · 圖 · MoDELS · 論文 · 可理解性 ·

2024 年 11 月 8 日

Parameterized Voter Relevance in Facility Location Games with Tree-Shaped Invitation Graphs

Ryoto Ando,Kei Kimura,Taiki Todo,Makoto Yokoo

Diffusion mechanism design, which investigate how to incentivise agents to invite as many colleagues to a multi-agent decision making as possible, is a new research paradigm at the intersection between microeconomics and computer science. In this paper we extend traditional facility location games into the model of diffusion mechanism design. Our objective is to completely understand to what extent of anonymity/voter-relevance we can achieve, along with strategy-proofness and Pareto efficiency when voters strategically invite collegues. We define a series of anonymity properties applicable to the diffusion mechanism design model, as well as parameterized voter-relevance properties for guaranteeing reasonably-fair decision making. We obtained two impossibility theorems and two existence theorems, which partially answer the question we have raised in the beginning of the paper

相關內容

設計是對(dui)現(xian)有狀的一種重(zhong)新認識和打破重(zhong)組的過程，設計讓一切變得更美。

Agent · 回合 · 全 · 控制器 · 線性的 ·

2024 年 12 月 20 日

LTLf Synthesis on First-Order Agent Programs in Nondeterministic Environments

Till Hofmann,Jens Cla?en

from arxiv, Accepted at AAAI'25

We investigate the synthesis of policies for high-level agent programs expressed in Golog, a language based on situation calculus that incorporates nondeterministic programming constructs. Unlike traditional approaches for program realization that assume full agent control or rely on incremental search, we address scenarios where environmental nondeterminism significantly influences program outcomes. Our synthesis problem involves deriving a policy that successfully realizes a given Golog program while ensuring the satisfaction of a temporal specification, expressed in Linear Temporal Logic on finite traces (LTLf), across all possible environmental behaviors. By leveraging an expressive class of first-order action theories, we construct a finite game arena that encapsulates program executions and tracks the satisfaction of the temporal goal. A game-theoretic approach is employed to derive such a policy. Experimental results demonstrate this approach's feasibility in domains with unbounded objects and non-local effects. This work bridges agent programming and temporal logic synthesis, providing a framework for robust agent behavior in nondeterministic environments.

Learning · 分解 · 機器人 · 可約的 · 對抗學習 ·

2024 年 12 月 19 日

Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration

Junjia Liu,Zhuo Li,Minghao Yu,Zhipeng Dong,Sylvain Calinon,Darwin Caldwell,Fei Chen

from arxiv, 9 pages, 8 figures. Accepted by IEEE Robotics and Automation Magazine

Humanoid robots are envisioned as embodied intelligent agents capable of performing a wide range of human-level loco-manipulation tasks, particularly in scenarios requiring strenuous and repetitive labor. However, learning these skills is challenging due to the high degrees of freedom of humanoid robots, and collecting sufficient training data for humanoid is a laborious process. Given the rapid introduction of new humanoid platforms, a cross-embodiment framework that allows generalizable skill transfer is becoming increasingly critical. To address this, we propose a transferable framework that reduces the data bottleneck by using a unified digital human model as a common prototype and bypassing the need for re-training on every new robot platform. The model learns behavior primitives from human demonstrations through adversarial imitation, and the complex robot structures are decomposed into functional components, each trained independently and dynamically coordinated. Task generalization is achieved through a human-object interaction graph, and skills are transferred to different robots via embodiment-specific kinematic motion retargeting and dynamic fine-tuning. Our framework is validated on five humanoid robots with diverse configurations, demonstrating stable loco-manipulation and highlighting its effectiveness in reducing data requirements and increasing the efficiency of skill transfer across platforms.

Agent · Facebook AI Research · Principle · Processing（編程語言） · 穩健性 ·

2024 年 12 月 19 日

Operationalising Rawlsian Ethics for Fairness in Norm-Learning Agents

Jessica Woodgate,Paul Marshall,Nirav Ajmeri

from arxiv, 14 pages, 7 figures, 8 tables (and supplementary material with reproducibility and additional results), accepted at AAAI 2025

Social norms are standards of behaviour common in a society. However, when agents make decisions without considering how others are impacted, norms can emerge that lead to the subjugation of certain agents. We present RAWL-E, a method to create ethical norm-learning agents. RAWL-E agents operationalise maximin, a fairness principle from Rawlsian ethics, in their decision-making processes to promote ethical norms by balancing societal well-being with individual goals. We evaluate RAWL-E agents in simulated harvesting scenarios. We find that norms emerging in RAWL-E agent societies enhance social welfare, fairness, and robustness, and yield higher minimum experience compared to those that emerge in agent societies that do not implement Rawlsian ethics.

Analysis · MoDELS · 估計/估計量 · 置信度 · Continuity ·

2024 年 12 月 19 日

Sharp Bounds for Continuous-Valued Treatment Effects with Unobserved Confounders

Jean-Baptiste Baitairian,Bernard Sebastien,Rana Jreich,Sandrine Katsahian,Agathe Guilloux

In causal inference, treatment effects are typically estimated under the ignorability, or unconfoundedness, assumption, which is often unrealistic in observational data. By relaxing this assumption and conducting a sensitivity analysis, we introduce novel bounds and derive confidence intervals for the Average Potential Outcome (APO) - a standard metric for evaluating continuous-valued treatment or exposure effects. We demonstrate that these bounds are sharp under a continuous sensitivity model, in the sense that they give the smallest possible interval under this model, and propose a doubly robust version of our estimators. In a comparative analysis with the method of Jesson et al. (2022) (arXiv:2204.10022), using both simulated and real datasets, we show that our approach not only yields sharper bounds but also achieves good coverage of the true APO, with significantly reduced computation times.

Learning · 稀疏 · 優化器 · 強化學習 · 貢獻度分配問題 ·

2024 年 12 月 19 日

Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning

Aditya Kapoor,Sushant Swamy,Kale-ab Tessera,Mayank Baranwal,Mingfei Sun,Harshad Khadilkar,Stefano V. Albrecht

from arxiv, 12 pages, 1 figure

In multi-agent environments, agents often struggle to learn optimal policies due to sparse or delayed global rewards, particularly in long-horizon tasks where it is challenging to evaluate actions at intermediate time steps. We introduce Temporal-Agent Reward Redistribution (TAR$^2$), a novel approach designed to address the agent-temporal credit assignment problem by redistributing sparse rewards both temporally and across agents. TAR$^2$ decomposes sparse global rewards into time-step-specific rewards and calculates agent-specific contributions to these rewards. We theoretically prove that TAR$^2$ is equivalent to potential-based reward shaping, ensuring that the optimal policy remains unchanged. Empirical results demonstrate that TAR$^2$ stabilizes and accelerates the learning process. Additionally, we show that when TAR$^2$ is integrated with single-agent reinforcement learning algorithms, it performs as well as or better than traditional multi-agent reinforcement learning methods.

Networking · MoDELS · 相似度 · motivation · 置換 ·

2024 年 12 月 18 日

Decentralized Convergence to Equilibrium Prices in Trading Networks

Edwin Lock,Benjamin Patrick Evans,Eleonora Kreacic,Sujay Bhatt,Alec Koppel,Sumitra Ganesh,Paul W. Goldberg

from arxiv, Extended version of paper accepted at AAAI'25

We propose a decentralized market model in which agents can negotiate bilateral contracts. This builds on a similar, but centralized, model of trading networks introduced by Hatfield et al. (2013). Prior work has established that fully-substitutable preferences guarantee the existence of competitive equilibria which can be centrally computed. Our motivation comes from the fact that prices in markets such as over-the-counter markets and used car markets arise from \textit{decentralized} negotiation among agents, which has left open an important question as to whether equilibrium prices can emerge from agent-to-agent bilateral negotiations. We design a best response dynamic intended to capture such negotiations between market participants. We assume fully substitutable preferences for market participants. In this setting, we provide proofs of convergence for sparse markets ({covering many real world markets of interest}), and experimental results for more general cases, demonstrating that prices indeed reach equilibrium, quickly, via bilateral negotiations. Our best response dynamic, and its convergence behavior, forms an important first step in understanding how decentralized markets reach, and retain, equilibrium.

閾值 · UCT · 蒙特卡羅 · 蒙特卡洛樹搜索 · Markov ·

2024 年 12 月 18 日

Threshold UCT: Cost-Constrained Monte Carlo Tree Search with Pareto Curves

Martin Kure?ka,Václav Nevyho?těny,Petr Novotny,Vít Un?ovsky

Constrained Markov decision processes (CMDPs), in which the agent optimizes expected payoffs while keeping the expected cost below a given threshold, are the leading framework for safe sequential decision making under stochastic uncertainty. Among algorithms for planning and learning in CMDPs, methods based on Monte Carlo tree search (MCTS) have particular importance due to their efficiency and extendibility to more complex frameworks (such as partially observable settings and games). However, current MCTS-based methods for CMDPs either struggle with finding safe (i.e., constraint-satisfying) policies, or are too conservative and do not find valuable policies. We introduce Threshold UCT (T-UCT), an online MCTS-based algorithm for CMDP planning. Unlike previous MCTS-based CMDP planners, T-UCT explicitly estimates Pareto curves of cost-utility trade-offs throughout the search tree, using these together with a novel action selection and threshold update rules to seek safe and valuable policies. Our experiments demonstrate that our approach significantly outperforms state-of-the-art methods from the literature.

Learning · 設計 · Agent · 在線 · 泛函 ·

2024 年 12 月 18 日

Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

Shuang Qiu,Boxiang Lyu,Qinglin Meng,Zhaoran Wang,Zhuoran Yang,Michael I. Jordan

from arxiv, Accepted in JMLR 2024

Dynamic mechanism design studies how mechanism designers should allocate resources among agents in a time-varying environment. We consider the problem where the agents interact with the mechanism designer according to an unknown Markov Decision Process (MDP), where agent rewards and the mechanism designer's state evolve according to an episodic MDP with unknown reward functions and transition kernels. We focus on the online setting with linear function approximation and propose novel learning algorithms to recover the dynamic Vickrey-Clarke-Grove (VCG) mechanism over multiple rounds of interaction. A key contribution of our approach is incorporating reward-free online Reinforcement Learning (RL) to aid exploration over a rich policy space to estimate prices in the dynamic VCG mechanism. We show that the regret of our proposed method is upper bounded by $\tilde{\mathcal{O}}(T^{2/3})$ and further devise a lower bound to show that our algorithm is efficient, incurring the same $\Omega(T^{2 / 3})$ regret as the lower bound, where $T$ is the total number of rounds. Our work establishes the regret guarantee for online RL in solving dynamic mechanism design problems without prior knowledge of the underlying model.

Learning · Agent · INTERACT · 深度強化學習 · motivation ·

2022 年 8 月 2 日

Deep Reinforcement Learning for Multi-Agent Interaction

Ibrahim H. Ahmed,Cillian Brewitt,Ignacio Carlucho,Filippos Christianos,Mhairi Dunion,Elliot Fosong,Samuel Garcin,Shangmin Guo,Balint Gyevnar,Trevor McInroe,Georgios Papoudakis,Arrasy Rahman,Lukas Sch?fer,Massimiliano Tamborski,Giuseppe Vecchio,Cheng Wang,Stefano V. Albrecht

from arxiv, Published in AI Communications Special Issue on Multi-Agent Systems Research in the UK

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.

塑造 · 可辨認的 · Better · 目標檢測 · state-of-the-art ·

2018 年 1 月 10 日

From Superpixel to Human Shape Modelling for Carried Object Detection

Farnoosh Ghadiri,Robert Bergevin,Guillaume-Alexandre Bilodeau

Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<li id='wn3d5'></li>

_{^{<dd id='wn3d5'><tbody id='wn3d5'><td id='wn3d5'><optgroup id='wn3d5'><strong id='wn3d5'></strong></optgroup><address id='wn3d5'><ul id='wn3d5'></ul></address><big id='wn3d5'></big></td><table id='wn3d5'></table></tbody><pre id='wn3d5'></pre></dd><span id='wn3d5'><b id='wn3d5'></b></span>}}


<dfn id='wn3d5'><optgroup id='wn3d5'></optgroup></dfn><tfoot id='wn3d5'><bdo id='wn3d5'><div id='wn3d5'></div><i id='wn3d5'><dt id='wn3d5'></dt></i></bdo></tfoot>

_{<fieldset id='wn3d5'></fieldset>}