亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We introduce a class of networked Markov potential games where agents are associated with nodes in a network. Each agent has its own local potential function, and the reward of each agent depends only on the states and actions of agents within a $\kappa$-hop neighborhood. In this context, we propose a localized actor-critic algorithm. The algorithm is scalable since each agent uses only local information and does not need access to the global state. Further, the algorithm overcomes the curse of dimensionality through the use of function approximation. Our main results provide finite-sample guarantees up to a localization error and a function approximation error. Specifically, we achieve an $\tilde{\mathcal{O}}(\epsilon^{-4})$ sample complexity measured by the averaged Nash regret. This is the first finite-sample bound for multi-agent competitive games that does not depend on the number of agents.

相關內容

Mobility-as-a-Service (MaaS) systems are two-sided markets, with two mutually exclusive sets of agents, i.e., travelers/users and operators, forming a mobility ecosystem in which multiple operators compete or cooperate to serve customers under a governing platform provider. This study proposes a MaaS platform equilibrium model based on many-to-many assignment games incorporating both fixed-route transit services and mobility-on-demand (MOD) services. The matching problem is formulated as a multicommodity flow network design problem under congestion. The local stability conditions reflect a generalization of Wardrop's principles that include operator decisions. A subsidy mechanism from the platform is proposed to guarantee local stability. An exact solution algorithm is proposed based on a branch and bound framework with a Frank-Wolfe algorithm integrated with Lagrangian relaxation and subgradient optimization, which guarantees the optimality of the matching problem but not stability. A heuristic which integrates stability conditions and subsidy design is proposed, which reaches either the optimal MaaS platform equilibrium solution with global stability, or a feasible locally stable solution that may require subsidy. A worst-case bound and condition for obtaining an exact solution are both identified. Two sets of reproducible numerical experiments are conducted. The first, on a toy network, verifies the model and algorithm, and illustrates the differences between local and global stability. The second, on an expanded Sioux Falls network with 82 nodes and 748 links, derives generalizable insights about the model for coopetitive interdependencies between operators sharing the platform, handling congestion effects in MOD services, effects of local stability on investment impacts, and illustrating inequities that may arise under heterogeneous populations.

The maximization of submodular functions have found widespread application in areas such as machine learning, combinatorial optimization, and economics, where practitioners often wish to enforce various constraints; the matroid constraint has been investigated extensively due to its algorithmic properties and expressive power. Recent progress has focused on fast algorithms for important classes of matroids given in explicit form. Currently, nearly-linear time algorithms only exist for graphic and partition matroids [ICALP '19]. In this work, we develop algorithms for monotone submodular maximization constrained by graphic, transversal matroids, or laminar matroids in time near-linear in the size of their representation. Our algorithms achieve an optimal approximation of $1-1/e-\epsilon$ and both generalize and accelerate the results of Ene and Nguyen [ICALP '19]. In fact, the running time of our algorithm cannot be improved within the fast continuous greedy framework of Badanidiyuru and Vondr\'ak [SODA '14]. To achieve near-linear running time, we make use of dynamic data structures that maintain bases with approximate maximum cardinality and weight under certain element updates. These data structures need to support a weight decrease operation and a novel FREEZE operation that allows the algorithm to freeze elements (i.e. force to be contained) in its basis regardless of future data structure operations. For the laminar matroid, we present a new dynamic data structure using the top tree interface of Alstrup, Holm, de Lichtenberg, and Thorup [TALG '05] that maintains the maximum weight basis under insertions and deletions of elements in $O(\log n)$ time. For the transversal matroid the FREEZE operation corresponds to requiring the data structure to keep a certain set $S$ of vertices matched, a property that we call $S$-stability.

We propose fast and communication-efficient optimization algorithms for multi-robot rotation averaging and translation estimation problems that arise from collaborative simultaneous localization and mapping (SLAM), structure-from-motion (SfM), and camera network localization applications. Our methods are based on theoretical relations between the Hessians of the underlying Riemannian optimization problems and the Laplacians of suitably weighted graphs. We leverage these results to design a collaborative solver in which robots coordinate with a central server to perform approximate second-order optimization, by solving a Laplacian system at each iteration. Crucially, our algorithms permit robots to employ spectral sparsification to sparsify intermediate dense matrices before communication, and hence provide a mechanism to trade off accuracy with communication efficiency with provable guarantees. We perform rigorous theoretical analysis of our methods and prove that they enjoy (local) linear rate of convergence. Furthermore, we show that our methods can be combined with graduated non-convexity to achieve outlier-robust estimation. Extensive experiments on real-world SLAM and SfM scenarios demonstrate the superior convergence rate and communication efficiency of our methods.

Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency. For one thing, the universally adopted black box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local and global dependencies at the same time, inevitably limiting the overall performance. To address these mentioned issues, we first formulate the degradation process of the high-resolution multispectral (HrMS) image as a unified variational optimization problem, and alternately solve its data and prior subproblems by the designed iterative proximal gradient descent (PGD) algorithm. Moreover, we customize a Local-Global Transformer (LGT) to simultaneously model local and global dependencies, and further formulate an LGT-based prior module for image denoising. Besides the prior module, we also design a lightweight data module. Finally, by serially integrating the data and prior modules in each iterative stage, we unfold the iterative algorithm into a stage-wise unfolding network, Local-Global Transformer Enhanced Unfolding Network (LGTEUN), for the interpretable MS pan-sharpening. Comprehensive experimental results on three satellite data sets demonstrate the effectiveness and efficiency of LGTEUN compared with state-of-the-art (SOTA) methods. The source code is available at //github.com/lms-07/LGTEUN.

We study online learning in episodic constrained Markov decision processes (CMDPs), where the goal of the learner is to collect as much reward as possible over the episodes, while guaranteeing that some long-term constraints are satisfied during the learning process. Rewards and constraints can be selected either stochastically or adversarially, and the transition function is not known to the learner. While online learning in classical unconstrained MDPs has received considerable attention over the last years, the setting of CMDPs is still largely unexplored. This is surprising, since in real-world applications, such as, e.g., autonomous driving, automated bidding, and recommender systems, there are usually additional constraints and specifications that an agent has to obey during the learning process. In this paper, we provide the first best-of-both-worlds algorithm for CMDPs with long-term constraints. Our algorithm is capable of handling settings in which rewards and constraints are selected either stochastically or adversarially, without requiring any knowledge of the underling process. Moreover, our algorithm matches state-of-the-art regret and constraint violation bounds for settings in which constraints are selected stochastically, while it is the first to provide guarantees in the case in which they are chosen adversarially.

In this paper, we propose a distributed algorithm to control a team of cooperating robots aiming to protect a target from a set of intruders. Specifically, we model the strategy of the defending team by means of an online optimization problem inspired by the emerging distributed aggregative framework. In particular, each defending robot determines its own position depending on (i) the relative position between an associated intruder and the target, (ii) its contribution to the barycenter of the team, and (iii) collisions to avoid with its teammates. We highlight that each agent is only aware of local, noisy measurements about the location of the associated intruder and the target. Thus, in each robot, our algorithm needs to (i) locally reconstruct global unavailable quantities and (ii) predict its current objective functions starting from the local measurements. The effectiveness of the proposed methodology is corroborated by simulations and experiments on a team of cooperating quadrotors.

Recommender systems predict what items a user will interact with next, based on their past interactions. The problem is often approached through supervised learning, but recent advancements have shifted towards policy optimization of rewards (e.g., user engagement). One challenge with the latter is policy mismatch: we are only able to train a new policy given data collected from a previously-deployed policy. The conventional way to address this problem is through importance sampling correction, but this comes with practical limitations. We suggest an alternative approach of local policy improvement without off-policy correction. Our method computes and optimizes a lower bound of expected reward of the target policy, which is easy to estimate from data and does not involve density ratios (such as those appearing in importance sampling correction). This local policy improvement paradigm is ideal for recommender systems, as previous policies are typically of decent quality and policies are updated frequently. We provide empirical evidence and practical recipes for applying our technique in a sequential recommendation setting.

We consider agents in a social network competing to be selected as partners in collaborative, mutually beneficial activities. We study this through a model in which an agent i can initiate a limited number k_i>0 of games and selects the ideal partners from its one-hop neighborhood. On the flip side it can accept as many games offered from its neighbors. Each game signifies a productive joint economic activity, and players attempt to maximize their individual utilities. Unsurprisingly, more trustworthy agents are more desirable as partners. Trustworthiness is measured by the game theoretic concept of Limited-Trust, which quantifies the maximum cost an agent is willing to incur in order to improve the net utility of all agents. Agents learn about their neighbors' trustworthiness through interactions and their behaviors evolve in response. Empirical trials performed on realistic social networks show that when given the option, many agents become highly trustworthy; most or all become highly trustworthy when knowledge of their neighbors' trustworthiness is based on past interactions rather than known a priori. This trustworthiness is not the result of altruism, instead agents are intrinsically motivated to become trustworthy partners by competition. Two insights are presented: first, trustworthy behavior drives an increase in the utility of all agents, where maintaining a relatively modest level of trustworthiness may easily improve net utility by as much as 14.5%. If only one agent exhibits modest trust among self-centered ones, it can increase its average utility by up to 25% in certain cases! Second, and counter-intuitively, when partnership opportunities are abundant agents become less trustworthy.

Effective multi-robot teams require the ability to move to goals in complex environments in order to address real-world applications such as search and rescue. Multi-robot teams should be able to operate in a completely decentralized manner, with individual robot team members being capable of acting without explicit communication between neighbors. In this paper, we propose a novel game theoretic model that enables decentralized and communication-free navigation to a goal position. Robots each play their own distributed game by estimating the behavior of their local teammates in order to identify behaviors that move them in the direction of the goal, while also avoiding obstacles and maintaining team cohesion without collisions. We prove theoretically that generated actions approach a Nash equilibrium, which also corresponds to an optimal strategy identified for each robot. We show through extensive simulations that our approach enables decentralized and communication-free navigation by a multi-robot system to a goal position, and is able to avoid obstacles and collisions, maintain connectivity, and respond robustly to sensor noise.

Recent years have witnessed significant advances in technologies and services in modern network applications, including smart grid management, wireless communication, cybersecurity as well as multi-agent autonomous systems. Considering the heterogeneous nature of networked entities, emerging network applications call for game-theoretic models and learning-based approaches in order to create distributed network intelligence that responds to uncertainties and disruptions in a dynamic or an adversarial environment. This paper articulates the confluence of networks, games and learning, which establishes a theoretical underpinning for understanding multi-agent decision-making over networks. We provide an selective overview of game-theoretic learning algorithms within the framework of stochastic approximation theory, and associated applications in some representative contexts of modern network systems, such as the next generation wireless communication networks, the smart grid and distributed machine learning. In addition to existing research works on game-theoretic learning over networks, we highlight several new angles and research endeavors on learning in games that are related to recent developments in artificial intelligence. Some of the new angles extrapolate from our own research interests. The overall objective of the paper is to provide the reader a clear picture of the strengths and challenges of adopting game-theoretic learning methods within the context of network systems, and further to identify fruitful future research directions on both theoretical and applied studies.

北京阿比特科技有限公司