亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The concept of Nash equilibrium enlightens the structure of rational behavior in multi-agent settings. However, the concept is as helpful as one may compute it efficiently. We introduce the Cut-and-Play, an algorithm to compute Nash equilibria for non-cooperative simultaneous games where each player's objective is linear in their variables and bilinear in the other players' variables. Using the rich theory of integer programming, we alternate between constructing (i.) increasingly tighter outer approximations of the convex hull of each player's feasible set -- by using branching and cutting plane methods -- and (ii.) increasingly better inner approximations of these hulls -- by finding extreme points and rays of the convex hulls. In particular, we prove the correctness of our algorithm when these convex hulls are polyhedra. Our algorithm allows us to leverage the mixed integer programming technology to compute equilibria for a large class of games. Further, we integrate existing cutting plane families inside the algorithm, significantly speeding up equilibria computation. We showcase a set of extensive computational results for Integer Programming Games and simultaneous games among bilevel leaders. In both cases, our framework outperforms the state-of-the-art in computing time and solution quality.

相關內容

Financial networks model a set of financial institutions (firms) interconnected by obligations. Recent work has introduced to this model a class of obligations called credit default swaps, a certain kind of financial derivatives. The main computational challenge for such systems is known as the clearing problem, which is to determine which firms are in default and to compute their exposure to systemic risk, technically known as their recovery rates. It is known that the recovery rates form the set of fixed points of a simple function, and that these fixed points can be irrational. Furthermore, Schuldenzucker et al. (2016) have shown that finding a weakly (or "almost") approximate (rational) fixed point is PPAD-complete. We further study the clearing problem from the point of view of irrationality and approximation strength. Firstly, we observe that weakly approximate solutions may misrepresent the actual financial state of an institution. On this basis, we study the complexity of finding a strongly (or "near") approximate solution, and show FIXP-completeness. We then study the structural properties required for irrationality, and we give necessary conditions for irrational solutions to emerge: The presence of certain types of cycles in a financial network forces the recovery rates to take the form of roots of non-linear polynomials. In the absence of a large subclass of such cycles, we study the complexity of finding an exact fixed point, which we show to be a problem close to, albeit outside of, PPAD.

An $n$-person game is specified by $n$ tensors of the same format. We view its equilibria as points in that tensor space. Dependency equilibria are defined by linear constraints on conditional probabilities, and thus by determinantal quadrics in the tensor entries. These equations cut out the Spohn variety, named after the philosopher who introduced dependency equilibria. The Nash equilibria among these are the tensors of rank one. We study the real algebraic geometry of the Spohn variety. This variety is rational, except for $2 \times 2$ games, when it is an elliptic curve. For $3 \times 2$ games, it is a del Pezzo surface of degree two. We characterize the payoff regions and their boundaries using oriented matroids, and we develop the connection to Bayesian networks in statistics.

The two squares theorem of Fermat is a gem in number theory, with a spectacular one-sentence "proof from the Book". Here is a formalisation of this proof, with an interpretation using windmill patterns. The theory behind involves involutions on a finite set, especially the parity of the number of fixed points in the involutions. Starting as an existence proof that is non-constructive, there is an ingenious way to turn it into a constructive one. This gives an algorithm to compute the two squares by iterating the two involutions alternatively from a known fixed point.

While many works exploiting an existing Lie group structure have been proposed for state estimation, in particular the Invariant Extended Kalman Filter (IEKF), few papers address the construction of a group structure that allows casting a given system into the IEKF framework, namely making the dynamics group affine and the observations invariant. In this paper we introduce a large class of systems encompassing most problems involving a navigating vehicle encountered in practice. For those systems we introduce a novel methodology that systematically provides a group structure for the state space, including vectors of the body frame such as biases. We use it to derive observers having properties akin to those of linear observers or filters. The proposed unifying and versatile framework encompasses all systems where IEKF has proved successful, improves state-of-the art "imperfect" IEKF for inertial navigation with sensor biases, and allows addressing novel examples, like GNSS antenna lever arm estimation.

We develop two adaptive discretization algorithms for convex semi-infinite optimization, which terminate after finitely many iterations at approximate solutions of arbitrary precision. In particular, they terminate at a feasible point of the considered optimization problem. Compared to the existing finitely feasible algorithms for general semi-infinite optimization problems, our algorithms work with considerably smaller discretizations and are thus computationally favorable. Also, our algorithms terminate at approximate solutions of arbitrary precision, while for general semi-infinite optimization problems the best possible approximate-solution precision can be arbitrarily bad. All occurring finite optimization subproblems in our algorithms have to be solved only approximately, and continuity is the only regularity assumption on our objective and constraint functions. Applications to parametric and non-parametric regression problems under shape constraints are discussed.

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.

We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent's actions and update its belief of their hidden state in an online manner. We evaluate this approach on three different tasks and show that the agents are able to learn better policies using their estimate of the other players' hidden states, in both cooperative and adversarial settings.

In this paper, we propose a listwise approach for constructing user-specific rankings in recommendation systems in a collaborative fashion. We contrast the listwise approach to previous pointwise and pairwise approaches, which are based on treating either each rating or each pairwise comparison as an independent instance respectively. By extending the work of (Cao et al. 2007), we cast listwise collaborative ranking as maximum likelihood under a permutation model which applies probability mass to permutations based on a low rank latent score matrix. We present a novel algorithm called SQL-Rank, which can accommodate ties and missing data and can run in linear time. We develop a theoretical framework for analyzing listwise ranking methods based on a novel representation theory for the permutation model. Applying this framework to collaborative ranking, we derive asymptotic statistical rates as the number of users and items grow together. We conclude by demonstrating that our SQL-Rank method often outperforms current state-of-the-art algorithms for implicit feedback such as Weighted-MF and BPR and achieve favorable results when compared to explicit feedback algorithms such as matrix factorization and collaborative ranking.

Generative adversarial networks (GANs) evolved into one of the most successful unsupervised techniques for generating realistic images. Even though it has recently been shown that GAN training converges, GAN models often end up in local Nash equilibria that are associated with mode collapse or otherwise fail to model the target distribution. We introduce Coulomb GANs, which pose the GAN learning problem as a potential field of charged particles, where generated samples are attracted to training set samples but repel each other. The discriminator learns a potential field while the generator decreases the energy by moving its samples along the vector (force) field determined by the gradient of the potential field. Through decreasing the energy, the GAN model learns to generate samples according to the whole target distribution and does not only cover some of its modes. We prove that Coulomb GANs possess only one Nash equilibrium which is optimal in the sense that the model distribution equals the target distribution. We show the efficacy of Coulomb GANs on a variety of image datasets. On LSUN and celebA, Coulomb GANs set a new state of the art and produce a previously unseen variety of different samples.

北京阿比特科技有限公司