三级电影一区二区三区_欧美精品在欧美一区二区少妇_免费A级片在线观看_午夜宅男影院在线看网站_少妇一区二区三区免费AV_日本中文字幕精品一区二区_国产色欲女人乱免费视频

In the literature on simultaneous non-cooperative games, it is well-known that a positive affine (linear) transformation (PAT) of the utility payoffs do not change the best response sets and the Nash equilibrium set. PATs have been successfully used to expand the classes of 2-player games for which we can compute a Nash equilibrium in polynomial time. We investigate which game transformations other than PATs also possess one of the following properties: (i) The game transformation shall not change the Nash equilibrium set when being applied on an arbitrary game. (ii) The game transformation shall not change the best response sets when being applied on an arbitrary game. First, we prove that property (i) implies property (ii). Over a series of further results, we derive that game transformations with property (ii) must be positive affine. Therefore, we obtained two new and equivalent characterisations with game theoretic meaning for what it means to be a positive affine transformation. All our results in particular hold for the 2-player case of bimatrix games.

相關內容

納(na)什均衡

關注 1

平滑 · 代價函數 · 非凸 · Networking · 泛函 ·

2021 年 12 月 31 日

Distributed Random Reshuffling over Networks

Kun Huang,Xiao Li,Andre Milzarek,Shi Pu,Junwen Qiu

from arxiv, 28 pages, 5 figures

In this paper, we consider the distributed optimization problem where $n$ agents, each possessing a local cost function, collaboratively minimize the average of the local cost functions over a connected network. To solve the problem, we propose a distributed random reshuffling (D-RR) algorithm that combines the classical distributed gradient descent (DGD) method and Random Reshuffling (RR). We show that D-RR inherits the superiority of RR for both smooth strongly convex and smooth nonconvex objective functions. In particular, for smooth strongly convex objective functions, D-RR achieves $\mathcal{O}(1/T^2)$ rate of convergence (here, $T$ counts the total number of iterations) in terms of the squared distance between the iterate and the unique minimizer. When the objective function is assumed to be smooth nonconvex and has Lipschitz continuous component functions, we show that D-RR drives the squared norm of gradient to $0$ at a rate of $\mathcal{O}(1/T^{2/3})$. These convergence results match those of centralized RR (up to constant factors).

模型平均 · 支持向量機 · MoDELS · 頻率主義學派 · 模型選擇 ·

2021 年 12 月 30 日

Optimal Model Averaging of Support Vector Machines in Diverging Model Spaces

Chaoxia Yuan,Chao Ying,Zhou Yu,Fang Fang

from arxiv, On page 7 of the paper, the condition description has some problems and needs to be revised

Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet no frequentist model averaging method was considered for SVM. This work aims to fill the gap and to propose a frequentist model averaging procedure for SVM which selects the optimal weight by cross validation. Even when the number of covariates diverges at an exponential rate of the sample size, we show asymptotic optimality of the proposed method in the sense that the ratio of its hinge loss to the lowest possible loss converges to one. We also derive the convergence rate which provides more insights to model averaging. Compared to model selection methods of SVM which require a tedious but critical task of tuning parameter selection, the model averaging method avoids the task and shows promising performances in the empirical studies.

簇 · 近似 · 確切的 · 聚類方法 · state-of-the-art ·

2021 年 12 月 29 日

A sampling-based approach for efficient clustering in large datasets

Georgios Exarchakis,Omar Oubari,Gregor Lenz

from arxiv, 10 pages, 5 figures, 1 table, an open source implementation of the algorithm is provided in the //github.com/OOub/peregrine

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our contribution is substantially more efficient than k-means as it does not require an all to all comparison of data points and clusters. We show that the optimal solutions of our approximation are the same as in the exact solution. However, our approach is considerably more efficient at extracting these clusters compared to the state-of-the-art. We compare our approximation with the exact k-means and alternative approximation approaches on a series of standardised clustering tasks. For the evaluation, we consider the algorithmic complexity, including number of operations to convergence, and the stability of the results.

線性的 · 閾值 · 矩陣論 · 可辨認的 · 均勻采樣 ·

2021 年 12 月 29 日

On Local Convergence of Iterative Hard Thresholding for Matrix Completion

Trung Vu,Raviv Raich

from arxiv, 14 pages in double-column format

Iterative hard thresholding (IHT) has gained in popularity over the past decades in large-scale optimization. However, convergence properties of this method have only been explored recently in non-convex settings. In matrix completion, existing works often focus on the guarantee of global convergence of IHT via standard assumptions such as incoherence property and uniform sampling. While such analysis provides a global upper bound on the linear convergence rate, it does not describe the actual performance of IHT in practice. In this paper, we provide a novel insight into the local convergence of a specific variant of IHT for matrix completion. We uncover the exact linear rate of IHT in a closed-form expression and identify the region of convergence in which the algorithm is guaranteed to converge. Furthermore, we utilize random matrix theory to study the linear rate of convergence of IHTSVD for large-scale matrix completion. We find that asymptotically, the rate can be expressed in closed form in terms of the relative rank and the sampling rate. Finally, we present various numerical results to verify the aforementioned theoretical analysis.

可辨認的 · 博弈論 ·

2021 年 12 月 28 日

Stable decompositions of coalition formation games

Agustín G. Bonifacio,Elena Inarra,Pablo Neme

It is known that a coalition formation game may not have a stable coalition structure. In this study we propose a new solution concept for these games, which we call "stable decomposition", and show that each game has at least one. This solution consists of a collection of coalitions organized in sets that "protect" each other in a stable way. When sets of this collection are singletons, the stable decomposition can be identified with a stable coalition structure. As an application, we study convergence to stability in coalition formation games.

博弈論 · Performance · MoDELS · 學成 · 平滑 ·

2020 年 12 月 15 日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Stefanos Leonardos,Georgios Piliouras

from arxiv, Appears in the 35th AAAI Conference on Artificial Intelligence

Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.

多樣性 · 優化器 · MoDELS · 潛在 · 正則化項 ·

2019 年 2 月 28 日

Jointly Optimizing Diversity and Relevance in Neural Response Generation

Xiang Gao,Sungjin Lee,Yizhe Zhang,Chris Brockett,Michel Galley,Jianfeng Gao,Bill Dolan

from arxiv, Long paper accepted at NAACL 2019

Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a method to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.

Lipschitz · 判別函數 · GANs · 判別器 · WGAN ·

2019 年 2 月 15 日

Lipschitz Generative Adversarial Nets

Zhiming Zhou,Jiadong Liang,Yuxuan Song,Lantao Yu,Hongwei Wang,Weinan Zhang,Yong Yu,Zhihua Zhang

from arxiv, Under review by the International Conference on Machine Learning (ICML 2019)

In this paper we study the convergence of generative adversarial networks (GANs) from the perspective of the informativeness of the gradient of the optimal discriminative function. We show that GANs without restriction on the discriminative function space commonly suffer from the problem that the gradient produced by the discriminator is uninformative to guide the generator. By contrast, Wasserstein GAN (WGAN), where the discriminative function is restricted to $1$-Lipschitz, does not suffer from such a gradient uninformativeness problem. We further show in the paper that the model with a compact dual form of Wasserstein distance, where the Lipschitz condition is relaxed, also suffers from this issue. This implies the importance of Lipschitz condition and motivates us to study the general formulation of GANs with Lipschitz constraint, which leads to a new family of GANs that we call Lipschitz GANs (LGANs). We show that LGANs guarantee the existence and uniqueness of the optimal discriminative function as well as the existence of a unique Nash equilibrium. We prove that LGANs are generally capable of eliminating the gradient uninformativeness problem. According to our empirical analysis, LGANs are more stable and generate consistently higher quality samples compared with WGAN.

學成 · 控制器 · 強化學習 · Performer · SimPLe ·

2018 年 12 月 15 日

Residual Policy Learning

Tom Silver,Kelsey Allen,Josh Tenenbaum,Leslie Kaelbling

We present Residual Policy Learning (RPL): a simple method for improving nondifferentiable policies using model-free deep reinforcement learning. RPL thrives in complex robotic manipulation tasks where good but imperfect controllers are available. In these tasks, reinforcement learning from scratch remains data-inefficient or intractable, but learning a residual on top of the initial controller can yield substantial improvement. We study RPL in five challenging MuJoCo tasks involving partial observability, sensor noise, model misspecification, and controller miscalibration. By combining learning with control algorithms, RPL can perform long-horizon, sparse-reward tasks for which reinforcement learning alone fails. Moreover, we find that RPL consistently and substantially improves on the initial controllers. We argue that RPL is a promising approach for combining the complementary strengths of deep reinforcement learning and robotic control, pushing the boundaries of what either can achieve independently.

獎勵函數 · 線性的 · 強化學習 · 學成 · 值迭代 ·

2018 年 4 月 22 日

Logically-Constrained Reinforcement Learning

Mohammadhosein Hasanbeig,Alessandro Abate,Daniel Kroening

This paper proposes a Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Buchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, our algorithm synthesizes a policy that satisfies the linear time property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.