This article explores which parameters of the repeated Prisoner's Dilemma lead to cooperation. Using simulations, I demonstrate that the potential function of the stochastic evolutionary dynamics of the Grim Trigger strategy is useful to predict cooperation between Q-learners. The frontier separating the parameter spaces that induce either cooperation or defection can be determined based on the kinetic energy exerted by the respective basins of attraction. When the incentive compatibility constraint of the Grim Trigger strategy is slack, a sudden increase in the observed cooperation rates occurs when the ratio of the kinetic energies approaches a critical value, which itself is a function of the discount factor, multiplied by a correction factor to account for the effect of the algorithms' exploration probability. Using metadata from laboratory experiments, I provide evidence that the insights obtained from the simulations are also useful to explain the emergence of cooperation between humans. The observed cooperation rates show a positive gradient at the frontier characterized by an exploration probability of approximately five percent. In the context of human-to-human interaction, the exploration probability can be viewed as the belief about the opponent's probability to deviate from the equilibrium action.
Artificial learners often behave differently from human learners in the context of neural agent-based simulations of language emergence and change. The lack of appropriate cognitive biases in these learners is one of the prevailing explanations. However, it has also been proposed that more naturalistic settings of language learning and use could lead to more human-like results. In this work, we investigate the latter account focusing on the word-order/case-marking trade-off, a widely attested language universal which has proven particularly difficult to simulate. We propose a new Neural-agent Language Learning and Communication framework (NeLLCom) where pairs of speaking and listening agents first learn a given miniature language through supervised learning, and then optimize it for communication via reinforcement learning. Following closely the setup of earlier human experiments, we succeed in replicating the trade-off with the new framework without hard-coding any learning bias in the agents. We see this as an essential step towards the investigation of language universals with neural learners.
In sprite the state-of-the-art, significantly reducing carbon footprint (CF) in communications systems remains urgent. We address this challenge in the context of edge computing. The carbon intensity of electricity supply largely varies spatially as well as temporally. This, together with energy sharing via a battery management system (BMS), justifies the potential of CF-oriented task offloading, by redistributing the computational tasks in time and space. In this paper, we consider optimal task scheduling and offloading, as well as battery charging to minimize the total CF. We formulate this CF minimization problem as an integer linear programming model. However, we demonstrate that, via a graph-based reformulation, the problem can be cast as a minimum-cost flow problem. This finding reveals that global optimum can be admitted in polynomial time. Numerical results using real-world data show that optimization can reduce up to 83.3% of the total CF.
Most of the work in auction design literature assumes that bidders behave rationally based on the information available for each individual auction. However, in today's online advertising markets, one of the most important real-life applications of auction design, the data and computational power required to bid optimally are only available to the auction designer, and an advertiser can only participate by setting performance objectives (clicks, conversions, etc.) for the campaign. In this paper, we focus on value-maximizing campaigns with return-on-investment (ROI) constraints, which is widely adopted in many global-scale auto-bidding platforms. Through theoretical analysis and empirical experiments on both synthetic and realistic data, we find that second price auction exhibits counterintuitive behaviors in the resulted equilibrium and loses its dominant theoretical advantages in single-item scenarios. At the market scale, the equilibrium structure is complicated and opens up space for bidders and even auctioneers to exploit. We also explore the broader impacts of the auto-bidding mechanism beyond efficiency and strategyproofness. In particular, the multiplicity of equilibria and the input sensitivity make advertisers' utilities unstable. In addition, the interference among both bidders and goods introduces bias into A/B testing, which hinders the development of even non-bidding components of the platform. The aforementioned phenomena have been widely observed in practice, and our results indicate that one of the reasons might be intrinsic to the underlying auto-bidding mechanism. To deal with these challenges, we provide suggestions and potential solutions for practitioners.
In the past, citizen identity has been used within siloed data areas, and where government agencies have linked citizens across different services. Often these identifiers were simple alphanumeric strings, and which were assigned by government agencies. These identifiers are then linked in some way to the citizen, and where citizens often have to request access to documents that prove certain aspects of their citizenship. These systems, too, often use paper-based approaches and have little in the way of real digital trust. But, in an information age, we now have the ability to provide unique digital identifiers for each citizen, and then for them to claim access to their citizenship documents. This might be in the form of their academic qualifications, their tax status, or even their driver's licence. While, at one time, these documents were either held by the trusted issuers of the information, or in a paper form, we now have the opportunity for these documents to be linked to a citizen wallet. This would allow citizens the opportunity to request documents once, but use them many times. A core part of this is the unique private key associated with the citizen, and in the usage of digital signing by trusted entities. While many countries have struggled to implement a digital identity scheme, the EU Commission has the ambition to provide every EU citizen with a digital wallet, and thus moved towards improved freedom of movement and integration of the countries within the EU. The scale of this cannot be underestimated, and it could break down the barriers that have been created by legacy systems. In order to harmonise the integration of both citizens and trusted signers, the EU Commission proposes the usage of EBSI (European Blockchain Services Infrastructure).
Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instruction on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a problem is particularly challenging due to the dependent data generated in the online environment, the unknown optimal policy, and the complex exploration and exploitation trade-off in the adaptive experiment. In this paper, we aim to overcome these difficulties in policy evaluation for online learning. We explicitly derive the probability of exploration that quantifies the probability of exploring the non-optimal actions under commonly used bandit algorithms. We use this probability to conduct valid inference on the online conditional mean estimator under each action and develop the doubly robust interval estimation (DREAM) method to infer the value under the estimated optimal policy in online learning. The proposed value estimator provides double protection on the consistency and is asymptotically normal with a Wald-type confidence interval provided. Extensive simulations and real data applications are conducted to demonstrate the empirical validity of the proposed DREAM method.
In mobile computation offloading (MCO), mobile devices (MDs) can choose to either execute tasks locally or to have them executed on a remote edge server (ES). This paper addresses the problem of assigning both the wireless communication bandwidth needed, along with the ES capacity that is used for the task execution, so that task completion time constraints are satisfied. The objective is to obtain these allocations so that the average power consumption of the mobile devices is minimized, subject to a cost budget constraint. The paper includes contributions for both soft and hard task completion deadline constraints. The problems are first formulated as mixed integer nonlinear programs (MINLPs). Approximate solutions are then obtained by decomposing the problems into a collection of convex subproblems that can be efficiently solved. Results are presented that demonstrate the quality of the proposed solutions, which can achieve near optimum performance over a wide range of system parameters.
Several machine learning (ML) applications are characterized by searching for an optimal solution to a complex task. The search space for this optimal solution is often very large, so large in fact that this optimal solution is often not computable. Part of the problem is that many candidate solutions found via ML are actually infeasible and have to be discarded. Restricting the search space to only the feasible solution candidates simplifies finding an optimal solution for the tasks. Further, the set of feasible solutions could be re-used in multiple problems characterized by different tasks. In particular, we observe that complex tasks can be decomposed into subtasks and corresponding skills. We propose to learn a reusable and transferable skill by training an actor to generate all feasible actions. The trained actor can then propose feasible actions, among which an optimal one can be chosen according to a specific task. The actor is trained by interpreting the feasibility of each action as a target distribution. The training procedure minimizes a divergence of the actor's output distribution to this target. We derive the general optimization target for arbitrary f-divergences using a combination of kernel density estimates, resampling, and importance sampling. We further utilize an auxiliary critic to reduce the interactions with the environment. A preliminary comparison to related strategies shows that our approach learns to visit all the modes in the feasible action space, demonstrating the framework's potential for learning skills that can be used in various downstream tasks.
Along with the massive growth of the Internet from the 1990s until now, various innovative technologies have been created to bring users breathtaking experiences with more virtual interactions in cyberspace. Many virtual environments with thousands of services and applications, from social networks to virtual gaming worlds, have been developed with immersive experience and digital transformation, but most are incoherent instead of being integrated into a platform. In this context, metaverse, a term formed by combining meta and universe, has been introduced as a shared virtual world that is fueled by many emerging technologies, such as fifth-generation networks and beyond, virtual reality, and artificial intelligence (AI). Among such technologies, AI has shown the great importance of processing big data to enhance immersive experience and enable human-like intelligence of virtual agents. In this survey, we make a beneficial effort to explore the role of AI in the foundation and development of the metaverse. We first deliver a preliminary of AI, including machine learning algorithms and deep learning architectures, and its role in the metaverse. We then convey a comprehensive investigation of AI-based methods concerning six technical aspects that have potentials for the metaverse: natural language processing, machine vision, blockchain, networking, digital twin, and neural interface, and being potential for the metaverse. Subsequently, several AI-aided applications, such as healthcare, manufacturing, smart cities, and gaming, are studied to be deployed in the virtual worlds. Finally, we conclude the key contribution of this survey and open some future research directions in AI for the metaverse.
Artificial Intelligence (AI) is rapidly becoming integrated into military Command and Control (C2) systems as a strategic priority for many defence forces. The successful implementation of AI is promising to herald a significant leap in C2 agility through automation. However, realistic expectations need to be set on what AI can achieve in the foreseeable future. This paper will argue that AI could lead to a fragility trap, whereby the delegation of C2 functions to an AI could increase the fragility of C2, resulting in catastrophic strategic failures. This calls for a new framework for AI in C2 to avoid this trap. We will argue that antifragility along with agility should form the core design principles for AI-enabled C2 systems. This duality is termed Agile, Antifragile, AI-Enabled Command and Control (A3IC2). An A3IC2 system continuously improves its capacity to perform in the face of shocks and surprises through overcompensation from feedback during the C2 decision-making cycle. An A3IC2 system will not only be able to survive within a complex operational environment, it will also thrive, benefiting from the inevitable shocks and volatility of war.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.