The study of market equilibria is central to economic theory, particularly in efficiently allocating scarce resources. However, the computation of equilibrium prices at which the supply of goods matches their demand typically relies on having access to complete information on private attributes of agents, e.g., suppliers' cost functions, which are often unavailable in practice. Motivated by this practical consideration, we consider the problem of setting equilibrium prices in the incomplete information setting wherein a market operator seeks to satisfy the customer demand for a commodity by purchasing the required amount from competing suppliers with privately known cost functions unknown to the market operator. In this incomplete information setting, we consider the online learning problem of learning equilibrium prices over time while jointly optimizing three performance metrics -- unmet demand, cost regret, and payment regret -- pertinent in the context of equilibrium pricing over a horizon of $T$ periods. We first consider the setting when suppliers' cost functions are fixed and develop algorithms that achieve a regret of $O(\log \log T)$ when the customer demand is constant over time, or $O(\sqrt{T} \log \log T)$ when the demand is variable over time. Next, we consider the setting when the suppliers' cost functions can vary over time and illustrate that no online algorithm can achieve sublinear regret on all three metrics when the market operator has no information about how the cost functions change over time. Thus, we consider an augmented setting wherein the operator has access to hints/contexts that, without revealing the complete specification of the cost functions, reflect the variation in the cost functions over time and propose an algorithm with sublinear regret in this augmented setting.
Inverse problems are in many cases solved with optimization techniques. When the underlying model is linear, first-order gradient methods are usually sufficient. With nonlinear models, due to nonconvexity, one must often resort to second-order methods that are computationally more expensive. In this work we aim to approximate a nonlinear model with a linear one and correct the resulting approximation error. We develop a sequential method that iteratively solves a linear inverse problem and updates the approximation error by evaluating it at the new solution. This treatment convexifies the problem and allows us to benefit from established convex optimization methods. We separately consider cases where the approximation is fixed over iterations and where the approximation is adaptive. In the fixed case we show theoretically under what assumptions the sequence converges. In the adaptive case, particularly considering the special case of approximation by first-order Taylor expansion, we show that with certain assumptions the sequence converges to a critical point of the original nonconvex functional. Furthermore, we show that with quadratic objective functions the sequence corresponds to the Gauss-Newton method. Finally, we showcase numerical results superior to the conventional model correction method. We also show, that a fixed approximation can provide competitive results with considerable computational speed-up.
The Nash Equilibrium (NE) estimation in bidding games of electricity markets is the key concern of both generation companies (GENCOs) for bidding strategy optimization and the Independent System Operator (ISO) for market surveillance. However, existing methods for NE estimation in emerging modern electricity markets (FEM) are inaccurate and inefficient because the priori knowledge of bidding strategies before any environment changes, such as load demand variations, network congestion, and modifications of market design, is not fully utilized. In this paper, a Bayes-adaptive Markov Decision Process in FEM (BAMDP-FEM) is therefore developed to model the GENCOs' bidding strategy optimization considering the priori knowledge. A novel Multi-Agent Generative Adversarial Imitation Learning algorithm (MAGAIL-FEM) is then proposed to enable GENCOs to learn simultaneously from priori knowledge and interactions with changing environments. The obtained NE is a Bayesian Nash Equilibrium (BNE) with priori knowledge transferred from the previous environment. In the case study, the superiority of this proposed algorithm in terms of convergence speed compared with conventional methods is verified. It is concluded that the optimal bidding strategies in the obtained BNE can always lead to more profits than NE due to the effective learning from the priori knowledge. Also, BNE is more accurate and consistent with situations in real-world markets.
The accuracy requirements in many scientific computing workloads result in the use of double-precision floating-point arithmetic in the execution kernels. Nevertheless, emerging real-number representations, such as posit arithmetic, show promise in delivering even higher accuracy in such computations. In this work, we explore the native use of 64-bit posits in a series of numerical benchmarks extracted from the PolyBench collection and compare their timing performance, accuracy and hardware cost to IEEE 754 doubles. For this, we extend the PERCIVAL RISC-V core and the Xposit custom RISC-V extension with posit64 and quire operations. Results show that posit64 can execute as fast as doubles, while also obtaining up to 4 orders of magnitude lower mean square error and up to 3 orders of magnitude lower maximum absolute error. However, leveraging the quire accumulator register can limit the order of some operations such as matrix multiplications. Furthermore, detailed FPGA synthesis results highlight the significant hardware cost of 64-bit posit arithmetic and quire. Despite this, the large accuracy improvements achieved with the same memory bandwidth suggest that posit arithmetic may provide a potential alternative representation for scientific computing.
Social learning refers to the process by which networked strategic agents learn an unknown state of the world by observing private state-related signals as well as other agents' actions. In their classic work, Bikhchandani, Hirshleifer and Welch showed that information cascades occur in social learning, in which agents blindly follow others' behavior, and consequently, the actions in a cascade reveal no further information about the state. In this paper, we consider the introduction of an information coordinator to mitigate information cascades. The coordinator commits to a mechanism, which is a contract that agents may choose to accept or not. If an agent enters the mechanism, she pays a fee, and sends a message to the coordinator indicating her private signal (not necessarily truthfully). The coordinator, in turn, suggests an action to the agents according to his knowledge and interest. We study a class of mechanisms that possess properties such as individual rationality for agents (i.e., agents are willing to enter), truth telling, and profit maximization for the coordinator. We prove that the coordinator, without loss of optimality, can adopt a summary-based mechanism that depends on the complete observation history through an appropriate sufficient statistic. Furthermore, we show the existence of a mechanism which strictly improves social welfare, and results in strictly positive profit, so that such a mechanism is acceptable for both agents and the coordinator, and is beneficial to the agent community. Finally, we analyze the performance of this mechanism and show significant gains on both aforementioned metrics.
The Invertible Bloom Lookup Table (IBLT) is a probabilistic data structure for set representation, with applications in network and traffic monitoring. It is known for its ability to list its elements, an operation that succeeds with high probability for sufficiently large table. However, listing can fail even for relatively small sets. This paper extends recent work on the worst-case analysis of IBLT, which guarantees successful listing for all sets of a certain size, by introducing more general IBLT schemes. These schemes allow for greater freedom in the implementation of the insert, delete, and listing operations and demonstrate that the IBLT memory can be reduced while still maintaining successful listing guarantees. The paper also explores the time-memory trade-off of these schemes, some of which are based on linear codes and \(B_h\)-sequences over finite fields.
Deep neural network based recommendation systems have achieved great success as information filtering techniques in recent years. However, since model training from scratch requires sufficient data, deep learning-based recommendation methods still face the bottlenecks of insufficient data and computational inefficiency. Meta-learning, as an emerging paradigm that learns to improve the learning efficiency and generalization ability of algorithms, has shown its strength in tackling the data sparsity issue. Recently, a growing number of studies on deep meta-learning based recommenddation systems have emerged for improving the performance under recommendation scenarios where available data is limited, e.g. user cold-start and item cold-start. Therefore, this survey provides a timely and comprehensive overview of current deep meta-learning based recommendation methods. Specifically, we propose a taxonomy to discuss existing methods according to recommendation scenarios, meta-learning techniques, and meta-knowledge representations, which could provide the design space for meta-learning based recommendation methods. For each recommendation scenario, we further discuss technical details about how existing methods apply meta-learning to improve the generalization ability of recommendation models. Finally, we also point out several limitations in current research and highlight some promising directions for future research in this area.
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.
Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERC-oriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on "emotion shift" frequency within a conversation, then the conversations are scheduled in an "easy to hard" schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model's ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.
Effective multi-robot teams require the ability to move to goals in complex environments in order to address real-world applications such as search and rescue. Multi-robot teams should be able to operate in a completely decentralized manner, with individual robot team members being capable of acting without explicit communication between neighbors. In this paper, we propose a novel game theoretic model that enables decentralized and communication-free navigation to a goal position. Robots each play their own distributed game by estimating the behavior of their local teammates in order to identify behaviors that move them in the direction of the goal, while also avoiding obstacles and maintaining team cohesion without collisions. We prove theoretically that generated actions approach a Nash equilibrium, which also corresponds to an optimal strategy identified for each robot. We show through extensive simulations that our approach enables decentralized and communication-free navigation by a multi-robot system to a goal position, and is able to avoid obstacles and collisions, maintain connectivity, and respond robustly to sensor noise.
This paper aims to mitigate straggler effects in synchronous distributed learning for multi-agent reinforcement learning (MARL) problems. Stragglers arise frequently in a distributed learning system, due to the existence of various system disturbances such as slow-downs or failures of compute nodes and communication bottlenecks. To resolve this issue, we propose a coded distributed learning framework, which speeds up the training of MARL algorithms in the presence of stragglers, while maintaining the same accuracy as the centralized approach. As an illustration, a coded distributed version of the multi-agent deep deterministic policy gradient(MADDPG) algorithm is developed and evaluated. Different coding schemes, including maximum distance separable (MDS)code, random sparse code, replication-based code, and regular low density parity check (LDPC) code are also investigated. Simulations in several multi-robot problems demonstrate the promising performance of the proposed framework.