We study online weighted bipartite matching of reusable resources where an adversarial sequence of requests for resources arrive over time. A resource that is matched is 'used' for a random duration, drawn independently from a resource-dependent distribution, after which it returns and is able to be matched again. We study the performance of the greedy policy, which matches requests to the resource that yields the highest reward. Previously, it was known that the greedy policy is 1/2 competitive against a clairvoyant benchmark that knows the request sequence in advance. In this work, we improve this result by introducing a parameter that quantifies the degree of reusability of the resources. Specifically, if p represents the smallest probability over the usage distributions that a matched resource returns in one time step, the greedy policy achieves a competitive ratio of $1/(2-p)$. Furthermore, when the usage distributions are geometric, we establish a stronger competitive ratio of $(1+p)/2$, which we demonstrate to be tight. Both of these results align with the known results in the two extreme scenarios: p = 0 corresponds to non-reusable resources, where 1/2 is known to be tight, while p = 1 corresponds to every resource returning immediately, where greedy is the optimal policy and hence the competitive ratio is 1. Finally, we show that both results are robust to approximations of the greedy policy. Our work demonstrates that the reusability of resources can enhance performance compared to the non-reusable setting, and that a simple greedy policy suffices when the degree of reusability is high. Our insights contribute to the understanding of how resource reusability can influence the performance of online algorithms, and highlight the potential for improved performance as the degree of reusability increases.
Learned optimizers are a crucial component of meta-learning. Recent advancements in scalable learned optimizers have demonstrated their superior performance over hand-designed optimizers in various tasks. However, certain characteristics of these models, such as an unstable learning curve, limited ability to handle unseen tasks and network architectures, difficult-to-control behaviours, and poor performance in fine-tuning tasks impede their widespread adoption. To tackle the issue of generalization in scalable learned optimizers, we propose a hybrid-update-based (HUB) optimization strategy inspired by recent advancements in hard prompt tuning and result selection techniques used in large language and vision models. This approach can be easily applied to any task that involves hand-designed or learned optimizer. By incorporating hand-designed optimizers as the second component in our hybrid approach, we are able to retain the benefits of learned optimizers while stabilizing the training process and, more importantly, improving testing performance. We validate our design through a total of 17 tasks, consisting of thirteen training from scratch and four fine-tuning settings. These tasks vary in model sizes, architectures, or dataset sizes, and the competing optimizers are hyperparameter-tuned. We outperform all competitors in 94% of the tasks with better testing performance. Furthermore, we conduct a theoretical analysis to examine the potential impact of our hybrid strategy on the behaviours and inherited traits of learned optimizers.
Most reinforcement learning algorithms treat the context under which they operate as a stationary, isolated and undisturbed environment. However, in the real world, the environment is constantly changing due to a variety of external influences. To address this problem, we study Markov Decision Processes (MDP) under the influence of an external temporal process. We formalize this notion and discuss conditions under which the problem becomes tractable with suitable solutions. We propose a policy iteration algorithm to solve this problem and theoretically analyze its performance.
This paper explores the implicit bias of overparameterized neural networks of depth greater than two layers. Our framework considers a family of networks of varying depths that all have the same capacity but different implicitly defined representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function; it reflects the function space bias associated with the architecture. Our results show that adding linear layers to a ReLU network yields a representation cost that favors functions that can be approximated by a low-rank linear operator composed with a function with low representation cost using a two-layer network. Specifically, using a neural network to fit training data with minimum representation cost yields an interpolating function that is nearly constant in directions orthogonal to a low-dimensional subspace. This means that the learned network will approximately be a single- or multiple-index model. Our experiments show that when this active subspace structure exists in the data, adding linear layers can improve generalization and result in a network that is well-aligned with the true active subspace.
In this work we initiate the study of buy-and-sell prophet inequalities. We start by considering what is arguably the most fundamental setting. In this setting the online algorithm observes a sequence of prices one after the other. At each time step, the online algorithm can decide to buy and pay the current price if it does not hold the item already; or it can decide to sell and collect the current price as a reward if it holds the item. We show that for i.i.d. prices a single-threshold online algorithm achieves at least $1/2$ of the expected profit of the optimal offline algorithm and we prove that this is optimal. For non-i.i.d. prices in random order, where prices are no longer independent, we give a single-threshold online algorithm that achieves at least a $1/16$ fraction of the expected profit of the optimal offline algorithm. We also show that for this setting no online algorithm can yield a better than $1/3$ approximation, and thus establish a formal separation from the i.i.d. case. On the other hand, we present a threshold-based online algorithm for this setting that yields a $1/2-o(1)$ approximation. For non-i.i.d. prices no approximation is possible. We use the results for these base cases to solve a variety of more complex settings. For instance, we show a $1/2-o(1)$ approximation for settings where prices are affiliated and the online algorithm has only access to a single sample. We also extend our upper and lower bounds for the single item case to $k$ items, and thus in particular show that it is impossible to achieve $1-o(1)$ approximations. For the budgeted version, where fractions of an item can be bought, and gains can be reinvested, we show a constant-factor approximation to the optimal offline algorithm's growth rate. In a setting with $k$ item types and price streams, we achieve a $\Omega(1/k)$ approximation for the unit-capacity case, which is optimal.
We introduce a random recursive tree model with two communities, called balanced community modulated random recursive tree, or BCMRT in short. In this setting, pairs of nodes of different type appear sequentially. Each one of them decides independently to attach to their own type with probability 1-q, or to the other type with probability q, and then chooses its parent uniformly within the set of existing nodes with the selected type. We find that the limiting degree distributions coincide for different q. Therefore, as far as inference is concerned, other statistics have to be studied. We first consider the setting where the time-labels of the nodes, i.e. their time of arrival, are observed but their type is not. In this setting, we design a consistent estimator for q and provide bounds for the feasibility of testing between two different values of q. Moreover, we show that if q is small enough, then it is possible to cluster in a way correlated with the true partition, even though the algorithm is exponential in time. In the unlabelled setting, i.e. when only the tree structure is observed, we show that it is possible to test between different values of q in a strictly better way than by random guessing. This follows from a delicate analysis of the sum-of-distances statistic.
Multi-Agent Reinforcement Learning (MARL) is an increasingly important research field that can model and control multiple large-scale autonomous systems. Despite its achievements, existing multi-agent learning methods typically involve expensive computations in terms of training time and power arising from large observation-action space and a huge number of training steps. Therefore, a key challenge is understanding and characterizing the computationally intensive functions in several popular classes of MARL algorithms during their training phases. Our preliminary experiments reveal new insights into the key modules of MARL algorithms that limit the adoption of MARL in real-world systems. We explore neighbor sampling strategy to improve cache locality and observe performance improvement ranging from 26.66% (3 agents) to 27.39% (12 agents) during the computationally intensive mini-batch sampling phase. Additionally, we demonstrate that improving the locality leads to an end-to-end training time reduction of 10.2% (for 12 agents) compared to existing multi-agent algorithms without significant degradation in the mean reward.
Deep kernel processes are a recently introduced class of deep Bayesian models that have the flexibility of neural networks, but work entirely with Gram matrices. They operate by alternately sampling a Gram matrix from a distribution over positive semi-definite matrices, and applying a deterministic transformation. When the distribution is chosen to be Wishart, the model is called a deep Wishart process (DWP). This particular model is of interest because its prior is equivalent to a deep Gaussian process (DGP) prior, but at the same time it is invariant to rotational symmetries, leading to a simpler posterior distribution. Practical inference in the DWP was made possible in recent work ("A variational approximate posterior for the deep Wishart process" Ober and Aitchison 2021a) where the authors used a generalisation of the Bartlett decomposition of the Wishart distribution as the variational approximate posterior. However, predictive performance in that paper was less impressive than one might expect, with the DWP only beating a DGP on a few of the UCI datasets used for comparison. In this paper, we show that further generalising their distribution to allow linear combinations of rows and columns in the Bartlett decomposition results in better predictive performance, while incurring negligible additional computation cost.
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.
When learning tasks over time, artificial neural networks suffer from a problem known as Catastrophic Forgetting (CF). This happens when the weights of a network are overwritten during the training of a new task causing forgetting of old information. To address this issue, we propose MetA Reusable Knowledge or MARK, a new method that fosters weight reusability instead of overwriting when learning a new task. Specifically, MARK keeps a set of shared weights among tasks. We envision these shared weights as a common Knowledge Base (KB) that is not only used to learn new tasks, but also enriched with new knowledge as the model learns new tasks. Key components behind MARK are two-fold. On the one hand, a metalearning approach provides the key mechanism to incrementally enrich the KB with new knowledge and to foster weight reusability among tasks. On the other hand, a set of trainable masks provides the key mechanism to selectively choose from the KB relevant weights to solve each task. By using MARK, we achieve state of the art results in several popular benchmarks, surpassing the best performing methods in terms of average accuracy by over 10% on the 20-Split-MiniImageNet dataset, while achieving almost zero forgetfulness using 55% of the number of parameters. Furthermore, an ablation study provides evidence that, indeed, MARK is learning reusable knowledge that is selectively used by each task.
Click-through rate (CTR) prediction plays a critical role in recommender systems and online advertising. The data used in these applications are multi-field categorical data, where each feature belongs to one field. Field information is proved to be important and there are several works considering fields in their models. In this paper, we proposed a novel approach to model the field information effectively and efficiently. The proposed approach is a direct improvement of FwFM, and is named as Field-matrixed Factorization Machines (FmFM, or $FM^2$). We also proposed a new explanation of FM and FwFM within the FmFM framework, and compared it with the FFM. Besides pruning the cross terms, our model supports field-specific variable dimensions of embedding vectors, which acts as soft pruning. We also proposed an efficient way to minimize the dimension while keeping the model performance. The FmFM model can also be optimized further by caching the intermediate vectors, and it only takes thousands of floating-point operations (FLOPs) to make a prediction. Our experiment results show that it can out-perform the FFM, which is more complex. The FmFM model's performance is also comparable to DNN models which require much more FLOPs in runtime.