We consider allocating indivisible chores among agents with different cost functions, such that all agents receive a cost of at most a constant factor times their maximin share. The state-of-the-art was presented in In EC 2021 by Huang and Lu. They presented a non-polynomial-time algorithm, called HFFD, that attains an 11/9 approximation, and a polynomial-time algorithm that attains a 5/4 approximation. In this paper, we show that HFFD can be reduced to an algorithm called MultiFit, developed by Coffman, Garey and Johnson in 1978 for makespan minimization in job scheduling. Using this reduction, we prove that the approximation ratio of HFFD is in fact equal to that of MultiFit, which is known to be 13/11 in general, 20/17 for n at most 7, and 15/13 for n=3. Moreover, we develop an algorithm for (13/11+epsilon)-maximin-share allocation for any epsilon>0, with run-time polynomial in the problem size and 1/epsilon. For n=3, we can improve the algorithm to find a 15/13-maximin-share allocation with run-time polynomial in the problem size. Thus, we have practical algorithms that attain the best known approximation to maximin-share chore allocation.
Planning has been part of the core pursuit for artificial intelligence since its conception, but earlier AI agents mostly focused on constrained settings because many of the cognitive substrates necessary for human-level planning have been lacking. Recently, language agents powered by large language models (LLMs) have shown interesting capabilities such as tool use and reasoning. Are these language agents capable of planning in more complex settings that are out of the reach of prior AI agents? To advance this investigation, we propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario. It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans. Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%. Language agents struggle to stay on task, use the right tools to collect information, or keep track of multiple constraints. However, we note that the mere possibility for language agents to tackle such a complex problem is in itself non-trivial progress. TravelPlanner provides a challenging yet meaningful testbed for future language agents.
Image captioning models are typically trained by treating all samples equally, neglecting to account for mismatched or otherwise difficult data points. In contrast, recent work has shown the effectiveness of training models by scheduling the data using curriculum learning strategies. This paper contributes to this direction by actively curating difficult samples in datasets without increasing the total number of samples. We explore the effect of using three data curation methods within the training process: complete removal of an sample, caption replacement, or image replacement via a text-to-image generation model. Experiments on the Flickr30K and COCO datasets with the BLIP and BEiT-3 models demonstrate that these curation methods do indeed yield improved image captioning models, underscoring their efficacy.
We study a cooperative multi-agent bandit setting in the distributed GOSSIP model: in every round, each of $n$ agents chooses an action from a common set, observes the action's corresponding reward, and subsequently exchanges information with a single randomly chosen neighbor, which informs its policy in the next round. We introduce and analyze several families of fully-decentralized local algorithms in this setting under the constraint that each agent has only constant memory. We highlight a connection between the global evolution of such decentralized algorithms and a new class of "zero-sum" multiplicative weights update methods, and we develop a general framework for analyzing the population-level regret of these natural protocols. Using this framework, we derive sublinear regret bounds for both stationary and adversarial reward settings. Moreover, we show that these simple local algorithms can approximately optimize convex functions over the simplex, assuming that the reward distributions are generated from a stochastic gradient oracle.
One well motivated explanation method for classifiers leverages counterfactuals which are hypothetical events identical to real observations in all aspects except for one categorical feature. Constructing such counterfactual poses specific challenges for texts, however, as some attribute values may not necessarily align with plausible real-world events. In this paper we propose a simple method for generating counterfactuals by intervening in the space of text representations which bypasses this limitation. We argue that our interventions are minimally disruptive and that they are theoretically sound as they align with counterfactuals as defined in Pearl's causal inference framework. To validate our method, we first conduct experiments on a synthetic dataset of counterfactuals, allowing for a direct comparison between classifier predictions based on ground truth counterfactuals (obtained through explicit text interventions) and our counterfactuals, derived through interventions in the representation space. Second, we study a real world scenario where our counterfactuals can be leveraged both for explaining a classifier and for bias mitigation.
Allocating indivisible items among a set of agents is a frequently studied discrete optimization problem. In the setting considered in this work, the agents' preferences over the items are assumed to be identical. We consider a very recent measure for the overall quality of an allocation which does not rely on numerical valuations of the items. Instead, it captures the agents' opinion by a directed acyclic preference graph with vertices representing items. An arc $(a,b)$ in such a graph means that the agents prefer item $a$ over item $b$. For a given allocation of items the dissatisfaction of an agent is defined as the number of items which the agent does not receive and for which no more preferred item is given to the agent. Our goal is to find an efficient allocation of the items to the agents such that the total dissatisfaction over all agents is minimized. We explore the dichotomy between NP-hard and polynomially solvable instances, depending on properties of the underlying preference graph. While the problem is NP-hard already for three agents even on very restricted graph classes, it is polynomially solvable for two agents on general preference graphs. For an arbitrary number of agents, we derive polynomial-time algorithms for relevant restrictions of the underlying undirected graph. These are trees and, among the graphs of treewidth two, series-parallel graphs and cactus graphs.
We consider the task of allocating indivisible items to agents, when the agents' preferences over the items are identical. The preferences are captured by means of a directed acyclic graph, with vertices representing items and an edge $(a,b)$, meaning that each of the agents prefers item $a$ over item $b$. The dissatisfaction of an agent is measured by the number of items that the agent does not receive and for which it also does not receive any more preferred item. The aim is to allocate the items to the agents in a fair way, i.e., to minimize the maximum dissatisfaction among the agents. We study the status of computational complexity of that problem and establish the following dichotomy: the problem is NP-hard for the case of at least three agents, even on fairly restricted graphs, but polynomially solvable for two agents. We also provide several polynomial-time results with respect to different underlying graph structures, such as graphs of width at most two and tree-like structures such as stars and matchings. These findings are complemented with fixed parameter tractability results related to path modules and independent set modules. Techniques employed in the paper include bottleneck assignment problem, greedy algorithm, dynamic programming, maximum network flow, and integer linear programming.
Object detection and multiple object tracking (MOT) are essential components of self-driving systems. Accurate detection and uncertainty quantification are both critical for onboard modules, such as perception, prediction, and planning, to improve the safety and robustness of autonomous vehicles. Collaborative object detection (COD) has been proposed to improve detection accuracy and reduce uncertainty by leveraging the viewpoints of multiple agents. However, little attention has been paid to how to leverage the uncertainty quantification from COD to enhance MOT performance. In this paper, as the first attempt to address this challenge, we design an uncertainty propagation framework called MOT-CUP. Our framework first quantifies the uncertainty of COD through direct modeling and conformal prediction, and propagates this uncertainty information into the motion prediction and association steps. MOT-CUP is designed to work with different collaborative object detectors and baseline MOT algorithms. We evaluate MOT-CUP on V2X-Sim, a comprehensive collaborative perception dataset, and demonstrate a 2% improvement in accuracy and a 2.67X reduction in uncertainty compared to the baselines, e.g. SORT and ByteTrack. In scenarios characterized by high occlusion levels, our MOT-CUP demonstrates a noteworthy $4.01\%$ improvement in accuracy. MOT-CUP demonstrates the importance of uncertainty quantification in both COD and MOT, and provides the first attempt to improve the accuracy and reduce the uncertainty in MOT based on COD through uncertainty propagation. Our code is public on //coperception.github.io/MOT-CUP/.
Machine teaching often involves the creation of an optimal (typically minimal) dataset to help a model (referred to as the `student') achieve specific goals given by a teacher. While abundant in the continuous domain, the studies on the effectiveness of machine teaching in the discrete domain are relatively limited. This paper focuses on machine teaching in the discrete domain, specifically on manipulating student models' predictions based on the goals of teachers via changing the training data efficiently. We formulate this task as a combinatorial optimization problem and solve it by proposing an iterative searching algorithm. Our algorithm demonstrates significant numerical merit in the scenarios where a teacher attempts at correcting erroneous predictions to improve the student's models, or maliciously manipulating the model to misclassify some specific samples to the target class aligned with his personal profits. Experimental results show that our proposed algorithm can have superior performance in effectively and efficiently manipulating the predictions of the model, surpassing conventional baselines.
A community reveals the features and connections of its members that are different from those in other communities in a network. Detecting communities is of great significance in network analysis. Despite the classical spectral clustering and statistical inference methods, we notice a significant development of deep learning techniques for community detection in recent years with their advantages in handling high dimensional network data. Hence, a comprehensive overview of community detection's latest progress through deep learning is timely to both academics and practitioners. This survey devises and proposes a new taxonomy covering different categories of the state-of-the-art methods, including deep learning-based models upon deep neural networks, deep nonnegative matrix factorization and deep sparse filtering. The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders. The survey also summarizes the popular benchmark data sets, model evaluation metrics, and open-source implementations to address experimentation settings. We then discuss the practical applications of community detection in various domains and point to implementation scenarios. Finally, we outline future directions by suggesting challenging topics in this fast-growing deep learning field.
Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.