Multi-Agent Combinatorial Path Finding (MCPF) seeks collision-free paths for multiple agents from their initial to goal locations, while visiting a set of intermediate target locations in the middle of the paths. MCPF is challenging as it involves both planning collision-free paths for multiple agents and target sequencing, i.e., solving traveling salesman problems to assign targets to and find the visiting order for the agents. Recent work develops methods to address MCPF while minimizing the sum of individual arrival times at goals. Such a problem formulation may result in paths with different arrival times and lead to a long makespan, the maximum arrival time, among the agents. This paper proposes a min-max variant of MCPF, denoted as MCPF-max, that minimizes the makespan of the agents. While the existing methods (such as MS*) for MCPF can be adapted to solve MCPF-max, we further develop two new techniques based on MS* to defer the expensive target sequencing during planning to expedite the overall computation. We analyze the properties of the resulting algorithm Deferred MS* (DMS*), and test DMS* with up to 20 agents and 80 targets. We demonstrate the use of DMS* on differential-drive robots.
Multi-Agent Path Finding (MAPF) aims to arrange collision-free goal-reaching paths for a group of agents. Anytime MAPF solvers based on large neighborhood search (LNS) have gained prominence recently due to their flexibility and scalability. Neighborhood selection strategy is crucial to the success of MAPF-LNS and a flurry of methods have been proposed. However, several pitfalls exist and hinder a comprehensive evaluation of these new methods, which mainly include: 1) Lower than actual or incorrect baseline performance; 2) Lack of a unified evaluation setting and criterion; 3) Lack of a codebase or executable model for supervised learning methods. To overcome these challenges, we conduct a fair comparison across prominent methods on the same benchmark and hyperparameter search settings. Additionally, we propose a simple neighborhood selection strategy which marks a clear advancement in terms of runtime efficiency in large maps with large number of agents. Our benchmarking evaluation promotes new challenges for existing learning based methods and presents opportunities for future research when machine learning is integrated with MAPF-LNS. Code and data are available at //github.com/ChristinaTan0704/mapf-lns-benchmark.
Large Language Models (LLMs) have demonstrated their capabilities across various tasks, from language translation to complex reasoning. Understanding and predicting human behavior and biases are crucial for artificial intelligence (AI) assisted systems to provide useful assistance, yet it remains an open question whether these models can achieve this. This paper addresses this gap by leveraging the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks. These tasks involve balancing between exploitative and exploratory actions and handling delayed feedback, both essential for simulating real-life decision processes. We compare the performance of LLMs with a cognitive instance-based learning (IBL) model, which imitates human experiential decision-making. Our findings indicate that LLMs excel at rapidly incorporating feedback to enhance prediction accuracy. In contrast, the cognitive IBL model better accounts for human exploratory behaviors and effectively captures loss aversion bias, i.e., the tendency to choose a sub-optimal goal with fewer step-cost penalties rather than exploring to find the optimal choice, even with limited experience. The results highlight the benefits of integrating LLMs with cognitive architectures, suggesting that this synergy could enhance the modeling and understanding of complex human decision-making patterns.
Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs.
Continual Learning (CL) is crucial for enabling networks to dynamically adapt as they learn new tasks sequentially, accommodating new data and classes without catastrophic forgetting. Diverging from conventional perspectives on CL, our paper introduces a new perspective wherein forgetting could actually benefit the sequential learning paradigm. Specifically, we present BiasPruner, a CL framework that intentionally forgets spurious correlations in the training data that could lead to shortcut learning. Utilizing a new bias score that measures the contribution of each unit in the network to learning spurious features, BiasPruner prunes those units with the highest bias scores to form a debiased subnetwork preserved for a given task. As BiasPruner learns a new task, it constructs a new debiased subnetwork, potentially incorporating units from previous subnetworks, which improves adaptation and performance on the new task. During inference, BiasPruner employs a simple task-agnostic approach to select the best debiased subnetwork for predictions. We conduct experiments on three medical datasets for skin lesion classification and chest X-Ray classification and demonstrate that BiasPruner consistently outperforms SOTA CL methods in terms of classification performance and fairness. Our code is available here.
Recent advancements in Large Language Models (LLMs) have demonstrated exceptional capabilities in complex tasks like machine translation, commonsense reasoning, and language understanding. One of the primary reasons for the adaptability of LLMs in such diverse tasks is their in-context learning (ICL) capability, which allows them to perform well on new tasks by simply using a few task samples in the prompt. Despite their effectiveness in enhancing the performance of LLMs on diverse language and tabular tasks, these methods have not been thoroughly explored for their potential to generate post hoc explanations. In this work, we carry out one of the first explorations to analyze the effectiveness of LLMs in explaining other complex predictive models using ICL. To this end, we propose a novel framework, In-Context Explainers, comprising of three novel approaches that exploit the ICL capabilities of LLMs to explain the predictions made by other predictive models. We conduct extensive analysis with these approaches on real-world tabular and text datasets and demonstrate that LLMs are capable of explaining other predictive models similar to state-of-the-art post hoc explainers, opening up promising avenues for future research into LLM-based post hoc explanations of complex predictive models.
Closed-source agents suffer from several issues such as a lack of affordability, transparency, and reproducibility, particularly on complex interactive tasks. This motivates the development of open-source alternatives. We introduce LUMOS, one of the first frameworks for training open-source LLM-based agents. LUMOS features a learnable, unified, and modular architecture with a planning module that learns high-level subgoal generation, and a grounding module trained to translate these into actions using various tools in the execution module. The design allows for modular upgrades and wider applicability to diverse interactive tasks. To foster generalizable agent learning, we collect large-scale, unified, and high-quality training annotations derived from diverse ground-truth reasoning rationales across various complex interactive tasks. On 9 datasets, LUMOS exhibits several key advantages: (1) LUMOS excels multiple larger open-source agents on the held-out datasets (unused for training) for each task type. LUMOS even surpasses GPT agents on QA and web tasks; (2) LUMOS outperforms open-source agents produced by chain-of-thoughts and unmodularized integrated training; and (3) LUMOS effectively generalizes to unseen tasks, outperforming 33B-scale agents and domain-specific agents.
Explainable Multimodal Emotion Recognition (EMER) is an emerging task that aims to achieve reliable and accurate emotion recognition. However, due to the high annotation cost, the existing dataset (denoted as EMER-Fine) is small, making it difficult to perform supervised training. To reduce the annotation cost and expand the dataset size, this paper reviews the previous dataset construction process. Then, we simplify the annotation pipeline, avoid manual checks, and replace the closed-source models with open-source models. Finally, we build \textbf{EMER-Coarse}, a coarsely-labeled dataset containing large-scale samples. Besides the dataset, we propose a two-stage training framework \textbf{AffectGPT}. The first stage exploits EMER-Coarse to learn a coarse mapping between multimodal inputs and emotion-related descriptions; the second stage uses EMER-Fine to better align with manually-checked results. Experimental results demonstrate the effectiveness of our proposed method on the challenging EMER task. To facilitate further research, we will make the code and dataset available at: //github.com/zeroQiaoba/AffectGPT.
Large Language Models (LLMs) have demonstrated remarkable capabilities in solving various tasks, yet they often struggle with comprehensively addressing complex and vague problems. Existing approaches, including multi-agent LLM systems, offer solutions to certain challenges but still require manual setup and lack scalability. To address this gap, we propose a novel approach leveraging decomposition to enable LLMs to tackle vague problems effectively. Our approach involves an orchestrating LLM that interacts with users to understand the problem and then decomposes it into tangible sub-problems. Instead of expecting the LLM to solve the entire problem in one go, we train it to ask follow-up questions to gain a deeper understanding of the user's requirements. Once the problem is adequately understood, the orchestrating LLM divides it into smaller, manageable sub-problems. Each sub-problem is then assigned to specialized LLM agents or non-LLM functions for resolution. These agents work in parallel to solve their respective sub-problems, with the orchestrating LLM overseeing the process and compiling the solutions into a comprehensive answer for the user. By adopting this decomposition approach, we alleviate the constraints imposed by token limitations on LLM outputs and empower them to provide nuanced solutions to complex and ambiguous problems. Through our approach, we aim to enable LLMs to think and operate more like humans, breaking down complex problems into manageable parts and collaboratively solving them. This not only enhances the problem-solving capabilities of LLMs but also offers a scalable and efficient method for addressing a wide range of real-world challenges.
Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of entities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.
The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.