Trajectory planning is a fundamental problem in robotics. It facilitates a wide range of applications in navigation and motion planning, control, and multi-agent coordination. Trajectory planning is a difficult problem due to its computational complexity and real-world environment complexity with uncertainty, non-linearity, and real-time requirements. The multi-agent trajectory planning problem adds another dimension of difficulty due to inter-agent interaction. Existing solutions are either search-based or optimization-based approaches with simplified assumptions of environment, limited planning speed, and limited scalability in the number of agents. In this work, we make the first attempt to reformulate single agent and multi-agent trajectory planning problem as query problems over an implicit neural representation of trajectories. We formulate such implicit representation as Neural Trajectory Models (NTM) which can be queried to generate nearly optimal trajectory in complex environments. We conduct experiments in simulation environments and demonstrate that NTM can solve single-agent and multi-agent trajectory planning problems. In the experiments, NTMs achieve (1) sub-millisecond panning time using GPUs, (2) almost avoiding all environment collision, (3) almost avoiding all inter-agent collision, and (4) generating almost shortest paths. We also demonstrate that the same NTM framework can also be used for trajectories correction and multi-trajectory conflict resolution refining low quality and conflicting multi-agent trajectories into nearly optimal solutions efficiently. (Open source code will be available at //github.com/laser2099/neural-trajectory-model)
Deep learning (DL) models have emerged as a powerful tool in avian bioacoustics to diagnose environmental health and biodiversity. However, inconsistencies in research pose notable challenges hindering progress in this domain. Reliable DL models need to analyze bird calls flexibly across various species and environments to fully harness the potential of bioacoustics in a cost-effective passive acoustic monitoring scenario. Data fragmentation and opacity across studies complicate a comprehensive evaluation of general model performance. To overcome these challenges, we present the BirdSet benchmark, a unified framework consolidating research efforts with a holistic approach for classifying bird vocalizations in avian bioacoustics. BirdSet harmonizes open-source bird recordings into a curated dataset collection. This unified approach provides an in-depth understanding of model performance and identifies potential shortcomings across different tasks. By establishing baseline results of current models, BirdSet aims to facilitate comparability, guide subsequent data collection, and increase accessibility for newcomers to avian bioacoustics.
Surface parameterization is a fundamental geometry processing problem with rich downstream applications. Traditional approaches are designed to operate on well-behaved mesh models with high-quality triangulations that are laboriously produced by specialized 3D modelers, and thus unable to meet the processing demand for the current explosion of ordinary 3D data. In this paper, we seek to perform UV unwrapping on unstructured 3D point clouds. Technically, we propose ParaPoint, an unsupervised neural learning pipeline for achieving global free-boundary surface parameterization by building point-wise mappings between given 3D points and 2D UV coordinates with adaptively deformed boundaries. We ingeniously construct several geometrically meaningful sub-networks with specific functionalities, and assemble them into a bi-directional cycle mapping framework. We also design effective loss functions and auxiliary differential geometric constraints for the optimization of the neural mapping process. To the best of our knowledge, this work makes the first attempt to investigate neural point cloud parameterization that pursues both global mappings and free boundaries. Experiments demonstrate the effectiveness and inspiring potential of our proposed learning paradigm. The code will be publicly available.
A promising approach for improving the performance of vision-language models like CLIP for image classification is to extend the class descriptions (i.e., prompts) with related attributes, e.g., using brown sparrow instead of sparrow. However, current zero-shot methods select a subset of attributes regardless of commonalities between the target classes, potentially providing no useful information that would have helped to distinguish between them. For instance, they may use color instead of bill shape to distinguish between sparrows and wrens, which are both brown. We propose Follow-up Differential Descriptions (FuDD), a zero-shot approach that tailors the class descriptions to each dataset and leads to additional attributes that better differentiate the target classes. FuDD first identifies the ambiguous classes for each image, and then uses a Large Language Model (LLM) to generate new class descriptions that differentiate between them. The new class descriptions resolve the initial ambiguity and help predict the correct label. In our experiments, FuDD consistently outperforms generic description ensembles and naive LLM-generated descriptions on 12 datasets. We show that differential descriptions are an effective tool to resolve class ambiguities, which otherwise significantly degrade the performance. We also show that high quality natural language class descriptions produced by FuDD result in comparable performance to few-shot adaptation methods.
In this paper, we introduce a new flow-based method for global optimization of Lipschitz functions, called Stein Boltzmann Sampling (SBS). Our method samples from the Boltzmann distribution that becomes asymptotically uniform over the set of the minimizers of the function to be optimized. Candidate solutions are sampled via the \emph{Stein Variational Gradient Descent} algorithm. We prove the asymptotic convergence of our method, introduce two SBS variants, and provide a detailed comparison with several state-of-the-art global optimization algorithms on various benchmark functions. The design of our method, the theoretical results, and our experiments, suggest that SBS is particularly well-suited to be used as a continuation of efficient global optimization methods as it can produce better solutions while making a good use of the budget.
We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts arise naturally in collaborative data curation settings where multiple experts work independently and then aim to put their efforts together to improve and accelerate data cleaning. The key idea of our approach is to model conflicting updates as a formal \emph{argumentation framework}(AF). Such argumentation frameworks can be automatically analyzed and solved by translating them to a logic program $P_{AF}$ whose declarative semantics yield a transparent solution with many desirable properties, e.g., uncontroversial updates are accepted, unjustified ones are rejected, and the remaining ambiguities are exposed and presented to users for further analysis. After motivating the problem, we introduce our approach and illustrate it with a detailed running example introducing both well-founded and stable semantics to help understand the AF solutions. We have begun to develop open source tools and Jupyter notebooks that demonstrate the practicality of our approach. In future work we plan to develop a toolkit for conflict resolution that can be used in conjunction with OpenRefine, a popular interactive data cleaning tool.
Robotic collectives for military and disaster response applications require coalition formation algorithms to partition robots into appropriate task teams. Collectives' missions will often incorporate tasks that require multiple high-level robot behaviors or services, which coalition formation must accommodate. The highly dynamic and unstructured application domains also necessitate that coalition formation algorithms produce near optimal solutions (i.e., >95% utility) in near real-time (i.e., <5 minutes) with very large collectives (i.e., hundreds of robots). No previous coalition formation algorithm satisfies these requirements. An initial evaluation found that traditional auction-based algorithms' runtimes are too long, even though the centralized simulator incorporated ideal conditions unlikely to occur in real-world deployments (i.e., synchronization across robots and perfect, instantaneous communication). The hedonic game-based GRAPE algorithm can produce solutions in near real-time, but cannot be applied to multiple service collectives. This manuscript integrates GRAPE and a services model, producing GRAPE-S and Pair-GRAPE-S. These algorithms and two auction baselines were evaluated using a centralized simulator with up to 1000 robots, and via the largest distributed coalition formation simulated evaluation to date, with up to 500 robots. The evaluations demonstrate that auctions transfer poorly to distributed collectives, resulting in excessive runtimes and low utility solutions. GRAPE-S satisfies the target domains' coalition formation requirements, producing near optimal solutions in near real-time, and Pair-GRAPE-S more than satisfies the domain requirements, producing optimal solutions in near real-time. GRAPE-S and Pair-GRAPE-S are the first algorithms demonstrated to support near real-time coalition formation for very large, distributed collectives with multiple services.
As a primary means of information acquisition, information retrieval (IR) systems, such as search engines, have integrated themselves into our daily lives. These systems also serve as components of dialogue, question-answering, and recommender systems. The trajectory of IR has evolved dynamically from its origins in term-based methods to its integration with advanced neural models. While the neural models excel at capturing complex contextual signals and semantic nuances, thereby reshaping the IR landscape, they still face challenges such as data scarcity, interpretability, and the generation of contextually plausible yet potentially inaccurate responses. This evolution requires a combination of both traditional methods (such as term-based sparse retrieval methods with rapid response) and modern neural architectures (such as language models with powerful language understanding capacity). Meanwhile, the emergence of large language models (LLMs), typified by ChatGPT and GPT-4, has revolutionized natural language processing due to their remarkable language understanding, generation, generalization, and reasoning abilities. Consequently, recent research has sought to leverage LLMs to improve IR systems. Given the rapid evolution of this research trajectory, it is necessary to consolidate existing methodologies and provide nuanced insights through a comprehensive overview. In this survey, we delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers. Additionally, we explore promising directions within this expanding field.
Traffic forecasting is an important factor for the success of intelligent transportation systems. Deep learning models including convolution neural networks and recurrent neural networks have been applied in traffic forecasting problems to model the spatial and temporal dependencies. In recent years, to model the graph structures in the transportation systems as well as the contextual information, graph neural networks (GNNs) are introduced as new tools and have achieved the state-of-the-art performance in a series of traffic forecasting problems. In this survey, we review the rapidly growing body of recent research using different GNNs, e.g., graph convolutional and graph attention networks, in various traffic forecasting problems, e.g., road traffic flow and speed forecasting, passenger flow forecasting in urban rail transit systems, demand forecasting in ride-hailing platforms, etc. We also present a collection of open data and source resources for each problem, as well as future research directions. To the best of our knowledge, this paper is the first comprehensive survey that explores the application of graph neural networks for traffic forecasting problems. We have also created a public Github repository to update the latest papers, open data and source resources.
The design of deep graph models still remains to be investigated and the crucial part is how to explore and exploit the knowledge from different hops of neighbors in an efficient way. In this paper, we propose a novel RNN-like deep graph neural network architecture by incorporating AdaBoost into the computation of network; and the proposed graph convolutional network called AdaGCN~(AdaBoosting Graph Convolutional Network) has the ability to efficiently extract knowledge from high-order neighbors and integrate knowledge from different hops of neighbors into the network in an AdaBoost way. We also present the architectural difference between AdaGCN and existing graph convolutional methods to show the benefits of our proposal. Finally, extensive experiments demonstrate the state-of-the-art prediction performance and the computational advantage of our approach AdaGCN.
This paper surveys the machine learning literature and presents machine learning as optimization models. Such models can benefit from the advancement of numerical optimization techniques which have already played a distinctive role in several machine learning settings. Particularly, mathematical optimization models are presented for commonly used machine learning approaches for regression, classification, clustering, and deep neural networks as well new emerging applications in machine teaching and empirical model learning. The strengths and the shortcomings of these models are discussed and potential research directions are highlighted.