Multi-Agent Path Finding (MAPF) is NP-hard to solve optimally, even on graphs, suggesting no polynomial-time algorithms can compute exact optimal solutions for them. This raises a natural question: How optimal can polynomial-time algorithms reach? Whereas algorithms for computing constant-factor optimal solutions have been developed, the constant factor is generally very large, limiting their application potential. In this work, among other breakthroughs, we propose the first low-polynomial-time MAPF algorithms delivering $1$-$1.5$ (resp., $1$-$1.67$) asymptotic makespan optimality guarantees for 2D (resp., 3D) grids for random instances at a very high $1/3$ agent density, with high probability. Moreover, when regularly distributed obstacles are introduced, our methods experience no performance degradation. These methods generalize to support $100\%$ agent density. Regardless of the dimensionality and density, our high-quality methods are enabled by a unique hierarchical integration of two key building blocks. At the higher level, we apply the labeled Grid Rearrangement Algorithm (RTA), capable of performing efficient reconfiguration on grids through row/column shuffles. At the lower level, we devise novel methods that efficiently simulate row/column shuffles returned by RTA. Our implementations of RTA-based algorithms are highly effective in extensive numerical evaluations, demonstrating excellent scalability compared to other SOTA methods. For example, in 3D settings, \rta-based algorithms readily scale to grids with over $370,000$ vertices and over $120,000$ agents and consistently achieve conservative makespan optimality approaching $1.5$, as predicted by our theoretical analysis.
We study the hardness of approximating two reconfiguration problems. One problem is Maxmin $k$-Cut Reconfiguration, which is a reconfiguration analogue of Max $k$-Cut. The other is Maxmin E$k$-SAT Reconfiguration, which is a reconfiguration analogue of Max E$k$-SAT. The Probabilistically Checkable Reconfiguration Proof theorem due to Hirahara and Ohsaka (STOC 2024) and Karthik C. S. and Manurangsi (2023) implies that Maxmin 4-Cut Reconfiguration and Maxmin E3-SAT Reconfiguration are PSPACE-hard to approximate within a constant factor. However, the asymptotic behavior of approximability for these problems with respect to $k$ is not well understood. In this paper, we present the following hardness-of-approximation results and approximation algorithms for Maxmin $k$-Cut Reconfiguration and Maxmin E$k$-SAT Reconfiguration: $\bullet$ For every $k \geq 2$, Maxmin $k$-Cut Reconfiguration is PSPACE-hard to approximate within a factor of $1 - \Omega\left(\frac{1}{k}\right)$, whereas it can be approximated within a factor of $1-\frac{2}{k}$. Our lower and upper bounds demonstrate that Maxmin $k$-Cut Reconfiguration exhibits the asymptotically same approximability as Max $k$-Cut. $\bullet$ For every $k \geq 3$, Maxmin E$k$-SAT Reconfiguration is PSPACE-hard (resp. NP-hard) to approximate within a factor of $1-\Omega\left(\frac{1}{9^{\sqrt{k}}}\right)$ (resp. $1-\frac{1}{8k}$). On the other hand, it admits a deterministic $\left(1-\frac{2.5}{k}\right)$-factor approximation algorithm, implying that Maxmin E$k$-SAT Reconfiguration displays an asymptotically approximation threshold different from Max E$k$-SAT.
Gradient-based trajectory optimization with B-spline curves is widely used for unmanned aerial vehicles (UAVs) due to its fast convergence and continuous trajectory generation. However, the application of B-spline curves for path-velocity coupled trajectory planning in autonomous vehicles (AVs) has been highly limited because it is challenging to reduce the over-approximation of the vehicle shape and to create a collision-free trajectory using B-spline curves while satisfying kinodynamic constraints. To address these challenges, this paper proposes novel disc-type swept volume (SV), incremental path flattening (IPF), and kinodynamic feasibility penalty methods. The disc-type SV estimation method is a new technique to reduce SV over-approximation and is used to find collision points for IPF. In IPF, the collision points are used to push the trajectory away from obstacles and to iteratively increase the curvature weight, thereby reducing SV and generating a collision-free trajectory. Additionally, to satisfy kinodynamic constraints for AVs using B-spline curves, we apply a clamped B-spline curvature penalty along with longitudinal and lateral velocity and acceleration penalties. Our experimental results demonstrate that our method outperforms state-of-the-art baselines in various simulated environments. We also conducted a real-world experiment using an AV, and our results validate the simulated tracking performance of the proposed approach.
Unsupervised Domain Adaptation (UDA) is crucial for reducing the need for extensive manual data annotation when training deep networks on point cloud data. A significant challenge of UDA lies in effectively bridging the domain gap. To tackle this challenge, we propose \textbf{C}urvature \textbf{D}iversity-Driven \textbf{N}uclear-Norm Wasserstein \textbf{D}omain Alignment (CDND). Our approach first introduces a \textit{\textbf{Curv}ature Diversity-driven Deformation \textbf{Rec}onstruction (CurvRec)} task, which effectively mitigates the gap between the source and target domains by enabling the model to extract salient features from semantically rich regions of a given point cloud. We then propose \textit{\textbf{D}eformation-based \textbf{N}uclear-norm \textbf{W}asserstein \textbf{D}iscrepancy (D-NWD)}, which applies the Nuclear-norm Wasserstein Discrepancy to both \textit{deformed and original} data samples to align the source and target domains. Furthermore, we contribute a theoretical justification for the effectiveness of D-NWD in distribution alignment and demonstrate that it is \textit{generic} enough to be applied to \textbf{any} deformations. To validate our method, we conduct extensive experiments on two public domain adaptation datasets for point cloud classification and segmentation tasks. Empirical experiment results show that our CDND achieves state-of-the-art performance by a noticeable margin over existing approaches.
The Optimal Transport (OT) problem investigates a transport map that connects two distributions while minimizing a given cost function. Finding such a transport map has diverse applications in machine learning, such as generative modeling and image-to-image translation. In this paper, we introduce a scalable and simulation-free approach for solving the Entropic Unbalanced Optimal Transport (EUOT) problem. We derive the dynamical form of this EUOT problem, which is a generalization of the Schr\"odinger bridges (SB) problem. Based on this, we derive dual formulation and optimality conditions of the EUOT problem from the stochastic optimal control interpretation. By leveraging these properties, we propose a simulation-free algorithm to solve EUOT, called Simulation-free EUOT (SF-EUOT). While existing SB models require expensive simulation costs during training and evaluation, our model achieves simulation-free training and one-step generation by utilizing the reciprocal property. Our model demonstrates significantly improved scalability in generative modeling and image-to-image translation tasks compared to previous SB methods.
Designing sample-efficient and computationally feasible reinforcement learning (RL) algorithms is particularly challenging in environments with large or infinite state and action spaces. In this paper, we advance this effort by presenting an efficient algorithm for Markov Decision Processes (MDPs) where the state-action value function of any policy is linear in a given feature map. This challenging setting can model environments with infinite states and actions, strictly generalizes classic linear MDPs, and currently lacks a computationally efficient algorithm under online access to the MDP. Specifically, we introduce a new RL algorithm that efficiently finds a near-optimal policy in this setting, using a number of episodes and calls to a cost-sensitive classification (CSC) oracle that are both polynomial in the problem parameters. Notably, our CSC oracle can be efficiently implemented when the feature dimension is constant, representing a clear improvement over state-of-the-art methods, which require solving non-convex problems with horizon-many variables and can incur computational costs that are exponential in the horizon.
The Euler Characteristic Transform (ECT) is an efficiently-computable geometrical-topological invariant that characterizes the global shape of data. In this paper, we introduce the Local Euler Characteristic Transform ($\ell$-ECT), a novel extension of the ECT particularly designed to enhance expressivity and interpretability in graph representation learning. Unlike traditional Graph Neural Networks (GNNs), which may lose critical local details through aggregation, the $\ell$-ECT provides a lossless representation of local neighborhoods. This approach addresses key limitations in GNNs by preserving nuanced local structures while maintaining global interpretability. Moreover, we construct a rotation-invariant metric based on $\ell$-ECTs for spatial alignment of data spaces. Our method exhibits superior performance than standard GNNs on a variety of node classification tasks, particularly in graphs with high heterophily.
Remarkable progress in the development of Deep Learning Weather Prediction (DLWP) models positions them to become competitive with traditional numerical weather prediction (NWP) models. Indeed, a wide number of DLWP architectures -- based on various backbones, including U-Net, Transformer, Graph Neural Network (GNN), and Fourier Neural Operator (FNO) -- have demonstrated their potential at forecasting atmospheric states. However, due to differences in training protocols, forecast horizons, and data choices, it remains unclear which (if any) of these methods and architectures are most suitable for weather forecasting and for future model development. Here, we step back and provide a detailed empirical analysis, under controlled conditions, comparing and contrasting the most prominent DLWP models, along with their backbones. We accomplish this by predicting synthetic two-dimensional incompressible Navier-Stokes and real-world global weather dynamics. In terms of accuracy, memory consumption, and runtime, our results illustrate various tradeoffs. For example, on synthetic data, we observe favorable performance of FNO; and on the real-world WeatherBench dataset, our results demonstrate the suitability of ConvLSTM and SwinTransformer for short-to-mid-ranged forecasts. For long-ranged weather rollouts of up to 365 days, we observe superior stability and physical soundness in architectures that formulate a spherical data representation, i.e., GraphCast and Spherical FNO. In addition, we observe that all of these model backbones "saturate," i.e., none of them exhibit so-called neural scaling, which highlights an important direction for future work on these and related models. The code is available at //github.com/amazon-science/dlwp-benchmark.
Density ratio estimation (DRE) is a fundamental machine learning technique for capturing relationships between two probability distributions. State-of-the-art DRE methods estimate the density ratio using neural networks trained with loss functions derived from variational representations of $f$-divergence. However, existing methods face optimization challenges, such as overfitting due to lower-unbounded loss functions, biased mini-batch gradients, vanishing training loss gradients, and high sample requirements for Kullback-Leibler (KL) divergence loss functions. To address these issues, we focus on $\alpha$-divergence, which provides a suitable variational representation of $f$-divergence. Subsequently, a novel loss function for DRE, the $\alpha$-divergence loss function ($\alpha$-Div), is derived. $\alpha$-Div is concise but offers stable and effective optimization for DRE. The boundedness of $\alpha$-divergence provides the potential for successful DRE with data exhibiting high KL-divergence. Our numerical experiments demonstrate the effectiveness in optimization using $\alpha$-Div. However, the experiments also show that the proposed loss function offers no significant advantage over the KL-divergence loss function in terms of RMSE for DRE. This indicates that the accuracy of DRE is primarily determined by the amount of KL-divergence in the data and is less dependent on $\alpha$-divergence.
As a promising paradigm to collaboratively train models with decentralized data, Federated Learning (FL) can be exploited to fine-tune Large Language Models (LLMs). While LLMs correspond to huge size, the scale of the training data significantly increases, which leads to tremendous amounts of computation and communication costs. The training data is generally non-Independent and Identically Distributed (non-IID), which requires adaptive data processing within each device. Although Low Rank Adaptation (LoRA) can significantly reduce the scale of parameters to update in the fine-tuning process, it still takes unaffordable time to transfer the low-rank parameters of all the layers in LLMs. In this paper, we propose a Fisher Information-based Efficient Curriculum Federated Learning framework (FibecFed) with two novel methods, i.e., adaptive federated curriculum learning and efficient sparse parameter update. First, we propose a fisher information-based method to adaptively sample data within each device to improve the effectiveness of the FL fine-tuning process. Second, we dynamically select the proper layers for global aggregation and sparse parameters for local update with LoRA so as to improve the efficiency of the FL fine-tuning process. Extensive experimental results based on 10 datasets demonstrate that FibecFed yields excellent performance (up to 45.35% in terms of accuracy) and superb fine-tuning speed (up to 98.61% faster) compared with 17 baseline approaches).
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.