Object manipulation has been extensively studied in the context of fixed base and mobile manipulators. However, the overactuated locomotion modality employed by snake robots allows for a unique blend of object manipulation through locomotion, referred to as loco-manipulation. The following work presents an optimization approach to solving the loco-manipulation problem based on non-impulsive implicit contact path planning for our snake robot COBRA. We present the mathematical framework and show high-fidelity simulation results and experiments to demonstrate the effectiveness of our approach.
The relay channel, consisting of a source-destination pair along with a relay, is a fundamental component of cooperative communications. While the capacity of a general relay channel remains unknown, various relaying strategies, including compress-and-forward (CF), have been proposed. In CF, the relay forwards a quantized version of its received signal to the destination. Given the correlated signals at the relay and destination, distributed compression techniques, such as Wyner--Ziv coding, can be harnessed to utilize the relay-to-destination link more efficiently. Leveraging recent advances in neural network-based distributed compression, we revisit the relay channel problem and integrate a learned task-aware Wyner--Ziv compressor into a primitive relay channel with a finite-capacity out-of-band relay-to-destination link. The resulting neural CF scheme demonstrates that our compressor recovers binning of the quantized indices at the relay, mimicking the optimal asymptotic CF strategy, although no structure exploiting the knowledge of source statistics was imposed into the design. The proposed neural CF, employing finite order modulation, operates closely to the rate achievable in a primitive relay channel with a Gaussian codebook. We showcase the advantages of exploiting the correlated destination signal for relay compression through various neural CF architectures that involve end-to-end training of the compressor and the demodulator components. Our learned task-oriented compressors provide the first proof-of-concept work toward interpretable and practical neural CF relaying schemes.
Learning-based approaches to cloth simulation have started to show their potential in recent years. However, handling collisions and intersections in neural simulations remains a largely unsolved problem. In this work, we present \moniker{}, a learning-based solution for handling intersections in neural cloth simulations. Unlike conventional approaches that critically rely on intersection-free inputs, \moniker{} robustly recovers from intersections introduced through missed collisions, self-penetrating bodies, or errors in manually designed multi-layer outfits. The technical core of \moniker{} is a novel intersection contour loss that penalizes interpenetrations and encourages rapid resolution thereof. We integrate our intersection loss with a collision-avoiding repulsion objective into a neural cloth simulation method based on graph neural networks (GNNs). We demonstrate our method's ability across a challenging set of diverse multi-layer outfits under dynamic human motions. Our extensive analysis indicates that \moniker{} significantly improves collision handling for learned simulation and produces visually compelling results.
Influence maximization (IM) is a crucial optimization task related to analyzing complex networks in the real world, such as social networks, disease propagation networks, and marketing networks. Publications to date about the IM problem focus mainly on graphs, which fail to capture high-order interaction relationships from the real world. Therefore, the use of hypergraphs for addressing the IM problem has been receiving increasing attention. However, identifying the most influential nodes in hypergraphs remains challenging, mainly because nodes and hyperedges are often strongly coupled and correlated. In this paper, to effectively identify the most influential nodes, we first propose a novel hypergraph-independent cascade model that integrates the influences of both node and hyperedge failures. Afterward, we introduce genetic algorithms (GA) to identify the most influential nodes that leverage hypergraph collective influences. In the GA-based method, the hypergraph collective influence is effectively used to initialize the population, thereby enhancing the quality of initial candidate solutions. The designed fitness function considers the joint influences of both nodes and hyperedges. This ensures the optimal set of nodes with the best influence on both nodes and hyperedges to be evaluated accurately. Moreover, a new mutation operator is designed by introducing factors, i.e., the collective influence and overlapping effects of nodes in hypergraphs, to breed high-quality offspring. In the experiments, several simulations on both synthetic and real hypergraphs have been conducted, and the results demonstrate that the proposed method outperforms the compared methods.
We define a graph-based rate optimization problem and consider its computation, which provides a unified approach to the computation of various theoretical limits, such as the (conditional) graph entropy, rate-distortion functions and capacity-cost functions with two-sided information. Our contributions are twofold. On the theoretical side, we simplify the graph-based problem by constructing explicit graph contractions in some special cases. These efforts reduce the number of decision variables in the optimization problem. Graph characterizations for rate-distortion and capacity-cost functions with two-sided information are simplified by specializing the results. On the computational side, we design an alternating minimization algorithm for the graph-based problem, which deals with the inequality constraint by a flexible multiplier update strategy. Moreover, deflation techniques are introduced, so that the computing time can be largely reduced. Theoretical analysis shows that the algorithm converges to an optimal solution. The accuracy and efficiency of the algorithm are illustrated by numerical experiments.
Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.
The proliferation of edge devices has brought Federated Learning (FL) to the forefront as a promising paradigm for decentralized and collaborative model training while preserving the privacy of clients' data. However, FL struggles with a significant performance reduction and poor convergence when confronted with Non-Independent and Identically Distributed (Non-IID) data distributions among participating clients. While previous efforts, such as client drift mitigation and advanced server-side model fusion techniques, have shown some success in addressing this challenge, they often overlook the root cause of the performance reduction - the absence of identical data accurately mirroring the global data distribution among clients. In this paper, we introduce Gen-FedSD, a novel approach that harnesses the powerful capability of state-of-the-art text-to-image foundation models to bridge the significant Non-IID performance gaps in FL. In Gen-FedSD, each client constructs textual prompts for each class label and leverages an off-the-shelf state-of-the-art pre-trained Stable Diffusion model to synthesize high-quality data samples. The generated synthetic data is tailored to each client's unique local data gaps and distribution disparities, effectively making the final augmented local data IID. Through extensive experimentation, we demonstrate that Gen-FedSD achieves state-of-the-art performance and significant communication cost savings across various datasets and Non-IID settings.
Treewidth (tw) is an important parameter that, when bounded, yields tractability for many problems. For example, graph problems expressible in Monadic Second Order (MSO) logic and QUANTIFIED SAT or, more generally, QUANTIFIED CSP, are FPT parameterized by the tw of the input's (primal) graph plus the length of the MSO-formula [Courcelle, Information & Computation 1990] and the quantifier rank [Chen, ECAI 2004], resp. The algorithms from these (meta-)results have running times whose dependence on tw is a tower of exponents. A conditional lower bound by Fichte et al. [LICS 2020] shows that, for QUANTIFIED SAT, the height of this tower is equal to the number of quantifier alternations. Lower bounds showing that at least double-exponential factors in the running time are necessary are rare: there are very few (for tw and vertex cover vc parameterizations) and they are for problems that are complete for #NP, $\Sigma_2^p$, $\Pi_2^p$, or higher levels of the polynomial hierarchy. We show, for the first time, that it is not necessary to go higher up in the polynomial hierarchy to obtain such lower bounds. We design a novel, yet simple versatile technique based on Sperner families to obtain such lower bounds and apply it to 3 problems: METRIC DIMENSION, STRONG METRIC DIMENSION, and GEODETIC SET. We prove that they do not admit $2^{2^{o(tw)}} \cdot n^{O(1)}$-time algorithms, even on bounded diameter graphs, unless the ETH fails. For STRONG METRIC DIMENSION, the lower bound holds even for vc. We complement our lower bounds with matching upper bounds.
Human learning is sensitive to rule-like structure and the curriculum of examples used for training. In tasks governed by succinct rules, learning is more robust when related examples are blocked across trials, but in the absence of such rules, interleaving is more effective. To date, no neural model has simultaneously captured these seemingly contradictory effects. Here we show that this same tradeoff spontaneously emerges with ``in-context learning'' (ICL) both in neural networks trained with metalearning and in large language models (LLMs). ICL is the ability to learn new tasks ``in context'' -- without weight changes -- via an inner-loop algorithm implemented in activation dynamics. Experiments with pretrained LLMs and metalearning transformers show that ICL exhibits the blocking advantage demonstrated in humans on a task involving rule-like structure, and conversely, that concurrent in-weight learning reproduces the interleaving advantage observed in humans on tasks lacking such structure.
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
Video instance segmentation (VIS) is the task that requires simultaneously classifying, segmenting and tracking object instances of interest in video. Recent methods typically develop sophisticated pipelines to tackle this task. Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem. Given a video clip consisting of multiple image frames as input, VisTR outputs the sequence of masks for each instance in the video in order directly. At the core is a new, effective instance sequence matching and segmentation strategy, which supervises and segments instances at the sequence level as a whole. VisTR frames the instance segmentation and tracking in the same perspective of similarity learning, thus considerably simplifying the overall pipeline and is significantly different from existing approaches. Without bells and whistles, VisTR achieves the highest speed among all existing VIS models, and achieves the best result among methods using single model on the YouTube-VIS dataset. For the first time, we demonstrate a much simpler and faster video instance segmentation framework built upon Transformers, achieving competitive accuracy. We hope that VisTR can motivate future research for more video understanding tasks.