As multi-robot systems continue to advance and become integral to various applications, managing conflicts and ensuring secure access control are critical challenges that need to be addressed. Access control is essential in multi-robot systems to ensure secure and authorized interactions among robots, protect sensitive data, and prevent unauthorized access to resources. This paper presents a novel framework for customizable conflict resolution and attribute-based access control in multi-robot systems for ROS 2 leveraging the Hyperledger Fabric blockchain. We introduce an attribute-based access control (ABAC) Fabric-ROS 2 bridge to enable secure communication and control between users and robots. By defining conflict resolution policies based on task priorities, robot capabilities, and user-defined constraints, our framework offers a flexible way to resolve conflicts. Additionally, it incorporates attribute-based access control, granting access rights based on user and robot attributes. ABAC offers a modular approach to control access compared to existing access control approaches in ROS 2, such as SROS2. Through this framework, multi-robot systems can be managed efficiently, securely, and adaptably, ensuring controlled access to resources and managing conflicts. Our experimental evaluation shows that our framework marginally improves latency and throughput over exiting Fabric and ROS 2 integration solutions. At higher network load, it is the only solution to operate reliably without a diverging transaction commitment latency. We also demonstrate how conflicts arising from simultaneous control or a robot by two users are resolved in real-time and motion distortion is effectively eliminated.
Sharding is essential for improving blockchain scalability. Existing protocols overlook diverse adversarial attacks, limiting transaction throughput. This paper presents Reticulum, a groundbreaking sharding protocol addressing this issue, boosting blockchain scalability. Reticulum employs a two-phase approach, adapting transaction throughput based on runtime adversarial attacks. It comprises "control" and "process" shards in two layers. Process shards contain at least one trustworthy node, while control shards have a majority of trusted nodes. In the first phase, transactions are written to blocks and voted on by nodes in process shards. Unanimously accepted blocks are confirmed. In the second phase, blocks without unanimous acceptance are voted on by control shards. Blocks are accepted if the majority votes in favor, eliminating first-phase opponents and silent voters. Reticulum uses unanimous voting in the first phase, involving fewer nodes, enabling more parallel process shards. Control shards finalize decisions and resolve disputes. Experiments confirm Reticulum's innovative design, providing high transaction throughput and robustness against various network attacks, outperforming existing sharding protocols for blockchain networks.
Autonomy advances have enabled robots in diverse environments and close human interaction, necessitating controllers with formal safety guarantees. This paper introduces an experimental platform designed for the validation and demonstration of a novel class of Control Barrier Functions (CBFs) tailored for Unmanned Ground Vehicles (UGVs) to proactively prevent collisions with kinematic obstacles by integrating the concept of collision cones. While existing CBF formulations excel with static obstacles, extensions to torque/acceleration-controlled unicycle and bicycle models have seen limited success. Conventional CBF applications in nonholonomic UGV models have demonstrated control conservatism, particularly in scenarios where steering/thrust control was deemed infeasible. Drawing inspiration from collision cones in path planning, we present a pioneering CBF formulation ensuring theoretical safety guarantees for both unicycle and bicycle models. The core premise revolves around aligning the obstacle's velocity away from the vehicle, establishing a constraint to perpetually avoid vectors directed towards it. This control methodology is rigorously validated through simulations and experimental verification on the Copernicus mobile robot (Unicycle Model) and FOCAS-Car (Bicycle Model).
Large pre-trained models are essential in paralinguistic systems, demonstrating effectiveness in tasks like emotion recognition and stuttering detection. In this paper, we employ large pre-trained models for the ACM Multimedia Computational Paralinguistics Challenge, addressing the Requests and Emotion Share tasks. We explore audio-only and hybrid solutions leveraging audio and text modalities. Our empirical results consistently show the superiority of the hybrid approaches over the audio-only models. Moreover, we introduce a Bayesian layer as an alternative to the standard linear output layer. The multimodal fusion approach achieves an 85.4% UAR on HC-Requests and 60.2% on HC-Complaints. The ensemble model for the Emotion Share task yields the best rho value of .614. The Bayesian wav2vec2 approach, explored in this study, allows us to easily build ensembles, at the cost of fine-tuning only one model. Moreover, we can have usable confidence values instead of the usual overconfident posterior probabilities.
A growing area of research investigates augmenting language models with tools (e.g., search engines, calculators) to overcome their shortcomings (e.g., missing or incorrect knowledge, incorrect logical inferences). Various few-shot tool-usage strategies have been proposed. However, there is no systematic and fair comparison across different strategies, or between these strategies and strong baselines that do not leverage tools. We conduct an extensive empirical analysis, finding that (1) across various datasets, example difficulty levels, and models, strong no-tool baselines are competitive to tool-assisted strategies, implying that effectively using tools with in-context demonstrations is a difficult unsolved problem; (2) for knowledge-retrieval tasks, strategies that *refine* incorrect outputs with tools outperform strategies that retrieve relevant information *ahead of* or *during generation*; (3) tool-assisted strategies are expensive in the number of tokens they require to work -- incurring additional costs by orders of magnitude -- which does not translate into significant improvement in performance. Overall, our findings suggest that few-shot tool integration is still an open challenge, emphasizing the need for comprehensive evaluations of future strategies to accurately assess their *benefits* and *costs*.
Continual domain shift poses a significant challenge in real-world applications, particularly in situations where labeled data is not available for new domains. The challenge of acquiring knowledge in this problem setting is referred to as unsupervised continual domain shift learning. Existing methods for domain adaptation and generalization have limitations in addressing this issue, as they focus either on adapting to a specific domain or generalizing to unseen domains, but not both. In this paper, we propose Complementary Domain Adaptation and Generalization (CoDAG), a simple yet effective learning framework that combines domain adaptation and generalization in a complementary manner to achieve three major goals of unsupervised continual domain shift learning: adapting to a current domain, generalizing to unseen domains, and preventing forgetting of previously seen domains. Our approach is model-agnostic, meaning that it is compatible with any existing domain adaptation and generalization algorithms. We evaluate CoDAG on several benchmark datasets and demonstrate that our model outperforms state-of-the-art models in all datasets and evaluation metrics, highlighting its effectiveness and robustness in handling unsupervised continual domain shift learning.
Large Language models (LLMs) possess the capability to engage In-context Learning (ICL) by leveraging a few demonstrations pertaining to a new downstream task as conditions. However, this particular learning paradigm suffers from high instability stemming from substantial variances induced by factors such as the input distribution of selected examples, their ordering, and prompt formats. In this work, we demonstrate that even when all these factors are held constant, the random selection of examples still results in high variance. Consequently, we aim to explore the informative ability of data examples by quantifying the Information Gain (IG) obtained in prediction after observing a given example candidate. Then we propose to sample those with maximum IG. Additionally, we identify the presence of template bias, which can lead to unfair evaluations of IG during the sampling process. To mitigate this bias, we introduce Calibration Before Sampling strategy. The experimental results illustrate that our proposed method can yield an average relative improvement of 14.3% across six classification tasks using three LLMs.
Continuum robots have gained widespread popularity due to their inherent compliance and flexibility, particularly their adjustable levels of stiffness for various application scenarios. Despite efforts to dynamic modeling and control synthesis over the past decade, few studies have incorporated stiffness regulation into their feedback control design; however, this is one of the initial motivations to develop continuum robots. This paper addresses the crucial challenge of controlling both the position and stiffness of underactuated continuum robots actuated by antagonistic tendons. We begin by presenting a rigid-link dynamical model that can analyze the open-loop stiffening of tendon-driven continuum robots. Based on this model, we propose a novel passivity-based position-and-stiffness controller that adheres to the non-negative tension constraint. Comprehensive experiments on our continuum robot validate the theoretical results and demonstrate the efficacy and precision of this approach.
Despite recent progress in Reinforcement Learning for robotics applications, many tasks remain prohibitively difficult to solve because of the expensive interaction cost. Transfer learning helps reduce the training time in the target domain by transferring knowledge learned in a source domain. Sim2Real transfer helps transfer knowledge from a simulated robotic domain to a physical target domain. Knowledge transfer reduces the time required to train a task in the physical world, where the cost of interactions is high. However, most existing approaches assume exact correspondence in the task structure and the physical properties of the two domains. This work proposes a framework for Few-Shot Policy Transfer between two domains through Observation Mapping and Behavior Cloning. We use Generative Adversarial Networks (GANs) along with a cycle-consistency loss to map the observations between the source and target domains and later use this learned mapping to clone the successful source task behavior policy to the target domain. We observe successful behavior policy transfer with limited target task interactions and in cases where the source and target task are semantically dissimilar.
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.
Existing methods for vision-and-language learning typically require designing task-specific architectures and objectives for each task. For example, a multi-label answer classifier for visual question answering, a region scorer for referring expression comprehension, and a language decoder for image captioning, etc. To alleviate these hassles, in this work, we propose a unified framework that learns different tasks in a single architecture with the same language modeling objective, i.e., multimodal conditional text generation, where our models learn to generate labels in text based on the visual and textual inputs. On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models. Moreover, our generative approach shows better generalization ability on answering questions that have rare answers. In addition, we show that our framework allows multi-task learning in a single architecture with a single set of parameters, which achieves similar performance to separately optimized single-task models. Our code will be publicly available at: //github.com/j-min/VL-T5