This paper presents a vision guidance and control method for autonomous robotic capture and stabilization of orbital objects in a time-critical manner. The method takes into account various operational and physical constraints, including ensuring a smooth capture, handling line-of-sight (LOS) obstructions of the target, and staying within the acceleration, force, and torque limits of the robot. Our approach involves the development of an optimal control framework for an eye-to-hand visual servoing method, which integrates two sequential sub-maneuvers: a pre-capturing maneuver and a post-capturing maneuver, aimed at achieving the shortest possible capture time. Integrating both control strategies enables a seamless transition between them, allowing for real-time switching to the appropriate control system. Moreover, both controllers are adaptively tuned through vision feedback to account for the unknown dynamics of the target. The integrated estimation and control architecture also facilitates fault detection and recovery of the visual feedback in situations where the feedback is temporarily obstructed. The experimental results demonstrate the successful execution of pre- and post-capturing operations on a tumbling and drifting target, despite multiple operational constraints.
Traffic from distributed training of machine learning (ML) models makes up a large and growing fraction of the traffic mix in enterprise data centers. While work on distributed ML abounds, the network traffic generated by distributed ML has received little attention. Using measurements on a testbed network, we investigate the traffic characteristics generated by the training of the ResNet-50 neural network with an emphasis on studying its short-term burstiness. For the latter we propose metrics that quantify traffic burstiness at different time scales. Our analysis reveals that distributed ML traffic exhibits a very high degree of burstiness on short time scales, exceeding a 60:1 peak-to-mean ratio on time intervals as long as 5~ms. We observe that training software orchestrates transmissions in such a way that burst transmissions from different sources within the same application do not result in congestion and packet losses. An extrapolation of the measurement data to multiple applications underscores the challenges of distributed ML traffic for congestion and flow control algorithms.
Autonomous vehicles are advanced driving systems that are well known to be vulnerable to various adversarial attacks, compromising vehicle safety and posing a risk to other road users. Rather than actively training complex adversaries by interacting with the environment, there is a need to first intelligently find and reduce the search space to only those states where autonomous vehicles are found to be less confident. In this paper, we propose a black-box testing framework ReMAV that uses offline trajectories first to analyze the existing behavior of autonomous vehicles and determine appropriate thresholds to find the probability of failure events. To this end, we introduce a three-step methodology which i) uses offline state action pairs of any autonomous vehicle under test, ii) builds an abstract behavior representation using our designed reward modeling technique to analyze states with uncertain driving decisions, and iii) uses a disturbance model for minimal perturbation attacks where the driving decisions are less confident. Our reward modeling technique helps in creating a behavior representation that allows us to highlight regions of likely uncertain behavior even when the standard autonomous vehicle performs well. We perform our experiments in a high-fidelity urban driving environment using three different driving scenarios containing single- and multi-agent interactions. Our experiment shows an increase in 35, 23, 48, and 50% in the occurrences of vehicle collision, road object collision, pedestrian collision, and offroad steering events, respectively by the autonomous vehicle under test, demonstrating a significant increase in failure events. We compare ReMAV with two baselines and show that ReMAV demonstrates significantly better effectiveness in generating failure events compared to the baselines in all evaluation metrics.
We present a tool for exploring the design space of shaders using an interactive evolutionary algorithm integrated with the Unity editor, a well-known commercial tool for video game development. Our framework leverages the underlying graph-based representation of recent shader editors and interactive evolution to allow designers to explore several visual options starting from an existing shader. Our framework encodes the graph representation of a current shader as a chromosome used to seed the evolution of a shader population. It applies graph-based recombination and mutation with a set of heuristics to create feasible shaders. The framework is an extension of the Unity editor; thus, designers with little knowledge of evolutionary computation (and shader programming) can interact with the underlying evolutionary engine using the same visual interface used for working on game scenes.
This paper presents a method for achieving high-speed running of a quadruped robot by considering the actuator torque-speed operating region in reinforcement learning. The physical properties and constraints of the actuator are included in the training process to reduce state transitions that are infeasible in the real world due to motor torque-speed limitations. The gait reward is designed to distribute motor torque evenly across all legs, contributing to more balanced power usage and mitigating performance bottlenecks due to single-motor saturation. Additionally, we designed a lightweight foot to enhance the robot's agility. We observed that applying the motor operating region as a constraint helps the policy network avoid infeasible areas during sampling. With the trained policy, KAIST Hound, a 45 kg quadruped robot, can run up to 6.5 m/s, which is the fastest speed among electric motor-based quadruped robots.
Intelligent transportation systems play a crucial role in modern traffic management and optimization, greatly improving traffic efficiency and safety. With the rapid development of generative artificial intelligence (Generative AI) technologies in the fields of image generation and natural language processing, generative AI has also played a crucial role in addressing key issues in intelligent transportation systems, such as data sparsity, difficulty in observing abnormal scenarios, and in modeling data uncertainty. In this review, we systematically investigate the relevant literature on generative AI techniques in addressing key issues in different types of tasks in intelligent transportation systems. First, we introduce the principles of different generative AI techniques, and their potential applications. Then, we classify tasks in intelligent transportation systems into four types: traffic perception, traffic prediction, traffic simulation, and traffic decision-making. We systematically illustrate how generative AI techniques addresses key issues in these four different types of tasks. Finally, we summarize the challenges faced in applying generative AI to intelligent transportation systems, and discuss future research directions based on different application scenarios.
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.
We address the task of automatically scoring the competency of candidates based on textual features, from the automatic speech recognition (ASR) transcriptions in the asynchronous video job interview (AVI). The key challenge is how to construct the dependency relation between questions and answers, and conduct the semantic level interaction for each question-answer (QA) pair. However, most of the recent studies in AVI focus on how to represent questions and answers better, but ignore the dependency information and interaction between them, which is critical for QA evaluation. In this work, we propose a Hierarchical Reasoning Graph Neural Network (HRGNN) for the automatic assessment of question-answer pairs. Specifically, we construct a sentence-level relational graph neural network to capture the dependency information of sentences in or between the question and the answer. Based on these graphs, we employ a semantic-level reasoning graph attention network to model the interaction states of the current QA session. Finally, we propose a gated recurrent unit encoder to represent the temporal question-answer pairs for the final prediction. Empirical results conducted on CHNAT (a real-world dataset) validate that our proposed model significantly outperforms text-matching based benchmark models. Ablation studies and experimental results with 10 random seeds also show the effectiveness and stability of our models.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.