Owing to resource limitations, efficient computation systems have long been a critical demand for those designing autonomous vehicles. Additionally, sensor cost and size restrict the development of self-driving cars. This paper presents an efficient framework for the operation of vision-based automatic vehicles; a front-facing camera and a few inexpensive radars are the required sensors for driving environment perception. The proposed algorithm comprises a multi-task UNet (MTUNet) network for extracting image features and constrained iterative linear quadratic regulator (CILQR) modules for rapid lateral and longitudinal motion planning. The MTUNet is designed to simultaneously solve lane line segmentation, ego vehicle heading angle regression, road type classification, and traffic object detection tasks at an approximate speed of 40 FPS when an RGB image of size 228 x 228 is fed into it. The CILQR algorithms then take processed MTUNet outputs and radar data as their input to produce driving commands for lateral and longitudinal vehicle automation guidance; both optimal control problems can be solved within 1 ms. The proposed CILQR controllers are shown to be more efficient than the sequential quadratic programming (SQP) methods and can collaborate with the MTUNet to drive a car autonomously in unseen simulation environments for lane-keeping and car-following maneuvers. Our experiments demonstrate that the proposed autonomous driving system is applicable to modern automobiles.
Preoperative medical imaging is an essential part of surgical planning. The data from medical imaging devices, such as CT and MRI scanners, consist of stacks of 2D images in DICOM format. Conversely, advances in 3D data visualization provide further information by assembling cross-sections into 3D volumetric datasets. As Microsoft unveiled the HoloLens 2 (HL2), which is considered one of the best Mixed Reality (XR) headsets in the market, it promised to enhance visualization in 3D by providing an immersive experience to users. This paper introduces a prototype holographic XR DICOM Viewer for the 3D visualization of DICOM image sets on HL2 for surgical planning. We first developed a standalone graphical C++ engine using the native DirectX11 API and HLSL shaders. Based on that, the prototype further applies the OpenXR API for potential deployment on a wide range of devices from vendors across the XR spectrum. With native access to the device, our prototype unravels the limitation of hardware capabilities on HL2 for 3D volume rendering and interaction. Moreover, smartphones can act as input devices to provide another user interaction method by connecting to our server. In this paper, we present a holographic DICOM viewer for the HoloLens 2 and contribute (i) a prototype that renders the DICOM image stacks in real-time on HL2, (ii) three types of user interactions in XR, and (iii) a preliminary qualitative evaluation of our prototype.
Evaluating safety performance in a resource-efficient way is crucial for the development of autonomous systems. Simulation of parameterized scenarios is a popular testing strategy but parameter sweeps can be prohibitively expensive. To address this, we propose HiddenGems: a sample-efficient method for discovering the boundary between compliant and non-compliant behavior via active learning. Given a parameterized scenario, one or more compliance metrics, and a simulation oracle, HiddenGems maps the compliant and non-compliant domains of the scenario. The methodology enables critical test case identification, comparative analysis of different versions of the system under test, as well as verification of design objectives. We evaluate HiddenGems on a scenario with a jaywalker crossing in front of an autonomous vehicle and obtain compliance boundary estimates for collision, lane keep, and acceleration metrics individually and in combination, with 6 times fewer simulations than a parameter sweep. We also show how HiddenGems can be used to detect and rectify a failure mode for an unprotected turn with 86% fewer simulations.
This study presents a novel approach for modeling and simulating human-vehicle interactions in order to examine the effects of automated driving systems (ADS) on driving performance and driver control workload. Existing driver-ADS interaction studies have relied on simulated or real-world human driver experiments that are limited in providing objective evaluation of the dynamic interactions and control workloads on the driver. Our approach leverages an integrated human model-based active driving system (HuMADS) to simulate the dynamic interaction between the driver model and the haptic-based ADS during a vehicle overtaking task. Two driver arm-steering models were developed for both tense and relaxed human driver conditions and validated against experimental data. We conducted a simulation study to evaluate the effects of three different haptic shared control conditions (based on the presence and type of control conflict) on overtaking task performance and driver workloads. We found that No Conflict shared control scenarios result in improved driving performance and reduced control workloads, while Conflict scenarios result in unsafe maneuvers and increased workloads. These findings, which are consistent with experimental studies, demonstrate the potential for our approach to improving future ADS design for safer driver assistance systems.
Vehicular edge computing (VEC) becomes a promising paradigm for the development of emerging intelligent transportation systems. Nevertheless, the limited resources and massive transmission demands bring great challenges on implementing vehicular applications with stringent deadline requirements. This work presents a non-orthogonal multiple access (NOMA) based architecture in VEC, where heterogeneous edge nodes are cooperated for real-time task processing. We derive a vehicle-to-infrastructure (V2I) transmission model by considering both intra-edge and inter-edge interferences and formulate a cooperative resource optimization (CRO) problem by jointly optimizing the task offloading and resource allocation, aiming at maximizing the service ratio. Further, we decompose the CRO into two subproblems, namely, task offloading and resource allocation. In particular, the task offloading subproblem is modeled as an exact potential game (EPG), and a multi-agent distributed distributional deep deterministic policy gradient (MAD4PG) is proposed to achieve the Nash equilibrium. The resource allocation subproblem is divided into two independent convex optimization problems, and an optimal solution is proposed by using a gradient-based iterative method and KKT condition. Finally, we build the simulation model based on real-world vehicle trajectories and give a comprehensive performance evaluation, which conclusively demonstrates the superiority of the proposed solutions.
We propose an efficient way of solving optimal control problems for rigid-body systems on the basis of inverse dynamics and the multiple-shooting method. We treat all variables, including the state, acceleration, and control input torques, as optimization variables and treat the inverse dynamics as an equality constraint. We eliminate the update of the control input torques from the linear equation of Newton's method by applying condensing for inverse dynamics. The size of the resultant linear equation is the same as that of the multiple-shooting method based on forward dynamics except for the variables related to the passive joints and contacts. Compared with the conventional methods based on forward dynamics, the proposed method reduces the computational cost of the dynamics and their sensitivities by utilizing the recursive Newton-Euler algorithm (RNEA) and its partial derivatives. In addition, it increases the sparsity of the Hessian of the Karush-Kuhn-Tucker conditions, which reduces the computational cost, e.g., of Riccati recursion. Numerical experiments show that the proposed method outperforms state-of-the-art implementations of differential dynamic programming based on forward dynamics in terms of computational time and numerical robustness.
This chapter discusses the intricacies of cybersecurity agents' perception. It addresses the complexity of perception and illuminates how perception shapes and influences the decision-making process. It then explores the necessary considerations when crafting the world representation and discusses the power and bandwidth constraints of perception and the underlying issues of AICA's trust in perception. On these foundations, it provides the reader with a guide to developing perception models for AICA, discussing the trade-offs of each objective state approximation. The guide is written in the context of the CYST cybersecurity simulation engine, which aims to closely model cybersecurity interactions and can be used as a basis for developing AICA. Because CYST is freely available, the reader is welcome to try implementing and evaluating the proposed methods for themselves.
Generating realistic lip motion from audio to simulate speech production is critical for driving natural character animation. Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality. Devising metrics that align with subjective opinion first requires understanding what impacts human perception of quality. In this work, we focus on the degree of articulation and run a series of experiments to study how articulation strength impacts human perception of lip motion accompanying speech. Specifically, we study how increasing under-articulated (dampened) and over-articulated (exaggerated) lip motion affects human perception of quality. We examine the impact of articulation strength on human perception when considering only lip motion, where viewers are presented with talking faces represented by landmarks, and in the context of embodied characters, where viewers are presented with photo-realistic videos. Our results show that viewers prefer over-articulated lip motion consistently more than under-articulated lip motion and that this preference generalizes across different speakers and embodiments.
Motion planning and control are crucial components of robotics applications. Here, spatio-temporal hard constraints like system dynamics and safety boundaries (e.g., obstacles in automated driving) restrict the robot's motions. Direct methods from optimal control solve a constrained optimization problem. However, in many applications finding a proper cost function is inherently difficult because of the weighting of partially conflicting objectives. On the other hand, Imitation Learning (IL) methods such as Behavior Cloning (BC) provide a intuitive framework for learning decision-making from offline demonstrations and constitute a promising avenue for planning and control in complex robot applications. Prior work primarily relied on soft-constraint approaches, which use additional auxiliary loss terms describing the constraints. However, catastrophic safety-critical failures might occur in out-of-distribution (OOD) scenarios. This work integrates the flexibility of IL with hard constraint handling in optimal control. Our approach constitutes a general framework for constraint robotic motion planning and control using offline IL. Hard constraints are integrated into the learning problem in a differentiable manner, via explicit completion and gradient-based correction. Simulated experiments of mobile robot navigation and automated driving provide evidence for the performance of the proposed method.
A good estimation of the actions' cost is key in task planning for human-robot collaboration. The duration of an action depends on agents' capabilities and the correlation between actions performed simultaneously by the human and the robot. This paper proposes an approach to learning actions' costs and coupling between actions executed concurrently by humans and robots. We leverage the information from past executions to learn the average duration of each action and a synergy coefficient representing the effect of an action performed by the human on the duration of the action performed by the robot (and vice versa). We implement the proposed method in a simulated scenario where both agents can access the same area simultaneously. Safety measures require the robot to slow down when the human is close, denoting a bad synergy of tasks operating in the same area. We show that our approach can learn such bad couplings so that a task planner can leverage this information to find better plans.
Today, an increasing number of Adaptive Deep Neural Networks (AdNNs) are being used on resource-constrained embedded devices. We observe that, similar to traditional software, redundant computation exists in AdNNs, resulting in considerable performance degradation. The performance degradation is dependent on the input and is referred to as input-dependent performance bottlenecks (IDPBs). To ensure an AdNN satisfies the performance requirements of resource-constrained applications, it is essential to conduct performance testing to detect IDPBs in the AdNN. Existing neural network testing methods are primarily concerned with correctness testing, which does not involve performance testing. To fill this gap, we propose DeepPerform, a scalable approach to generate test samples to detect the IDPBs in AdNNs. We first demonstrate how the problem of generating performance test samples detecting IDPBs can be formulated as an optimization problem. Following that, we demonstrate how DeepPerform efficiently handles the optimization problem by learning and estimating the distribution of AdNNs' computational consumption. We evaluate DeepPerform on three widely used datasets against five popular AdNN models. The results show that DeepPerform generates test samples that cause more severe performance degradation (FLOPs: increase up to 552\%). Furthermore, DeepPerform is substantially more efficient than the baseline methods in generating test inputs(runtime overhead: only 6-10 milliseconds).