This paper proposes a comprehensive hierarchical control framework for autonomous decision-making arising in robotics and autonomous systems. In a typical hierarchical control architecture, high-level decision making is often characterised by discrete state and decision/control sets. However, a rational decision is usually affected by not only the discrete states of the autonomous system, but also the underlying continuous dynamics even the evolution of its operational environment. This paper proposes a holistic and comprehensive design process and framework for this type of challenging problems, from new modelling and design problem formulation to control design and stability analysis. It addresses the intricate interplay between traditional continuous systems dynamics utilized at the low levels for control design and discrete Markov decision processes (MDP) for facilitating high-level decision making. We model the decision making system in complex environments as a hybrid system consisting of a controlled MDP and autonomous (i.e. uncontrolled) continuous dynamics. Consequently, the new formulation is called as hybrid Markov decision process (HMDP). The design problem is formulated with a focus on ensuring both safety and optimality while taking into account the influence of both the discrete and continuous state variables of different levels. With the help of the model predictive control (MPC) concept, a decision maker design scheme is proposed for the proposed hybrid decision making model. By carefully designing key ingredients involved in this scheme, it is shown that the recursive feasibility and stability of the proposed autonomous decision making scheme are guaranteed. The proposed framework is applied to develop an autonomous lane changing system for intelligent vehicles.
The ever-expanding scale of integrated circuits has brought about a significant rise in the design risks associated with radiation-resistant integrated circuit chips. Traditional single-particle experimental methods, with their iterative design approach, are increasingly ill-suited for the challenges posed by large-scale integrated circuits. In response, this article introduces a novel sensitivity-aware single-particle radiation effects simulation framework tailored for System-on-Chip platforms. Based on SVM algorithm we have implemented fast finding and classification of sensitive circuit nodes. Additionally, the methodology automates soft error analysis across the entire software stack. The study includes practical experiments focusing on RISC-V architecture, encompassing core components, buses, and memory systems. It culminates in the establishment of databases for Single Event Upsets (SEU) and Single Event Transients (SET), showcasing the practical efficacy of the proposed methodology in addressing radiation-induced challenges at the scale of contemporary integrated circuits. Experimental results have shown up to 12.78X speed-up on the basis of achieving 94.58% accuracy.
This work focuses on autonomous contingency planning for scientific missions by enabling rapid policy computation from any off-nominal point in the state space in the event of a delay or deviation from the nominal mission plan. Successful contingency planning involves managing risks and rewards, often probabilistically associated with actions, in stochastic scenarios. Markov Decision Processes (MDPs) are used to mathematically model decision-making in such scenarios. However, in the specific case of planetary rover traverse planning, the vast action space and long planning time horizon pose computational challenges. A bi-level MDP framework is proposed to improve computational tractability, while also aligning with existing mission planning practices and enhancing explainability and trustworthiness of AI-driven solutions. We discuss the conversion of a mission planning MDP into a bi-level MDP, and test the framework on RoverGridWorld, a modified GridWorld environment for rover mission planning. We demonstrate the computational tractability and near-optimal policies achievable with the bi-level MDP approach, highlighting the trade-offs between compute time and policy optimality as the problem's complexity grows. This work facilitates more efficient and flexible contingency planning in the context of scientific missions.
This paper introduces a framework for solving alternating current optimal power flow (ACOPF) problems using graphics processing units (GPUs). While GPUs have demonstrated remarkable performance in various computing domains, their application in ACOPF has been limited due to challenges associated with porting sparse automatic differentiation (AD) and sparse linear solver routines to GPUs. We address these issues with two key strategies. First, we utilize a single-instruction, multiple-data abstraction of nonlinear programs. This approach enables the specification of model equations while preserving their parallelizable structure and, in turn, facilitates the parallel AD implementation. Second, we employ a condensed-space interior-point method (IPM) with an inequality relaxation. This technique involves condensing the Karush--Kuhn--Tucker (KKT) system into a positive definite system. This strategy offers the key advantage of being able to factorize the KKT matrix without numerical pivoting, which has hampered the parallelization of the IPM algorithm. By combining these strategies, we can perform the majority of operations on GPUs while keeping the data residing in the device memory only. Comprehensive numerical benchmark results showcase the advantage of our approach. Remarkably, our implementations -- MadNLP.jl and ExaModels.jl -- running on NVIDIA GPUs achieve an order of magnitude speedup compared with state-of-the-art tools running on contemporary CPUs.
This paper proposes a new distributed finite-time differentiator (DFD) for multi-agent systems (MAS) under directed graph, which extends the differentiator algorithm from the centralized case to the distributed case by only using relative/absolute position information. By skillfully constructing a Lyapunov function, the finite-time stability of the closed-loop system under DFD is proved. Inspired by the duality principle of control theory, a distributed continuous finite-time output consensus algorithm extended from DFD for a class of leader-follower MAS is provided, which not only completely suppresses disturbance, but also avoids chattering. Finally, several simulation examples are given to verify the effectiveness of the DFD.
Recent technological advancements in artificial intelligence and computer vision have enabled gait analysis on portable devices such as cell phones. However, most state-of-the-art vision-based systems still impose numerous constraints for capturing a patient's video, such as using a static camera and maintaining a specific distance from it. While these constraints are manageable under professional observation, they pose challenges in home settings. Another issue with most vision-based systems is their output, typically a classification label and confidence value, whose reliability is often questioned by medical professionals. This paper addresses these challenges by presenting a novel system for gait analysis robust to camera movements and providing explanations for its output. The study utilizes a dataset comprising videos of subjects wearing two types of Knee Ankle Foot Orthosis (KAFO), namely "Locked Knee" and "Semi-flexion," for mobility, along with metadata and ground truth for explanations. The ground truth highlights the statistical significance of seven features captured using motion capture systems to differentiate between the two gaits. To address camera movement challenges, the proposed system employs super-resolution and pose estimation during pre-processing. It then identifies the seven features - Stride Length, Step Length and Duration of single support of orthotic and non-orthotic leg, Cadence, and Speed - using the skeletal output of pose estimation. These features train a multi-layer perceptron, with its output explained by highlighting the features' contribution to classification. While most state-of-the-art systems struggle with processing the video or training on the proposed dataset, our system achieves an average accuracy of 94%. The model's explainability is validated using ground truth and can be considered reliable.
Despite progress in video-language modeling, the computational challenge of interpreting long-form videos in response to task-specific linguistic queries persists, largely due to the complexity of high-dimensional video data and the misalignment between language and visual cues over space and time. To tackle this issue, we introduce a novel approach called Language-guided Spatial-Temporal Prompt Learning (LSTP). This approach features two key components: a Temporal Prompt Sampler (TPS) with optical flow prior that leverages temporal information to efficiently extract relevant video content, and a Spatial Prompt Solver (SPS) that adeptly captures the intricate spatial relationships between visual and textual elements. By harmonizing TPS and SPS with a cohesive training strategy, our framework significantly enhances computational efficiency, temporal understanding, and spatial-temporal alignment. Empirical evaluations across two challenging tasks--video question answering and temporal question grounding in videos--using a variety of video-language pretrainings (VLPs) and large language models (LLMs) demonstrate the superior performance, speed, and versatility of our proposed LSTP paradigm.
Cognitive decision-making processes are crucial aspects of human behavior, influencing various personal and professional domains. This research delves into the application of differential equations in analyzing decision-making accuracy by leveraging eye-tracking data within a virtual industrial town setting. The study unveils a systematic approach to transforming raw data into a differential equation, essential for deciphering the relationship between eye movements during decision-making processes. Mathematical relationship extraction and variable-parameter definition pave the way for deriving a differential equation that encapsulates the growth of fixations on characters. The key factors in this equation encompass the fixation rate $(\lambda)$ and separation rate $(\mu)$, reflecting user interaction dynamics and their impact on decision-making complexities tied to user engagement with virtual characters. For a comprehensive grasp of decision dynamics, solving this differential equation requires initial fixation counts, fixation rate, and separation rate. The formulation of differential equations incorporates various considerations such as engagement duration, character-player distance, relative speed, and character attributes, enabling the representation of fixation changes, speed dynamics, distance variations, and the effects of character attributes. This comprehensive analysis not only enhances our comprehension of decision-making processes but also provides a foundational framework for predictive modeling and data-driven insights for future research and applications in cognitive science and virtual reality environments.
This paper surveys vision-language pre-training (VLP) methods for multimodal intelligence that have been developed in the last few years. We group these approaches into three categories: ($i$) VLP for image-text tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding; ($ii$) VLP for core computer vision tasks, such as (open-set) image classification, object detection, and segmentation; and ($iii$) VLP for video-text tasks, such as video captioning, video-text retrieval, and video question answering. For each category, we present a comprehensive review of state-of-the-art methods, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies. In addition, for each category, we discuss advanced topics being actively explored in the research community, such as big foundation models, unified modeling, in-context few-shot learning, knowledge, robustness, and computer vision in the wild, to name a few.
The rapid development of deep learning has made a great progress in segmentation, one of the fundamental tasks of computer vision. However, the current segmentation algorithms mostly rely on the availability of pixel-level annotations, which are often expensive, tedious, and laborious. To alleviate this burden, the past years have witnessed an increasing attention in building label-efficient, deep-learning-based segmentation algorithms. This paper offers a comprehensive review on label-efficient segmentation methods. To this end, we first develop a taxonomy to organize these methods according to the supervision provided by different types of weak labels (including no supervision, coarse supervision, incomplete supervision and noisy supervision) and supplemented by the types of segmentation problems (including semantic segmentation, instance segmentation and panoptic segmentation). Next, we summarize the existing label-efficient segmentation methods from a unified perspective that discusses an important question: how to bridge the gap between weak supervision and dense prediction -- the current methods are mostly based on heuristic priors, such as cross-pixel similarity, cross-label constraint, cross-view consistency, cross-image relation, etc. Finally, we share our opinions about the future research directions for label-efficient deep segmentation.
Deep neural network based recommendation systems have achieved great success as information filtering techniques in recent years. However, since model training from scratch requires sufficient data, deep learning-based recommendation methods still face the bottlenecks of insufficient data and computational inefficiency. Meta-learning, as an emerging paradigm that learns to improve the learning efficiency and generalization ability of algorithms, has shown its strength in tackling the data sparsity issue. Recently, a growing number of studies on deep meta-learning based recommenddation systems have emerged for improving the performance under recommendation scenarios where available data is limited, e.g. user cold-start and item cold-start. Therefore, this survey provides a timely and comprehensive overview of current deep meta-learning based recommendation methods. Specifically, we propose a taxonomy to discuss existing methods according to recommendation scenarios, meta-learning techniques, and meta-knowledge representations, which could provide the design space for meta-learning based recommendation methods. For each recommendation scenario, we further discuss technical details about how existing methods apply meta-learning to improve the generalization ability of recommendation models. Finally, we also point out several limitations in current research and highlight some promising directions for future research in this area.