Flight-time failures of small Uncrewed Aerial Systems (sUAS) can have a severe impact on people or the environment. Therefore, sUAS applications must be thoroughly evaluated and tested to ensure their adherence to specified requirements, and safe behavior under real-world conditions, such as poor weather, wireless interference, and satellite failure. However, current simulation environments for autonomous vehicles, including sUAS, provide limited support for validating their behavior in diverse environmental contexts and moreover, lack a test harness to facilitate structured testing based on system-level requirements. We address these shortcomings by eliciting and specifying requirements for an sUAS testing and simulation platform, and developing and deploying it. The constructed platform, DroneReqValidator (DRV), allows sUAS developers to define the operating context, configure multi-sUAS mission requirements, specify safety properties, and deploy their own custom sUAS applications in a high-fidelity 3D environment. The DRV Monitoring system collects runtime data from sUAS and the environment, analyzes compliance with safety properties, and captures violations. We report on two case studies in which we used our platform prior to real-world sUAS deployments, in order to evaluate sUAS mission behavior in various environmental contexts. Furthermore, we conducted a study with developers and found that DRV simplifies the process of specifying requirements-driven test scenarios and analyzing acceptance test results
Large language models (LLMs) have significantly improved the ability to perform tasks in the field of code generation. However, there is still a gap between LLMs being capable coders and being top-tier software engineers. Based on the observation that top-level software engineers often ask clarifying questions to reduce ambiguity in both requirements and coding solutions, we argue that the same should be applied to LLMs for code generation tasks. By asking probing questions in various topics before generating the final code, the challenges of programming with LLMs, such as unclear intent specification, lack of computational thinking, and undesired code quality, may be alleviated. This, in turn, increases confidence in the generated code. In this work, we explore how to leverage better communication skills to achieve greater confidence in generated code. We propose a communication-centered process that uses an LLM-generated communicator to identify issues with high ambiguity or low confidence in problem descriptions and generated code. We then ask clarifying questions to obtain responses from users for refining the code.
Gait recognition is a promising biometric technology for identification due to its non-invasiveness and long-distance. However, external variations such as clothing changes and viewpoint differences pose significant challenges to gait recognition. Silhouette-based methods preserve body shape but neglect internal structure information, while skeleton-based methods preserve structure information but omit appearance. To fully exploit the complementary nature of the two modalities, a novel triple branch gait recognition framework, TriGait, is proposed in this paper. It effectively integrates features from the skeleton and silhouette data in a hybrid fusion manner, including a two-stream network to extract static and motion features from appearance, a simple yet effective module named JSA-TC to capture dependencies between all joints, and a third branch for cross-modal learning by aligning and fusing low-level features of two modalities. Experimental results demonstrate the superiority and effectiveness of TriGait for gait recognition. The proposed method achieves a mean rank-1 accuracy of 96.0% over all conditions on CASIA-B dataset and 94.3% accuracy for CL, significantly outperforming all the state-of-the-art methods. The source code will be available at //github.com/feng-xueling/TriGait/.
Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code -- supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance. Though hybrid approaches aim for the "best of both worlds," using them effectively requires subtle considerations to make code amenable to safe, accurate, and efficient graph execution -- avoiding performance bottlenecks and semantically inequivalent results. We present our ongoing work on an automated refactoring approach that assists developers in specifying whether and how their otherwise eagerly-executed imperative DL code could be reliably and efficiently executed as graphs at run-time in a semantics-preserving fashion. The approach, based on a novel tensor analysis specifically for imperative DL code, consists of refactoring preconditions for automatically determining when it is safe and potentially advantageous to migrate imperative DL code to graph execution and modifying decorator parameters or eagerly executing code already running as graphs. The approach is being implemented as a PyDev Eclipse IDE plug-in and uses the WALA Ariadne analysis framework. We discuss our ongoing work towards optimizing imperative DL code to its full potential.
Graph neural networks (GNNs) have been a hot spot of recent research and are widely utilized in diverse applications. However, with the use of huger data and deeper models, an urgent demand is unsurprisingly made to accelerate GNNs for more efficient execution. In this paper, we provide a comprehensive survey on acceleration methods for GNNs from an algorithmic perspective. We first present a new taxonomy to classify existing acceleration methods into five categories. Based on the classification, we systematically discuss these methods and highlight their correlations. Next, we provide comparisons from aspects of the efficiency and characteristics of these methods. Finally, we suggest some promising prospects for future research.
Recommender systems have been widely applied in different real-life scenarios to help us find useful information. Recently, Reinforcement Learning (RL) based recommender systems have become an emerging research topic. It often surpasses traditional recommendation models even most deep learning-based methods, owing to its interactive nature and autonomous learning ability. Nevertheless, there are various challenges of RL when applying in recommender systems. Toward this end, we firstly provide a thorough overview, comparisons, and summarization of RL approaches for five typical recommendation scenarios, following three main categories of RL: value-function, policy search, and Actor-Critic. Then, we systematically analyze the challenges and relevant solutions on the basis of existing literature. Finally, under discussion for open issues of RL and its limitations of recommendation, we highlight some potential research directions in this field.
The content based image retrieval aims to find the similar images from a large scale dataset against a query image. Generally, the similarity between the representative features of the query image and dataset images is used to rank the images for retrieval. In early days, various hand designed feature descriptors have been investigated based on the visual cues such as color, texture, shape, etc. that represent the images. However, the deep learning has emerged as a dominating alternative of hand-designed feature engineering from a decade. It learns the features automatically from the data. This paper presents a comprehensive survey of deep learning based developments in the past decade for content based image retrieval. The categorization of existing state-of-the-art methods from different perspectives is also performed for greater understanding of the progress. The taxonomy used in this survey covers different supervision, different networks, different descriptor type and different retrieval type. A performance analysis is also performed using the state-of-the-art methods. The insights are also presented for the benefit of the researchers to observe the progress and to make the best choices. The survey presented in this paper will help in further research progress in image retrieval using deep learning.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
Convolutional networks (ConvNets) have achieved great successes in various challenging vision tasks. However, the performance of ConvNets would degrade when encountering the domain shift. The domain adaptation is more significant while challenging in the field of biomedical image analysis, where cross-modality data have largely different distributions. Given that annotating the medical data is especially expensive, the supervised transfer learning approaches are not quite optimal. In this paper, we propose an unsupervised domain adaptation framework with adversarial learning for cross-modality biomedical image segmentations. Specifically, our model is based on a dilated fully convolutional network for pixel-wise prediction. Moreover, we build a plug-and-play domain adaptation module (DAM) to map the target input to features which are aligned with source domain feature space. A domain critic module (DCM) is set up for discriminating the feature space of both domains. We optimize the DAM and DCM via an adversarial loss without using any target domain label. Our proposed method is validated by adapting a ConvNet trained with MRI images to unpaired CT data for cardiac structures segmentations, and achieved very promising results.
Chatbot has become an important solution to rapidly increasing customer care demands on social media in recent years. However, current work on chatbot for customer care ignores a key to impact user experience - tones. In this work, we create a novel tone-aware chatbot that generates toned responses to user requests on social media. We first conduct a formative research, in which the effects of tones are studied. Significant and various influences of different tones on user experience are uncovered in the study. With the knowledge of effects of tones, we design a deep learning based chatbot that takes tone information into account. We train our system on over 1.5 million real customer care conversations collected from Twitter. The evaluation reveals that our tone-aware chatbot generates as appropriate responses to user requests as human agents. More importantly, our chatbot is perceived to be even more empathetic than human agents.
Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, noticeable progress has been made in scene classification and target detection.However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at //github.com/2051/RSICD_optimal