To improve the development of responsible AI systems, developers are increasingly utilizing tools such as checklists or guideline cards to ensure fairness, transparency, and sustainability. However, these tools face two main challenges. First, they are static and are not meant to keep pace with the latest responsible AI literature and international standards. Second, they tend to prioritize individual usage over fostering collaboration among AI practitioners. To overcome these limitations, we propose a method that enables easy updates of responsible AI guidelines by incorporating research papers and ISO standards, ensuring that the content remains relevant and up to date, while emphasizing actionable guidelines that can be implemented by a wide range of AI practitioners. We validated our method in a case study at a large tech company by designing and deploying a tool that recommends interactive and actionable guidelines, which were generated by a team of engineers, standardization experts, and a lawyer using our method. Through the study involving AI developers and engineers, we assessed the usability and effectiveness of the tool, showing that the guidelines were considered practical and actionable. The guidelines encouraged self-reflection and facilitated a better understanding of the ethical considerations of AI during the early stages of development, significantly contributing to the idea of "Responsible AI by Design" -- a design-first approach that considers responsible AI values throughout the development lifecycle and across business roles.
Mobile manipulators have been used for inspection, maintenance and repair tasks over the years, but there are some key limitations. Stability concerns typically require mobile platforms to be large in order to handle far-reaching manipulators, or for the manipulators to have drastically reduced workspaces to fit onto smaller mobile platforms. Therefore we propose a combination of two widely-used robots, the Clearpath Jackal unmanned ground vehicle and the Kinova Gen3 six degree-of-freedom manipulator. The Jackal has a small footprint and works well in low-clearance indoor environments. Extensive testing of localization, navigation and mapping using LiDAR sensors makes the Jackal a well developed mobile platform suitable for mobile manipulation. The Gen3 has a long reach with reasonable power consumption for manipulation tasks. A wrist camera for RGB-D sensing and a customizable end effector interface makes the Gen3 suitable for a myriad of manipulation tasks. Typically these features would result in an unstable platform, however with a few minor hardware and software modifications, we have produced a stable, high-performance mobile manipulation platform with significant mobility, reach, sensing, and maneuverability for indoor inspection tasks, without degradation of the component robots' individual capabilities. These assertions were investigated with hardware via semi-autonomous navigation to waypoints in a busy indoor environment, and high-precision self-alignment alongside planar structures for intervention tasks.
As automated web accessibility testing tools become enriched with new and improved tests, it can be impractical to leverage those advances. Each tool offers unique benefits, but effectively using multiple tools would require integrating them into a uniform testing and reporting scheme. Such integration is complex, because tools vary in what they try to detect, what they actually detect, and how they classify, describe, and report defects. Consequently, testers typically use only one tool. Testaro is a novel open-source NPM package that checks compliance with about 650 rules defined by an ensemble of 8 tools: alfa, Axe, Equal Access, HTML CodeSniffer, Nu Html Checker, QualWeb, Testaro, and WAVE. Attendees at the demonstration will, within 5 minutes, create jobs for Testaro, run them, and generate unified reports documenting more accessibility issues than any single tool can discover.
Scene transfer for vision-based mobile robotics applications is a highly relevant and challenging problem. The utility of a robot greatly depends on its ability to perform a task in the real world, outside of a well-controlled lab environment. Existing scene transfer end-to-end policy learning approaches often suffer from poor sample efficiency or limited generalization capabilities, making them unsuitable for mobile robotics applications. This work proposes an adaptive multi-pair contrastive learning strategy for visual representation learning that enables zero-shot scene transfer and real-world deployment. Control policies relying on the embedding are able to operate in unseen environments without the need for finetuning in the deployment environment. We demonstrate the performance of our approach on the task of agile, vision-based quadrotor flight. Extensive simulation and real-world experiments demonstrate that our approach successfully generalizes beyond the training domain and outperforms all baselines.
Static analysis tools have gained popularity among developers for finding potential bugs, but their widespread adoption is hindered by the accomnpanying high false alarm rates (up to 90%). To address this challenge, previous studies proposed the concept of actionable warnings, and apply machine-learning methods to distinguish actionable warnings from false alarms. Despite these efforts, our preliminary study suggests that the current methods used to collect actionable warnings are rather shaky and unreliable, resulting in a large proportion of invalid actionable warnings. In this work, we mined 68,274 reversions from Top-500 Github C repositories to create a substantia actionable warning dataset and assigned weak labels to each warning's likelihood of being a real bug. To automatically identify actionable warnings and recommend those with a high probability of being real bugs (AWHB), we propose a two-stage framework called ACWRecommender. In the first stage, our tool use a pre-trained model, i.e., UniXcoder, to identify actionable warnings from a huge number of SA tool's reported warnings. In the second stage, we rerank valid actionable warnings to the top by using weakly supervised learning. Experimental results showed that our tool outperformed several baselines for actionable warning detection (in terms of F1-score) and performed better for AWHB recommendation (in terms of nDCG and MRR). Additionaly, we also performed an in-the-wild evaluation, we manually validated 24 warnings out of 2,197 reported warnings on 10 randomly selected projects, 22 of which were confirmed by developers as real bugs, demonstrating the practical usage of our tool.
Small CNN-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modeling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the Transformer-based large model and the CNN-based small network. In this paper, we develop the first Heterogeneous Generative Knowledge Distillation (H-GKD) based on MIM, which can efficiently transfer knowledge from large Transformer models to small CNN-based models in a generative self-supervised fashion. Our method builds a bridge between Transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modeling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced generative methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, H-GKD improves the accuracy of Resnet50 (sparse) from 76.98% to 80.01%.
Formats for representing and manipulating verification problems are extremely important for supporting the ecosystem of tools, developers, and practitioners. A good format allows representing many different types of problems, has a strong toolchain for manipulating and translating problems, and can grow with the community. In the world of hardware verification, and, specifically, the Hardware Model Checking Competition (HWMCC), the Btor2 format has emerged as the dominating format. It is supported by Btor2Tools, verification tools, and Verilog design tools like Yosys. In this paper, we present an alternative format and toolchain, called Btor2MLIR, based on the recent MLIR framework. The advantage of Btor2MLIR is in reusing existing components from a mature compiler infrastructure, including parsers, text and binary formats, converters to a variety of intermediate representations, and executable semantics of LLVM. We hope that the format and our tooling will lead to rapid prototyping of verification and related tools for hardware verification.
Visual object recognition in unseen and cluttered indoor environments is a challenging problem for mobile robots. Toward this goal, we extend our previous work to propose the TOPS2 descriptor, and an accompanying recognition framework, THOR2, inspired by a human reasoning mechanism known as object unity. We interleave color embeddings obtained using the Mapper algorithm for topological soft clustering with the shape-based TOPS descriptor to obtain the TOPS2 descriptor. THOR2, trained using synthetic data, achieves substantially higher recognition accuracy than the shape-based THOR framework and outperforms RGB-D ViT on two real-world datasets: the benchmark OCID dataset and the UW-IS Occluded dataset. Therefore, THOR2 is a promising step toward achieving robust recognition in low-cost robots.
Ground segmentation, as the basic task of unmanned intelligent perception, provides an important support for the target detection task. Unstructured road scenes represented by open-pit mines have irregular boundary lines and uneven road surfaces, which lead to segmentation errors in current ground segmentation methods. To solve this problem, a ground segmentation method based on point cloud map is proposed, which involves three parts: region of interest extraction, point cloud registration and background subtraction. Firstly, establishing boundary semantic associations to obtain regions of interest in unstructured roads. Secondly, establishing the location association between point cloud map and the real-time point cloud of region of interest by semantics information. Thirdly, establishing a background model based on Gaussian distribution according to location association, and segments the ground in real-time point cloud by the background substraction method. Experimental results show that the correct segmentation rate of ground points is 99.95%, and the running time is 26ms. Compared with state of the art ground segmentation algorithm Patchwork++, the average accuracy of ground point segmentation is increased by 7.43%, and the running time is increased by 17ms. Furthermore, the proposed method is practically applied to unstructured road scenarios represented by open pit mines.
Autonomic computing investigates how systems can achieve (user) specified control outcomes on their own, without the intervention of a human operator. Autonomic computing fundamentals have been substantially influenced by those of control theory for closed and open-loop systems. In practice, complex systems may exhibit a number of concurrent and inter-dependent control loops. Despite research into autonomic models for managing computer resources, ranging from individual resources (e.g., web servers) to a resource ensemble (e.g., multiple resources within a data center), research into integrating Artificial Intelligence (AI) and Machine Learning (ML) to improve resource autonomy and performance at scale continues to be a fundamental challenge. The integration of AI/ML to achieve such autonomic and self-management of systems can be achieved at different levels of granularity, from full to human-in-the-loop automation. In this article, leading academics, researchers, practitioners, engineers, and scientists in the fields of cloud computing, AI/ML, and quantum computing join to discuss current research and potential future directions for these fields. Further, we discuss challenges and opportunities for leveraging AI and ML in next generation computing for emerging computing paradigms, including cloud, fog, edge, serverless and quantum computing environments.
Visual information extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection and recognition) and information extraction, which completely ignored the high correlation among them during optimization. In this paper, we propose a robust visual information extraction system (VIES) towards real-world scenarios, which is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction by taking a single document image as input and outputting the structured information. Specifically, the information extraction branch collects abundant visual and semantic representations from text spotting for multimodal feature fusion and conversely, provides higher-level semantic clues to contribute to the optimization of text spotting. Moreover, regarding the shortage of public benchmarks, we construct a fully-annotated dataset called EPHOIE (//github.com/HCIILAB/EPHOIE), which is the first Chinese benchmark for both text spotting and visual information extraction. EPHOIE consists of 1,494 images of examination paper head with complex layouts and background, including a total of 15,771 Chinese handwritten or printed text instances. Compared with the state-of-the-art methods, our VIES shows significant superior performance on the EPHOIE dataset and achieves a 9.01% F-score gain on the widely used SROIE dataset under the end-to-end scenario.