High-performance GPU-accelerated particle filter methods are critical for object detection applications, ranging from autonomous driving, robot localization, to time-series prediction. In this work, we investigate the design, development and optimization of particle-filter using half-precision on CUDA cores and compare their performance and accuracy with single- and double-precision baselines on Nvidia V100, A100, A40 and T4 GPUs. To mitigate numerical instability and precision losses, we introduce algorithmic changes in the particle filters. Using half-precision leads to a performance improvement of 1.5-2x and 2.5-4.6x with respect to single- and double-precision baselines respectively, at the cost of a relatively small loss of accuracy.
A vast number of applications for legged robots entail tasks in complex, dynamic environments. But these environments put legged robots at high risk for limb damage. This paper presents an empirical study of fault tolerant dynamic gaits designed for a quadrupedal robot suffering from a single, known "missing" limb. Preliminary data suggests that the featured gait controller successfully anchors a previously developed planar monopedal hopping template in the three-legged spatial machine. This compositional approach offers a useful and generalizable guide to the development of a wider range of tripedal recovery gaits for damaged quadrupedal machines.
We propose OptCtrlPoints, a data-driven framework designed to identify the optimal sparse set of control points for reproducing target shapes using biharmonic 3D shape deformation. Control-point-based 3D deformation methods are widely utilized for interactive shape editing, and their usability is enhanced when the control points are sparse yet strategically distributed across the shape. With this objective in mind, we introduce a data-driven approach that can determine the most suitable set of control points, assuming that we have a given set of possible shape variations. The challenges associated with this task primarily stem from the computationally demanding nature of the problem. Two main factors contribute to this complexity: solving a large linear system for the biharmonic weight computation and addressing the combinatorial problem of finding the optimal subset of mesh vertices. To overcome these challenges, we propose a reformulation of the biharmonic computation that reduces the matrix size, making it dependent on the number of control points rather than the number of vertices. Additionally, we present an efficient search algorithm that significantly reduces the time complexity while still delivering a nearly optimal solution. Experiments on SMPL, SMAL, and DeformingThings4D datasets demonstrate the efficacy of our method. Our control points achieve better template-to-target fit than FPS, random search, and neural-network-based prediction. We also highlight the significant reduction in computation time from days to approximately 3 minutes.
Modern agile software projects are subject to constant change, making it essential to re-asses overall delay risk throughout the project life cycle. Existing effort estimation models are static and not able to incorporate changes occurring during project execution. In this paper, we propose a dynamic model for continuously predicting overall delay using delay patterns and Bayesian modeling. The model incorporates the context of the project phase and learns from changes in team performance over time. We apply the approach to real-world data from 4,040 epics and 270 teams at ING. An empirical evaluation of our approach and comparison to the state-of-the-art demonstrate significant improvements in predictive accuracy. The dynamic model consistently outperforms static approaches and the state-of-the-art, even during early project phases.
As non-humanoid robots increasingly permeate various sectors, understanding their design implications for human acceptance becomes paramount. Despite their ubiquity, studies on how to optimize their design for better human interaction are sparse. Our investigation, conducted through two comprehensive surveys, addresses this gap. The first survey delineated correlations between robot behavioral and physical attributes, perceived occupation suitability, and gender attributions, suggesting that both design and perceived gender significantly influence acceptance. Survey 2 delved into the effects of varying gender cues on robot designs and their consequent impacts on human-robot interactions. Our findings highlighted that distinct gender cues can bolster or impede interaction comfort.
Presenting dynamic scenes without incurring motion artifacts visible to observers requires sustained effort from the display industry. A tool that predicts motion artifacts and simulates artifact elimination through optimizing the display configuration is highly desired to guide the design and manufacture of modern displays. Despite the popular demands, there is no such tool available in the market. In this study, we deliver an interactive toolkit, Binocular Perceived Motion Artifact Predictor (BiPMAP), as an executable file with GPU acceleration. BiPMAP accounts for an extensive collection of user-defined parameters and directly visualizes a variety of motion artifacts by presenting the perceived continuous and sampled moving stimuli side-by-side. For accurate artifact predictions, BiPMAP utilizes a novel model of the human contrast sensitivity function to effectively imitate the frequency modulation of the human visual system. In addition, BiPMAP is capable of deriving various in-plane motion artifacts for 2D displays and depth distortion in 3D stereoscopic displays.
Assigning repetitive and physically-demanding construction tasks to robots can alleviate human workers's exposure to occupational injuries. Transferring necessary dexterous and adaptive artisanal construction craft skills from workers to robots is crucial for the successful delegation of construction tasks and achieving high-quality robot-constructed work. Predefined motion planning scripts tend to generate rigid and collision-prone robotic behaviors in unstructured construction site environments. In contrast, Imitation Learning (IL) offers a more robust and flexible skill transfer scheme. However, the majority of IL algorithms rely on human workers to repeatedly demonstrate task performance at full scale, which can be counterproductive and infeasible in the case of construction work. To address this concern, this paper proposes an immersive, cloud robotics-based virtual demonstration framework that serves two primary purposes. First, it digitalizes the demonstration process, eliminating the need for repetitive physical manipulation of heavy construction objects. Second, it employs a federated collection of reusable demonstrations that are transferable for similar tasks in the future and can thus reduce the requirement for repetitive illustration of tasks by human agents. Additionally, to enhance the trustworthiness, explainability, and ethical soundness of the robot training, this framework utilizes a Hierarchical Imitation Learning (HIL) model to decompose human manipulation skills into sequential and reactive sub-skills. These two layers of skills are represented by deep generative models, enabling adaptive control of robot actions. By delegating the physical strains of construction work to human-trained robots, this framework promotes the inclusion of workers with diverse physical capabilities and educational backgrounds within the construction industry.
Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed -- this scheme cannot directly show a single \textit{incremental} translation to users. Further, this method lacks mechanisms for \textit{controlling} the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-$n$ policies for quality-latency control. We apply our framework to models trained for online or offline translation and demonstrate that both types can be effectively used in online mode. Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.
Explainable recommender systems (RS) have traditionally followed a one-size-fits-all approach, delivering the same explanation level of detail to each user, without considering their individual needs and goals. Further, explanations in RS have so far been presented mostly in a static and non-interactive manner. To fill these research gaps, we aim in this paper to adopt a user-centered, interactive explanation model that provides explanations with different levels of detail and empowers users to interact with, control, and personalize the explanations based on their needs and preferences. We followed a user-centered approach to design interactive explanations with three levels of detail (basic, intermediate, and advanced) and implemented them in the transparent Recommendation and Interest Modeling Application (RIMA). We conducted a qualitative user study (N=14) to investigate the impact of providing interactive explanations with varying level of details on the users' perception of the explainable RS. Our study showed qualitative evidence that fostering interaction and giving users control in deciding which explanation they would like to see can meet the demands of users with different needs, preferences, and goals, and consequently can have positive effects on different crucial aspects in explainable recommendation, including transparency, trust, satisfaction, and user experience.
While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.