This paper presents a novel real-time, delay-aware cooperative perception system designed for intelligent mobility platforms operating in dynamic indoor environments. The system contains a network of multi-modal sensor nodes and a central node that collectively provide perception services to mobility platforms. The proposed Hierarchical Clustering Considering the Scanning Pattern and Ground Contacting Feature based Lidar Camera Fusion improve intra-node perception for crowded environment. The system also features delay-aware global perception to synchronize and aggregate data across nodes. To validate our approach, we introduced the Indoor Pedestrian Tracking dataset, compiled from data captured by two indoor sensor nodes. Our experiments, compared to baselines, demonstrate significant improvements in detection accuracy and robustness against delays. The dataset is available in the repository: //github.com/NingMingHao/MVSLab-IndoorCooperativePerception
This paper presents HyperGraphOS, a significant innovation in the domain of operating systems, specifically designed to address the needs of scientific and engineering domains. This platform aims to combine model-based engineering, graph modeling, data containers, and documents, along with tools for handling computational elements. HyperGraphOS functions as an Operating System offering to users an infinite workspace for creating and managing complex models represented as graphs with customizable semantics. By leveraging a web-based architecture, it requires only a modern web browser for access, allowing organization of knowledge, documents, and content into models represented in a network of workspaces. Elements of the workspace are defined in terms of domain-specific languages (DSLs). These DSLs are pivotal for navigating workspaces, generating code, triggering AI components, and organizing information and processes. The models' dual nature as both visual drawings and data structures allows dynamic modifications and inspections both interactively as well as programaticaly. We evaluated HyperGraphOS's efficiency and applicability across a large set of diverse domains, including the design and development of a virtual Avatar dialog system, a robotic task planner based on large language models (LLMs), a new meta-model for feature-based code development and many others. Our findings show that HyperGraphOS offers substantial benefits in the interaction with a computer as information system, as platoform for experiments and data analysis, as streamlined engineering processes, demonstrating enhanced flexibility in managing data, computation and documents, showing an innovative approaches to persistent desktop environments.
This paper proposes a novel interdisciplinary framework for the critical evaluation of text-to-image models, addressing the limitations of current technical metrics and bias studies. By integrating art historical analysis, artistic exploration, and critical prompt engineering, the framework offers a more nuanced understanding of these models' capabilities and societal implications. Art historical analysis provides a structured approach to examine visual and symbolic elements, revealing potential biases and misrepresentations. Artistic exploration, through creative experimentation, uncovers hidden potentials and limitations, prompting critical reflection on the algorithms' assumptions. Critical prompt engineering actively challenges the model's assumptions, exposing embedded biases. Case studies demonstrate the framework's practical application, showcasing how it can reveal biases related to gender, race, and cultural representation. This comprehensive approach not only enhances the evaluation of text-to-image models but also contributes to the development of more equitable, responsible, and culturally aware AI systems.
We present a novel approach for enhancing human-robot collaboration using physical interactions for real-time error correction of large language model (LLM) powered robots. Unlike other methods that rely on verbal or text commands, the robot leverages an LLM to proactively executes 6 DoF linear Dynamical System (DS) commands using a description of the scene in natural language. During motion, a human can provide physical corrections, used to re-estimate the desired intention, also parameterized by linear DS. This corrected DS can be converted to natural language and used as part of the prompt to improve future LLM interactions. We provide proof-of-concept result in a hybrid real+sim experiment, showcasing physical interaction as a new possibility for LLM powered human-robot interface.
Technical troubleshooting in enterprise environments often involves navigating diverse, heterogeneous data sources to resolve complex issues effectively. This paper presents a novel agentic AI solution built on a Weighted Retrieval-Augmented Generation (RAG) Framework tailored for enterprise technical troubleshooting. By dynamically weighting retrieval sources such as product manuals, internal knowledge bases, FAQs, and troubleshooting guides based on query context, the framework prioritizes the most relevant data. For instance, it gives precedence to product manuals for SKU-specific queries while incorporating general FAQs for broader issues. The system employs FAISS for efficient dense vector search, coupled with a dynamic aggregation mechanism to seamlessly integrate results from multiple sources. A Llama-based self-evaluator ensures the contextual accuracy and confidence of the generated responses before delivering them. This iterative cycle of retrieval and validation enhances precision, diversity, and reliability in response generation. Preliminary evaluations on large enterprise datasets demonstrate the framework's efficacy in improving troubleshooting accuracy, reducing resolution times, and adapting to varied technical challenges. Future research aims to enhance the framework by integrating advanced conversational AI capabilities, enabling more interactive and intuitive troubleshooting experiences. Efforts will also focus on refining the dynamic weighting mechanism through reinforcement learning to further optimize the relevance and precision of retrieved information. By incorporating these advancements, the proposed framework is poised to evolve into a comprehensive, autonomous AI solution, redefining technical service workflows across enterprise settings.
In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code. Exception handling mechanisms require developers to detect, capture, and manage exceptions according to high standards, but many developers struggle with these tasks, leading to fragile code. This problem is particularly evident in open-source projects and impacts the overall quality of the software ecosystem. To address this challenge, we explore the use of large language models (LLMs) to improve exception handling in code. Through extensive analysis, we identify three key issues: Insensitive Detection of Fragile Code, Inaccurate Capture of Exception Block, and Distorted Handling Solution. These problems are widespread across real world repositories, suggesting that robust exception handling practices are often overlooked or mishandled. In response, we propose Seeker, a multi-agent framework inspired by expert developer strategies for exception handling. Seeker uses agents: Scanner, Detector, Predator, Ranker, and Handler to assist LLMs in detecting, capturing, and resolving exceptions more effectively. Our work is the first systematic study on leveraging LLMs to enhance exception handling practices in real development scenarios, providing valuable insights for future improvements in code reliability.
Currently, the prevalence of online handwriting has spurred a critical need for effective retrieval systems to accurately search relevant handwriting instances from specific writers, known as online writer retrieval. Despite the growing demand, this field suffers from a scarcity of well-established methodologies and public large-scale datasets. This paper tackles these challenges with a focus on Chinese handwritten phrases. First, we propose DOLPHIN, a novel retrieval model designed to enhance handwriting representations through synergistic temporal-frequency analysis. For frequency feature learning, we propose the HFGA block, which performs gated cross-attention between the vanilla temporal handwriting sequence and its high-frequency sub-bands to amplify salient writing details. For temporal feature learning, we propose the CAIR block, tailored to promote channel interaction and reduce channel redundancy. Second, to address data deficit, we introduce OLIWER, a large-scale online writer retrieval dataset encompassing over 670,000 Chinese handwritten phrases from 1,731 individuals. Through extensive evaluations, we demonstrate the superior performance of DOLPHIN over existing methods. In addition, we explore cross-domain writer retrieval and reveal the pivotal role of increasing feature alignment in bridging the distributional gap between different handwriting data. Our findings emphasize the significance of point sampling frequency and pressure features in improving handwriting representation quality and retrieval performance. Code and dataset are available at //github.com/SCUT-DLVCLab/DOLPHIN.
Fog computing brings about a transformative shift in data management, presenting unprecedented opportunities for enhanced performance and reduced latency. However, one of the key aspects of fog computing revolves around ensuring efficient power and reliability management. To address this challenge, we have introduced a novel model that proposes a non-cooperative game theory-based strategy to strike a balance between power consumption and reliability in decision-making processes. Our proposed model capitalizes on the Cold Primary/Backup strategy (CPB) to guarantee reliability target by re-executing tasks to different nodes when a fault occurs, while also leveraging Dynamic Voltage and Frequency Scaling (DVFS) to reduce power consumption during task execution and maximizing overall efficiency. Non-cooperative game theory plays a pivotal role in our model, as it facilitates the development of strategies and solutions that uphold reliability while reducing power consumption. By treating the trade-off between power and reliability as a non-cooperative game, our proposed method yields significant energy savings, with up to a 35% reduction in energy consumption, 41% decrease in wait time, and 31% shorter completion time compared to state-of-the-art approaches. Our findings underscore the value of game theory in optimizing power and reliability within fog computing environments, demonstrating its potential for driving substantial improvements
This paper presents HyperGraphOS, a significant innovation in the domain of operating systems, specifically designed to address the needs of scientific and engineering domains. This platform aims to combine model-based engineering, graph modeling, data containers, and documents, along with tools for handling computational elements. HyperGraphOS functions as an Operating System offering to users an infinite workspace for creating and managing complex models represented as graphs with customizable semantics. By leveraging a web-based architecture, it requires only a modern web browser for access, allowing organization of knowledge, documents, and content into models represented in a network of workspaces. Elements of the workspace are defined in terms of domain-specific languages (DSLs). These DSLs are pivotal for navigating workspaces, generating code, triggering AI components, and organizing information and processes. The models' dual nature as both visual drawings and data structures allows dynamic modifications and inspections both interactively as well as programaticaly. We evaluated HyperGraphOS's efficiency and applicability across a large set of diverse domains, including the design and development of a virtual Avatar dialog system, a robotic task planner based on large language models (LLMs), a new meta-model for feature-based code development and many others. Our findings show that HyperGraphOS offers substantial benefits in the interaction with a computer as information system, as platoform for experiments and data analysis, as streamlined engineering processes, demonstrating enhanced flexibility in managing data, computation and documents, showing an innovative approaches to persistent desktop environments.
This paper introduces an online model for object detection in videos designed to run in real-time on low-powered mobile and embedded devices. Our approach combines fast single-image object detection with convolutional long short term memory (LSTM) layers to create an interweaved recurrent-convolutional architecture. Additionally, we propose an efficient Bottleneck-LSTM layer that significantly reduces computational cost compared to regular LSTMs. Our network achieves temporal awareness by using Bottleneck-LSTMs to refine and propagate feature maps across frames. This approach is substantially faster than existing detection methods in video, outperforming the fastest single-frame models in model size and computational cost while attaining accuracy comparable to much more expensive single-frame models on the Imagenet VID 2015 dataset. Our model reaches a real-time inference speed of up to 15 FPS on a mobile CPU.
Learning similarity functions between image pairs with deep neural networks yields highly correlated activations of embeddings. In this work, we show how to improve the robustness of such embeddings by exploiting the independence within ensembles. To this end, we divide the last embedding layer of a deep network into an embedding ensemble and formulate training this ensemble as an online gradient boosting problem. Each learner receives a reweighted training sample from the previous learners. Further, we propose two loss functions which increase the diversity in our ensemble. These loss functions can be applied either for weight initialization or during training. Together, our contributions leverage large embedding sizes more effectively by significantly reducing correlation of the embedding and consequently increase retrieval accuracy of the embedding. Our method works with any differentiable loss function and does not introduce any additional parameters during test time. We evaluate our metric learning method on image retrieval tasks and show that it improves over state-of-the-art methods on the CUB 200-2011, Cars-196, Stanford Online Products, In-Shop Clothes Retrieval and VehicleID datasets.