Multi-scenario route ranking (MSRR) is crucial in many industrial mapping systems. However, the industrial community mainly adopts interactive interfaces to encourage users to select pre-defined scenarios, which may hinder the downstream ranking performance. In addition, in the academic community, the multi-scenario ranking works only come from other fields, and there are no works specifically focusing on route data due to lacking a publicly available MSRR dataset. Moreover, all the existing multi-scenario works still fail to address the three specific challenges of MSRR simultaneously, i.e. explosion of scenario number, high entanglement, and high-capacity demand. Different from the prior, to address MSRR, our key idea is to factorize the complicated scenario in route ranking into several disentangled factor scenario patterns. Accordingly, we propose a novel method, Disentangled Scenario Factorization Network (DSFNet), which flexibly composes scenario-dependent parameters based on a high-capacity multi-factor-scenario-branch structure. Then, a novel regularization is proposed to induce the disentanglement of factor scenarios. Furthermore, two extra novel techniques, i.e. scenario-aware batch normalization and scenario-aware feature filtering, are developed to improve the network awareness of scenario representation. Additionally, to facilitate MSRR research in the academic community, we propose MSDR, the first large-scale publicly available annotated industrial Multi-Scenario Driving Route dataset. Comprehensive experimental results demonstrate the superiority of our DSFNet, which has been successfully deployed in AMap to serve the major online traffic.
Autonomous driving systems (ADS) have achieved remarkable progress in recent years. However, ensuring their safety and reliability remains a critical challenge due to the complexity and uncertainty of driving scenarios. In this paper, we focus on simulation testing for ADS, where generating diverse and effective testing scenarios is a central task. Existing fuzz testing methods face limitations, such as overlooking the temporal and spatial dynamics of scenarios and failing to leverage simulation feedback (e.g., speed, acceleration and heading) to guide scenario selection and mutation. To address these issues, we propose SimADFuzz, a novel framework designed to generate high-quality scenarios that reveal violations in ADS behavior. Specifically, SimADFuzz employs violation prediction models, which evaluate the likelihood of ADS violations, to optimize scenario selection. Moreover, SimADFuzz proposes distance-guided mutation strategies to enhance interactions among vehicles in offspring scenarios, thereby triggering more edge-case behaviors of vehicles. Comprehensive experiments demonstrate that SimADFuzz outperforms state-of-the-art fuzzers by identifying 32 more unique violations, including 4 reproducible cases of vehicle-vehicle and vehicle-pedestrian collisions. These results demonstrate SimADFuzz's effectiveness in enhancing the robustness and safety of autonomous driving systems.
Understanding sensor data can be challenging for non-experts because of the complexity and unique semantic meanings of sensor modalities. This calls for intuitive and effective methods to present sensor information. However, creating intuitive sensor data visualizations presents three key challenges: the variability of sensor readings, gaps in domain comprehension, and the dynamic nature of sensor data. To address these issues, we develop Vivar, a novel AR system that integrates multi-modal sensor data and presents 3D volumetric content for visualization. In particular, we introduce a cross-modal embedding approach that maps sensor data into a pre-trained visual embedding space through barycentric interpolation. This allows for accurate and continuous integration of multi-modal sensor information. Vivar also incorporates sensor-aware AR scene generation using foundation models and 3D Gaussian Splatting (3DGS) without requiring domain expertise. In addition, Vivar leverages latent reuse and caching strategies to accelerate 2D and AR content generation. Our extensive experiments demonstrate that our system achieves 11$\times$ latency reduction without compromising quality. A user study involving over 485 participants, including domain experts, demonstrates Vivar's effectiveness in accuracy, consistency, and real-world applicability, paving the way for more intuitive sensor data visualization.
To derive valuable insights from statistics, machine learning applications frequently analyze substantial amounts of data. In this work, we address the problem of designing efficient secure techniques to probe large datasets which allow a scientist to conduct large-scale medical studies over specific attributes of patients' records, while maintaining the privacy of his model. We introduce a set of composable homomorphic operations and show how to combine private functions evaluation with private thresholds via approximate fully homomorphic encryption. This allows us to design a new system named TETRIS, which solves the real-world use case of private functional exploration of large databases, where the statistical criteria remain private to the server owning the patients' records. Our experiments show that TETRIS achieves practical performance over a large dataset of patients even for the evaluation of elaborate statements composed of linear and nonlinear functions. It is possible to extract private insights from a database of hundreds of thousands of patient records within only a few minutes on a single thread, with an amortized time per database entry smaller than 2ms.
Automated driving systems are an integral part of the automotive industry. Tools such as Robot Operating System and simulators support their development. However, in the end, the developers must test their algorithms on a real vehicle. To better observe the difference between reality and simulation--the reality gap--digital twin technology offers real-time communication between the real vehicle and its model. We present low fidelity digital twin generator and describe situations where automatic generation is preferable to high fidelity simulation. We validated our approach of generating a virtual environment with a vehicle model by replaying the data recorded from the real vehicle.
Due to the sensitivity of data, federated learning (FL) is employed to enable distributed machine learning while safeguarding data privacy and accommodating the requirements of various devices. However, in the context of semi-decentralized federated learning (SD-FL), clients' communication and training states are dynamic. This variability arises from local training fluctuations, heterogeneous data distributions, and intermittent client participation. Most existing studies primarily focus on stable client states, neglecting the dynamic challenges present in real-world scenarios. To tackle this issue, we propose a trust-aware client scheduling mechanism (TRAIL) that assesses client states and contributions, enhancing model training efficiency through selective client participation. Our focus is on a semi-decentralized federated learning framework where edge servers and clients train a shared global model using unreliable intra-cluster model aggregation and inter-cluster model consensus. First, we develop an adaptive hidden semi-Markov model (AHSMM) to estimate clients' communication states and contributions. Next, we address a client-server association optimization problem to minimize global training loss. Using convergence analysis, we propose a greedy client scheduling algorithm. Finally, our experiments conducted on real-world datasets demonstrate that TRAIL outperforms state-of-the-art baselines, achieving an improvement of 8.7\% in test accuracy and a reduction of 15.3\% in training loss.
Fog computing brings about a transformative shift in data management, presenting unprecedented opportunities for enhanced performance and reduced latency. However, one of the key aspects of fog computing revolves around ensuring efficient power and reliability management. To address this challenge, we have introduced a novel model that proposes a non-cooperative game theory-based strategy to strike a balance between power consumption and reliability in decision-making processes. Our proposed model capitalizes on the Cold Primary/Backup strategy (CPB) to guarantee reliability target by re-executing tasks to different nodes when a fault occurs, while also leveraging Dynamic Voltage and Frequency Scaling (DVFS) to reduce power consumption during task execution and maximizing overall efficiency. Non-cooperative game theory plays a pivotal role in our model, as it facilitates the development of strategies and solutions that uphold reliability while reducing power consumption. By treating the trade-off between power and reliability as a non-cooperative game, our proposed method yields significant energy savings, with up to a 35% reduction in energy consumption, 41% decrease in wait time, and 31% shorter completion time compared to state-of-the-art approaches. Our findings underscore the value of game theory in optimizing power and reliability within fog computing environments, demonstrating its potential for driving substantial improvements
Large Language Models (LLMs) are increasingly employed in complex workflows, where different LLMs and fine-tuned variants collaboratively address complex tasks. However, these systems face significant inefficiencies due to redundant context processing of the shared context. We propose DroidSpeak, a framework that optimizes context sharing between fine-tuned LLMs derived from the same foundational model. DroidSpeak identifies critical layers in the KV cache and selectively recomputes them, enabling effective reuse of intermediate data while maintaining high accuracy. Our approach balances computational efficiency and task fidelity, significantly reducing inference latency and throughput bottlenecks. Experiments on diverse datasets and model pairs demonstrate that DroidSpeak achieves up to 3x higher throughputs and 2.6x faster prefill times with negligible accuracy loss compared to full recomputation.
As the capabilities of Large Language Models (LLMs) continue to advance, the field of patent processing has garnered increased attention within the natural language processing community. However, the majority of research has been concentrated on classification tasks, such as patent categorization and examination, or on short text generation tasks like patent summarization and patent quizzes. In this paper, we introduce a novel and practical task known as Draft2Patent, along with its corresponding D2P benchmark, which challenges LLMs to generate full-length patents averaging 17K tokens based on initial drafts. Patents present a significant challenge to LLMs due to their specialized nature, standardized terminology, and extensive length. We propose a multi-agent framework called AutoPatent which leverages the LLM-based planner agent, writer agents, and examiner agent with PGTree and RRAG to generate lengthy, intricate, and high-quality complete patent documents. The experimental results demonstrate that our AutoPatent framework significantly enhances the ability to generate comprehensive patents across various LLMs. Furthermore, we have discovered that patents generated solely with the AutoPatent framework based on the Qwen2.5-7B model outperform those produced by larger and more powerful LLMs, such as GPT-4o, Qwen2.5-72B, and LLAMA3.1-70B, in both objective metrics and human evaluations. We will make the data and code available upon acceptance at \url{//github.com/QiYao-Wang/AutoPatent}.
Conventionally, spatiotemporal modeling network and its complexity are the two most concentrated research topics in video action recognition. Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance. In this paper, we attempt to acquire both efficiency and effectiveness simultaneously. First of all, besides traditionally treating H x W x T video frames as space-time signal (viewing from the Height-Width spatial plane), we propose to also model video from the other two Height-Time and Width-Time planes, to capture the dynamics of video thoroughly. Secondly, our model is designed based on 2D CNN backbones and model complexity is well kept in mind by design. Specifically, we introduce a novel multi-view fusion (MVF) module to exploit video dynamics using separable convolution for efficiency. It is a plug-and-play module and can be inserted into off-the-shelf 2D CNNs to form a simple yet effective model called MVFNet. Moreover, MVFNet can be thought of as a generalized video modeling framework and it can specialize to be existing methods such as C2D, SlowOnly, and TSM under different settings. Extensive experiments are conducted on popular benchmarks (i.e., Something-Something V1 & V2, Kinetics, UCF-101, and HMDB-51) to show its superiority. The proposed MVFNet can achieve state-of-the-art performance with 2D CNN's complexity.
The cross-domain recommendation technique is an effective way of alleviating the data sparsity in recommender systems by leveraging the knowledge from relevant domains. Transfer learning is a class of algorithms underlying these techniques. In this paper, we propose a novel transfer learning approach for cross-domain recommendation by using neural networks as the base model. We assume that hidden layers in two base networks are connected by cross mappings, leading to the collaborative cross networks (CoNet). CoNet enables dual knowledge transfer across domains by introducing cross connections from one base network to another and vice versa. CoNet is achieved in multi-layer feedforward networks by adding dual connections and joint loss functions, which can be trained efficiently by back-propagation. The proposed model is evaluated on two real-world datasets and it outperforms baseline models by relative improvements of 3.56\% in MRR and 8.94\% in NDCG, respectively.