Integrating textual data with imaging in liver tumor segmentation is essential for enhancing diagnostic accuracy. However, current multi-modal medical datasets offer only general text annotations, lacking lesion-specific details critical for extracting nuanced features, especially for fine-grained segmentation of tumor boundaries and small lesions. To address these limitations, we developed datasets with lesion-specific text annotations for liver tumors and introduced the TexLiverNet model. TexLiverNet employs an agent-based cross-attention module that integrates text features efficiently with visual features, significantly reducing computational costs. Additionally, enhanced spatial and adaptive frequency domain perception is proposed to precisely delineate lesion boundaries, reduce background interference, and recover fine details in small lesions. Comprehensive evaluations on public and private datasets demonstrate that TexLiverNet achieves superior performance compared to current state-of-the-art methods.
Crack segmentation plays a crucial role in ensuring the structural integrity and seismic safety of civil structures. However, existing crack segmentation algorithms encounter challenges in maintaining accuracy with domain shifts across datasets. To address this issue, we propose a novel deep network that employs incremental training with unsupervised domain adaptation (UDA) using adversarial learning, without a significant drop in accuracy in the source domain. Our approach leverages an encoder-decoder architecture, consisting of both domain-invariant and domain-specific parameters. The encoder learns shared crack features across all domains, ensuring robustness to domain variations. Simultaneously, the decoder's domain-specific parameters capture domain-specific features unique to each domain. By combining these components, our model achieves improved crack segmentation performance. Furthermore, we introduce BuildCrack, a new crack dataset comparable to sub-datasets of the well-established CrackSeg9K dataset in terms of image count and crack percentage. We evaluate our proposed approach against state-of-the-art UDA methods using different sub-datasets of CrackSeg9K and our custom dataset. Our experimental results demonstrate a significant improvement in crack segmentation accuracy and generalization across target domains compared to other UDA methods - specifically, an improvement of 0.65 and 2.7 mIoU on source and target domains respectively.
In the era of increasing privacy concerns and demand for personalized experiences, traditional Reinforcement Learning with Human Feedback (RLHF) frameworks face significant challenges due to their reliance on centralized data. We introduce Federated Reinforcement Learning with Human Feedback (FedRLHF), a novel framework that decentralizes the RLHF process. FedRLHF enables collaborative policy learning across multiple clients without necessitating the sharing of raw data or human feedback, thereby ensuring robust privacy preservation. Leveraging federated reinforcement learning, each client integrates human feedback locally into their reward functions and updates their policies through personalized RLHF processes. We establish rigorous theoretical foundations for FedRLHF, providing convergence guarantees, and deriving sample complexity bounds that scale efficiently with the number of clients. Empirical evaluations on the MovieLens and IMDb datasets demonstrate that FedRLHF not only preserves user privacy but also achieves performance on par with centralized RLHF, while enhancing personalization across diverse client environments.
3D transesophageal echocardiography (3DTEE), is the recommended method for diagnosing mitral regurgitation (MR). 3DTEE provides a high-quality 3D image of the mitral valve (MV), allowing for precise segmentation and measurement of the regurgitant valve anatomy. However, manual TEE segmentations are time-consuming and prone to intra-operator variability, affecting the reliability of the measurements. To address this, we developed a fully automated pipeline using a 3D convolutional neural network (CNN) to segment MV substructures (annulus, anterior leaflet, and posterior leaflet) and quantify MV anatomy. The 3D CNN, based on a multi-decoder residual U-Net architecture, was trained and tested on a dataset comprising 100 3DTEE images with corresponding segmentations. Within the pipeline, a custom algorithm refines the CNN-based segmentations and extracts MV models, from which anatomical landmarks and features are quantified. The accuracy of the proposed method was assessed using Dice score and mean surface distance (MSD) against ground truth segmentations, and the extracted anatomical parameters were compared against a semiautomated commercial software TomTec Image Arena. The trained 3D CNN achieved an average Dice score of 0.79 and MSD of 0.47 mm for the combined segmentation of the annulus, anterior and posterior leaflet. The proposed CNN architecture outperformed a baseline residual U-Net architecture in MV substructure segmentation, and the refinement of the predicted annulus segmentation improved MSD by 8.36%. The annular and leaflet linear measurements differed by less than 7.94 mm and 3.67 mm, respectively, compared to the 3D measurements obtained with TomTec Image Arena. The proposed pipeline was faster than the commercial software, with a modeling time of 12.54 s and a quantification time of 54.42 s.
In a real-world RAG system, the current query often involves spoken ellipses and ambiguous references from dialogue contexts, necessitating query rewriting to better describe user's information needs. However, traditional context-based rewriting has minimal enhancement on downstream generation tasks due to the lengthy process from query rewriting to response generation. Some researchers try to utilize reinforcement learning with generation feedback to assist the rewriter, but these sparse rewards provide little guidance in most cases, leading to unstable training and generation results. We find that user's needs are also reflected in the gold document, retrieved documents and ground truth. Therefore, by feeding back these multi-aspect dense rewards to query rewriting, more stable and satisfactory responses can be achieved. In this paper, we propose a novel query rewriting method MaFeRw, which improves RAG performance by integrating multi-aspect feedback from both the retrieval process and generated results. Specifically, we first use manual data to train a T5 model for the rewriter initialization. Next, we design three metrics as reinforcement learning feedback: the similarity between the rewritten query and the gold document, the ranking metrics, and ROUGE between the generation and the ground truth. Inspired by RLAIF, we train three kinds of reward models for the above metrics to achieve more efficient training. Finally, we combine the scores of these reward models as feedback, and use PPO algorithm to explore the optimal query rewriting strategy. Experimental results on two conversational RAG datasets demonstrate that MaFeRw achieves superior generation metrics and more stable training compared to baselines.
Data races are critical issues in multithreaded program, leading to unpredictable, catastrophic and difficult-to-diagnose problems. Despite the extensive in-house testing, data races often escape to deployed software and manifest in production runs. Existing approaches suffer from either prohibitively high runtime overhead or incomplete detection capability. In this paper, we introduce HardRace, a data race monitor to detect races on-the-fly while with sufficiently low runtime overhead and high detection capability. HardRace firstly employs sound static analysis to determine a minimal set of essential memory accesses relevant to data races. It then leverages hardware trace instruction, i.e., Intel PTWRITE, to selectively record only these memory accesses and thread synchronization events during execution with negligible runtime overhead. Given the tracing data, HardRace performs standard data race detection algorithms to timely report potential races occurred in production runs. The experimental evaluations show that HardRace outperforms state-of-the-art tools like ProRace and Kard in terms of both runtime overhead and detection capability -- HardRace can detect all kinds of data races in read-world applications while maintaining a negligible overhead, less than 2% on average.
Sign Language Production (SLP) aims to generate semantically consistent sign videos from textual statements, where the conversion from textual glosses to sign poses (G2P) is a crucial step. Existing G2P methods typically treat sign poses as discrete three-dimensional coordinates and directly fit them, which overlooks the relative positional relationships among joints. To this end, we provide a new perspective, constraining joint associations and gesture details by modeling the limb bones to improve the accuracy and naturalness of the generated poses. In this work, we propose a pioneering iconicity disentangled diffusion framework, termed Sign-IDD, specifically designed for SLP. Sign-IDD incorporates a novel Iconicity Disentanglement (ID) module to bridge the gap between relative positions among joints. The ID module disentangles the conventional 3D joint representation into a 4D bone representation, comprising the 3D spatial direction vector and 1D spatial distance vector between adjacent joints. Additionally, an Attribute Controllable Diffusion (ACD) module is introduced to further constrain joint associations, in which the attribute separation layer aims to separate the bone direction and length attributes, and the attribute control layer is designed to guide the pose generation by leveraging the above attributes. The ACD module utilizes the gloss embeddings as semantic conditions and finally generates sign poses from noise embeddings. Extensive experiments on PHOENIX14T and USTC-CSL datasets validate the effectiveness of our method. The code is available at: //github.com/NaVi-start/Sign-IDD.
The rapid evolution of Internet of Things (IoT) environments has created an urgent need for secure and trustworthy distributed computing systems, particularly when dealing with heterogeneous devices and applications where centralized trust cannot be assumed. This paper proposes TrustMesh, a novel blockchain-enabled framework that addresses these challenges through a unique three-layer architecture combining permissioned blockchain technology with a novel multi-phase Practical Byzantine Fault Tolerance (PBFT) consensus protocol. The key innovation lies in TrustMesh's ability to support non-deterministic scheduling algorithms while maintaining Byzantine fault tolerance - features traditionally considered mutually exclusive in blockchain systems. The framework supports a sophisticated resource management approach that enables flexible scheduling decisions while preserving the security guarantees of blockchain-based verification. Our experimental evaluation using a real-world cold chain monitoring scenario demonstrates that TrustMesh successfully maintains Byzantine fault tolerance with fault detection latencies under 150 milliseconds, while maintaining consistent framework overhead across varying computational workloads even with network scaling. These results establish TrustMesh's effectiveness in balancing security, performance, and flexibility requirements in trustless IoT environments, advancing the state-of-the-art in secure distributed computing frameworks.
Normalization is known to help the optimization of deep neural networks. Curiously, different architectures require specialized normalization methods. In this paper, we study what normalization is effective for Graph Neural Networks (GNNs). First, we adapt and evaluate the existing methods from other domains to GNNs. Faster convergence is achieved with InstanceNorm compared to BatchNorm and LayerNorm. We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets. Second, we show that the shift operation in InstanceNorm results in an expressiveness degradation of GNNs for highly regular graphs. We address this issue by proposing GraphNorm with a learnable shift. Empirically, GNNs with GraphNorm converge faster compared to GNNs using other normalization. GraphNorm also improves the generalization of GNNs, achieving better performance on graph classification benchmarks.
Applying artificial intelligence techniques in medical imaging is one of the most promising areas in medicine. However, most of the recent success in this area highly relies on large amounts of carefully annotated data, whereas annotating medical images is a costly process. In this paper, we propose a novel method, called FocalMix, which, to the best of our knowledge, is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection. We conducted extensive experiments on two widely used datasets for lung nodule detection, LUNA16 and NLST. Results show that our proposed SSL methods can achieve a substantial improvement of up to 17.3% over state-of-the-art supervised learning approaches with 400 unlabeled CT scans.
With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose occupancy networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.