亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The accurate segmentation of medical images is critical for various healthcare applications. Convolutional neural networks (CNNs), especially Fully Convolutional Networks (FCNs) like U-Net, have shown remarkable success in medical image segmentation tasks. However, they have limitations in capturing global context and long-range relations, especially for objects with significant variations in shape, scale, and texture. While transformers have achieved state-of-the-art results in natural language processing and image recognition, they face challenges in medical image segmentation due to image locality and translational invariance issues. To address these challenges, this paper proposes an innovative U-shaped network called BEFUnet, which enhances the fusion of body and edge information for precise medical image segmentation. The BEFUnet comprises three main modules, including a novel Local Cross-Attention Feature (LCAF) fusion module, a novel Double-Level Fusion (DLF) module, and dual-branch encoder. The dual-branch encoder consists of an edge encoder and a body encoder. The edge encoder employs PDC blocks for effective edge information extraction, while the body encoder uses the Swin Transformer to capture semantic information with global attention. The LCAF module efficiently fuses edge and body features by selectively performing local cross-attention on features that are spatially close between the two modalities. This local approach significantly reduces computational complexity compared to global cross-attention while ensuring accurate feature matching. BEFUnet demonstrates superior performance over existing methods across various evaluation metrics on medical image segmentation datasets.

相關內容

圖(tu)像分割就是把圖(tu)像分成若(ruo)干個特定的(de)(de)、具有獨(du)特性(xing)質的(de)(de)區(qu)域(yu)并提出(chu)感興趣目標(biao)的(de)(de)技術和過程。它是由圖(tu)像處理(li)到圖(tu)像分析(xi)的(de)(de)關鍵步驟。 所謂圖(tu)像分割指的(de)(de)是根據灰(hui)度、顏色、紋理(li)和形狀(zhuang)等特征(zheng)把圖(tu)像劃(hua)分成若(ruo)干互(hu)不(bu)交迭的(de)(de)區(qu)域(yu),并使這些特征(zheng)在同(tong)一(yi)區(qu)域(yu)內呈(cheng)現出(chu)相似性(xing),而在不(bu)同(tong)區(qu)域(yu)間呈(cheng)現出(chu)明顯(xian)的(de)(de)差(cha)異性(xing)。

知識薈萃

精品入門(men)和進(jin)階(jie)教程、論(lun)文和代碼(ma)整(zheng)理等

更多

查看相關VIP內(nei)容、論(lun)文、資(zi)訊等(deng)

High Dynamic Range (HDR) content (i.e., images and videos) has a broad range of applications. However, capturing HDR content from real-world scenes is expensive and time- consuming. Therefore, the challenging task of reconstructing visually accurate HDR images from their Low Dynamic Range (LDR) counterparts is gaining attention in the vision research community. A major challenge in this research problem is the lack of datasets, which capture diverse scene conditions (e.g., lighting, shadows, weather, locations, landscapes, objects, humans, buildings) and various image features (e.g., color, contrast, saturation, hue, luminance, brightness, radiance). To address this gap, in this paper, we introduce GTA-HDR, a large-scale synthetic dataset of photo-realistic HDR images sampled from the GTA-V video game. We perform thorough evaluation of the proposed dataset, which demonstrates significant qualitative and quantitative improvements of the state-of-the-art HDR image reconstruction methods. Furthermore, we demonstrate the effectiveness of the proposed dataset and its impact on additional computer vision tasks including 3D human pose estimation, human body part segmentation, and holistic scene segmentation. The dataset, data collection pipeline, and evaluation code are available at: //github.com/HrishavBakulBarua/GTA-HDR.

The burgeoning integration of 3D medical imaging into healthcare has led to a substantial increase in the workload of medical professionals. To assist clinicians in their diagnostic processes and alleviate their workload, the development of a robust system for retrieving similar case studies presents a viable solution. While the concept holds great promise, the field of 3D medical text-image retrieval is currently limited by the absence of robust evaluation benchmarks and curated datasets. To remedy this, our study presents a groundbreaking dataset, BIMCV-R (This dataset will be released upon acceptance.), which includes an extensive collection of 8,069 3D CT volumes, encompassing over 2 million slices, paired with their respective radiological reports. Expanding upon the foundational work of our dataset, we craft a retrieval strategy, MedFinder. This approach employs a dual-stream network architecture, harnessing the potential of large language models to advance the field of medical image retrieval beyond existing text-image retrieval solutions. It marks our preliminary step towards developing a system capable of facilitating text-to-image, image-to-text, and keyword-based retrieval tasks.

The advent of Vision Language Models (VLM) has allowed researchers to investigate the visual understanding of a neural network using natural language. Beyond object classification and detection, VLMs are capable of visual comprehension and common-sense reasoning. This naturally led to the question: How do VLMs respond when the image itself is inherently unreasonable? To this end, we present IllusionVQA: a diverse dataset of challenging optical illusions and hard-to-interpret scenes to test the capability of VLMs in two distinct multiple-choice VQA tasks - comprehension and soft localization. GPT4V, the best-performing VLM, achieves 62.99% accuracy (4-shot) on the comprehension task and 49.7% on the localization task (4-shot and Chain-of-Thought). Human evaluation reveals that humans achieve 91.03% and 100% accuracy in comprehension and localization. We discover that In-Context Learning (ICL) and Chain-of-Thought reasoning substantially degrade the performance of GeminiPro on the localization task. Tangentially, we discover a potential weakness in the ICL capabilities of VLMs: they fail to locate optical illusions even when the correct answer is in the context window as a few-shot example.

Mass casualty incidents (MCIs) pose a significant challenge to emergency medical services by overwhelming available resources and personnel. Effective victim assessment is the key to minimizing casualties during such a crisis. We introduce ARTEMIS, an AI-driven Robotic Triage Labeling and Emergency Medical Information System, to aid first responders in MCI events. It leverages speech processing, natural language processing, and deep learning to help with acuity classification. This is deployed on a quadruped that performs victim localization and preliminary injury severity assessment. First responders access victim information through a Graphical User Interface that is updated in real-time. To validate our proposed algorithmic triage protocol, we used the Unitree Go1 quadruped. The robot identifies humans, interacts with them, gets vitals and information, and assigns an acuity label. Simulations of an MCI in software and a controlled environment outdoors were conducted. The system achieved a triage-level classification precision of over 74% on average and 99% for the most critical victims, i.e. level 1 acuity, outperforming state-of-the-art deep learning-based triage labeling systems. In this paper, we showcase the potential of human-robot interaction in assisting medical personnel in MCI events.

Bipolar disorder (BD) and schizophrenia (SZ) are severe mental disorders with profound societal impact. Identifying risk markers early is crucial for understanding disease progression and enabling preventive measures. The Danish High Risk and Resilience Study (VIA) focuses on understanding early disease processes, particularly in children with familial high risk (FHR). Understanding structural brain changes associated with these diseases during early stages is essential for effective interventions. The central sulcus (CS) is a prominent brain landmark related to brain regions involved in motor and sensory processing. Analyzing CS morphology can provide valuable insights into neurodevelopmental abnormalities in the FHR group. However, segmenting the central sulcus (CS) presents challenges due to its variability, especially in adolescents. This study introduces two novel approaches to improve CS segmentation: synthetic data generation to model CS variability and self-supervised pre-training with multi-task learning to adapt models to new cohorts. These methods aim to enhance segmentation performance across diverse populations, eliminating the need for extensive preprocessing.

In the Industrial Internet of Things (IoT), a large amount of data will be generated every day. Due to privacy and security issues, it is difficult to collect all these data together to train deep learning models, thus the federated learning, a distributed machine learning paradigm that protects data privacy, has been widely used in IoT. However, in practical federated learning, the data distributions usually have large differences across devices, and the heterogeneity of data will deteriorate the performance of the model. Moreover, federated learning in IoT usually has a large number of devices involved in training, and the limited communication resource of cloud servers become a bottleneck for training. To address the above issues, in this paper, we combine centralized federated learning with decentralized federated learning to design a semi-decentralized cloud-edge-device hierarchical federated learning framework, which can mitigate the impact of data heterogeneity, and can be deployed at lage scale in IoT. To address the effect of data heterogeneity, we use an incremental subgradient optimization algorithm in each ring cluster to improve the generalization ability of the ring cluster models. Our extensive experiments show that our approach can effectively mitigate the impact of data heterogeneity and alleviate the communication bottleneck in cloud servers.

Normalization is known to help the optimization of deep neural networks. Curiously, different architectures require specialized normalization methods. In this paper, we study what normalization is effective for Graph Neural Networks (GNNs). First, we adapt and evaluate the existing methods from other domains to GNNs. Faster convergence is achieved with InstanceNorm compared to BatchNorm and LayerNorm. We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets. Second, we show that the shift operation in InstanceNorm results in an expressiveness degradation of GNNs for highly regular graphs. We address this issue by proposing GraphNorm with a learnable shift. Empirically, GNNs with GraphNorm converge faster compared to GNNs using other normalization. GraphNorm also improves the generalization of GNNs, achieving better performance on graph classification benchmarks.

Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms exiting models with state-of-the-art results.

Applying artificial intelligence techniques in medical imaging is one of the most promising areas in medicine. However, most of the recent success in this area highly relies on large amounts of carefully annotated data, whereas annotating medical images is a costly process. In this paper, we propose a novel method, called FocalMix, which, to the best of our knowledge, is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection. We conducted extensive experiments on two widely used datasets for lung nodule detection, LUNA16 and NLST. Results show that our proposed SSL methods can achieve a substantial improvement of up to 17.3% over state-of-the-art supervised learning approaches with 400 unlabeled CT scans.

With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose occupancy networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

北京阿比特科技有限公司