91精品综合久久久久久五月天_性爱视频免费试看网站_九九国产精品视频久久_亚洲丁香色婷婷综合在线观看_91亚洲国产在人线播放午夜_亚洲欧美久久精品中文字幕_国产日韩AA在线观看视频

Multi-modal Magnetic Resonance Imaging (MRI) plays an important role in clinical medicine. However, the acquisitions of some modalities, such as the T2-weighted modality, need a long time and they are always accompanied by motion artifacts. On the other hand, the T1-weighted image (T1WI) shares the same underlying information with T2-weighted image (T2WI), which needs a shorter scanning time. Therefore, in this paper we accelerate the acquisition of the T2WI by introducing the auxiliary modality (T1WI). Concretely, we first reconstruct high-quality T2WIs with under-sampled T2WIs. Here, we realize fast T2WI reconstruction by reducing the sampling rate in the k-space. Second, we establish a cross-modal synthesis task to generate the synthetic T2WIs for guiding better T2WI reconstruction. Here, we obtain the synthetic T2WIs by decomposing the whole cross-modal generation mapping into two OT processes, the spatial alignment mapping on the T1 image manifold and the cross-modal synthesis mapping from aligned T1WIs to T2WIs. It overcomes the negative transfer caused by the spatial misalignment. Then, we prove the reconstruction and the synthesis tasks are well complementary. Finally, we compare it with state-of-the-art approaches on an open dataset FastMRI and an in-house dataset to testify the validity of the proposed method.

相關內容

FAST

關注 2

FAST：Conference on File and Storage Technologies。 Explanation：文件和(he)存儲技(ji)術(shu)會(hui)議(yi)。 Publisher：USENIX。 SIT:

分解的 · 可理解性 · RGB-D · 表示 · 情景 ·

2023 年 6 月 21 日

Factored Neural Representation for Scene Understanding

Yu-Shiang Wong,Niloy J. Mitra

A long-standing goal in scene understanding is to obtain interpretable and editable representations that can be directly constructed from a raw monocular RGB-D video, without requiring specialized hardware setup or priors. The problem is significantly more challenging in the presence of multiple moving and/or deforming objects. Traditional methods have approached the setup with a mix of simplifications, scene priors, pretrained templates, or known deformation models. The advent of neural representations, especially neural implicit representations and radiance fields, opens the possibility of end-to-end optimization to collectively capture geometry, appearance, and object motion. However, current approaches produce global scene encoding, assume multiview capture with limited or no motion in the scenes, and do not facilitate easy manipulation beyond novel view synthesis. In this work, we introduce a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural presentations with an explicit encoding of object movement (e.g., rigid trajectory) and/or deformations (e.g., nonrigid movement). We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable (e.g., change object trajectory). Code and data are available at //geometry.cs.ucl.ac.uk/projects/2023/factorednerf .

變換 · 圖片分類 · 多峰值 · Performer · MoDELS ·

2023 年 6 月 20 日

Multimodal Fusion Transformer for Remote Sensing Image Classification

Swalpa Kumar Roy,Ankur Deria,Danfeng Hong,Behnood Rasti,Antonio Plaza,Jocelyn Chanussot

from arxiv, Published in IEEE Transactions on Geoscience and Remote Sensing

Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar transformers use an external classification (CLS) token which is randomly initialized and often fails to generalize well, whereas other sources of multimodal datasets, such as light detection and ranging (LiDAR) offer the potential to improve these models by means of a CLS. In this paper, we introduce a new multimodal fusion transformer (MFT) network which comprises a multihead cross patch attention (mCrossPA) for HSI land-cover classification. Our mCrossPA utilizes other sources of complementary information in addition to the HSI in the transformer encoder to achieve better generalization. The concept of tokenization is used to generate CLS and HSI patch tokens, helping to learn a {distinctive representation} in a reduced and hierarchical feature space. Extensive experiments are carried out on {widely used benchmark} datasets {i.e.,} the University of Houston, Trento, University of Southern Mississippi Gulfpark (MUUFL), and Augsburg. We compare the results of the proposed MFT model with other state-of-the-art transformers, classical CNNs, and conventional classifiers models. The superior performance achieved by the proposed model is due to the use of multihead cross patch attention. The source code will be made available publicly at \url{//github.com/AnkurDeria/MFT}.}

2023 年 6 月 20 日

CAMP-Net: Context-Aware Multi-Prior Network for Accelerated MRI Reconstruction

Liping Zhang,Xiaobo Li,Weitian Chen

Despite promising advances in deep learning-based MRI reconstruction methods, restoring high-frequency image details and textures remains a challenging problem for accelerated MRI. To tackle this challenge, we propose a novel context-aware multi-prior network (CAMP-Net) for MRI reconstruction. CAMP-Net leverages the complementary nature of multiple prior knowledge and explores data redundancy between adjacent slices in the hybrid domain to improve image quality. It incorporates three interleaved modules respectively for image enhancement, k-space restoration, and calibration consistency to jointly learn context-aware multiple priors in an end-to-end fashion. The image enhancement module learns a coil-combined image prior to suppress noise-like artifacts, while the k-space restoration module explores multi-coil k-space correlations to recover high-frequency details. The calibration consistency module embeds the known physical properties of MRI acquisition to ensure consistency of k-space correlations extracted from measurements and the artifact-free image intermediate. The resulting low- and high-frequency reconstructions are hierarchically aggregated in a frequency fusion module and iteratively refined to progressively reconstruct the final image. We evaluated the generalizability and robustness of our method on three large public datasets with various accelerations and sampling patterns. Comprehensive experiments demonstrate that CAMP-Net outperforms state-of-the-art methods in terms of reconstruction quality and quantitative $T_2$ mapping.

正則化項 · 噪聲 · 樣例 · 欠定的 · 可約的 ·

2023 年 6 月 18 日

Data-proximal complementary $\ell^1$-TV reconstruction for limited data CT

Simon G?ppel,Jürgen Frikel,Markus Haltmeier

In a number of tomographic applications, data cannot be fully acquired, resulting in a severely underdetermined image reconstruction. In such cases, conventional methods lead to reconstructions with significant artifacts. To overcome these artifacts, regularization methods are applied that incorporate additional information. An important example is TV reconstruction, which is known to be efficient at compensating for missing data and reducing reconstruction artifacts. At the same time, however, tomographic data is also contaminated by noise, which poses an additional challenge. The use of a single regularizer must therefore account for both the missing data and the noise. However, a particular regularizer may not be ideal for both tasks. For example, the TV regularizer is a poor choice for noise reduction across multiple scales, in which case $\ell^1$ curvelet regularization methods are well suited. To address this issue, in this paper we introduce a novel variational regularization framework that combines the advantages of different regularizers. The basic idea of our framework is to perform reconstruction in two stages, where the first stage mainly aims at accurate reconstruction in the presence of noise, and the second stage aims at artifact reduction. Both reconstruction stages are connected by a data proximity condition. The proposed method is implemented and tested for limited-view CT using a combined curvelet-TV approach. We define and implement a curvelet transform adapted to the limited-view problem and illustrate the advantages of our approach in numerical experiments.

正則化項 · Learning · 壓縮感知 · 無監督 · 噪聲 ·

2023 年 6 月 16 日

Compressed Sensing MRI Reconstruction Regularized by VAEs with Structured Image Covariance

Margaret Duff,Ivor J. A. Simpson,Matthias J. Ehrhardt,Neill D. F. Campbell

Objective: This paper investigates how generative models, trained on ground-truth images, can be used \changes{as} priors for inverse problems, penalizing reconstructions far from images the generator can produce. The aim is that learned regularization will provide complex data-driven priors to inverse problems while still retaining the control and insight of a variational regularization method. Moreover, unsupervised learning, without paired training data, allows the learned regularizer to remain flexible to changes in the forward problem such as noise level, sampling pattern or coil sensitivities in MRI. Approach: We utilize variational autoencoders (VAEs) that generate not only an image but also a covariance uncertainty matrix for each image. The covariance can model changing uncertainty dependencies caused by structure in the image, such as edges or objects, and provides a new distance metric from the manifold of learned images. Main results: We evaluate these novel generative regularizers on retrospectively sub-sampled real-valued MRI measurements from the fastMRI dataset. We compare our proposed learned regularization against other unlearned regularization approaches and unsupervised and supervised deep learning methods. Significance: Our results show that the proposed method is competitive with other state-of-the-art methods and behaves consistently with changing sampling patterns and noise levels.

評論員 · 圖 · 潛在 · 表示 · INFORMS ·

2023 年 6 月 16 日

Latent Graph Representations for Critical View of Safety Assessment

Aditya Murali,Deepak Alapatt,Pietro Mascagni,Armine Vardazaryan,Alain Garcia,Nariaki Okamoto,Didier Mutter,Nicolas Padoy

from arxiv, 12 pages, 4 figures

Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. Prior works have approached this task by including semantic segmentation as an intermediate step, using predicted segmentation masks to then predict the CVS. While these methods are effective, they rely on extremely expensive ground-truth segmentation annotations and tend to fail when the predicted segmentation is incorrect, limiting generalization. In this work, we propose a method for CVS prediction wherein we first represent a surgical image using a disentangled latent scene graph, then process this representation using a graph neural network. Our graph representations explicitly encode semantic information - object location, class information, geometric relations - to improve anatomy-driven reasoning, as well as visual features to retain differentiability and thereby provide robustness to semantic errors. Finally, to address annotation cost, we propose to train our method using only bounding box annotations, incorporating an auxiliary image reconstruction objective to learn fine-grained object boundaries. We show that our method not only outperforms several baseline methods when trained with bounding box annotations, but also scales effectively when trained with segmentation masks, maintaining state-of-the-art performance.

矩 · 離散化 · 線性的 · 講稿 · 分解的 ·

2023 年 6 月 15 日

High-Order Finite Element Second Moment Methods for Linear Transport

Samuel Olivier,Terry S. Haut

We present high-order, finite element-based Second Moment Methods (SMMs) for solving radiation transport problems in two spatial dimensions. We leverage the close connection between the Variable Eddington Factor (VEF) method and SMM to convert existing discretizations of the VEF moment system to discretizations of the SMM moment system. The moment discretizations are coupled to a high-order Discontinuous Galerkin discretization of the Discrete Ordinates transport equations. We show that the resulting methods achieve high-order accuracy on high-order (curved) meshes, preserve the thick diffusion limit, and are effective on a challenging multi-material problem both in outer fixed-point iterations and in inner preconditioned iterative solver iterations for the discrete moment systems. We also present parallel scaling results and provide direct comparisons to the VEF algorithms the SMM algorithms were derived from.

模態 · 潛在 · 正則化 · 損失 · Learning ·

2023 年 3 月 10 日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Qian Jiang,Changyou Chen,Han Zhao,Liqun Chen,Qing Ping,Son Dinh Tran,Yi Xu,Belinda Zeng,Trishul Chilimbi

from arxiv, 14 pages, 8 figure, CVPR 2023 accepted

Contrastive loss has been increasingly used in learning representations from multiple modalities. In the limit, the nature of the contrastive loss encourages modalities to exactly match each other in the latent space. Yet it remains an open question how the modality alignment affects the downstream task performance. In this paper, based on an information-theoretic argument, we first prove that exact modality alignment is sub-optimal in general for downstream prediction tasks. Hence we advocate that the key of better performance lies in meaningful latent modality structures instead of perfect modality alignment. To this end, we propose three general approaches to construct latent modality structures. Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization. Extensive experiments are conducted on two popular multi-modal representation learning frameworks: the CLIP-based two-tower model and the ALBEF-based fusion model. We test our model on a variety of tasks including zero/few-shot image classification, image-text retrieval, visual question answering, visual reasoning, and visual entailment. Our method achieves consistent improvements over existing methods, demonstrating the effectiveness and generalizability of our proposed approach on latent modality structure regularization.

貪心 · 模態 · MoDELS · 學成 · 泛化理論 ·

2022 年 2 月 10 日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Nan Wu,Stanis?aw Jastrz?bski,Kyunghyun Cho,Krzysztof J. Geras

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.

數據增強 · 泛化理論 · 矩 · 規范化的 · surge ·

2020 年 2 月 25 日

On Feature Normalization and Data Augmentation

Boyi Li,Felix Wu,Ser-Nam Lim,Serge Belongie,Kilian Q. Weinberger

Modern neural network training relies heavily on data augmentation for improved generalization. After the initial success of label-preserving augmentations, there has been a recent surge of interest in label-perturbing approaches, which combine features and labels across training samples to smooth the learned decision surface. In this paper, we propose a new augmentation method that leverages the first and second moments extracted and re-injected by feature normalization. We replace the moments of the learned features of one training image by those of another, and also interpolate the target labels. As our approach is fast, operates entirely in feature space, and mixes different signals than prior methods, one can effectively combine it with existing augmentation methods. We demonstrate its efficacy across benchmark data sets in computer vision, speech, and natural language processing, where it consistently improves the generalization performance of highly competitive baseline networks.