亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Although existing deep learning compressed-sensing-based Magnetic Resonance Imaging (CS-MRI) methods have achieved considerably impressive performance, explainability and generalizability continue to be challenging for such methods since the transition from mathematical analysis to network design not always natural enough, often most of them are not flexible enough to handle multi-sampling-ratio reconstruction assignments. {In this work, to tackle explainability and generalizability, we propose a unifying deep unfolding multi-sampling-ratio interpretable CS-MRI framework.} The combined approach offers more generalizability than previous works whereas deep learning gains explainability through a geometric prior module. Inspired by the multigrid algorithm, we first embed the CS-MRI-based optimization algorithm into correction-distillation scheme that consists of three ingredients: pre-relaxation module, correction module and geometric prior distillation module. Furthermore, we employ a condition module to learn adaptively step-length and noise level, which enables the proposed framework to jointly train multi-ratio tasks through a single model. { The proposed model not only compensates for the lost contextual information of reconstructed image which is refined from low frequency error in geometric characteristic k-space}, but also integrates the theoretical guarantee of model-based methods and the superior reconstruction performances of deep learning-based methods. Therefore, it can give us a novel perspective to design biomedical imaging networks. { Numerical experiments show that our framework outperforms state-of-the-art methods in terms of qualitative and quantitative evaluations.} {Our method achieves 3.18 dB improvement at low CS ratio 10\% and average 1.42 dB improvement over other comparison methods on brain dataset using Cartesian sampling mask.

相關內容

Networking:IFIP International Conferences on Networking。 Explanation:國際網(wang)絡會議。 Publisher:IFIP。 SIT:

With the widespread popularity of user-generated short videos, it becomes increasingly challenging for content creators to promote their content to potential viewers. Automatically generating appealing titles and covers for short videos can help grab viewers' attention. Existing studies on video captioning mostly focus on generating factual descriptions of actions, which do not conform to video titles intended for catching viewer attention. Furthermore, research for cover selection based on multimodal information is sparse. These problems motivate the need for tailored methods to specifically support the joint task of short video title generation and cover selection (TG-CS) as well as the demand for creating corresponding datasets to support the studies. In this paper, we first collect and present a real-world dataset named Short Video Title Generation (SVTG) that contains videos with appealing titles and covers. We then propose a Title generation and Cover selection with attention Refinement (TCR) method for TG-CS. The refinement procedure progressively selects high-quality samples and highly relevant frames and text tokens within each sample to refine model training. Extensive experiments show that our TCR method is superior to various existing video captioning methods in generating titles and is able to select better covers for noisy real-world short videos.

Detailed 3D reconstruction and photo-realistic relighting of digital humans are essential for various applications. To this end, we propose a novel sparse-view 3d human reconstruction framework that closely incorporates the occupancy field and albedo field with an additional visibility field--it not only resolves occlusion ambiguity in multiview feature aggregation, but can also be used to evaluate light attenuation for self-shadowed relighting. To enhance its training viability and efficiency, we discretize visibility onto a fixed set of sample directions and supply it with coupled geometric 3D depth feature and local 2D image feature. We further propose a novel rendering-inspired loss, namely TransferLoss, to implicitly enforce the alignment between visibility and occupancy field, enabling end-to-end joint training. Results and extensive experiments demonstrate the effectiveness of the proposed method, as it surpasses state-of-the-art in terms of reconstruction accuracy while achieving comparably accurate relighting to ray-traced ground truth.

For years, Single Image Super Resolution (SISR) has been an interesting and ill-posed problem in computer vision. The traditional super-resolution (SR) imaging approaches involve interpolation, reconstruction, and learning-based methods. Interpolation methods are fast and uncomplicated to compute, but they are not so accurate and reliable. Reconstruction-based methods are better compared with interpolation methods, but they are time-consuming and the quality degrades as the scaling increases. Even though learning-based methods like Markov random chains are far better than all the previous ones, they are unable to match the performance of deep learning models for SISR. This study examines the Residual Dense Networks architecture proposed by Yhang et al. [17] and analyzes the importance of its components. By leveraging hierarchical features from original low-resolution (LR) images, this architecture achieves superior performance, with a network structure comprising four main blocks, including the residual dense block (RDB) as the core. Through investigations of each block and analyses using various loss metrics, the study evaluates the effectiveness of the architecture and compares it to other state-of-the-art models that differ in both architecture and components.

In a variety of tomographic applications, data cannot be fully acquired, leading to a severely underdetermined image reconstruction problem. In such cases, conventional methods generate reconstructions with significant artifacts. In order to remove these artifacts, regularization methods must be applied that beneficially incorporate additional information. An important example of such methods is TV reconstruction. It is well-known that this technique can efficiently compensate for the missing data and reduce reconstruction artifacts. At the same time, however, tomographic data is also contaminated by noise, which poses an additional challenge. The use of a single penalty term (regularizer) within a variational regularization framework must therefore account for both, the missing data and the noise. However, a single regularizer may not be ideal for both tasks. For example, the TV regularizer is a poor choice for noise reduction across different scales, in which case $\ell^1$-curvelet regularization methods work well. To address this issue, in this paper we introduce a novel variational regularization framework that combines the advantages of two different regularizers. The basic idea of our framework is to perform reconstruction in two stages, where the first stage mainly aims at accurate reconstruction in the presence of noise, and the second stage aims at artifact reduction. Both reconstruction stages are connected by a data consistency condition, which makes them close to each other in the data domain. The proposed method is implemented and tested for limited view CT using a combined curvelet-TV-approach. To this end, we define and implement a curvelet transform adapted to the limited view problem and demonstrate the advantages of our approach in a series of numerical experiments in this context.

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

Video captioning is a challenging task that requires a deep understanding of visual scenes. State-of-the-art methods generate captions using either scene-level or object-level information but without explicitly modeling object interactions. Thus, they often fail to make visually grounded predictions, and are sensitive to spurious correlations. In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time. Our model builds interpretable links and is able to provide explicit visual grounding. To avoid unstable performance caused by the variable number of objects, we further propose an object-aware knowledge distillation mechanism, in which local object information is used to regularize global scene features. We demonstrate the efficacy of our approach through extensive experiments on two benchmarks, showing our approach yields competitive performance with interpretable predictions.

This paper proposes a generic method to learn interpretable convolutional filters in a deep convolutional neural network (CNN) for object classification, where each interpretable filter encodes features of a specific object part. Our method does not require additional annotations of object parts or textures for supervision. Instead, we use the same training data as traditional CNNs. Our method automatically assigns each interpretable filter in a high conv-layer with an object part of a certain category during the learning process. Such explicit knowledge representations in conv-layers of CNN help people clarify the logic encoded in the CNN, i.e., answering what patterns the CNN extracts from an input image and uses for prediction. We have tested our method using different benchmark CNNs with various structures to demonstrate the broad applicability of our method. Experiments have shown that our interpretable filters are much more semantically meaningful than traditional filters.

Graph Neural Networks (GNNs), which generalize deep neural networks to graph-structured data, have drawn considerable attention and achieved state-of-the-art performance in numerous graph related tasks. However, existing GNN models mainly focus on designing graph convolution operations. The graph pooling (or downsampling) operations, that play an important role in learning hierarchical representations, are usually overlooked. In this paper, we propose a novel graph pooling operator, called Hierarchical Graph Pooling with Structure Learning (HGP-SL), which can be integrated into various graph neural network architectures. HGP-SL incorporates graph pooling and structure learning into a unified module to generate hierarchical representations of graphs. More specifically, the graph pooling operation adaptively selects a subset of nodes to form an induced subgraph for the subsequent layers. To preserve the integrity of graph's topological information, we further introduce a structure learning mechanism to learn a refined graph structure for the pooled graph at each layer. By combining HGP-SL operator with graph neural networks, we perform graph level representation learning with focus on graph classification task. Experimental results on six widely used benchmarks demonstrate the effectiveness of our proposed model.

Deep learning has revolutionized speech recognition, image recognition, and natural language processing since 2010, each involving a single modality in the input signal. However, many applications in artificial intelligence involve more than one modality. It is therefore of broad interest to study the more difficult and complex problem of modeling and learning across multiple modalities. In this paper, a technical review of the models and learning methods for multimodal intelligence is provided. The main focus is the combination of vision and natural language, which has become an important area in both computer vision and natural language processing research communities. This review provides a comprehensive analysis of recent work on multimodal deep learning from three new angles - learning multimodal representations, the fusion of multimodal signals at various levels, and multimodal applications. On multimodal representation learning, we review the key concept of embedding, which unifies the multimodal signals into the same vector space and thus enables cross-modality signal processing. We also review the properties of the many types of embedding constructed and learned for general downstream tasks. On multimodal fusion, this review focuses on special architectures for the integration of the representation of unimodal signals for a particular task. On applications, selected areas of a broad interest in current literature are covered, including caption generation, text-to-image generation, and visual question answering. We believe this review can facilitate future studies in the emerging field of multimodal intelligence for the community.

To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose Ripple Network, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines.

北京阿比特科技有限公司