亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Current deep learning methods for low-light image enhancement (LLIE) typically rely on pixel-wise mapping learned from paired data. However, these methods often overlook the importance of considering degradation representations, which can lead to sub-optimal outcomes. In this paper, we address this limitation by proposing a degradation-aware learning scheme for LLIE using diffusion models, which effectively integrates degradation and image priors into the diffusion process, resulting in improved image enhancement. Our proposed degradation-aware learning scheme is based on the understanding that degradation representations play a crucial role in accurately modeling and capturing the specific degradation patterns present in low-light images. To this end, First, a joint learning framework for both image generation and image enhancement is presented to learn the degradation representations. Second, to leverage the learned degradation representations, we develop a Low-Light Diffusion model (LLDiffusion) with a well-designed dynamic diffusion module. This module takes into account both the color map and the latent degradation representations to guide the diffusion process. By incorporating these conditioning factors, the proposed LLDiffusion can effectively enhance low-light images, considering both the inherent degradation patterns and the desired color fidelity. Finally, we evaluate our proposed method on several well-known benchmark datasets, including synthetic and real-world unpaired datasets. Extensive experiments on public benchmarks demonstrate that our LLDiffusion outperforms state-of-the-art LLIE methods both quantitatively and qualitatively. The source code and pre-trained models are available at //github.com/TaoWangzj/LLDiffusion.

相關內容

Over-the-air federated learning (OTA-FL) integrates communication and model aggregation by exploiting the innate superposition property of wireless channels. The approach renders bandwidth efficient learning, but requires care in handling the wireless physical layer impairments. In this paper, federated edge learning is considered for a network that is heterogeneous with respect to client (edge node) data set distributions and individual client resources, under a general non-convex learning objective. We augment the wireless OTA-FL system with a Reconfigurable Intelligent Surface (RIS) to enable a propagation environment with improved learning performance in a realistic time varying physical layer. Our approach is a cross-layer perspective that jointly optimizes communication, computation and learning resources, in this general heterogeneous setting. We adapt the local computation steps and transmission power of the clients in conjunction with the RIS phase shifts. The resulting joint communication and learning algorithm, RIS-assisted Over-the-air Adaptive Resource Allocation for Federated learning (ROAR-Fed) is shown to be convergent in this general setting. Numerical results demonstrate the effectiveness of ROAR-Fed under heterogeneous (non i.i.d.) data and imperfect CSI, indicating the advantage of RIS assisted learning in this general set up.

The domain shift between training and testing data presents a significant challenge for training generalizable deep learning models. As a consequence, the performance of models trained with the independent and identically distributed (i.i.d) assumption deteriorates when deployed in the real world. This problem is exacerbated in the medical imaging context due to variations in data acquisition across clinical centers, medical apparatus, and patients. Domain generalization (DG) aims to address this problem by learning a model that generalizes well to any unseen target domain. Many domain generalization techniques were unsuccessful in learning domain-invariant representations due to the large domain shift. Furthermore, multiple tasks in medical imaging are not yet extensively studied in existing literature when it comes to DG point of view. In this paper, we introduce a DG method that re-establishes the model objective function as a maximization of mutual information with a large pretrained model to the medical imaging field. We re-visit the problem of DG in Diabetic Retinopathy (DR) classification to establish a clear benchmark with a correct model selection strategy and to achieve robust domain-invariant representation for an improved generalization. Moreover, we conduct extensive experiments on public datasets to show that our proposed method consistently outperforms the previous state-of-the-art by a margin of 5.25% in average accuracy and a lower standard deviation. Source code available at //github.com/BioMedIA-MBZUAI/DGM-DR

Deep learning-based methods have been extensively explored for automatic building mapping from high-resolution remote sensing images over recent years. While most building mapping models produce vector polygons of buildings for geographic and mapping systems, dominant methods typically decompose polygonal building extraction in some sub-problems, including segmentation, polygonization, and regularization, leading to complex inference procedures, low accuracy, and poor generalization. In this paper, we propose a simple and novel building mapping method with Hierarchical Transformers, called HiT, improving polygonal building mapping quality from high-resolution remote sensing images. HiT builds on a two-stage detection architecture by adding a polygon head parallel to classification and bounding box regression heads. HiT simultaneously outputs building bounding boxes and vector polygons, which is fully end-to-end trainable. The polygon head formulates a building polygon as serialized vertices with the bidirectional characteristic, a simple and elegant polygon representation avoiding the start or end vertex hypothesis. Under this new perspective, the polygon head adopts a transformer encoder-decoder architecture to predict serialized vertices supervised by the designed bidirectional polygon loss. Furthermore, a hierarchical attention mechanism combined with convolution operation is introduced in the encoder of the polygon head, providing more geometric structures of building polygons at vertex and edge levels. Comprehensive experiments on two benchmarks (the CrowdAI and Inria datasets) demonstrate that our method achieves a new state-of-the-art in terms of instance segmentation and polygonal metrics compared with state-of-the-art methods. Moreover, qualitative results verify the superiority and effectiveness of our model under complex scenes.

We introduce RobotPerf, a vendor-agnostic benchmarking suite designed to evaluate robotics computing performance across a diverse range of hardware platforms using ROS 2 as its common baseline. The suite encompasses ROS 2 packages covering the full robotics pipeline and integrates two distinct benchmarking approaches: black-box testing, which measures performance by eliminating upper layers and replacing them with a test application, and grey-box testing, an application-specific measure that observes internal system states with minimal interference. Our benchmarking framework provides ready-to-use tools and is easily adaptable for the assessment of custom ROS 2 computational graphs. Drawing from the knowledge of leading robot architects and system architecture experts, RobotPerf establishes a standardized approach to robotics benchmarking. As an open-source initiative, RobotPerf remains committed to evolving with community input to advance the future of hardware-accelerated robotics.

Despite the remarkable success of deep learning systems over the last decade, a key difference still remains between neural network and human decision-making: As humans, we cannot only form a decision on the spot, but also ponder, revisiting an initial guess from different angles, distilling relevant information, arriving at a better decision. Here, we propose RecycleNet, a latent feature recycling method, instilling the pondering capability for neural networks to refine initial decisions over a number of recycling steps, where outputs are fed back into earlier network layers in an iterative fashion. This approach makes minimal assumptions about the neural network architecture and thus can be implemented in a wide variety of contexts. Using medical image segmentation as the evaluation environment, we show that latent feature recycling enables the network to iteratively refine initial predictions even beyond the iterations seen during training, converging towards an improved decision. We evaluate this across a variety of segmentation benchmarks and show consistent improvements even compared with top-performing segmentation methods. This allows trading increased computation time for improved performance, which can be beneficial, especially for safety-critical applications.

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural language processing techniques. However, these studies, mainly single-task focused, faced challenges like overfitting and performance degradation in speech recognition tasks, often at the cost of sacrificing performance in multi-task scenarios. This study presents a comprehensive comparison and optimization of discrete tokens generated by various leading SSL models in speech recognition and synthesis tasks. We aim to explore the universality of speech discrete tokens across multiple speech tasks. Experimental results demonstrate that discrete tokens achieve comparable results against systems trained on FBank features in speech recognition tasks and outperform mel-spectrogram features in speech synthesis in subjective and objective metrics. These findings suggest that universal discrete tokens have enormous potential in various speech-related tasks. Our work is open-source and publicly available to facilitate research in this direction.

Deep learning-based algorithms have seen a massive popularity in different areas of remote sensing image analysis over the past decade. Recently, transformers-based architectures, originally introduced in natural language processing, have pervaded computer vision field where the self-attention mechanism has been utilized as a replacement to the popular convolution operator for capturing long-range dependencies. Inspired by recent advances in computer vision, remote sensing community has also witnessed an increased exploration of vision transformers for a diverse set of tasks. Although a number of surveys have focused on transformers in computer vision in general, to the best of our knowledge we are the first to present a systematic review of recent advances based on transformers in remote sensing. Our survey covers more than 60 recent transformers-based methods for different remote sensing problems in sub-areas of remote sensing: very high-resolution (VHR), hyperspectral (HSI) and synthetic aperture radar (SAR) imagery. We conclude the survey by discussing different challenges and open issues of transformers in remote sensing. Additionally, we intend to frequently update and maintain the latest transformers in remote sensing papers with their respective code at: //github.com/VIROBO-15/Transformer-in-Remote-Sensing

Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis. However, the current MIL methods are usually based on independent and identical distribution hypothesis, thus neglect the correlation among different instances. To address this problem, we proposed a new framework, called correlated MIL, and provided a proof for convergence. Based on this framework, we devised a Transformer based MIL (TransMIL), which explored both morphological and spatial information. The proposed TransMIL can effectively deal with unbalanced/balanced and binary/multiple classification with great visualization and interpretability. We conducted various experiments for three different computational pathology problems and achieved better performance and faster convergence compared with state-of-the-art methods. The test AUC for the binary tumor classification can be up to 93.09% over CAMELYON16 dataset. And the AUC over the cancer subtypes classification can be up to 96.03% and 98.82% over TCGA-NSCLC dataset and TCGA-RCC dataset, respectively.

Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms exiting models with state-of-the-art results.

State-of-the-art Convolutional Neural Network (CNN) benefits a lot from multi-task learning (MTL), which learns multiple related tasks simultaneously to obtain shared or mutually related representations for different tasks. The most widely-used MTL CNN structure is based on an empirical or heuristic split on a specific layer (e.g., the last convolutional layer) to minimize different task-specific losses. However, this heuristic sharing/splitting strategy may be harmful to the final performance of one or multiple tasks. In this paper, we propose a novel CNN structure for MTL, which enables automatic feature fusing at every layer. Specifically, we first concatenate features from different tasks according to their channel dimension, and then formulate the feature fusing problem as discriminative dimensionality reduction. We show that this discriminative dimensionality reduction can be done by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN, which we refer to as Neural Discriminative Dimensionality Reduction (NDDR). We perform ablation analysis in details for different configurations in training the network. The experiments carried out on different network structures and different task sets demonstrate the promising performance and desirable generalizability of our proposed method.

北京阿比特科技有限公司