亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Image inpainting aims to fill the missing hole of the input. It is hard to solve this task efficiently when facing high-resolution images due to two reasons: (1) Large reception field needs to be handled for high-resolution image inpainting. (2) The general encoder and decoder network synthesizes many background pixels synchronously due to the form of the image matrix. In this paper, we try to break the above limitations for the first time thanks to the recent development of continuous implicit representation. In detail, we down-sample and encode the degraded image to produce the spatial-adaptive parameters for each spatial patch via an attentional Fast Fourier Convolution(FFC)-based parameter generation network. Then, we take these parameters as the weights and biases of a series of multi-layer perceptron(MLP), where the input is the encoded continuous coordinates and the output is the synthesized color value. Thanks to the proposed structure, we only encode the high-resolution image in a relatively low resolution for larger reception field capturing. Then, the continuous position encoding will be helpful to synthesize the photo-realistic high-frequency textures by re-sampling the coordinate in a higher resolution. Also, our framework enables us to query the coordinates of missing pixels only in parallel, yielding a more efficient solution than the previous methods. Experiments show that the proposed method achieves real-time performance on the 2048$\times$2048 images using a single GTX 2080 Ti GPU and can handle 4096$\times$4096 images, with much better performance than existing state-of-the-art methods visually and numerically. The code is available at: //github.com/NiFangBaAGe/CoordFill.

相關內容

讓 iOS 8 和 OS X Yosemite 無縫切換的一個新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source:

Deep learning-based image signal processor (ISP) models for mobile cameras can generate high-quality images that rival those of professional DSLR cameras. However, their computational demands often make them unsuitable for mobile settings. Additionally, modern mobile cameras employ non-Bayer color filter arrays (CFA) such as Quad Bayer, Nona Bayer, and QxQ Bayer to enhance image quality, yet most existing deep learning-based ISP (or demosaicing) models focus primarily on standard Bayer CFAs. In this study, we present PyNET-QxQ, a lightweight demosaicing model specifically designed for QxQ Bayer CFA patterns, which is derived from the original PyNET. We also propose a knowledge distillation method called progressive distillation to train the reduced network more effectively. Consequently, PyNET-QxQ contains less than 2.5% of the parameters of the original PyNET while preserving its performance. Experiments using QxQ images captured by a proto type QxQ camera sensor show that PyNET-QxQ outperforms existing conventional algorithms in terms of texture and edge reconstruction, despite its significantly reduced parameter count.

Multi-view clustering has attracted much attention thanks to the capacity of multi-source information integration. Although numerous advanced methods have been proposed in past decades, most of them generally overlook the significance of weakly-supervised information and fail to preserve the feature properties of multiple views, thus resulting in unsatisfactory clustering performance. To address these issues, in this paper, we propose a novel Deep Multi-view Semi-supervised Clustering (DMSC) method, which jointly optimizes three kinds of losses during networks finetuning, including multi-view clustering loss, semi-supervised pairwise constraint loss and multiple autoencoders reconstruction loss. Specifically, a KL divergence based multi-view clustering loss is imposed on the common representation of multi-view data to perform heterogeneous feature optimization, multi-view weighting and clustering prediction simultaneously. Then, we innovatively propose to integrate pairwise constraints into the process of multi-view clustering by enforcing the learned multi-view representation of must-link samples (cannot-link samples) to be similar (dissimilar), such that the formed clustering architecture can be more credible. Moreover, unlike existing rivals that only preserve the encoders for each heterogeneous branch during networks finetuning, we further propose to tune the intact autoencoders frame that contains both encoders and decoders. In this way, the issue of serious corruption of view-specific and view-shared feature space could be alleviated, making the whole training procedure more stable. Through comprehensive experiments on eight popular image datasets, we demonstrate that our proposed approach performs better than the state-of-the-art multi-view and single-view competitors.

Urban Physical Disorder (UPD), such as old or abandoned buildings, broken sidewalks, litter, and graffiti, has a negative impact on residents' quality of life. They can also increase crime rates, cause social disorder, and pose a public health risk. Currently, there is a lack of efficient and reliable methods for detecting and understanding UPD. To bridge this gap, we propose UPDExplainer, an interpretable transformer-based framework for UPD detection. We first develop a UPD detection model based on the Swin Transformer architecture, which leverages readily accessible street view images to learn discriminative representations. In order to provide clear and comprehensible evidence and analysis, we subsequently introduce a UPD factor identification and ranking module that combines visual explanation maps with semantic segmentation maps. This novel integrated approach enables us to identify the exact objects within street view images that are responsible for physical disorders and gain insights into the underlying causes. Experimental results on the re-annotated Place Pulse 2.0 dataset demonstrate promising detection performance of the proposed method, with an accuracy of 79.9%. For a comprehensive evaluation of the method's ranking performance, we report the mean Average Precision (mAP), R-Precision (RPrec), and Normalized Discounted Cumulative Gain (NDCG), with success rates of 75.51%, 80.61%, and 82.58%, respectively. We also present a case study of detecting and ranking physical disorders in the southern region of downtown Los Angeles, California, to demonstrate the practicality and effectiveness of our framework.

Robust watermarking tries to conceal information within a cover image/video imperceptibly that is resistant to various distortions. Recently, deep learning-based approaches for image watermarking have made significant advancements in robustness and invisibility. However, few studies focused on video watermarking using deep neural networks due to the high complexity and computational costs. Our paper aims to answer this research question: Can well-designed deep learning-based image watermarking be efficiently adapted to video watermarking? Our answer is positive. First, we revisit the workflow of deep learning-based watermarking methods that leads to a critical insight: temporal information in the video may be essential for general computer vision tasks but not for specific video watermarking. Inspired by this insight, we propose a method named ItoV for efficiently adapting deep learning-based Image watermarking to Video watermarking. Specifically, ItoV merges the temporal dimension of the video with the channel dimension to enable deep neural networks to treat videos as images. We further explore the effects of different convolutional blocks in video watermarking. We find that spatial convolution is the primary influential component in video watermarking and depthwise convolutions significantly reduce computational cost with negligible impact on performance. In addition, we propose a new frame loss to constrain that the watermark intensity in each video clip frame is consistent, significantly improving the invisibility. Extensive experiments show the superior performance of the adapted video watermarking method compared with the state-of-the-art methods on Kinetics-600 and Inter4K datasets, which demonstrate the efficacy of our method ItoV.

Due to the ever increasing data rate demand of beyond 5G networks and considering the wide range of Orthogonal Frequency Division Multipllexing (OFDM) technique in cellular systems, it is critical to reduce pilot overhead of OFDM systems in order to increase data rate of such systems. Due to sparsity of multipath channels, sparse recovery methods can be exploited to reduce pilot overhead. OFDM pilots are utilized as random samples for channel impulse response estimation. We propose a three-step sparsity recovery algorithm which is based on sparsity domain smoothing. Time domain residue computation, sparsity domain smoothing, and adaptive thresholding sparsifying are the three-steps of the proposed scheme. To the best of our knowledge, the proposed sparsity domain smoothing based thresholding recovery method known as SDS-IMAT has not been used for OFDM sparse channel estimation in the literature. Pilot locations are also derived based on the minimization of the measurement matrix coherence. Numerical results verify that the performance of the proposed scheme outperforms other existing thresholding and greedy recovery methods and has a near-optimal performance. The effectiveness of the proposed scheme is shown in terms of mean square error and bit error rate.

The local pivotal method (LPM) is a successful sampling method for taking well-spread samples from discrete populations. We show how the LPM can be utilized to sample from arbitrary continuous distributions and thereby give powerful variance reduction in general cases. The method creates an ``automatic stratification" on any continuous distribution, of any dimension, and selects a ``thin" well-spread sample. We demonstrate the simplicity, generality and effectiveness of the LPM with various examples, including Monte Carlo estimation of integrals, option pricing and stability estimation in non-linear dynamical systems. Additionally, we show how the LPM can be combined with other variance reduction techniques, such as importance sampling, to achieve even greater variance reduction. To facilitate the implementation of the LPM, we provide a quick start guide to using LPM in MATLAB and R, which includes sample code demonstrating how to achieve variance reduction with just a few lines of code.

It is challenging to remove rain-steaks from a single rainy image because the rain steaks are spatially varying in the rainy image. Although the CNN based methods have reported promising performance recently, there are still some defects, such as data dependency and insufficient interpretation. A single image deraining algorithm based on the combination of data-driven and model-based approaches is proposed. Firstly, an improved weighted guided image filter (iWGIF) is used to extract high-frequency information and learn the rain steaks to avoid interference from other information through the input image. Then, transfering the input image and rain steaks from the image domain to the feature domain adaptively to learn useful features for high-quality image deraining. Finally, networks with attention mechanisms is used to restore high-quality images from the latent features. Experiments show that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both qualitative and quantitative measures.

Multi-scale design has been considered in recent image super-resolution (SR) works to explore the hierarchical feature information. Existing multi-scale networks aim to build elaborate blocks or progressive architecture for restoration. In general, larger scale features concentrate more on structural and high-level information, while smaller scale features contain plentiful details and textured information. In this point of view, information from larger scale features can be derived from smaller ones. Based on the observation, in this paper, we build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR. Specially, we consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information. SMB is designed in a recursive way based on the linearity of convolution with restricted parameters. Besides the sequential hierarchical learning, we also investigate the correlations among the feature maps and devise a distribution transformation block (DTB). Different from attention-based methods, DTB regards the transformation in a normalization manner, and jointly considers the spatial and channel-wise correlations with scaling and bias factors. Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods with near 34\% parameters and 50\% MACs off when scaling factor is $\times4$. To boost the performance without further training, the extension model SHSR$^+$ with self-ensemble achieves competitive performance than larger networks with near 92\% parameters and 42\% MACs off with scaling factor $\times4$.

Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and open questions related to attention mechanism in general. Finally, we recommend possible future research directions for deep attention.

What matters for contrastive learning? We argue that contrastive learning heavily relies on informative features, or "hard" (positive or negative) features. Early works include more informative features by applying complex data augmentations and large batch size or memory bank, and recent works design elaborate sampling approaches to explore informative features. The key challenge toward exploring such features is that the source multi-view data is generated by applying random data augmentations, making it infeasible to always add useful information in the augmented data. Consequently, the informativeness of features learned from such augmented data is limited. In response, we propose to directly augment the features in latent space, thereby learning discriminative representations without a large amount of input data. We perform a meta learning technique to build the augmentation generator that updates its network parameters by considering the performance of the encoder. However, insufficient input data may lead the encoder to learn collapsed features and therefore malfunction the augmentation generator. A new margin-injected regularization is further added in the objective function to avoid the encoder learning a degenerate mapping. To contrast all features in one gradient back-propagation step, we adopt the proposed optimization-driven unified contrastive loss instead of the conventional contrastive loss. Empirically, our method achieves state-of-the-art results on several benchmark datasets.

北京阿比特科技有限公司