亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<li id='232ic'></li>

_{^{<dd id='232ic'><tbody id='232ic'><td id='232ic'><optgroup id='232ic'><strong id='232ic'></strong></optgroup><address id='232ic'><ul id='232ic'></ul></address><big id='232ic'></big></td><table id='232ic'></table></tbody><pre id='232ic'></pre></dd><span id='232ic'><b id='232ic'></b></span>}}


<dfn id='232ic'><optgroup id='232ic'></optgroup></dfn><tfoot id='232ic'><bdo id='232ic'><div id='232ic'></div><i id='232ic'><dt id='232ic'></dt></i></bdo></tfoot>

_{<fieldset id='232ic'></fieldset>}

·

估計/估計量 · MoDELS · 噪聲 · HTTPS · 查準率/準確率 ·

2023 年 11 月 29 日

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

Daniel Geng,Inbum Park,Andrew Owens

We address the problem of synthesizing multi-view optical illusions: images that change appearance upon a transformation, such as a flip or rotation. We propose a simple, zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models. During the reverse diffusion process, we estimate the noise from different views of a noisy image. We then combine these noise estimates together and denoise the image. A theoretical analysis suggests that this method works precisely for views that can be written as orthogonal transformations, of which permutations are a subset. This leads to the idea of a visual anagram--an image that changes appearance under some rearrangement of pixels. This includes rotations and flips, but also more exotic pixel permutations such as a jigsaw rearrangement. Our approach also naturally extends to illusions with more than two views. We provide both qualitative and quantitative results demonstrating the effectiveness and flexibility of our method. Please see our project webpage for additional visualizations and results: //dangeng.github.io/visual_anagrams/

相關內容

估計/估計量

估計/估計量

相關系數 · 蒸餾 · 知識 (knowledge) · MoDELS · Notability ·

2024 年 1 月 19 日

HiCD: Change Detection in Quality-Varied Images via Hierarchical Correlation Distillation

Chao Pang,Xingxing Weng,Jiang Wu,Qiang Wang,Gui-Song Xia

from arxiv, accepted by TGRS

Advanced change detection techniques primarily target image pairs of equal and high quality. However, variations in imaging conditions and platforms frequently lead to image pairs with distinct qualities: one image being high-quality, while the other being low-quality. These disparities in image quality present significant challenges for understanding image pairs semantically and extracting change features, ultimately resulting in a notable decline in performance. To tackle this challenge, we introduce an innovative training strategy grounded in knowledge distillation. The core idea revolves around leveraging task knowledge acquired from high-quality image pairs to guide the model's learning process when dealing with image pairs that exhibit differences in quality. Additionally, we develop a hierarchical correlation distillation approach (involving self-correlation, cross-correlation, and global correlation). This approach compels the student model to replicate the correlations inherent in the teacher model, rather than focusing solely on individual features. This ensures effective knowledge transfer while maintaining the student model's training flexibility.

GPS · MoDELS · Attention · Extensibility · INFORMS ·

2024 年 1 月 19 日

Semantic Lens: Instance-Centric Semantic Alignment for Video Super-Resolution

Qi Tang,Yao Zhao,Meiqin Liu,Jian Jin,Chao Yao

from arxiv, Accepted to AAAI 2024

As a critical clue of video super-resolution (VSR), inter-frame alignment significantly impacts overall performance. However, accurate pixel-level alignment is a challenging task due to the intricate motion interweaving in the video. In response to this issue, we introduce a novel paradigm for VSR named Semantic Lens, predicated on semantic priors drawn from degraded videos. Specifically, video is modeled as instances, events, and scenes via a Semantic Extractor. Those semantics assist the Pixel Enhancer in understanding the recovered contents and generating more realistic visual results. The distilled global semantics embody the scene information of each frame, while the instance-specific semantics assemble the spatial-temporal contexts related to each instance. Furthermore, we devise a Semantics-Powered Attention Cross-Embedding (SPACE) block to bridge the pixel-level features with semantic knowledge, composed of a Global Perspective Shifter (GPS) and an Instance-Specific Semantic Embedding Encoder (ISEE). Concretely, the GPS module generates pairs of affine transformation parameters for pixel-level feature modulation conditioned on global semantics. After that, the ISEE module harnesses the attention mechanism to align the adjacent frames in the instance-centric semantic space. In addition, we incorporate a simple yet effective pre-alignment module to alleviate the difficulty of model training. Extensive experiments demonstrate the superiority of our model over existing state-of-the-art VSR methods.

CARS · 可約的 · Robot ·

2024 年 1 月 19 日

On The Impact of Replacing Private Cars with Autonomous Shuttles: An Agent-Based Approach

Daniel Bogdoll,Louis Karsch,Jennifer Amritzer,J. Marius Z?llner

from arxiv, Daniel Bogdoll and Louis Karsch contributed equally. Accepted for publication at FISTS 2024

The European Green Deal aims to achieve climate neutrality by 2050, which demands improved emissions efficiency from the transportation industry. This study uses an agent-based simulation to analyze the sustainability impacts of shared autonomous shuttles. We forecast travel demands for 2050 and simulate regulatory interventions in the form of replacing private cars with a fleet of shared autonomous shuttles in specific areas. We derive driving-related emissions, energy consumption, and non-driving-related emissions to calculate life-cycle emissions. We observe reduced life-cycle emissions from 0.4% to 9.6% and reduced energy consumption from 1.5% to 12.2%.

Networking · Performer · Neural Networks · MoDELS · 卷積神經網絡 ·

2024 年 1 月 18 日

ELRT: Efficient Low-Rank Training for Compact Convolutional Neural Networks

Yang Sui,Miao Yin,Yu Gong,Jinqi Xiao,Huy Phan,Bo Yuan

Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, and the entire training phase is always performed on the low-rank structure, bringing attractive benefits for practical applications. However, the existing low-rank training solutions still face several challenges, such as a considerable accuracy drop and/or still needing to update full-size models during the training. In this paper, we perform a systematic investigation on low-rank CNN training. By identifying the proper low-rank format and performance-improving strategy, we propose ELRT, an efficient low-rank training solution for high-accuracy, high-compactness, low-rank CNN models. Our extensive evaluation results for training various CNNs on different datasets demonstrate the effectiveness of ELRT.

MoDELS · Performer · binary · Vision · GROUP ·

2024 年 1 月 18 日

GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

Tian Gao,Cheng-Zhong Xu,Le Zhang,Hui Kong

from arxiv, Accepted by Neural Networks

Vision Transformer (ViT) has performed remarkably in various computer vision tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method, model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we investigate a binarized ViT model. Empirically, we observe that the existing binarization technology designed for Convolutional Neural Networks (CNN) cannot migrate well to a ViT's binarization task. We also find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Analytically, model binarization can limit the parameters search space during parameter updates while training a model....

詞元分析器 · 變換 · INFORMS · GROUP · Learning ·

2024 年 1 月 18 日

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Leheng Zhang,Yawei Li,Xingyu Zhou,Xiaorui Zhao,Shuhang Gu

from arxiv, 15 pages, 9 figures

Single Image Super-Resolution is a classic computer vision problem that involves estimating high-resolution (HR) images from low-resolution (LR) ones. Although deep neural networks (DNNs), especially Transformers for super-resolution, have seen significant advancements in recent years, challenges still remain, particularly in limited receptive field caused by window-based self-attention. To address these issues, we introduce a group of auxiliary Adaptive Token Dictionary to SR Transformer and establish an ATD-SR method. The introduced token dictionary could learn prior information from training data and adapt the learned prior to specific testing image through an adaptive refinement step. The refinement strategy could not only provide global information to all input tokens but also group image tokens into categories. Based on category partitions, we further propose a category-based self-attention mechanism designed to leverage distant but similar tokens for enhancing input features. The experimental results show that our method achieves the best performance on various single image super-resolution benchmarks.

Vim · 狀態空間 · Performer · Vision · MoDELS ·

2024 年 1 月 17 日

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu,Bencheng Liao,Qian Zhang,Xinlong Wang,Wenyu Liu,Xinggang Wang

from arxiv, Work in progress. Code is available at //github.com/hustvl/Vim

Recently the state space models (SSMs) with efficient hardware-aware designs, i.e., Mamba, have shown great potential for long sequence modeling. Building efficient and generic vision backbones purely upon SSMs is an appealing direction. However, representing visual data is challenging for SSMs due to the position-sensitivity of visual data and the requirement of global context for visual understanding. In this paper, we show that the reliance of visual representation learning on self-attention is not necessary and propose a new generic vision backbone with bidirectional Mamba blocks (Vim), which marks the image sequences with position embeddings and compresses the visual representation with bidirectional state space models. On ImageNet classification, COCO object detection, and ADE20k semantic segmentation tasks, Vim achieves higher performance compared to well-established vision transformers like DeiT, while also demonstrating significantly improved computation & memory efficiency. For example, Vim is 2.8$\times$ faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on images with a resolution of 1248$\times$1248. The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to become the next-generation backbone for vision foundation models. Code is available at //github.com/hustvl/Vim.

回合 · 數據集 · state-of-the-art · HTTPS · 真實值 ·

2024 年 1 月 17 日

Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

Benjamin Ummenhofer,Sanskar Agrawal,Rene Sepulveda,Yixing Lao,Kai Zhang,Tianhang Cheng,Stephan Richter,Shenlong Wang,German Ros

from arxiv, Accepted at 3DV 2024, Oral presentation. For the project page see //github.com/isl-org/objects-with-lighting

Reconstructing an object from photos and placing it virtually in a new environment goes beyond the standard novel view synthesis task as the appearance of the object has to not only adapt to the novel viewpoint but also to the new lighting conditions and yet evaluations of inverse rendering methods rely on novel view synthesis data or simplistic synthetic datasets for quantitative analysis. This work presents a real-world dataset for measuring the reconstruction and rendering of objects for relighting. To this end, we capture the environment lighting and ground truth images of the same objects in multiple environments allowing to reconstruct the objects from images taken in one environment and quantify the quality of the rendered views for the unseen lighting environments. Further, we introduce a simple baseline composed of off-the-shelf methods and test several state-of-the-art methods on the relighting task and show that novel view synthesis is not a reliable proxy to measure performance. Code and dataset are available at //github.com/isl-org/objects-with-lighting .

圖 · INTERACT · 可理解性 · Extensibility · 學成 ·

2021 年 12 月 16 日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Zhecan Wang,Haoxuan You,Liunian Harold Li,Alireza Zareian,Suji Park,Yiqing Liang,Kai-Wei Chang,Shih-Fu Chang

from arxiv, AAAI 2022

Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graphs in commonsense reasoning. To exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph. Moreover, we introduce a method to train and generate domain-relevant visual scene graphs using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show a significant performance boost compared with the state-of-the-art methods and prove the efficacy of each proposed component.

CARS · 深度學習 · Pattern Recognition · 圖像檢索 · Vision ·

2021 年 11 月 11 日

Fine-Grained Image Analysis with Deep Learning: A Survey

Xiu-Shen Wei,Yi-Zhe Song,Oisin Mac Aodha,Jianxin Wu,Yuxin Peng,Jinhui Tang,Jian Yang,Serge Belongie

from arxiv, Accepted by IEEE TPAMI

Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas -- fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

估計/估計量

查準率/準確率

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191