国产综合欧美日韩激情在线,一级无码免费视频,亚洲AV中文A无码AV接吻,欧美日本一二三区

Image restoration aims to reconstruct degraded images, e.g., denoising or deblurring. Existing works focus on designing task-specific methods and there are inadequate attempts at universal methods. However, simply unifying multiple tasks into one universal architecture suffers from uncontrollable and undesired predictions. To address those issues, we explore prompt learning in universal architectures for image restoration tasks. In this paper, we present Degradation-aware Visual Prompts, which encode various types of image degradation, e.g., noise and blur, into unified visual prompts. These degradation-aware prompts provide control over image processing and allow weighted combinations for customized image restoration. We then leverage degradation-aware visual prompts to establish a controllable and universal model for image restoration, called ProRes, which is applicable to an extensive range of image restoration tasks. ProRes leverages the vanilla Vision Transformer (ViT) without any task-specific designs. Furthermore, the pre-trained ProRes can easily adapt to new tasks through efficient prompt tuning with only a few images. Without bells and whistles, ProRes achieves competitive performance compared to task-specific methods and experiments can demonstrate its ability for controllable restoration and adaptation for new tasks. The code and models will be released in \url{//github.com/leonmakise/ProRes}.

相關內容

圖像還原

關注 0

Integration · Chiplet · 設計 · state-of-the-art · 評論員 ·

2023 年 8 月 15 日

Monad: Towards Cost-effective Specialization for Chiplet-based Spatial Accelerators

Xiaochen Hao,Zijian Ding,Jieming Yin,Yuan Wang,Yun Liang

from arxiv, To be published in ICCAD 2023

Advanced packaging offers a new design paradigm in the post-Moore era, where many small chiplets can be assembled into a large system. Based on heterogeneous integration, a chiplet-based accelerator can be highly specialized for a specific workload, demonstrating extreme efficiency and cost reduction. To fully leverage this potential, it is critical to explore both the architectural design space for individual chiplets and different integration options to assemble these chiplets, which have yet to be fully exploited by existing proposals. This paper proposes Monad, a cost-aware specialization approach for chiplet-based spatial accelerators that explores the tradeoffs between PPA and fabrication costs. To evaluate a specialized system, we introduce a modeling framework considering the non-uniformity in dataflow, pipelining, and communications when executing multiple tensor workloads on different chiplets. We propose to combine the architecture and integration design space by uniformly encoding the design aspects for both spaces and exploring them with a systematic ML-based approach. The experiments demonstrate that Monad can achieve an average of 16% and 30% EDP reduction compared with the state-of-the-art chiplet-based accelerators, Simba and NN-Baton, respectively.

state-of-the-art · Learning · Prompt · 視頻描述生成（Video Caption） · 表示 ·

2023 年 8 月 15 日

Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval

Chaorui Deng,Qi Chen,Pengda Qin,Da Chen,Qi Wu

from arxiv, to be appeared in ICCV2023

In text-video retrieval, recent works have benefited from the powerful learning capabilities of pre-trained text-image foundation models (e.g., CLIP) by adapting them to the video domain. A critical problem for them is how to effectively capture the rich semantics inside the video using the image encoder of CLIP. To tackle this, state-of-the-art methods adopt complex cross-modal modeling techniques to fuse the text information into video frame representations, which, however, incurs severe efficiency issues in large-scale retrieval systems as the video representations must be recomputed online for every text query. In this paper, we discard this problematic cross-modal fusion process and aim to learn semantically-enhanced representations purely from the video, so that the video representations can be computed offline and reused for different texts. Concretely, we first introduce a spatial-temporal "Prompt Cube" into the CLIP image encoder and iteratively switch it within the encoder layers to efficiently incorporate the global video semantics into frame representations. We then propose to apply an auxiliary video captioning objective to train the frame representations, which facilitates the learning of detailed video semantics by providing fine-grained guidance in the semantic space. With a naive temporal fusion strategy (i.e., mean-pooling) on the enhanced frame representations, we obtain state-of-the-art performances on three benchmark datasets, i.e., MSR-VTT, MSVD, and LSMDC.

泛化理論 · Learning · 類別 · 線性分類 · 線性的 ·

2023 年 8 月 15 日

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

Junhyeong Cho,Gilhyun Nam,Sungyeon Kim,Hunmin Yang,Suha Kwak

from arxiv, Accepted to ICCV 2023, Project Page: //promptstyler.github.io/

In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Also, a recent study has demonstrated the cross-modal transferability phenomenon of this joint space. From these observations, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse styles via prompts without using any images to deal with source-free domain generalization. The proposed method learns to generate a variety of style features (from "a S* style of a") via learnable style word vectors for pseudo-words S*. To ensure that learned styles do not distort content information, we force style-content features (from "a S* style of a [class]") to be located nearby their corresponding content features (from "[class]") in the joint vision-language space. After learning style word vectors, we train a linear classifier using synthesized style-content features. PromptStyler achieves the state of the art on PACS, VLCS, OfficeHome and DomainNet, even though it does not require any images for training.

圖 · Prompt · tuning · MoDELS · 學習器 ·

2023 年 8 月 15 日

SGL-PT: A Strong Graph Learner with Graph Prompt Tuning

Yun Zhu,Jianhao Guo,Siliang Tang

Recently, much exertion has been paid to design graph self-supervised methods to obtain generalized pre-trained models, and adapt pre-trained models onto downstream tasks through fine-tuning. However, there exists an inherent gap between pretext and downstream graph tasks, which insufficiently exerts the ability of pre-trained models and even leads to negative transfer. Meanwhile, prompt tuning has seen emerging success in natural language processing by aligning pre-training and fine-tuning with consistent training objectives. In this paper, we identify the challenges for graph prompt tuning: The first is the lack of a strong and universal pre-training task across sundry pre-training methods in graph domain. The second challenge lies in the difficulty of designing a consistent training objective for both pre-training and downstream tasks. To overcome above obstacles, we propose a novel framework named SGL-PT which follows the learning strategy ``Pre-train, Prompt, and Predict''. Specifically, we raise a strong and universal pre-training task coined as SGL that acquires the complementary merits of generative and contrastive self-supervised graph learning. And aiming for graph classification task, we unify pre-training and fine-tuning by designing a novel verbalizer-free prompting function, which reformulates the downstream task in a similar format as pretext task. Empirical results show that our method surpasses other baselines under unsupervised setting, and our prompt tuning method can greatly facilitate models on biological datasets over fine-tuning methods.

Graph Transformer · Attention · Microsoft Windows · 圖 · 變換 ·

2023 年 8 月 15 日

Graph-Segmenter: Graph Transformer with Boundary-aware Attention for Semantic Segmentation

Zizhang Wu,Yuanzhu Gan,Tianhao Xu,Fan Wang

The transformer-based semantic segmentation approaches, which divide the image into different regions by sliding windows and model the relation inside each window, have achieved outstanding success. However, since the relation modeling between windows was not the primary emphasis of previous work, it was not fully utilized. To address this issue, we propose a Graph-Segmenter, including a Graph Transformer and a Boundary-aware Attention module, which is an effective network for simultaneously modeling the more profound relation between windows in a global view and various pixels inside each window as a local one, and for substantial low-cost boundary adjustment. Specifically, we treat every window and pixel inside the window as nodes to construct graphs for both views and devise the Graph Transformer. The introduced boundary-aware attention module optimizes the edge information of the target objects by modeling the relationship between the pixel on the object's edge. Extensive experiments on three widely used semantic segmentation datasets (Cityscapes, ADE-20k and PASCAL Context) demonstrate that our proposed network, a Graph Transformer with Boundary-aware Attention, can achieve state-of-the-art segmentation performance.

多峰值 · 樣例 · contrastive · Learning · 對比學習 ·

2023 年 8 月 14 日

AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning

Ziqi Zhou,Shengshan Hu,Minghui Li,Hangtao Zhang,Yechao Zhang,Hai Jin

from arxiv, This paper has been accepted by the ACM International Conference on Multimedia (ACM MM '23, October 29-November 3, 2023, Ottawa, ON, Canada)

Multimodal contrastive learning aims to train a general-purpose feature extractor, such as CLIP, on vast amounts of raw, unlabeled paired image-text data. This can greatly benefit various complex downstream tasks, including cross-modal image-text retrieval and image classification. Despite its promising prospect, the security issue of cross-modal pre-trained encoder has not been fully explored yet, especially when the pre-trained encoder is publicly available for commercial use. In this work, we propose AdvCLIP, the first attack framework for generating downstream-agnostic adversarial examples based on cross-modal pre-trained encoders. AdvCLIP aims to construct a universal adversarial patch for a set of natural images that can fool all the downstream tasks inheriting the victim cross-modal pre-trained encoder. To address the challenges of heterogeneity between different modalities and unknown downstream tasks, we first build a topological graph structure to capture the relevant positions between target samples and their neighbors. Then, we design a topology-deviation based generative adversarial network to generate a universal adversarial patch. By adding the patch to images, we minimize their embeddings similarity to different modality and perturb the sample distribution in the feature space, achieving unviersal non-targeted attacks. Our results demonstrate the excellent attack performance of AdvCLIP on two types of downstream tasks across eight datasets. We also tailor three popular defenses to mitigate AdvCLIP, highlighting the need for new defense mechanisms to defend cross-modal pre-trained encoders.

塊 · MoDELS · 求逆 · Extensibility · state-of-the-art ·

2023 年 8 月 14 日

Rethinking Mobile Block for Efficient Attention-based Models

Jiangning Zhang,Xiangtai Li,Jian Li,Liang Liu,Zhucun Xue,Boshen Zhang,Zhengkai Jiang,Tianxin Huang,Yabiao Wang,Chengjie Wang

This paper focuses on developing modern, efficient, lightweight models for dense predictions while trading off parameters, FLOPs, and performance. Inverted Residual Block (IRB) serves as the infrastructure for lightweight CNNs, but no counterpart has been recognized by attention-based studies. This work rethinks lightweight infrastructure from efficient IRB and effective components of Transformer from a unified perspective, extending CNN-based IRB to attention-based models and abstracting a one-residual Meta Mobile Block (MMB) for lightweight model design. Following simple but effective design criterion, we deduce a modern Inverted Residual Mobile Block (iRMB) and build a ResNet-like Efficient MOdel (EMO) with only iRMB for down-stream tasks. Extensive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, e.g., EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass equal-order CNN-/Attention-based models, while trading-off the parameter, efficiency, and accuracy well: running 2.8-4.0x faster than EdgeNeXt on iPhone14.

矩 · 變換 · MoDELS · Learning · Extensibility ·

2023 年 8 月 14 日

Knowing Where to Focus: Event-aware Transformer for Video Grounding

Jinhyun Jang,Jungin Park,Jin Kim,Hyeongjun Kwon,Kwanghoon Sohn

from arxiv, ICCV 2023. Code is available at //github.com/jinhyunj/EaTR

Recent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries. However, their input-agnostic moment queries inevitably overlook an intrinsic temporal structure of a video, providing limited positional information. In this paper, we formulate an event-aware dynamic moment query to enable the model to take the input-specific content and positional information of the video into account. To this end, we present two levels of reasoning: 1) Event reasoning that captures distinctive event units constituting a given video using a slot attention mechanism; and 2) moment reasoning that fuses the moment queries with a given sentence through a gated fusion transformer layer and learns interactions between the moment queries and video-sentence representations to predict moment timestamps. Extensive experiments demonstrate the effectiveness and efficiency of the event-aware dynamic moment queries, outperforming state-of-the-art approaches on several video grounding benchmarks.

3D · 點云 · 設計 · Learning · Extensibility ·

2023 年 8 月 11 日

PersonalTailor: Personalizing 2D Pattern Design from 3D Garment Point Clouds

Sauradip Nag,Anran Qi,Xiatian Zhu,Ariel Shamir

from arxiv, Technical Report

Garment pattern design aims to convert a 3D garment to the corresponding 2D panels and their sewing structure. Existing methods rely either on template fitting with heuristics and prior assumptions, or on model learning with complicated shape parameterization. Importantly, both approaches do not allow for personalization of the output garment, which today has increasing demands. To fill this demand, we introduce PersonalTailor: a personalized 2D pattern design method, where the user can input specific constraints or demands (in language or sketch) for personal 2D panel fabrication from 3D point clouds. PersonalTailor first learns a multi-modal panel embeddings based on unsupervised cross-modal association and attentive fusion. It then predicts a binary panel masks individually using a transformer encoder-decoder framework. Extensive experiments show that our PersonalTailor excels on both personalized and standard pattern fabrication tasks.

目標檢測 · 學成 · 冪法 · 可辨認的 · 深度學習 ·

2018 年 9 月 6 日

Deep Learning for Generic Object Detection: A Survey

Li Liu,Wanli Ouyang,Xiaogang Wang,Paul Fieguth,Jie Chen,Xinwang Liu,Matti Pietik?inen

from arxiv, Submitted to IJCV, 30pages

Generic object detection, aiming at locating object instances from a large number of predefined categories in natural images, is one of the most fundamental and challenging problems in computer vision. Deep learning techniques have emerged in recent years as powerful methods for learning feature representations directly from data, and have led to remarkable breakthroughs in the field of generic object detection. Given this time of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought by deep learning techniques. More than 250 key contributions are included in this survey, covering many aspects of generic object detection research: leading detection frameworks and fundamental subproblems including object feature representation, object proposal generation, context information modeling and training strategies; evaluation issues, specifically benchmark datasets, evaluation metrics, and state of the art performance. We finish by identifying promising directions for future research.