欧美精品日韩精品国内精品_亚洲色精品一区二区色欲AV_亚洲国产中文精品一区二区第一页_欧洲视频欧美视频在线播放_巨胸流奶水观看视频在线_午夜一区二区三区_人人搞人人摸人人操

Underwater image enhancement (UIE) poses challenges due to distinctive properties of the underwater environment, including low contrast, high turbidity, visual blurriness, and color distortion. In recent years, the application of deep learning has quietly revolutionized various areas of scientific research, including UIE. However, existing deep learning-based UIE methods generally suffer from issues of weak robustness and limited adaptability. In this paper, inspired by residual and attention mechanisms, we propose a more reliable and reasonable UIE network called RAUNE-Net by employing residual learning of high-level features at the network's bottle-neck and two aspects of attention manipulations in the down-sampling procedure. Furthermore, we collect and create two datasets specifically designed for evaluating UIE methods, which contains different types of underwater distortions and degradations. The experimental validation demonstrates that our method obtains promising objective performance and consistent visual results across various real-world underwater images compared to other eight UIE methods. Our example code and datasets are publicly available at //github.com/fansuregrin/RAUNE-Net.

相關內容

Networking

關注 22

Networking：IFIP International Conferences on Networking。 Explanation：國(guo)際網絡會議。 Publisher：IFIP。 SIT：

INFORMS · Processing（編程語言） · 控制器 · Extensibility · Guidance ·

2023 年 12 月 18 日

DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization

Nisha Huang,Yuxin Zhang,Fan Tang,Chongyang Ma,Haibin Huang,Yong Zhang,Weiming Dong,Changsheng Xu

Despite the impressive results of arbitrary image-guided style transfer methods, text-driven image stylization has recently been proposed for transferring a natural image into a stylized one according to textual descriptions of the target style provided by the user. Unlike the previous image-to-image transfer approaches, text-guided stylization progress provides users with a more precise and intuitive way to express the desired style. However, the huge discrepancy between cross-modal inputs/outputs makes it challenging to conduct text-driven image stylization in a typical feed-forward CNN pipeline. In this paper, we present DiffStyler, a dual diffusion processing architecture to control the balance between the content and style of the diffused results. The cross-modal style information can be easily integrated as guidance during the diffusion process step-by-step. Furthermore, we propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image. We validate the proposed DiffStyler beyond the baseline methods through extensive qualitative and quantitative experiments. Code is available at \url{//github.com/haha-lisa/Diffstyler}.

偏移量 · Performer · 估計/估計量 · 在線 · 可約的 ·

2023 年 12 月 18 日

TMP: Temporal Motion Propagation for Online Video Super-Resolution

Zhengqiang Zhang,Ruihuang Li,Shi Guo,Yang Cao,Lei Zhang

Online video super-resolution (online-VSR) highly relies on an effective alignment module to aggregate temporal information, while the strict latency requirement makes accurate and efficient alignment very challenging. Though much progress has been achieved, most of the existing online-VSR methods estimate the motion fields of each frame separately to perform alignment, which is computationally redundant and ignores the fact that the motion fields of adjacent frames are correlated. In this work, we propose an efficient Temporal Motion Propagation (TMP) method, which leverages the continuity of motion field to achieve fast pixel-level alignment among consecutive frames. Specifically, we first propagate the offsets from previous frames to the current frame, and then refine them in the neighborhood, which significantly reduces the matching space and speeds up the offset estimation process. Furthermore, to enhance the robustness of alignment, we perform spatial-wise weighting on the warped features, where the positions with more precise offsets are assigned higher importance. Experiments on benchmark datasets demonstrate that the proposed TMP method achieves leading online-VSR accuracy as well as inference speed. The source code of TMP can be found at //github.com/xtudbxk/TMP.

MoDELS · 機器人 · 數據集 · Learning · HTTPS ·

2023 年 12 月 18 日

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration,Abhishek Padalkar,Acorn Pooley,Ajay Mandlekar,Ajinkya Jain,Albert Tung,Alex Bewley,Alex Herzog,Alex Irpan,Alexander Khazatsky,Anant Rai,Anikait Singh,Animesh Garg,Anthony Brohan,Antonin Raffin,Ayzaan Wahid,Ben Burgess-Limerick,Beomjoon Kim,Bernhard Sch?lkopf,Brian Ichter,Cewu Lu,Charles Xu,Chelsea Finn,Chenfeng Xu,Cheng Chi,Chenguang Huang,Christine Chan,Chuer Pan,Chuyuan Fu,Coline Devin,Danny Driess,Deepak Pathak,Dhruv Shah,Dieter Büchler,Dmitry Kalashnikov,Dorsa Sadigh,Edward Johns,Federico Ceola,Fei Xia,Freek Stulp,Gaoyue Zhou,Gaurav S. Sukhatme,Gautam Salhotra,Ge Yan,Giulio Schiavi,Gregory Kahn,Hao Su,Hao-Shu Fang,Haochen Shi,Heni Ben Amor,Henrik I Christensen,Hiroki Furuta,Homer Walke,Hongjie Fang,Igor Mordatch,Ilija Radosavovic,Isabel Leal,Jacky Liang,Jad Abou-Chakra,Jaehyung Kim,Jan Peters,Jan Schneider,Jasmine Hsu,Jeannette Bohg,Jeffrey Bingham,Jiajun Wu,Jialin Wu,Jianlan Luo,Jiayuan Gu,Jie Tan,Jihoon Oh,Jitendra Malik,Jonathan Booher,Jonathan Tompson,Jonathan Yang,Joseph J. Lim,Jo?o Silvério,Junhyek Han,Kanishka Rao,Karl Pertsch,Karol Hausman,Keegan Go,Keerthana Gopalakrishnan,Ken Goldberg,Kendra Byrne,Kenneth Oslund,Kento Kawaharazuka,Kevin Zhang,Krishan Rana,Krishnan Srinivasan,Lawrence Yunliang Chen,Lerrel Pinto,Li Fei-Fei,Liam Tan,Lionel Ott,Lisa Lee,Masayoshi Tomizuka,Max Spero,Maximilian Du,Michael Ahn,Mingtong Zhang,Mingyu Ding,Mohan Kumar Srirama,Mohit Sharma,Moo Jin Kim,Naoaki Kanazawa,Nicklas Hansen,Nicolas Heess,Nikhil J Joshi,Niko Suenderhauf,Norman Di Palo,Nur Muhammad Mahi Shafiullah,Oier Mees,Oliver Kroemer,Pannag R Sanketi,Paul Wohlhart,Peng Xu,Pierre Sermanet,Priya Sundaresan,Quan Vuong,Rafael Rafailov,Ran Tian,Ria Doshi,Roberto Martín-Martín,Russell Mendonca,Rutav Shah,Ryan Hoque,Ryan Julian,Samuel Bustamante,Sean Kirmani,Sergey Levine,Sherry Moore,Shikhar Bahl,Shivin Dass,Shubham Sonawani,Shuran Song,Sichun Xu,Siddhant Haldar,Simeon Adebola,Simon Guist,Soroush Nasiriany,Stefan Schaal,Stefan Welker,Stephen Tian,Sudeep Dasari,Suneel Belkhale,Takayuki Osa,Tatsuya Harada,Tatsuya Matsushima,Ted Xiao,Tianhe Yu,Tianli Ding,Todor Davchev,Tony Z. Zhao,Travis Armstrong,Trevor Darrell,Vidhi Jain,Vincent Vanhoucke,Wei Zhan,Wenxuan Zhou,Wolfram Burgard,Xi Chen,Xiaolong Wang,Xinghao Zhu,Xuanlin Li,Yao Lu,Yevgen Chebotar,Yifan Zhou,Yifeng Zhu,Ying Xu,Yixuan Wang,Yonatan Bisk,Yoonyoung Cho,Youngwoon Lee,Yuchen Cui,Yueh-Hua Wu,Yujin Tang,Yuke Zhu,Yunzhu Li,Yusuke Iwasawa,Yutaka Matsuo,Zhuo Xu,Zichen Jeff Cui

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website $\href{//robotics-transformer-x.github.io}{\text{robotics-transformer-x.github.io}}$.

3D · 潛在 · Learning · Extensibility · INFORMS ·

2023 年 12 月 18 日

Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation

Hui Fu,Zeqing Wang,Ke Gong,Keze Wang,Tianshui Chen,Haojie Li,Haifeng Zeng,Wenxiong Kang

from arxiv, 7 pages, 6 figures, accepted by AAAI-24

Speech-driven 3D facial animation aims to synthesize vivid facial animations that accurately synchronize with speech and match the unique speaking style. However, existing works primarily focus on achieving precise lip synchronization while neglecting to model the subject-specific speaking style, often resulting in unrealistic facial animations. To the best of our knowledge, this work makes the first attempt to explore the coupled information between the speaking style and the semantic content in facial motions. Specifically, we introduce an innovative speaking style disentanglement method, which enables arbitrary-subject speaking style encoding and leads to a more realistic synthesis of speech-driven facial animations. Subsequently, we propose a novel framework called \textbf{Mimic} to learn disentangled representations of the speaking style and content from facial motions by building two latent spaces for style and content, respectively. Moreover, to facilitate disentangled representation learning, we introduce four well-designed constraints: an auxiliary style classifier, an auxiliary inverse classifier, a content contrastive loss, and a pair of latent cycle losses, which can effectively contribute to the construction of the identity-related style space and semantic-related content space. Extensive qualitative and quantitative experiments conducted on three publicly available datasets demonstrate that our approach outperforms state-of-the-art methods and is capable of capturing diverse speaking styles for speech-driven 3D facial animation. The source code and supplementary video are publicly available at: //zeqing-wang.github.io/Mimic/

Segment Anything · MoDELS · Integration · Boosting（一種模型訓練加速方式） · 掩碼 ·

2023 年 12 月 17 日

SAM-Deblur: Let Segment Anything Boost Image Deblurring

Siwei Li,Mingxuan Liu,Yating Zhang,Shu Chen,Haoxiang Li,Zifei Dou,Hong Chen

from arxiv, Accepted to ICASSP 2024

Image deblurring is a critical task in the field of image restoration, aiming to eliminate blurring artifacts. However, the challenge of addressing non-uniform blurring leads to an ill-posed problem, which limits the generalization performance of existing deblurring models. To solve the problem, we propose a framework SAM-Deblur, integrating prior knowledge from the Segment Anything Model (SAM) into the deblurring task for the first time. In particular, SAM-Deblur is divided into three stages. First, we preprocess the blurred images, obtain segment masks via SAM, and propose a mask dropout method for training to enhance model robustness. Then, to fully leverage the structural priors generated by SAM, we propose a Mask Average Pooling (MAP) unit specifically designed to average SAM-generated segmented areas, serving as a plug-and-play component which can be seamlessly integrated into existing deblurring networks. Finally, we feed the fused features generated by the MAP Unit into the deblurring model to obtain a sharp image. Experimental results on the RealBlurJ, ReloBlur, and REDS datasets reveal that incorporating our methods improves GoPro-trained NAFNet's PSNR by 0.05, 0.96, and 7.03, respectively. Project page is available at GitHub \href{//hplqaq.github.io/projects/sam-deblur}{HPLQAQ/SAM-Deblur}.

CLIP · 推斷 · Extensibility · MoDELS · AIM ·

2023 年 12 月 17 日

SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference

Feng Wang,Jieru Mei,Alan Yuille

Recent advances in contrastive language-image pretraining (CLIP) have demonstrated strong capabilities in zero-shot classification by aligning visual representations with target text embeddings in an image level. However, in dense prediction tasks, CLIP often struggles to localize visual features within an image and fails to give accurate pixel-level predictions, which prevents it from functioning as a generalized visual foundation model. In this work, we aim to enhance CLIP's potential for semantic segmentation with minimal modifications to its pretrained models. By rethinking self-attention, we surprisingly find that CLIP can adapt to dense prediction tasks by simply introducing a novel Correlative Self-Attention (CSA) mechanism. Specifically, we replace the traditional self-attention block of CLIP vision encoder's last layer by our CSA module and reuse its pretrained projection matrices of query, key, and value, leading to a training-free adaptation approach for CLIP's zero-shot semantic segmentation. Extensive experiments show the advantage of CSA: we obtain a 38.2% average zero-shot mIoU across eight semantic segmentation benchmarks highlighted in this paper, significantly outperforming the existing SoTA's 33.9% and the vanilla CLIP's 14.1%.

估計/估計量 · 最大后驗 · 向量空間 · 狀態估計 · 最大后驗估計 ·

2023 年 12 月 16 日

Stein-MAP: A Sequential Variational Inference Framework for Maximum A Posteriori Estimation

Min-Won Seo,Solmaz S. Kia

from arxiv, 13 pages

State estimation poses substantial challenges in robotics, often involving encounters with multimodality in real-world scenarios. To address these challenges, it is essential to calculate Maximum a posteriori (MAP) sequences from joint probability distributions of latent states and observations over time. However, it generally involves a trade-off between approximation errors and computational complexity. In this article, we propose a new method for MAP sequence estimation called Stein-MAP, which effectively manages multimodality with fewer approximation errors while significantly reducing computational and memory burdens. Our key contribution lies in the introduction of a sequential variational inference framework designed to handle temporal dependencies among transition states within dynamical system models. The framework integrates Stein's identity from probability theory and reproducing kernel Hilbert space (RKHS) theory, enabling computationally efficient MAP sequence estimation. As a MAP sequence estimator, Stein-MAP boasts a computational complexity of O(N), where N is the number of particles, in contrast to the O(N^2) complexity of the Viterbi algorithm. The proposed method is empirically validated through real-world experiments focused on range-only (wireless) localization. The results demonstrate a substantial enhancement in state estimation compared to existing methods. A remarkable feature of Stein-MAP is that it can attain improved state estimation with only 40 to 50 particles, as opposed to the 1000 particles that the particle filter or its variants require.

Learning · Performer · MoDELS · 類別 · 知識 (knowledge) ·

2023 年 2 月 7 日

Deep Class-Incremental Learning: A Survey

Da-Wei Zhou,Qi-Wei Wang,Zhi-Hong Qi,Han-Jia Ye,De-Chuan Zhan,Ziwei Liu

from arxiv, Code is available at //github.com/zhoudw-zdw/CIL_Survey/

Deep models, e.g., CNNs and Vision Transformers, have achieved impressive achievements in many vision tasks in the closed world. However, novel classes emerge from time to time in our ever-changing world, requiring a learning system to acquire new knowledge continually. For example, a robot needs to understand new instructions, and an opinion monitoring system should analyze emerging topics every day. Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally and build a universal classifier among all seen classes. Correspondingly, when directly training the model with new class instances, a fatal problem occurs -- the model tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades. There have been numerous efforts to tackle catastrophic forgetting in the machine learning community. In this paper, we survey comprehensively recent advances in deep class-incremental learning and summarize these methods from three aspects, i.e., data-centric, model-centric, and algorithm-centric. We also provide a rigorous and unified evaluation of 16 methods in benchmark image classification tasks to find out the characteristics of different algorithms empirically. Furthermore, we notice that the current comparison protocol ignores the influence of memory budget in model storage, which may result in unfair comparison and biased results. Hence, we advocate fair comparison by aligning the memory budget in evaluation, as well as several memory-agnostic performance measures. The source code to reproduce these evaluations is available at //github.com/zhoudw-zdw/CIL_Survey/

Extensibility · 噪聲 · Performer · state-of-the-art · 學成 ·

2021 年 6 月 30 日

Affective Image Content Analysis: Two Decades Review and New Perspectives

Sicheng Zhao,Xingxu Yao,Jufeng Yang,Guoli Jia,Guiguang Ding,Tat-Seng Chua,Bj?rn W. Schuller,Kurt Keutzer

from arxiv, Accepted by IEEE TPAMI

Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.

Neural Networks · 圖 · Networks · Taxonomy · 圖神經網絡 ·

2020 年 12 月 16 日

Graph Neural Networks: Taxonomy, Advances and Trends

Yu Zhou,Haixia Zheng,Xin Huang

from arxiv, 50 pages, 7 figures

Graph neural networks provide a powerful toolkit for embedding real-world graphs into low-dimensional spaces according to specific tasks. Up to now, there have been several surveys on this topic. However, they usually lay emphasis on different angles so that the readers can not see a panorama of the graph neural networks. This survey aims to overcome this limitation, and provide a comprehensive review on the graph neural networks. First of all, we provide a novel taxonomy for the graph neural networks, and then refer to up to 400 relevant literatures to show the panorama of the graph neural networks. All of them are classified into the corresponding categories. In order to drive the graph neural networks into a new stage, we summarize four future research directions so as to overcome the facing challenges. It is expected that more and more scholars can understand and exploit the graph neural networks, and use them in their research community.