人人操人人莫人人草-国产三级A专区在线观看播放

Real-world documents may suffer various forms of degradation, often resulting in lower accuracy in optical character recognition (OCR) systems. Therefore, a crucial preprocessing step is essential to eliminate noise while preserving text and key features of documents. In this paper, we propose NAF-DPM, a novel generative framework based on a diffusion probabilistic model (DPM) designed to restore the original quality of degraded documents. While DPMs are recognized for their high-quality generated images, they are also known for their large inference time. To mitigate this problem we provide the DPM with an efficient nonlinear activation-free (NAF) network and we employ as a sampler a fast solver of ordinary differential equations, which can converge in a few iterations. To better preserve text characters, we introduce an additional differentiable module based on convolutional recurrent neural networks, simulating the behavior of an OCR system during training. Experiments conducted on various datasets showcase the superiority of our approach, achieving state-of-the-art performance in terms of pixel-level and perceptual similarity metrics. Furthermore, the results demonstrate a notable character error reduction made by OCR systems when transcribing real-world document images enhanced by our framework. Code and pre-trained models are available at //github.com/ispamm/NAF-DPM.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 圖像降噪 · 去噪 · 相關系數 · INFORMS ·

2024 年 5 月 23 日

SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising

Guanyiman Fu,Fengchao Xiong,Jianfeng Lu,Jun Zhou,Yuntao Qian

Denoising is a crucial preprocessing procedure for hyperspectral images (HSIs) due to the noise originating from intra-imaging mechanisms and environmental factors. Utilizing domain knowledge of HSIs, such as spectral correlation, spatial self-similarity, and spatial-spectral correlation, is essential for deep learning-based denoising. Existing methods are often constrained by running time, space complexity, and computational complexity, employing strategies that explore these kinds of domain knowledge separately. While these strategies can avoid some redundant information, they inevitably overlook broader and more in-depth long-range spatial-spectral information that positively impacts image restoration. This paper proposes a Spatial-Spectral Selective State Space Model-based U-shaped network, Spatial-Spectral U-Mamba (SSUMamba), for hyperspectral image denoising. The SSUMamba can exploit complete global spatial-spectral correlation within a module thanks to the linear space complexity in State Space Model (SSM) computations. We introduce a Spatial-Spectral Alternating Zigzag Scan (SSAZS) strategy for HSIs, which helps exploit the continuous information flow in multiple directions of 3-D characteristics within HSIs. Experimental results demonstrate that our method outperforms comparison methods. The source code is available at //github.com/lronkitty/SSUMamba.

SimPLe · 變換 · MoDELS · Guidance · 值域 ·

2024 年 5 月 23 日

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Shiqi Yang,Zhi Zhong,Mengjie Zhao,Shusuke Takahashi,Masato Ishii,Takashi Shibuya,Yuki Mitsufuji

from arxiv, 10 pages

In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation methods usually resort to huge large language model or composable diffusion models. Instead of designing another giant model for audio-visual generation, in this paper we take a step back showing a simple and lightweight generative transformer, which is not fully investigated in multi-modal generation, can achieve excellent results on image2audio generation. The transformer operates in the discrete audio and visual Vector-Quantized GAN space, and is trained in the mask denoising manner. After training, the classifier-free guidance could be deployed off-the-shelf achieving better performance, without any extra training or modification. Since the transformer model is modality symmetrical, it could also be directly deployed for audio2image generation and co-generation. In the experiments, we show that our simple method surpasses recent image2audio generation methods. Generated audio samples can be found at //docs.google.com/presentation/d/1ZtC0SeblKkut4XJcRaDsSTuCRIXB3ypxmSi7HTY3IyQ

INFORMS · 圖 · 估計/估計量 · 層 · INTERACT ·

2024 年 5 月 23 日

Quantifying Multivariate Graph Dependencies: Theory and Estimation for Multiplex Graphs

Anda Skeja,Sofia C. Olhede

Multiplex graphs, characterised by their layered structure, exhibit informative interdependencies within layers that are crucial for understanding complex network dynamics. Quantifying the interaction and shared information among these layers is challenging due to the non-Euclidean structure of graphs. Our paper introduces a comprehensive theory of multivariate information measures for multiplex graphs. We introduce graphon mutual information for pairs of graphs and expand this to graphon interaction information for three or more graphs, including their conditional variants. We then define graphon total correlation and graphon dual total correlation, along with their conditional forms, and introduce graphon $O-$information. We discuss and quantify the concepts of synergy and redundancy in graphs for the first time, introduce consistent nonparametric estimators for these multivariate graphon information--theoretic measures, and provide their convergence rates. We also conduct a simulation study to illustrate our theoretical findings and demonstrate the relationship between the introduced measures, multiplex graph structure, and higher--order interdependecies. Real-world applications further show the utility of our estimators in revealing shared information and dependence structures in real-world multiplex graphs. This work not only answers fundamental questions about information sharing across multiple graphs but also sets the stage for advanced pattern analysis in complex networks.

端到端 · 優化器 · NeRF · 表示 · 可約的 ·

2024 年 5 月 23 日

JointRF: End-to-End Joint Optimization for Dynamic Neural Radiance Field Representation and Compression

Zihan Zheng,Houqiang Zhong,Qiang Hu,Xiaoyun Zhang,Li Song,Ya Zhang,Yanfeng Wang

from arxiv, 8 pages, 5 figures

Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos. However, rendering dynamic and long-sequence radiance fields remains challenging due to the significant data required to represent volumetric videos. In this paper, we propose a novel end-to-end joint optimization scheme of dynamic NeRF representation and compression, called JointRF, thus achieving significantly improved quality and compression efficiency against the previous methods. Specifically, JointRF employs a compact residual feature grid and a coefficient feature grid to represent the dynamic NeRF. This representation handles large motions without compromising quality while concurrently diminishing temporal redundancy. We also introduce a sequential feature compression subnetwork to further reduce spatial-temporal redundancy. Finally, the representation and compression subnetworks are end-to-end trained combined within the JointRF. Extensive experiments demonstrate that JointRF can achieve superior compression performance across various datasets.

MoDELS · 圖像分割 · Learning · 模型評估 · 損失 ·

2024 年 5 月 23 日

DuEDL: Dual-Branch Evidential Deep Learning for Scribble-Supervised Medical Image Segmentation

Yitong Yang,Xinli Xu,Haigen Hu,Haixia Long,Qianwei Zhou,Qiu Guan

from arxiv, 14 pages, 2 figures

Despite the recent progress in medical image segmentation with scribble-based annotations, the segmentation results of most models are still not ro-bust and generalizable enough in open environments. Evidential deep learn-ing (EDL) has recently been proposed as a promising solution to model predictive uncertainty and improve the reliability of medical image segmen-tation. However directly applying EDL to scribble-supervised medical im-age segmentation faces a tradeoff between accuracy and reliability. To ad-dress the challenge, we propose a novel framework called Dual-Branch Evi-dential Deep Learning (DuEDL). Firstly, the decoder of the segmentation network is changed to two different branches, and the evidence of the two branches is fused to generate high-quality pseudo-labels. Then the frame-work applies partial evidence loss and two-branch consistent loss for joint training of the model to adapt to the scribble supervision learning. The pro-posed method was tested on two cardiac datasets: ACDC and MSCMRseg. The results show that our method significantly enhances the reliability and generalization ability of the model without sacrificing accuracy, outper-forming state-of-the-art baselines. The code is available at //github.com/Gardnery/DuEDL.

MoDELS · Performer · state-of-the-art · HTTPS · 圖像修復 ·

2024 年 5 月 21 日

Ship in Sight: Diffusion Models for Ship-Image Super Resolution

Luigi Sigillo,Riccardo Fosco Gramaccioni,Alessandro Nicolosi,Danilo Comminiello

from arxiv, Accepted at 2024 International Joint Conference on Neural Networks (IJCNN)

In recent years, remarkable advancements have been achieved in the field of image generation, primarily driven by the escalating demand for high-quality outcomes across various image generation subtasks, such as inpainting, denoising, and super resolution. A major effort is devoted to exploring the application of super-resolution techniques to enhance the quality of low-resolution images. In this context, our method explores in depth the problem of ship image super resolution, which is crucial for coastal and port surveillance. We investigate the opportunity given by the growing interest in text-to-image diffusion models, taking advantage of the prior knowledge that such foundation models have already learned. In particular, we present a diffusion-model-based architecture that leverages text conditioning during training while being class-aware, to best preserve the crucial details of the ships during the generation of the super-resoluted image. Since the specificity of this task and the scarcity availability of off-the-shelf data, we also introduce a large labeled ship dataset scraped from online ship images, mostly from ShipSpotting\footnote{\url{www.shipspotting.com}} website. Our method achieves more robust results than other deep learning models previously employed for super resolution, as proven by the multiple experiments performed. Moreover, we investigate how this model can benefit downstream tasks, such as classification and object detection, thus emphasizing practical implementation in a real-world scenario. Experimental results show flexibility, reliability, and impressive performance of the proposed framework over state-of-the-art methods for different tasks. The code is available at: //github.com/LuigiSigillo/ShipinSight .

INFORMS · 邊 · 神經張量網絡 · 可約的 · Networking ·

2024 年 5 月 21 日

Edge Information Hub-Empowered 6G NTN: Latency-Oriented Resource Orchestration and Configuration

Yueshan Lin,Wei Feng,Yunfei Chen,Ning Ge,Zhiyong Feng,Yue Gao

Quick response to disasters is crucial for saving lives and reducing loss. This requires low-latency uploading of situation information to the remote command center. Since terrestrial infrastructures are often damaged in disaster areas, non-terrestrial networks (NTNs) are preferable to provide network coverage, and mobile edge computing (MEC) could be integrated to improve the latency performance. Nevertheless, the communications and computing in MEC-enabled NTNs are strongly coupled, which complicates the system design. In this paper, an edge information hub (EIH) that incorporates communication, computing and storage capabilities is proposed to synergize communication and computing and enable systematic design. We first address the joint data scheduling and resource orchestration problem to minimize the latency for uploading sensing data. The problem is solved using an optimal resource orchestration algorithm. On that basis, we propose the principles for resource configuration of the EIH considering payload constraints on size, weight and energy supply. Simulation results demonstrate the superiority of our proposed scheme in reducing the overall upload latency, thus enabling quick emergency rescue.

異常檢測 · Prompt · 規范化的 · MoDELS · Learning ·

2024 年 5 月 20 日

Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays

Zhichao Sun,Yuliang Gu,Yepeng Liu,Zerui Zhang,Zhou Zhao,Yongchao Xu

from arxiv, MICCAI 2024 Early Accept

Anomaly detection in chest X-rays is a critical task. Most methods mainly model the distribution of normal images, and then regard significant deviation from normal distribution as anomaly. Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks. In this paper, we aim to explore the potential of CLIP-based methods for anomaly detection in chest X-rays. Considering the discrepancy between the CLIP pre-training data and the task-specific data, we propose a position-guided prompt learning method. Specifically, inspired by the fact that experts diagnose chest X-rays by carefully examining distinct lung regions, we propose learnable position-guided text and image prompts to adapt the task data to the frozen pre-trained CLIP-based model. To enhance the model's discriminative capability, we propose a novel structure-preserving anomaly synthesis method within chest x-rays during the training process. Extensive experiments on three datasets demonstrate that our proposed method outperforms some state-of-the-art methods. The code of our implementation is available at //github.com/sunzc-sunny/PPAD.

相關系數 · 示例 · Learning · 表示學習 · 輸出 ·

2024 年 5 月 20 日

UniParser: Multi-Human Parsing with Unified Correlation Representation Learning

Jiaming Chu,Lei Jin,Junliang Xing,Jian Zhao

Multi-human parsing is an image segmentation task necessitating both instance-level and fine-grained category-level information. However, prior research has typically processed these two types of information through separate branches and distinct output formats, leading to inefficient and redundant frameworks. This paper introduces UniParser, which integrates instance-level and category-level representations in three key aspects: 1) we propose a unified correlation representation learning approach, allowing our network to learn instance and category features within the cosine space; 2) we unify the form of outputs of each modules as pixel-level segmentation results while supervising instance and category features using a homogeneous label accompanied by an auxiliary loss; and 3) we design a joint optimization procedure to fuse instance and category representations. By virtual of unifying instance-level and category-level output, UniParser circumvents manually designed post-processing techniques and surpasses state-of-the-art methods, achieving 49.3% AP on MHPv2.0 and 60.4% AP on CIHP. We will release our source code, pretrained models, and online demos to facilitate future studies.

圖 · 表征學習 · 知識圖譜 · INTERACT · Performer ·

2019 年 1 月 23 日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Hongwei Wang,Fuzheng Zhang,Miao Zhao,Wenjie Li,Xing Xie,Minyi Guo

from arxiv, In Proceedings of The 2019 Web Conference (WWW 2019)

Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems. In this paper, we consider knowledge graphs as the source of side information. We propose MKR, a Multi-task feature learning approach for Knowledge graph enhanced Recommendation. MKR is a deep end-to-end framework that utilizes knowledge graph embedding task to assist recommendation task. The two tasks are associated by cross&compress units, which automatically share latent features and learn high-order interactions between items in recommender systems and entities in the knowledge graph. We prove that cross&compress units have sufficient capability of polynomial approximation, and show that MKR is a generalized framework over several representative methods of recommender systems and multi-task learning. Through extensive experiments on real-world datasets, we demonstrate that MKR achieves substantial gains in movie, book, music, and news recommendation, over state-of-the-art baselines. MKR is also shown to be able to maintain a decent performance even if user-item interactions are sparse.