精品亚洲中文一区二区三区_国产高潮白浆调教福利在线视频_亚洲国产中文成人挑花视频_日韩精品卡1卡2三卡四卡乱码_特级毛片在线视频免费看_一区二区三区精品国产欧美_中文字幕无码精品亚洲资源网

Most existing methods for text-based person retrieval focus on text-to-image person retrieval. Nevertheless, due to the lack of dynamic information provided by isolated frames, the performance is hampered when the person is obscured in isolated frames or variable motion details are given in the textual description. In this paper, we propose a new task called Text-to-Video Person Retri;\n border-radius: 5px;margin-bottom: 15px;" data-v-238316ac>

相關內容

行(xing)人重識(shi)別

關注 862

行人(ren)(ren)(ren)重識(shi)別(bie)（Person re-identification）也稱行人(ren)(ren)(ren)再識(shi)別(bie)，是利用計算機視(shi)(shi)覺技術(shu)判斷圖像(xiang)或者(zhe)視(shi)(shi)頻序列(lie)中(zhong)是否存在(zai)特定(ding)行人(ren)(ren)(ren)的(de)(de)(de)技術(shu)。廣(guang)(guang)泛(fan)被認為(wei)是一(yi)個圖像(xiang)檢索的(de)(de)(de)子(zi)問題。給(gei)定(ding)一(yi)個監控(kong)行人(ren)(ren)(ren)圖像(xiang)，檢索跨設備(bei)下的(de)(de)(de)該(gai)行人(ren)(ren)(ren)圖像(xiang)。旨在(zai)彌補目前固(gu)定(ding)的(de)(de)(de)攝(she)像(xiang)頭(tou)的(de)(de)(de)視(shi)(shi)覺局限，并可(ke)(ke)與行人(ren)(ren)(ren)檢測(ce)/行人(ren)(ren)(ren)跟(gen)蹤(zong)技術(shu)相結合，可(ke)(ke)廣(guang)(guang)泛(fan)應用于智(zhi)(zhi)能視(shi)(shi)頻監控(kong)、智(zhi)(zhi)能安(an)保等領域。由(you)于不同攝(she)像(xiang)設備(bei)之間的(de)(de)(de)差(cha)異，同時行人(ren)(ren)(ren)兼具剛性(xing)和柔性(xing)的(de)(de)(de)特性(xing) ，外觀易受穿著、尺

知識薈萃
精品入(ru)門和(he)進階教程、論(lun)文和(he)代碼整理等
更多
查看相關VIP內容、論文(wen)、資訊等

推斷 · network inference · 可約的 · Networking · Learning ·
2023 年 9 月 6 日

LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup

Xiaohu Tang,Yang Wang,Ting Cao,Li Lyna Zhang,Qi Chen,Deng Cai,Yunxin Liu,Mao Yang

On-device Deep Neural Network (DNN) inference consumes significant computing resources and development efforts. To alleviate that, we propose LUT-NN, the first system to empower inference by table lookup, to reduce inference cost. LUT-NN learns the typical features for each operator, named centroid, and precompute the results for these centroids to save in lookup tables. During inference, the results of the closest centroids with the inputs can be read directly from the table, as the approximated outputs without computations. LUT-NN integrates two major novel techniques: (1) differentiable centroid learning through backpropagation, which adapts three levels of approximation to minimize the accuracy impact by centroids; (2) table lookup inference execution, which comprehensively considers different levels of parallelism, memory access reduction, and dedicated hardware units for optimal performance. LUT-NN is evaluated on multiple real tasks, covering image and speech recognition, and nature language processing. Compared to related work, LUT-NN improves accuracy by 66% to 92%, achieving similar level with the original models. LUT-NN reduces the cost at all dimensions, including FLOPs ($\leq$ 16x), model size ($\leq$ 7x), latency ($\leq$ 6.8x), memory ($\leq$ 6.5x), and power ($\leq$ 41.7%).

泛函 · Performer · 方陣 · 操作 · HTTPS ·
2023 年 9 月 5 日

TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems

Maurus Item,Juan Gómez-Luna,Yuxin Guo,Geraldo F. Oliveira,Mohammad Sadrosadati,Onur Mutlu

from arxiv, Our open-source software is available at //github.com/CMU-SAFARI/transpimlib

Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at~\url{//github.com/CMU-SAFARI/transpimlib}.

MoDELS · Integration · Boosting（一種模型訓練加速方式） · 掩碼 · 圖像還原 ·
2023 年 9 月 5 日

SAM-Deblur: Let Segment Anything Boost Image Deblurring

Siwei Li,Mingxuan Liu,Yating Zhang,Shu Chen,Haoxiang Li,Hong Chen,Zifei Dou

from arxiv, Under review

Image deblurring is a critical task in the field of image restoration, aiming to eliminate blurring artifacts. However, the challenge of addressing non-uniform blurring leads to an ill-posed problem, which limits the generalization performance of existing deblurring models. To solve the problem, we propose a framework SAM-Deblur, integrating prior knowledge from the Segment Anything Model (SAM) into the deblurring task for the first time. In particular, SAM-Deblur is divided into three stages. First, We preprocess the blurred images, obtain image masks via SAM, and propose a mask dropout method for training to enhance model robustness. Then, to fully leverage the structural priors generated by SAM, we propose a Mask Average Pooling (MAP) unit specifically designed to average SAM-generated segmented areas, serving as a plug-and-play component which can be seamlessly integrated into existing deblurring networks. Finally, we feed the fused features generated by the MAP Unit into the deblurring model to obtain a sharp image. Experimental results on the RealBlurJ, ReloBlur, and REDS datasets reveal that incorporating our methods improves NAFNet's PSNR by 0.05, 0.96, and 7.03, respectively. Code will be available at \href{//github.com/HPLQAQ/SAM-Deblur}{SAM-Deblur}.

MoDELS · 生成模型 · HTTPS · state-of-the-art · 語言模型化 ·
2023 年 9 月 5 日

BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models

Jordan Vice,Naveed Akhtar,Richard Hartley,Ajmal Mian

from arxiv, This research was supported by National Intelligence and Security Discovery Research Grants (project# NS220100007), funded by the Department of Defence Australia

The rise in popularity of text-to-image generative artificial intelligence (AI) has attracted widespread public interest. We demonstrate that this technology can be attacked to generate content that subtly manipulates its users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM), which upon triggering, infuses the generated images with manipulative details that are naturally blended in the content. Our attack is the first to target three popular text-to-image generative models across three stages of the generative process by modifying the behaviour of the embedded tokenizer, the language model or the image generative model. Based on the penetration level, BAGM takes the form of a suite of attacks that are referred to as surface, shallow and deep attacks in this article. Given the existing gap within this domain, we also contribute a comprehensive set of quantitative metrics designed specifically for assessing the effectiveness of backdoor attacks on text-to-image models. The efficacy of BAGM is established by attacking state-of-the-art generative models, using a marketing scenario as the target domain. To that end, we contribute a dataset of branded product images. Our embedded backdoors increase the bias towards the target outputs by more than five times the usual, without compromising the model robustness or the generated content utility. By exposing generative AI's vulnerabilities, we encourage researchers to tackle these challenges and practitioners to exercise caution when using pre-trained models. Relevant code, input prompts and supplementary material can be found at //github.com/JJ-Vice/BAGM, and the dataset is available at: //ieee-dataport.org/documents/marketable-foods-mf-dataset. Keywords: Generative Artificial Intelligence, Generative Models, Text-to-Image generation, Backdoor Attacks, Trojan, Stable Diffusion.

ChatGPT · 異常檢測 · MoDELS · Performer · 泛化理論 ·
2023 年 9 月 3 日

LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection

Jiaxing Qi,Shaohan Huang,Zhongzhi Luan,Carol Fung,Hailong Yang,Depei Qian

The increasing volume of log data produced by software-intensive systems makes it impractical to analyze them manually. Many deep learning-based methods have been proposed for log-based anomaly detection. These methods face several challenges such as high-dimensional and noisy log data, class imbalance, generalization, and model interpretability. Recently, ChatGPT has shown promising results in various domains. However, there is still a lack of study on the application of ChatGPT for log-based anomaly detection. In this work, we proposed LogGPT, a log-based anomaly detection framework based on ChatGPT. By leveraging the ChatGPT's language interpretation capabilities, LogGPT aims to explore the transferability of knowledge from large-scale corpora to log-based anomaly detection. We conduct experiments to evaluate the performance of LogGPT and compare it with three deep learning-based methods on BGL and Spirit datasets. LogGPT shows promising results and has good interpretability. This study provides preliminary insights into prompt-based models, such as ChatGPT, for the log-based anomaly detection task.

穩健性 · MoDELS · 圖片分類 · 單峰值 · Integration ·
2023 年 9 月 1 日

CLIPAG: Towards Generator-Free Text-to-Image Generation

Roy Ganz,Michael Elad

Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings. While this phenomenon has gained significant research attention, it was solely studied in the context of unimodal vision-only architectures. In this work, we extend the study of PAG to Vision-Language architectures, which form the foundations for diverse image-text tasks and applications. Through an adversarial robustification finetuning of CLIP, we demonstrate that robust Vision-Language models exhibit PAG in contrast to their vanilla counterparts. This work reveals the merits of CLIP with PAG (CLIPAG) in several vision-language generative tasks. Notably, we show that seamlessly integrating CLIPAG in a "plug-n-play" manner leads to substantial improvements in vision-language generative applications. Furthermore, leveraging its PAG property, CLIPAG enables text-to-image generation without any generative model, which typically requires huge generators.

MoDELS · 規范化的 · 穩健性 · Extensibility · HTTPS ·
2023 年 9 月 1 日

DARC: Distribution-Aware Re-Coloring Model for Generalizable Nucleus Segmentation

Shengcong Chen,Changxing Ding,Dacheng Tao,Hao Chen

from arxiv, Accepted by MICCAI 2023

Nucleus segmentation is usually the first step in pathological image analysis tasks. Generalizable nucleus segmentation refers to the problem of training a segmentation model that is robust to domain gaps between the source and target domains. The domain gaps are usually believed to be caused by the varied image acquisition conditions, e.g., different scanners, tissues, or staining protocols. In this paper, we argue that domain gaps can also be caused by different foreground (nucleus)-background ratios, as this ratio significantly affects feature statistics that are critical to normalization layers. We propose a Distribution-Aware Re-Coloring (DARC) model that handles the above challenges from two perspectives. First, we introduce a re-coloring method that relieves dramatic image color variations between different domains. Second, we propose a new instance normalization method that is robust to the variation in foreground-background ratios. We evaluate the proposed methods on two H$\&$E stained image datasets, named CoNSeP and CPM17, and two IHC stained image datasets, called DeepLIIF and BC-DeepLIIF. Extensive experimental results justify the effectiveness of our proposed DARC model. Codes are available at \url{//github.com/csccsccsccsc/DARC

目標跟蹤 · Extensibility · 模態 · 數據集 · Performer ·
2021 年 11 月 11 日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Chenglong Li,Tianhao Zhu,Lei Liu,Xiaonan Si,Zilin Fan,Sulan Zhai

from arxiv, In Submission

In many visual systems, visual tracking often bases on RGB image sequences, in which some targets are invalid in low-light conditions, and tracking performance is thus affected significantly. Introducing other modalities such as depth and infrared data is an effective way to handle imaging limitations of individual sources, but multi-modal imaging platforms usually require elaborate designs and cannot be applied in many real-world applications at present. Near-infrared (NIR) imaging becomes an essential part of many surveillance cameras, whose imaging is switchable between RGB and NIR based on the light intensity. These two modalities are heterogeneous with very different visual properties and thus bring big challenges for visual tracking. However, existing works have not studied this challenging problem. In this work, we address the cross-modal object tracking problem and contribute a new video dataset, including 654 cross-modal image sequences with over 481K frames in total, and the average video length is more than 735 frames. To promote the research and development of cross-modal object tracking, we propose a new algorithm, which learns the modality-aware target representation to mitigate the appearance gap between RGB and NIR modalities in the tracking process. It is plug-and-play and could thus be flexibly embedded into different tracking frameworks. Extensive experiments on the dataset are conducted, and we demonstrate the effectiveness of the proposed algorithm in two representative tracking frameworks against 17 state-of-the-art tracking methods. We will release the dataset for free academic usage, dataset download link and code will be released soon.

事件抽取 · Extensibility · 端到端 · 有向非循環圖 · state-of-the-art ·
2019 年 9 月 23 日

Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction

Shun Zheng,Wei Cao,Wei Xu,Jiang Bian

from arxiv, Accepted by EMNLP 2019

Most existing event extraction (EE) methods merely extract event arguments within the sentence scope. However, such sentence-level EE methods struggle to handle soaring amounts of documents from emerging applications, such as finance, legislation, health, etc., where event arguments always scatter across different sentences, and even multiple such event mentions frequently co-exist in the same document. To address these challenges, we propose a novel end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic graph to fulfill the document-level EE (DEE) effectively. Moreover, we reformalize a DEE task with the no-trigger-words design to ease the document-level event labeling. To demonstrate the effectiveness of Doc2EDAG, we build a large-scale real-world dataset consisting of Chinese financial announcements with the challenges mentioned above. Extensive experiments with comprehensive analyses illustrate the superiority of Doc2EDAG over state-of-the-art methods. Data and codes can be found at //github.com/dolphin-zs/Doc2EDAG.

entity · 鏈路預測 · Extensibility · 圖 · 知識圖譜 ·
2019 年 3 月 13 日

MMKG: Multi-Modal Knowledge Graphs

Ye Liu,Hui Li,Alberto Garcia-Duran,Mathias Niepert,Daniel Onoro-Rubio,David S. Rosenblum

from arxiv, ESWC 2019

We present MMKG, a collection of three knowledge graphs that contain both numerical features and (links to) images for all entities as well as entity alignments between pairs of KGs. Therefore, multi-relational link prediction and entity matching communities can benefit from this resource. We believe this data set has the potential to facilitate the development of novel multi-modal learning approaches for knowledge graphs.We validate the utility ofMMKG in the sameAs link prediction task with an extensive set of experiments. These experiments show that the task at hand benefits from learning of multiple feature types.