亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='NcgQW'><strong id='gLW9S'></strong><small id='bYBgN'></small><button id='oR5iX'></button><li id='Ce9Ku'><noscript id='NFzo5'><big id='9l6Vp'></big><dt id='63cVP'></dt></noscript></li></tr><ol id='hiGpf'><option id='zIVUh'><table id='D3iDg'><blockquote id='IdGvi'><tbody id='KlHeU'></tbody></blockquote></table></option></ol><u id='Z3Q1R'></u><kbd id='wOiCP'><kbd id='uISF6'></kbd></kbd>

<code id='a152w'><strong id='UGhXD'></strong></code>

<fieldset id='j5ao0'></fieldset>

<span id='UedtZ'></span>

<ins id='gZQVq'></ins>

<acronym id='3cvUJ'><em id='2CUhD'></em><td id='dtsyd'><div id='mLNnM'></div></td></acronym><address id='pTyvL'><big id='6FycW'><big id='f00hS'></big><legend id='FLsNA'></legend></big></address>

<i id='8UsCN'><div id='1Mj56'><ins id='DeImz'></ins></div></i>

<i id='THZaU'></i>

·

多峰值 · Performer · DeepFakes · 語音合成 · ACM Multimedia ·

2022 年 9 月 16 日

TIMIT-TTS: a Text-to-Speech Dataset for Multimodal Synthetic Media Detection

Davide Salvi,Brian Hosler,Paolo Bestagini,Matthew C. Stamm,Stefano Tubaro

With the rapid development of deep learning techniques, the generation and counterfeiting of multimedia material are becoming increasingly straightforward to perform. At the same time, sharing fake content on the web has become so simple that malicious users can create unpleasant situations with minimal effort. Also, forged media are getting more and more complex, with manipulated videos that are taking the scene over still images. The multimedia forensic community has addressed the possible threats that this situation could imply by developing detectors that verify the authenticity of multimedia objects. However, the vast majority of these tools only analyze one modality at a time. This was not a problem as long as still images were considered the most widely edited media, but now, since manipulated videos are becoming customary, performing monomodal analyses could be reductive. Nonetheless, there is a lack in the literature regarding multimodal detectors, mainly due to the scarsity of datasets containing forged multimodal data to train and test the designed algorithms. In this paper we focus on the generation of an audio-visual deepfake dataset. First, we present a general pipeline for synthesizing speech deepfake content from a given real or fake video, facilitating the creation of counterfeit multimodal material. The proposed method uses Text-to-Speech (TTS) and Dynamic Time Warping techniques to achieve realistic speech tracks. Then, we use the pipeline to generate and release TIMIT-TTS, a synthetic speech dataset containing the most cutting-edge methods in the TTS field. This can be used as a standalone audio dataset, or combined with other state-of-the-art sets to perform multimodal research. Finally, we present numerous experiments to benchmark the proposed dataset in both mono and multimodal conditions, showing the need for multimodal forensic detectors and more suitable data.

相關內容

多峰值

多峰(feng)值(zhi)

數據集 · 基準 · HTTPS · 數據獲取 · 有向 ·

2022 年 10 月 24 日

EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain

Dennis Aumiller,Ashish Chouhan,Michael Gertz

Existing summarization datasets come with two main drawbacks: (1) They tend to focus on overly exposed domains, such as news articles or wiki-like texts, and (2) are primarily monolingual, with few multilingual datasets. In this work, we propose a novel dataset, called EUR-Lex-Sum, based on manually curated document summaries of legal acts from the European Union law platform (EUR-Lex). Documents and their respective summaries exist as cross-lingual paragraph-aligned data in several of the 24 official European languages, enabling access to various cross-lingual and lower-resourced summarization setups. We obtain up to 1,500 document/summary pairs per language, including a subset of 375 cross-lingually aligned legal acts with texts available in all 24 languages. In this work, the data acquisition process is detailed and key characteristics of the resource are compared to existing summarization resources. In particular, we illustrate challenging sub-problems and open questions on the dataset that could help the facilitation of future research in the direction of domain-specific cross-lingual summarization. Limited by the extreme length and language diversity of samples, we further conduct experiments with suitable extractive monolingual and cross-lingual baselines for future work. Code for the extraction as well as access to our data and baselines is available online at: //github.com/achouhan93/eur-lex-sum.

MoDELS · 訓練數據 · Learning · 講稿 · 3D ·

2022 年 10 月 21 日

Real-time Detection of 2D Tool Landmarks with Synthetic Training Data

Bram Vanherle,Jeroen Put,Nick Michiels,Frank Van Reeth

In this paper a deep learning architecture is presented that can, in real time, detect the 2D locations of certain landmarks of physical tools, such as a hammer or screwdriver. To avoid the labor of manual labeling, the network is trained on synthetically generated data. Training computer vision models on computer generated images, while still achieving good accuracy on real images, is a challenge due to the difference in domain. The proposed method uses an advanced rendering method in combination with transfer learning and an intermediate supervision architecture to address this problem. It is shown that the model presented in this paper, named Intermediate Heatmap Model (IHM), generalizes to real images when trained on synthetic data. To avoid the need for an exact textured 3D model of the tool in question, it is shown that the model will generalize to an unseen tool when trained on a set of different 3D models of the same type of tool. IHM is compared to two existing approaches to keypoint detection and it is shown that it outperforms those at detecting tool landmarks, trained on synthetic data.

entity · 語音翻譯 · 有向 · MoDELS · Notability ·

2022 年 10 月 21 日

Named Entity Detection and Injection for Direct Speech Translation

Marco Gaido,Yun Tang,Ilia Kulikov,Rongqing Huang,Hongyu Gong,Hirofumi Inaguma

from arxiv, \c{opyright} 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

In a sentence, certain words are critical for its semantic. Among them, named entities (NEs) are notoriously challenging for neural models. Despite their importance, their accurate handling has been neglected in speech-to-text (S2T) translation research, and recent work has shown that S2T models perform poorly for locations and notably person names, whose spelling is challenging unless known in advance. In this work, we explore how to leverage dictionaries of NEs known to likely appear in a given context to improve S2T model outputs. Our experiments show that we can reliably detect NEs likely present in an utterance starting from S2T encoder outputs. Indeed, we demonstrate that the current detection quality is sufficient to improve NE accuracy in the translation with a 31% reduction in person name errors.

可理解性 · 潛在 · INTERACT · 圖片分類 · INFORMS ·

2022 年 10 月 21 日

Collaborative Image Understanding

Koby Bibas,Oren Sar Shalom,Dietmar Jannach

from arxiv, CIKM 2022

Automatically understanding the contents of an image is a highly relevant problem in practice. In e-commerce and social media settings, for example, a common problem is to automatically categorize user-provided pictures. Nowadays, a standard approach is to fine-tune pre-trained image models with application-specific data. Besides images, organizations however often also collect collaborative signals in the context of their application, in particular how users interacted with the provided online content, e.g., in forms of viewing, rating, or tagging. Such signals are commonly used for item recommendation, typically by deriving latent user and item representations from the data. In this work, we show that such collaborative information can be leveraged to improve the classification process of new images. Specifically, we propose a multitask learning framework, where the auxiliary task is to reconstruct collaborative latent item representations. A series of experiments on datasets from e-commerce and social media demonstrates that considering collaborative signals helps to significantly improve the performance of the main task of image classification by up to 9.1%.

穩健性 · 分解的 · Learning · state-of-the-art · INFORMS ·

2022 年 10 月 21 日

Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection

Alexandros Haliassos,Rodrigo Mira,Stavros Petridis,Maja Pantic

from arxiv, CVPR 2022. Code: //github.com/ahaliassos/RealForensics

One of the most pressing challenges for the detection of face-manipulated videos is generalising to forgery methods not seen during training while remaining effective under common corruptions such as compression. In this paper, we examine whether we can tackle this issue by harnessing videos of real talking faces, which contain rich information on natural facial appearance and behaviour and are readily available in large quantities online. Our method, termed RealForensics, consists of two stages. First, we exploit the natural correspondence between the visual and auditory modalities in real videos to learn, in a self-supervised cross-modal manner, temporally dense video representations that capture factors such as facial movements, expression, and identity. Second, we use these learned representations as targets to be predicted by our forgery detector along with the usual binary forgery classification task; this encourages it to base its real/fake decision on said factors. We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments, and examine the factors that contribute to its performance. Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.

有偏 · 數據集 · motivation · MoDELS · AIM ·

2022 年 10 月 21 日

Detecting Unintended Social Bias in Toxic Language Datasets

Nihar Sahoo,Himanshu Gupta,Pushpak Bhattacharyya

With the rise of online hate speech, automatic detection of Hate Speech, Offensive texts as a natural language processing task is getting popular. However, very little research has been done to detect unintended social bias from these toxic language datasets. This paper introduces a new dataset ToxicBias curated from the existing dataset of Kaggle competition named "Jigsaw Unintended Bias in Toxicity Classification". We aim to detect social biases, their categories, and targeted groups. The dataset contains instances annotated for five different bias categories, viz., gender, race/ethnicity, religion, political, and LGBTQ. We train transformer-based models using our curated datasets and report baseline performance for bias identification, target generation, and bias implications. Model biases and their mitigation are also discussed in detail. Our study motivates a systematic extraction of social bias data from toxic language datasets. All the codes and dataset used for experiments in this work are publicly available

Notability · Performer · 異常檢測 · 相關系數 · 計算成本 ·

2021 年 3 月 2 日

Image/Video Deep Anomaly Detection: A Survey

Bahram Mohammadi,Mahmood Fathy,Mohammad Sabokrou

The considerable significance of Anomaly Detection (AD) problem has recently drawn the attention of many researchers. Consequently, the number of proposed methods in this research field has been increased steadily. AD strongly correlates with the important computer vision and image processing tasks such as image/video anomaly, irregularity and sudden event detection. More recently, Deep Neural Networks (DNNs) offer a high performance set of solutions, but at the expense of a heavy computational cost. However, there is a noticeable gap between the previously proposed methods and an applicable real-word approach. Regarding the raised concerns about AD as an ongoing challenging problem, notably in images and videos, the time has come to argue over the pitfalls and prospects of methods have attempted to deal with visual AD tasks. Hereupon, in this survey we intend to conduct an in-depth investigation into the images/videos deep learning based AD methods. We also discuss current challenges and future research directions thoroughly.

Extensibility · 可辨認的 · CASES · state-of-the-art · MoDELS ·

2020 年 6 月 8 日

Text Detection and Recognition in the Wild: A Review

Zobeir Raisi,Mohamed A. Naiel,Paul Fieguth,Steven Wardell,John Zelek

Detection and recognition of text in natural images are two main problems in the field of computer vision that have a wide variety of applications in analysis of sports videos, autonomous driving, industrial automation, to name a few. They face common challenging problems that are factors in how text is represented and affected by several environmental conditions. The current state-of-the-art scene text detection and/or recognition methods have exploited the witnessed advancement in deep learning architectures and reported a superior accuracy on benchmark datasets when tackling multi-resolution and multi-oriented text. However, there are still several remaining challenges affecting text in the wild images that cause existing methods to underperform due to there models are not able to generalize to unseen data and the insufficient labeled data. Thus, unlike previous surveys in this field, the objectives of this survey are as follows: first, offering the reader not only a review on the recent advancement in scene text detection and recognition, but also presenting the results of conducting extensive experiments using a unified evaluation framework that assesses pre-trained models of the selected methods on challenging cases, and applies the same evaluation criteria on these techniques. Second, identifying several existing challenges for detecting or recognizing text in the wild images, namely, in-plane-rotation, multi-oriented and multi-resolution text, perspective distortion, illumination reflection, partial occlusion, complex fonts, and special characters. Finally, the paper also presents insight into the potential research directions in this field to address some of the mentioned challenges that are still encountering scene text detection and recognition techniques.

Taxonomy · 目標檢測 · 可辨認的 · 評論員 · HTTPS ·

2020 年 3 月 11 日

Imbalance Problems in Object Detection: A Review

Kemal Oksuz,Baris Can Cam,Sinan Kalkan,Emre Akbas

from arxiv, Accepted to IEEE TPAMI; currently in press

In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in a systematic manner, we introduce a problem-based taxonomy. Following this taxonomy, we discuss each problem in depth and present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the existing imbalance problems as well as imbalance problems that have not been discussed before. Moreover, in order to keep our review up to date, we provide an accompanying webpage which catalogs papers addressing imbalance problems, according to our problem-based taxonomy. Researchers can track newer studies on this webpage available at: //github.com/kemaloksuz/ObjectDetectionImbalance .

LayoutLM · INFORMS · 可理解性 · SCAN · MoDELS ·

2020 年 2 月 19 日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Yiheng Xu,Minghao Li,Lei Cui,Shaohan Huang,Furu Wei,Ming Zhou

from arxiv, Work in progress

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread of pre-training models for NLP applications, they almost focused on text-level manipulation, while neglecting the layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model the interaction between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage the image features to incorporate the visual information of words into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at //github.com/microsoft/unilm/tree/master/layoutlm.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

語(yu)音合成

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<dir id='kZ0H8'><del id='cNRJj'><del id='Cy2Y2'></del><pre id='aXUmf'><pre id='EOWPk'><option id='s3K3y'><address id='DG8mE'></address><bdo id='BEvp9'><tr id='FbzXR'><acronym id='jiPxu'><pre id='k9e0G'></pre></acronym><div id='FcuxO'></div></tr></bdo></option></pre><small id='jdYkP'><address id='6BPuz'><u id='FCFoK'><legend id='Cp98b'><option id='knmHo'><abbr id='EC8P0'></abbr><li id='zj9KY'><pre id='CmoiV'></pre></li></option></legend><select id='RDv8Y'></select></u></address></small></pre></del><sup id='SDZyd'></sup><blockquote id='pdxvE'><dt id='71r7l'></dt></blockquote><blockquote id='DDaua'></blockquote></dir><tt id='wE6Ai'></tt><u id='qa1lt'><tt id='Q3pSF'><form id='bMJLT'></form></tt><td id='wWMf2'><dt id='L6s8r'></dt></td></u>

<code id='6D2cp'><i id='A0p4b'><q id='XTnnM'><legend id='1pcU1'><pre id='eH3Zg'><style id='S36nD'><acronym id='HARuV'><i id='6Fer0'><form id='Azn2L'><option id='yI9Fk'><center id='VRevX'></center></option></form></i></acronym></style><tt id='xLb7H'></tt></pre></legend></q></i></code><center id='IecAE'></center>

<dd id='3tG1t'></dd>

<style id='aT9ej'></style><sub id='nEmGW'><dfn id='yrYsT'><abbr id='XM8fn'><big id='wOaCp'><bdo id='59pgJ'></bdo></big></abbr></dfn></sub>_{<dir id='opf77'></dir>}