亚洲综合蜜桃久久丁香婷_久热精品视频在线观看_久久精品色浮熟妇丰满人妻99一_亚洲一区二区三区免费视频_漂亮人妻被黑人当面久久精品_国产成人一区二区三区在线_色色资源影音资源影音先锋

Copy-move forgery on speech (CMF), coupled with post-processing techniques, presents a great challenge to the forensic detection and localization of tampered areas. Most of the existing CMF detection approaches necessitate pre-segmentation of speech to facilitate similarity calculations among these segments. However, these approaches usually suffer from the problems of uncontrollable computational complexity and sensitivity to the presence of a word that is read multiple times within a speech recording. To address these issues, we propose a local feature tensors-based CMF detection algorithm that can transform duplicate detection and localization problems into a special tensor-matching procedure, accompanied by complete theoretical analysis as support. Through extensive experimentation, we have demonstrated that our method exhibits computational efficiency and robustness against post-processing techniques. Notably, it can effectively and blindly detect tampered segments, even those as short as a fractional second. These advantages highlight the promising potential of our approach for practical applications.

相關內容

CMF

關注 1

Learning · 控制器 · Automator · Performer · HTTPS ·

2023 年 10 月 24 日

Human-in-the-Loop Task and Motion Planning for Imitation Learning

Ajay Mandlekar,Caelan Garrett,Danfei Xu,Dieter Fox

from arxiv, Conference on Robot Learning (CoRL) 2023

Imitation learning from human demonstrations can teach robots complex manipulation skills, but is time-consuming and labor intensive. In contrast, Task and Motion Planning (TAMP) systems are automated and excel at solving long-horizon tasks, but they are difficult to apply to contact-rich tasks. In this paper, we present Human-in-the-Loop Task and Motion Planning (HITL-TAMP), a novel system that leverages the benefits of both approaches. The system employs a TAMP-gated control mechanism, which selectively gives and takes control to and from a human teleoperator. This enables the human teleoperator to manage a fleet of robots, maximizing data collection efficiency. The collected human data is then combined with an imitation learning framework to train a TAMP-gated policy, leading to superior performance compared to training on full task demonstrations. We compared HITL-TAMP to a conventional teleoperation system -- users gathered more than 3x the number of demos given the same time budget. Furthermore, proficient agents (75\%+ success) could be trained from just 10 minutes of non-expert teleoperation data. Finally, we collected 2.1K demos with HITL-TAMP across 12 contact-rich, long-horizon tasks and show that the system often produces near-perfect agents. Videos and additional results at //hitltamp.github.io .

LDPC · 解碼 · 近似 · 信念傳播 · Extensibility ·

2023 年 10 月 24 日

A High-Performance and Low-Complexity 5G LDPC Decoder: Algorithm and Implementation

Yuqing Ren,Hassan Harb,Yifei Shen,Alexios Balatsoukas-Stimming,Andreas Burg

from arxiv, 14 pages, 14 figures

5G New Radio (NR) has stringent demands on both performance and complexity for the design of low-density parity-check (LDPC) decoding algorithms and corresponding VLSI implementations. Furthermore, decoders must fully support the wide range of all 5G NR blocklengths and code rates, which is a significant challenge. In this paper, we present a high-performance and low-complexity LDPC decoder, tailor-made to fulfill the 5G requirements. First, to close the gap between belief propagation (BP) decoding and its approximations in hardware, we propose an extension of adjusted min-sum decoding, called generalized adjusted min-sum (GA-MS) decoding. This decoding algorithm flexibly truncates the incoming messages at the check node level and carefully approximates the non-linear functions of BP decoding to balance the error-rate and hardware complexity. Numerical results demonstrate that the proposed fixed-point GAMS has only a minor gap of 0.1 dB compared to floating-point BP under various scenarios of 5G standard specifications. Secondly, we present a fully reconfigurable 5G NR LDPC decoder implementation based on GA-MS decoding. Given that memory occupies a substantial portion of the decoder area, we adopt multiple data compression and approximation techniques to reduce 42.2% of the memory overhead. The corresponding 28nm FD-SOI ASIC decoder has a core area of 1.823 mm2 and operates at 895 MHz. It is compatible with all 5G NR LDPC codes and achieves a peak throughput of 24.42 Gbps and a maximum area efficiency of 13.40 Gbps/mm2 at 4 decoding iterations.

Branch · 圖 · 掩碼自編碼MAE · INFORMS · 結點 ·

2023 年 10 月 24 日

Generative and Contrastive Paradigms Are Complementary for Graph Self-Supervised Learning

Yuxiang Wang,Xiao Yan,Chuang Hu,Fangcheng Fu,Wentao Zhang,Hao Wang,Shuo Shang,Jiawei Jiang

For graph self-supervised learning (GSSL), masked autoencoder (MAE) follows the generative paradigm and learns to reconstruct masked graph edges or node features. Contrastive Learning (CL) maximizes the similarity between augmented views of the same graph and is widely used for GSSL. However, MAE and CL are considered separately in existing works for GSSL. We observe that the MAE and CL paradigms are complementary and propose the graph contrastive masked autoencoder (GCMAE) framework to unify them. Specifically, by focusing on local edges or node features, MAE cannot capture global information of the graph and is sensitive to particular edges and features. On the contrary, CL excels in extracting global information because it considers the relation between graphs. As such, we equip GCMAE with an MAE branch and a CL branch, and the two branches share a common encoder, which allows the MAE branch to exploit the global information extracted by the CL branch. To force GCMAE to capture global graph structures, we train it to reconstruct the entire adjacency matrix instead of only the masked edges as in existing works. Moreover, a discrimination loss is proposed for feature reconstruction, which improves the disparity between node embeddings rather than reducing the reconstruction error to tackle the feature smoothing problem of MAE. We evaluate GCMAE on four popular graph tasks (i.e., node classification, node clustering, link prediction, and graph classification) and compare with 14 state-of-the-art baselines. The results show that GCMAE consistently provides good accuracy across these tasks, and the maximum accuracy improvement is up to 3.2% compared with the best-performing baseline.

相互獨立的 · SimPLe · 語言模型化 · 縮放 · 無監督 ·

2023 年 10 月 23 日

Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

Wei Liu,Songlin Yang,Yoon Kim,Kewei Tu

from arxiv, Accepted to Findings of EMNLP, 2023

Scaling dense PCFGs to thousands of nonterminals via a low-rank parameterization of the rule probability tensor has been shown to be beneficial for unsupervised parsing. However, PCFGs scaled this way still perform poorly as a language model, and even underperform similarly-sized HMMs. This work introduces \emph{SimplePCFG}, a simple PCFG formalism with independent left and right productions. Despite imposing a stronger independence assumption than the low-rank approach, we find that this formalism scales more effectively both as a language model and as an unsupervised parser. As an unsupervised parser, our simple PCFG obtains an average F1 of 65.1 on the English PTB, and as a language model, it obtains a perplexity of 119.0, outperforming similarly-sized low-rank PCFGs. We further introduce \emph{FlashInside}, a hardware IO-aware implementation of the inside algorithm for efficiently scaling simple PCFGs.

可理解性 · MoDELS · 相似度 · 可辨認的 · binary ·

2023 年 10 月 23 日

Paraphrase Types for Generation and Detection

Jan Philip Wahle,Bela Gipp,Terry Ruas

from arxiv, Published at EMNLP 2023

Current approaches in paraphrase generation and detection heavily rely on a single general similarity score, ignoring the intricate linguistic properties of language. This paper introduces two new tasks to address this shortcoming by considering paraphrase types - specific linguistic perturbations at particular text positions. We name these tasks Paraphrase Type Generation and Paraphrase Type Detection. Our results suggest that while current techniques perform well in a binary classification scenario, i.e., paraphrased or not, the inclusion of fine-grained paraphrase types poses a significant challenge. While most approaches are good at generating and detecting general semantic similar content, they fail to understand the intrinsic linguistic variables they manipulate. Models trained in generating and identifying paraphrase types also show improvements in tasks without them. In addition, scaling these models further improves their ability to understand paraphrase types. We believe paraphrase types can unlock a new paradigm for developing paraphrase models and solving tasks in the future.

語音翻譯 · 端到端 · 輸出 · MoDELS · state-of-the-art ·

2023 年 10 月 23 日

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

Christian Huber,Tu Anh Dinh,Carlos Mullov,Ngoc Quan Pham,Thai Binh Nguyen,Fabian Retkowski,Stefan Constantin,Enes Yavuz Ugan,Danni Liu,Zhaolin Li,Sai Koneru,Jan Niehues,Alexander Waibel

The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work, we propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions. The evaluation is carried out in an end-to-end fashion. This includes the segmentation of the audio as well as the run-time of the different components. Secondly, we compare different approaches to low-latency speech translation using this framework. We evaluate models with the option to revise the output as well as methods with fixed output. Furthermore, we directly compare state-of-the-art cascaded as well as end-to-end systems. Finally, the framework allows to automatically evaluate the translation quality as well as latency and also provides a web interface to show the low-latency model outputs to the user.

語言模型化 · INFORMS · MoDELS · 估計/估計量 · 數據集 ·

2023 年 10 月 20 日

Bridging Information-Theoretic and Geometric Compression in Language Models

Emily Cheng,Corentin Kervadec,Marco Baroni

from arxiv, EMNLP 2023 Camera-Ready

For a language model (LM) to faithfully model human language, it must compress vast, potentially infinite information into relatively few dimensions. We propose analyzing compression in (pre-trained) LMs from two points of view: geometric and information-theoretic. We demonstrate that the two views are highly correlated, such that the intrinsic geometric dimension of linguistic data predicts their coding length under the LM. We then show that, in turn, high compression of a linguistic dataset predicts rapid adaptation to that dataset, confirming that being able to compress linguistic information is an important part of successful LM performance. As a practical byproduct of our analysis, we evaluate a battery of intrinsic dimension estimators for the first time on linguistic data, showing that only some encapsulate the relationship between information-theoretic compression, geometric compression, and ease-of-adaptation.

Performer · 控制器 · Continuity · 確定性策略 · 回合 ·

2023 年 10 月 20 日

Combining Policy Gradient and Safety-Based Control for Autonomous Driving

Xi Xiong,Lu Liu

With the advancement of data-driven techniques, addressing continuous con-trol challenges has become more efficient. However, the reliance of these methods on historical data introduces the potential for unexpected decisions in novel scenarios. To enhance performance in autonomous driving and collision avoidance, we propose a symbiotic fusion of policy gradient with safety-based control. In this study, we em-ploy the Deep Deterministic Policy Gradient (DDPG) algorithm to enable autono-mous driving in the absence of surrounding vehicles. By training the vehicle's driving policy within a stable and familiar environment, a robust and efficient learning pro-cess is achieved. Subsequently, an artificial potential field approach is utilized to formulate a collision avoidance algorithm, accounting for the presence of surround-ing vehicles. Furthermore, meticulous consideration is given to path tracking meth-ods. The amalgamation of these approaches demonstrates substantial performance across diverse scenarios, underscoring its potential for advancing autonomous driving while upholding safety standards.

小樣本學習 · 目標檢測 · Networking · 數據集 · 情景 ·

2020 年 3 月 31 日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Qi Fan,Wei Zhuo,Chi-Keung Tang,Yu-Wing Tai

from arxiv, CVPR2020 Camera Ready. (Fix Figure 3 and Table 5. More implementation details in the supplementary material.)

Conventional methods for object detection typically require a substantial amount of training data and preparing such high-quality training data is very labor-intensive. In this paper, we propose a novel few-shot object detection network that aims at detecting objects of unseen categories with only a few annotated examples. Central to our method are our Attention-RPN, Multi-Relation Detector and Contrastive Training strategy, which exploit the similarity between the few shot support set and query set to detect novel objects while suppressing false detection in the background. To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations. To the best of our knowledge, this is one of the first datasets specifically designed for few-shot object detection. Once our few-shot network is trained, it can detect objects of unseen categories without further training or fine-tuning. Our method is general and has a wide range of potential applications. We produce a new state-of-the-art performance on different datasets in the few-shot setting. The dataset link is //github.com/fanq15/Few-Shot-Object-Detection-Dataset.

視覺問答 · 自頂向下 · 圖像字幕 · 注意力機制 · 自下而上 ·

2018 年 3 月 14 日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Peter Anderson,Xiaodong He,Chris Buehler,Damien Teney,Mark Johnson,Stephen Gould,Lei Zhang

from arxiv, CVPR 2018 full oral, winner of the 2017 Visual Question Answering challenge

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.