又黄又爽又色的视频免费,日本特黄AAA大片24免费区,亚洲最大三级片网站

Customizable keyword spotting (KWS) in continuous speech has attracted increasing attention due to its real-world application potential. While contrastive learning (CL) has been widely used to extract keyword representations, previous CL approaches all operate on pre-segmented isolated words and employ only audio-text representations matching strategy. However, for KWS in continuous speech, co-articulation and streaming word segmentation can easily yield similar audio patterns for different texts, which may consequently trigger false alarms. To address this issue, we propose a novel CL with Audio Discrimination (CLAD) approach to learning keyword representation with both audio-text matching and audio-audio discrimination ability. Here, an InfoNCE loss considering both audio-audio and audio-text CL data pairs is employed for each sliding window during training. Evaluations on the open-source LibriPhrase dataset show that the use of sliding-window level InfoNCE loss yields comparable performance compared to previous CL approaches. Furthermore, experiments on the continuous speech dataset LibriSpeech demonstrate that, by incorporating audio discrimination, CLAD achieves significant performance gain over CL without audio discrimination. Meanwhile, compared to two-stage KWS approaches, the end-to-end KWS with CLAD achieves not only better performance, but also significant speed-up.

相關內容

Continuity

關注 4

讓 iOS 8 和 OS X Yosemite 無縫切換的一個新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source:

IP · Atom（文本編輯器） · 優化器 · binary · 線性的 ·

2024 年 2 月 26 日

Integer Programming Using A Single Atom

Kapil Goswami,Peter Schmelcher,Rick Mukherjee

from arxiv, 12 pages, 7 figures

Integer programming (IP), as the name suggests is an integer-variable-based approach commonly used to formulate real-world optimization problems with constraints. Currently, quantum algorithms reformulate the IP into an unconstrained form through the use of binary variables, which is an indirect and resource-consuming way of solving it. We develop an algorithm that maps and solves an IP problem in its original form to any quantum system that possesses a large number of accessible internal degrees of freedom which can be controlled with sufficient accuracy. Using a single Rydberg atom as an example, we associate the integer values to electronic states belonging to different manifolds and implement a selective superposition of these different states to solve the full IP problem. The optimal solution is found within 2-40{\mu}s for a few prototypical IP problems with up to eight variables and up to four constraints including a non-linear IP problem, which is usually harder to solve with classical algorithms when compared with linear IP problems. Our algorithm for solving IP is benchmarked using the Branch & Bound approach and it outperforms the classical algorithm in terms of the number of steps needed to converge and carries the potential to improve the bounds provided by the classical algorithm for larger problems.

邊 · 穩健性 · Branch · Integration · binary ·

2024 年 2 月 26 日

Edge Detectors Can Make Deep Convolutional Neural Networks More Robust

Jin Ding,Jie-Chao Zhao,Yong-Zhi Sun,Ping Tan,Jia-Wei Wang,Ji-En Ma,You-Tong Fang

from arxiv, 26 pages, 18 figures, 7 tables. submitted to Neural Networks, under review

Deep convolutional neural networks (DCNN for short) are vulnerable to examples with small perturbations. Improving DCNN's robustness is of great significance to the safety-critical applications, such as autonomous driving and industry automation. Inspired by the principal way that human eyes recognize objects, i.e., largely relying on the shape features, this paper first employs the edge detectors as layer kernels and designs a binary edge feature branch (BEFB for short) to learn the binary edge features, which can be easily integrated into any popular backbone. The four edge detectors can learn the horizontal, vertical, positive diagonal, and negative diagonal edge features, respectively, and the branch is stacked by multiple Sobel layers (using edge detectors as kernels) and one threshold layer. The binary edge features learned by the branch, concatenated with the texture features learned by the backbone, are fed into the fully connected layers for classification. We integrate the proposed branch into VGG16 and ResNet34, respectively, and conduct experiments on multiple datasets. Experimental results demonstrate the BEFB is lightweight and has no side effects on training. And the accuracy of the BEFB integrated models is better than the original ones on all datasets when facing FGSM, PGD, and C\&W attacks. Besides, BEFB integrated models equipped with the robustness enhancing techniques can achieve better classification accuracy compared to the original models. The work in this paper for the first time shows it is feasible to enhance the robustness of DCNNs through combining both shape-like features and texture features.

冪法 · 線性的 · 模型評估 · 樣例 · Analysis ·

2024 年 2 月 25 日

On Rayleigh Quotient Iteration for Dual Quaternion Hermitian Eigenvalue Problem

Shan-Qi Duan,Qing-Wen Wang,Xue-Feng Duan

from arxiv, arXiv admin note: text overlap with arXiv:2111.12211 by other authors

The application of eigenvalue theory to dual quaternion Hermitian matrices holds significance in the realm of multi-agent formation control. In this paper, we study the Rayleigh quotient iteration (RQI) for solving the right eigenpairs of dual quaternion Hermitian matrices. Combined with dual representation, the RQI algorithm can effectively compute the extreme eigenvalue along with the associated eigenvector of the large dual quaternion Hermitian matrices. Furthermore, a convergence analysis of the Rayleigh quotient iteration is derived, demonstrating a local convergence rate of at least cubic, which is faster than the linear convergence rate of the power method. Numerical examples are provided to illustrate the high accuracy and low CPU time cost of the proposed Rayleigh quotient iteration compared with the power method for solving the dual quaternion Hermitian eigenvalue problem.

MoDELS · 代碼 · EDA · 變換 · Learning ·

2024 年 2 月 24 日

A Machine Learning Approach Towards SKILL Code Autocompletion

Enrique Dehaerne,Bappaditya Dey,Wannes Meert

from arxiv, Accepted for SPIE Advanced Lithography + Patterning, 2024

As Moore's Law continues to increase the complexity of electronic systems, Electronic Design Automation (EDA) must advance to meet global demand. An important example of an EDA technology is SKILL, a scripting language used to customize and extend EDA software. Recently, code generation models using the transformer architecture have achieved impressive results in academic settings and have even been used in commercial developer tools to improve developer productivity. To the best of our knowledge, this study is the first to apply transformers to SKILL code autocompletion towards improving the productivity of hardware design engineers. In this study, a novel, data-efficient methodology for generating SKILL code is proposed and experimentally validated. More specifically, we propose a novel methodology for (i) creating a high-quality SKILL dataset with both unlabeled and labeled data, (ii) a training strategy where T5 models pre-trained on general programming language code are fine-tuned on our custom SKILL dataset using unsupervised and supervised learning, and (iii) evaluating synthesized SKILL code. We show that models trained using the proposed methodology outperform baselines in terms of human-judgment score and BLEU score. A major challenge faced was the extremely small amount of available SKILL code data that can be used to train a transformer model to generate SKILL code. Despite our validated improvements, the extremely small dataset available to us was still not enough to train a model that can reliably autocomplete SKILL code. We discuss this and other limitations as well as future work that could address these limitations.

優化器 · MoDELS · Transformer模型 · 變換 · 可辨認的 ·

2024 年 2 月 24 日

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

Yujie Wang,Youhe Jiang,Xupeng Miao,Fangcheng Fu,Shenhan Zhu,Xiaonan Nie,Yaofeng Tu,Bin Cui

from arxiv, arXiv admin note: substantial text overlap with arXiv:2211.13878

Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts to design distributed training plans or limit parallelism combinations to a constrained search space. In this paper, we present Galvatron-BMW, a novel system framework that integrates multiple prevalent parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy. To effectively navigate this vast search space, we employ a decision tree approach for decomposition and pruning based on intuitive insights. We further utilize a dynamic programming search algorithm to derive the optimal plan. Moreover, to improve resource utilization and enhance system efficiency, we propose a bi-objective optimization workflow that focuses on workload balance. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints. Across all tested scenarios, Galvatron-BMW consistently achieves superior system throughput, surpassing previous approaches that rely on limited parallelism strategies.

泛函 · 拉普拉斯特征映射 · CASE · 估計誤差 · Performer ·

2024 年 2 月 22 日

Nonsmooth Nonparametric Regression via Fractional Laplacian Eigenmaps

Zhaoyang Shi,Krishnakumar Balasubramanian,Wolfgang Polonik

We develop nonparametric regression methods for the case when the true regression function is not necessarily smooth. More specifically, our approach is using the fractional Laplacian and is designed to handle the case when the true regression function lies in an $L_2$-fractional Sobolev space with order $s\in (0,1)$. This function class is a Hilbert space lying between the space of square-integrable functions and the first-order Sobolev space consisting of differentiable functions. It contains fractional power functions, piecewise constant or polynomial functions and bump function as canonical examples. For the proposed approach, we prove upper bounds on the in-sample mean-squared estimation error of order $n^{-\frac{2s}{2s+d}}$, where $d$ is the dimension, $s$ is the aforementioned order parameter and $n$ is the number of observations. We also provide preliminary empirical results validating the practical performance of the developed estimators.

推斷 · INFORMS · Performer · 數據集 · 模型評估 ·

2024 年 2 月 22 日

Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic

Nathaniel Weir,Kate Sanders,Orion Weller,Shreya Sharma,Dongwei Jiang,Zhengping Zhang,Bhavana Dalvi Mishra,Oyvind Tafjord,Peter Jansen,Peter Clark,Benjamin Van Durme

Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy datasets and limited performance gains by modern neuro-symbolic engines. To address these problems, we formulate a consistent and theoretically grounded approach to annotating decompositional entailment datasets, and evaluate its impact on LLM-based textual inference. We find that our resulting dataset, RDTE (Recognizing Decompositional Textual Entailment), has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets, suggesting that RDTE is a significant step forward in the long-standing problem of forming a clear protocol for discerning entailment. We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in a modern neuro-symbolic reasoning engine significantly improves results (both accuracy and proof quality) over other entailment classifier baselines, illustrating the practical benefit of this advance for textual inference.

Performer · 穩健性 · 變換 · 數據集 · 可約的 ·

2024 年 2 月 22 日

Compression Robust Synthetic Speech Detection Using Patched Spectrogram Transformer

Amit Kumar Singh Yadav,Ziyue Xiang,Kratika Bhagtani,Paolo Bestagini,Stefano Tubaro,Edward J. Delp

from arxiv, Accepted as long oral paper at ICMLA 2023

Many deep learning synthetic speech generation tools are readily available. The use of synthetic speech has caused financial fraud, impersonation of people, and misinformation to spread. For this reason forensic methods that can detect synthetic speech have been proposed. Existing methods often overfit on one dataset and their performance reduces substantially in practical scenarios such as detecting synthetic speech shared on social platforms. In this paper we propose, Patched Spectrogram Synthetic Speech Detection Transformer (PS3DT), a synthetic speech detector that converts a time domain speech signal to a mel-spectrogram and processes it in patches using a transformer neural network. We evaluate the detection performance of PS3DT on ASVspoof2019 dataset. Our experiments show that PS3DT performs well on ASVspoof2019 dataset compared to other approaches using spectrogram for synthetic speech detection. We also investigate generalization performance of PS3DT on In-the-Wild dataset. PS3DT generalizes well than several existing methods on detecting synthetic speech from an out-of-distribution dataset. We also evaluate robustness of PS3DT to detect telephone quality synthetic speech and synthetic speech shared on social platforms (compressed speech). PS3DT is robust to compression and can detect telephone quality synthetic speech better than several existing methods.

多峰值 · 異常檢測 · 點云 · Extensibility · 連結 ·

2023 年 3 月 1 日

Multimodal Industrial Anomaly Detection via Hybrid Fusion

Yue Wang,Jinlong Peng,Jiangning Zhang,Ran Yi,Yabiao Wang,Chengjie Wang

from arxiv, Accepted by CVPR 2023

2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.

長短期記憶網絡 · 命名實體識別 · MoDELS · Better · 門控 ·

2018 年 5 月 15 日

Chinese NER Using Lattice LSTM

Yue Zhang,Jie Yang

from arxiv, Accepted at ACL 2018 as Long paper

We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.