销魂美女一区二区三区AV,久久男人免费视频,免费一区二区日韩精品视频,亚洲国产欧美日本在线,国产免费观在线精品

In streaming settings, speech recognition models have to map sub-sequences of speech to text before the full audio stream becomes available. However, since alignment information between speech and text is rarely available during training, models need to learn it in a completely self-supervised way. In practice, the exponential number of possible alignments makes this extremely challenging, with models often learning peaky or sub-optimal alignments. Prima facie, the exponential nature of the alignment space makes it difficult to even quantify the uncertainty of a model's alignment distribution. Fortunately, it has been known for decades that the entropy of a probabilistic finite state transducer can be computed in time linear to the size of the transducer via a dynamic programming reduction based on semirings. In this work, we revisit the entropy semiring for neural speech recognition models, and show how alignment entropy can be used to supervise models through regularization or distillation. We also contribute an open-source implementation of CTC and RNN-T in the semiring framework that includes numerically stable and highly parallel variants of the entropy semiring. Empirically, we observe that the addition of alignment distillation improves the accuracy and latency of an already well-optimized teacher-student distillation model, achieving state-of-the-art performance on the Librispeech dataset in the streaming scenario.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · Analysis · Performer · 欠采樣 · 圖像還原 ·

2024 年 2 月 6 日

Analysis of Deep Image Prior and Exploiting Self-Guidance for Image Reconstruction

Shijun Liang,Evan Bell,Qing Qu,Rongrong Wang,Saiprasad Ravishankar

The ability of deep image prior (DIP) to recover high-quality images from incomplete or corrupted measurements has made it popular in inverse problems in image restoration and medical imaging including magnetic resonance imaging (MRI). However, conventional DIP suffers from severe overfitting and spectral bias effects.In this work, we first provide an analysis of how DIP recovers information from undersampled imaging measurements by analyzing the training dynamics of the underlying networks in the kernel regime for different architectures.This study sheds light on important underlying properties for DIP-based recovery.Current research suggests that incorporating a reference image as network input can enhance DIP's performance in image reconstruction compared to using random inputs. However, obtaining suitable reference images requires supervision, and raises practical difficulties. In an attempt to overcome this obstacle, we further introduce a self-driven reconstruction process that concurrently optimizes both the network weights and the input while eliminating the need for training data. Our method incorporates a novel denoiser regularization term which enables robust and stable joint estimation of both the network input and reconstructed image.We demonstrate that our self-guided method surpasses both the original DIP and modern supervised methods in terms of MR image reconstruction performance and outperforms previous DIP-based schemes for image inpainting.

Guidance · 約束 · 流形 · 樣本 · 損失 ·

2024 年 2 月 5 日

Guidance with Spherical Gaussian Constraint for Conditional Diffusion

Lingxiao Yang,Shutong Ding,Yifan Cai,Jingyi Yu,Jingya Wang,Ye Shi

Recent advances in diffusion models attempt to handle conditional generative tasks by utilizing a differentiable loss function for guidance without the need for additional training. While these methods achieved certain success, they often compromise on sample quality and require small guidance step sizes, leading to longer sampling processes. This paper reveals that the fundamental issue lies in the manifold deviation during the sampling process when loss guidance is employed. We theoretically show the existence of manifold deviation by establishing a certain lower bound for the estimation error of the loss guidance. To mitigate this problem, we propose Diffusion with Spherical Gaussian constraint (DSG), drawing inspiration from the concentration phenomenon in high-dimensional Gaussian distributions. DSG effectively constrains the guidance step within the intermediate data manifold through optimization and enables the use of larger guidance steps. Furthermore, we present a closed-form solution for DSG denoising with the Spherical Gaussian constraint. Notably, DSG can seamlessly integrate as a plugin module within existing training-free conditional diffusion methods. Implementing DSG merely involves a few lines of additional code with almost no extra computational overhead, yet it leads to significant performance improvements. Comprehensive experimental results in various conditional generation tasks validate the superiority and adaptability of DSG in terms of both sample quality and time efficiency.

收縮 · TransAct · 可辨認的 · Performer · INFORMS ·

2024 年 2 月 5 日

Charting The Evolution of Solidity Error Handling

Charalambos Mitropoulos,Maria Kechagia,Chrysostomos Maschas,Sotiris Ioannidis,Federica Sarro,Dimitris Mitropoulos

The usage of error handling in Solidity smart contracts is vital because smart contracts perform transactions that should be verified. Transactions that are not carefully handled, may lead to program crashes and vulnerabilities, implying financial loss and legal consequences. While Solidity designers attempt to constantly update the language with new features, including error-handling (EH) features, it is necessary for developers to promptly absorb how to use them. We conduct a large-scale empirical study on 283K unique open-source smart contracts to identify patterns regarding the usage of Solidity EH features over time. Overall, the usage of most EH features is limited. However, we observe an upward trend (> 60%) in the usage of a Solidity-tailored EH feature, i.e., require. This indicates that designers of modern programming languages may consider making error handling more tailored to the purposes of each language. Our analysis on 102 versions of the Solidity documentation indicates the volatile nature of Solidity, as the language changes frequently, i.e., there are changes on EH features once or twice a year. Such frequent releases may confuse smart contract developers, discouraging them to carefully read the Solidity documentation, and correctly adopt EH features. Furthermore, our findings reveal that nearly 70% of the examined smart contracts are exposed to potential failures due to missing error handing, e.g., unchecked external calls. Therefore, the use of EH features should be further supported via a more informative documentation containing (1) representative and meaningful examples and (2) details about the impact of potential EH misuses.

Performer · cache · 進化方法 · CASES · 正則的 ·

2024 年 2 月 5 日

Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays

Stephen Nicholas Swatman,Ana-Lucia Varbanescu,Andy D. Pimentel,Andreas Salzburger,Attila Krasznahorkay

from arxiv, Minor textual changes, update of Figure 3 to incorporate a bugfix, notational changes to Table 1, Figure 4, Table 2 (no effect on veracity), addition of artifact information

The layout of multi-dimensional data can have a significant impact on the efficacy of hardware caches and, by extension, the performance of applications. Common multi-dimensional layouts include the canonical row-major and column-major layouts as well as the Morton curve layout. In this paper, we describe how the Morton layout can be generalized to a very large family of multi-dimensional data layouts with widely varying performance characteristics. We posit that this design space can be efficiently explored using a combinatorial evolutionary methodology based on genetic algorithms. To this end, we propose a chromosomal representation for such layouts as well as a methodology for estimating the fitness of array layouts using cache simulation. We show that our fitness function correlates to kernel running time in real hardware, and that our evolutionary strategy allows us to find candidates with favorable simulated cache properties in four out of the eight real-world applications under consideration in a small number of generations. Finally, we demonstrate that the array layouts found using our evolutionary method perform well not only in simulated environments but that they can effect significant performance gains -- up to a factor ten in extreme cases -- in real hardware.

MoDELS · Performer · Attention · 生成式對抗網絡 · Networking ·

2024 年 2 月 5 日

Generative Adversarial Networks for Spatio-Spectral Compression of Hyperspectral Images

Akshara Preethy Byju,Martin Hermann Paul Fuchs,Alisa Walda,Begüm Demir

The development of deep learning-based models for the compression of hyperspectral images (HSIs) has recently attracted great attention in remote sensing due to the sharp growing of hyperspectral data archives. Most of the existing models achieve either spectral or spatial compression, and do not jointly consider the spatio-spectral redundancies present in HSIs. To address this problem, in this paper we focus our attention on the High Fidelity Compression (HiFiC) model (which is proven to be highly effective for spatial compression problems) and adapt it to perform spatio-spectral compression of HSIs. In detail, we introduce two new models: i) HiFiC using Squeeze and Excitation (SE) blocks (denoted as HiFiC$_{SE}$); and ii) HiFiC with 3D convolutions (denoted as HiFiC$_{3D}$) in the framework of compression of HSIs. We analyze the effectiveness of HiFiC$_{SE}$ and HiFiC$_{3D}$ in compressing the spatio-spectral redundancies with channel attention and inter-dependency analysis. Experimental results show the efficacy of the proposed models in performing spatio-spectral compression, while reconstructing images at reduced bitrates with higher reconstruction quality. The code of the proposed models is publicly available at //git.tu-berlin.de/rsim/HSI-SSC .

INFORMS · Learning · MoDELS · 表示 · 掩碼 ·

2024 年 2 月 2 日

Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations

Jaeyeon Kim,Injune Hwang,Kyogu Lee

from arxiv, Accepted to ICASSP 2024

We propose a framework to learn semantics from raw audio signals using two types of representations, encoding contextual and phonetic information respectively. Specifically, we introduce a speech-to-unit processing pipeline that captures two types of representations with different time resolutions. For the language model, we adopt a dual-channel architecture to incorporate both types of representation. We also present new training objectives, masked context reconstruction and masked context prediction, that push models to learn semantics effectively. Experiments on the sSIMI metric of Zero Resource Speech Benchmark 2021 and Fluent Speech Command dataset show our framework learns semantics better than models trained with only one type of representation.

Storage · motivation · Performer · 置換 · 代碼 ·

2024 年 2 月 2 日

Short Systematic Codes for Correcting Random Edit Errors in DNA Storage

Serge Kas Hanna

DNA storage faces challenges in ensuring data reliability in the presence of edit errors -- deletions, insertions, and substitutions -- that occur randomly during various phases of the storage process. Current limitations in DNA synthesis technology also require the use of short DNA sequences, highlighting the particular need for short edit-correcting codes. Motivated by these factors, we introduce a systematic code designed to correct random edits while adhering to typical length constraints in DNA storage. We evaluate the performance of the code through simulations and assess its effectiveness within a DNA storage framework, revealing promising results.

推斷 · 語言模型化 · MoDELS · 目標領域 · Better ·

2024 年 2 月 2 日

Specialized Language Models with Cheap Inference from Limited Domain Data

David Grangier,Angelos Katharopoulos,Pierre Ablin,Awni Hannun

Large language models have emerged as a versatile tool but are challenging to apply to tasks lacking large inference budgets and large in-domain training sets. This work formalizes these constraints and distinguishes four important variables: the pretraining budget (for training before the target domain is known), the specialization budget (for training after the target domain is known), the inference budget, and the in-domain training set size. Across these settings, we compare different approaches from the machine learning literature. Limited by inference cost, we find better alternatives to the standard practice of training very large vanilla transformer models. In particular, we show that hyper-networks and mixture of experts have better perplexity for large pretraining budgets, while small models trained on importance sampled datasets are attractive for large specialization budgets.

Neural Networks · Networking · 卷積神經網絡 · 卷積 · SSD ·

2018 年 7 月 16 日

Convolutional Neural Networks for Aerial Multi-Label Pedestrian Detection

Amir Soleimani,Nasser M. Nasrabadi

from arxiv, This paper has been accepted in the 21st International Conference on Information Fusion and would be indexed in IEEE

The low resolution of objects of interest in aerial images makes pedestrian detection and action detection extremely challenging tasks. Furthermore, using deep convolutional neural networks to process large images can be demanding in terms of computational requirements. In order to alleviate these challenges, we propose a two-step, yes and no question answering framework to find specific individuals doing one or multiple specific actions in aerial images. First, a deep object detector, Single Shot Multibox Detector (SSD), is used to generate object proposals from small aerial images. Second, another deep network, is used to learn a latent common sub-space which associates the high resolution aerial imagery and the pedestrian action labels that are provided by the human-based sources

Faster R-CNN · domain shift · R-CNN · 目標檢測 · 可約的 ·

2018 年 3 月 8 日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Yuhua Chen,Wen Li,Christos Sakaridis,Dengxin Dai,Luc Van Gool

from arxiv, Accepted to CVPR 2018

Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.