亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we propose to extend the deep, complex U-Network architecture for speech enhancement by incorporating a probabilistic (i.e., variational) latent space model. The proposed model is evaluated against several ablated versions of itself in order to study the effects of the variational latent space model, complex-value processing, and self-attention. Evaluation on the MS-DNS 2020 and Voicebank+Demand datasets yields consistently high performance. E.g., the proposed model achieves an SI-SDR of up to 20.2 dB, about 0.5 to 1.4 dB higher than its ablated version without probabilistic latent space, 2-2.4 dB higher than WaveUNet, and 6.7 dB above PHASEN. Compared to real-valued magnitude spectrogram processing with a variational U-Net, the complex U-Net achieves an improvement of up to 4.5 dB SI-SDR. Complex spectrum encoding as magnitude and phase yields best performance in anechoic conditions whereas real and imaginary part representation results in better generalization to (novel) reverberation conditions, possibly due to the underlying physics of sound.

相關內容

語(yu)(yu)音(yin)(yin)(yin)增強是指(zhi)當語(yu)(yu)音(yin)(yin)(yin)信(xin)號被各(ge)種各(ge)樣的噪(zao)聲干(gan)擾(rao)、甚至(zhi)淹(yan)沒后,從噪(zao)聲背景中(zhong)提(ti)(ti)取有用的語(yu)(yu)音(yin)(yin)(yin)信(xin)號,抑制、降(jiang)低噪(zao)聲干(gan)擾(rao)的技術。一句話,從含(han)噪(zao)語(yu)(yu)音(yin)(yin)(yin)中(zhong)提(ti)(ti)取盡可能純凈的原始語(yu)(yu)音(yin)(yin)(yin)。

We present an implementation of a Web3 platform that leverages the Groth16 Zero-Knowledge Proof schema to verify the validity of questionnaire results within Smart Contracts. Our approach ensures that the answer key of the questionnaire remains undisclosed throughout the verification process, while ensuring that the evaluation is done fairly. To accomplish this, users respond to a series of questions, and their answers are encoded and securely transmitted to a hidden backend. The backend then performs an evaluation of the user's answers, generating the overall result of the questionnaire. Additionally, it generates a Zero-Knowledge Proof, attesting that the answers were appropriately evaluated against a valid set of constraints. Next, the user submits their result along with the proof to a Smart Contract, which verifies their validity and issues a non-fungible token (NFT) as an attestation of the user's test result. In this research, we implemented the Zero-Knowledge functionality using Circom 2 and deployed the Smart Contract using Solidity, thereby showcasing a practical and secure solution for questionnaire validity verification in the context of Smart Contracts.

This letter proposes a scheme assisted by a reconfigurable intelligent surface (RIS) for efficient uplink traffic multiplexing between enhanced mobile broadband (eMBB) and ultra-reliable-low-latency communication (URLLC). The scheme determines two RIS configurations based only on the eMBB channel state information (CSI) available at the base station (BS). The first optimizes eMBB quality of service, while the second reduces eMBB interference in URLLC traffic by temporarily silencing the eMBB traffic. Numerical results demonstrate that this approach, relying solely on eMBB CSI and without BS coordination, can outperform the state-of-the-art preemptive puncturing by 4.9 times in terms of URLLC outage probability.

This work presents an algorithm for tracking the shape of multiple entangling Deformable Linear Objects (DLOs) from a sequence of RGB-D images. This algorithm runs in real-time and improves on previous single-DLO tracking approaches by enabling tracking of multiple objects. This is achieved using Global-Local Topology Preservation (GLTP). This work uses the geodesic distance in GLTP to define the distance between separate objects and the distance between different parts of the same object. Tracking multiple entangling DLOs is demonstrated experimentally. The source code is publicly released.

This paper provides norm-based generalization bounds for the Transformer architecture that do not depend on the input sequence length. We employ a covering number based approach to prove our bounds. We use three novel covering number bounds for the function class of bounded linear transformations to upper bound the Rademacher complexity of the Transformer. Furthermore, we show this generalization bound applies to the common Transformer training technique of masking and then predicting the masked word. We also run a simulated study on a sparse majority data set that empirically validates our theoretical findings.

In this paper, a new method is given for counting cycles in the Tanner graph of a (Type-I) quasi-cyclic (QC) low-density parity-check (LDPC) code which the complexity mainly is dependent on the base matrix, independent from the CPM-size of the constructed code. Interestingly, for large CPM-sizes, in comparison of the existing methods, this algorithm is the first approach which efficiently counts the cycles in the Tanner graphs of QC-LDPC codes. In fact, the algorithm recursively counts the cycles in the parity-check matrix column-by-column by finding all non-isomorph tailless backtrackless closed (TBC) walks in the base graph and enumerating theoretically their corresponding cycles in the same equivalent class. Moreover, this approach can be modified in few steps to find the cycle distributions of a class of LDPC codes based on Affine permutation matrices (APM-LDPC codes). Interestingly, unlike the existing methods which count the cycles up to $2g-2$, where $g$ is the girth, the proposed algorithm can be used to enumerate the cycles of arbitrary length in the Tanner graph. Moreover, the proposed cycle searching algorithm improves upon various previously known methods, in terms of computational complexity and memory requirements.

In this paper, we propose a novel method for 3D scene and object reconstruction from sparse multi-view images. Different from previous methods that leverage extra information such as depth or generalizable features across scenes, our approach leverages the scene properties embedded in the multi-view inputs to create precise pseudo-labels for optimization without any prior training. Specifically, we introduce a geometry-guided approach that improves surface reconstruction accuracy from sparse views by leveraging spherical harmonics to predict the novel radiance while holistically considering all color observations for a point in the scene. Also, our pipeline exploits proxy geometry and correctly handles the occlusion in generating the pseudo-labels of radiance, which previous image-warping methods fail to avoid. Our method, dubbed Ray Augmentation (RayAug), achieves superior results on DTU and Blender datasets without requiring prior training, demonstrating its effectiveness in addressing the problem of sparse view reconstruction. Our pipeline is flexible and can be integrated into other implicit neural reconstruction methods for sparse views.

In this paper we present a fully distributed, asynchronous, and general purpose optimization algorithm for Consensus Simultaneous Localization and Mapping (CSLAM). Multi-robot teams require that agents have timely and accurate solutions to their state as well as the states of the other robots in the team. To optimize this solution we develop a CSLAM back-end based on Consensus ADMM called MESA (Manifold, Edge-based, Separable ADMM). MESA is fully distributed to tolerate failures of individual robots, asynchronous to tolerate practical network conditions, and general purpose to handle any CSLAM problem formulation. We demonstrate that MESA exhibits superior convergence rates and accuracy compare to existing state-of-the art CSLAM back-end optimizers.

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.

北京阿比特科技有限公司