亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

By using a computer keyboard as a finger recording device, we construct the largest existing dataset for gesture recognition via surface electromyography (sEMG), and use deep learning to achieve over 90% character-level accuracy on reconstructing typed text entirely from measured muscle potentials. We prioritize the temporal structure of the EMG signal instead of the spatial structure of the electrode layout, using network architectures inspired by those used for real-time spoken language transcription. Our architecture recognizes the rapid movements of natural computer typing, which occur at irregular intervals and often overlap in time. The extensive size of our dataset also allows us to study gesture recognition after synthetically downgrading the spatial or temporal resolution, showing the system capabilities necessary for real-time gesture recognition.

相關內容

Surface 是(shi)微軟公司( )旗下一系(xi)列使用(yong) Windows 10(早期(qi)為 Windows 8.X)操作系(xi)統的(de)電腦產品(pin),目前有 Surface、Surface Pro 和 Surface Book 三個系(xi)列。 2012 年(nian) 6 月(yue) 18 日(ri),初代 Surface Pro/RT 由(you)時任(ren)微軟 CEO 史蒂夫·鮑(bao)爾默(mo)發布于在(zai)洛杉(shan)磯舉行的(de)記(ji)者(zhe)會,2012 年(nian) 10 月(yue) 26 日(ri)上市銷售。

EHR systems lack a unified code system forrepresenting medical concepts, which acts asa barrier for the deployment of deep learningmodels in large scale to multiple clinics and hos-pitals. To overcome this problem, we introduceDescription-based Embedding,DescEmb, a code-agnostic representation learning framework forEHR. DescEmb takes advantage of the flexibil-ity of neural language understanding models toembed clinical events using their textual descrip-tions rather than directly mapping each event toa dedicated embedding. DescEmb outperformedtraditional code-based embedding in extensiveexperiments, especially in a zero-shot transfertask (one hospital to another), and was able totrain a single unified model for heterogeneousEHR datasets.

In this paper, we present a comparative study on the robustness of two different online streaming speech recognition models: Monotonic Chunkwise Attention (MoChA) and Recurrent Neural Network-Transducer (RNN-T). We explore three recently proposed data augmentation techniques, namely, multi-conditioned training using an acoustic simulator, Vocal Tract Length Perturbation (VTLP) for speaker variability, and SpecAugment. Experimental results show that unidirectional models are in general more sensitive to noisy examples in the training set. It is observed that the final performance of the model depends on the proportion of training examples processed by data augmentation techniques. MoChA models generally perform better than RNN-T models. However, we observe that training of MoChA models seems to be more sensitive to various factors such as the characteristics of training sets and the incorporation of additional augmentations techniques. On the other hand, RNN-T models perform better than MoChA models in terms of latency, inference time, and the stability of training. Additionally, RNN-T models are generally more robust against noise and reverberation. All these advantages make RNN-T models a better choice for streaming on-device speech recognition compared to MoChA models.

This paper presents an Expert Decision Support System for the identification of time-invariant, aeroacoustic source types. The system comprises two steps: first, acoustic properties are calculated based on spectral and spatial information. Second, clustering is performed based on these properties. The clustering aims at helping and guiding an expert for quick identification of different source types, providing an understanding of how sources differ. This supports the expert in determining similar or atypical behavior. A variety of features are proposed for capturing the characteristics of the sources. These features represent aeroacoustic properties that can be interpreted by both the machine and by experts. The features are independent of the absolute Mach number which enables the proposed method to cluster data measured at different flow configurations. The method is evaluated on deconvolved beamforming data from two scaled airframe half-model measurements. For this exemplary data, the proposed support system method results in clusters that mostly correspond to the source types identified by the authors. The clustering also provides the mean feature values and the cluster hierarchy for each cluster and for each cluster member a clustering confidence. This additional information makes the results transparent and allows the expert to understand the clustering choices.

Automatic Speech recognition (ASR) is a complex and challenging task. In recent years, there have been significant advances in the area. In particular, for the Brazilian Portuguese (BP) language, there were about 376 hours public available for ASR task until the second half of 2020. With the release of new datasets in early 2021, this number increased to 574 hours. The existing resources, however, are composed of audios containing only read and prepared speech. There is a lack of datasets including spontaneous speech, which are essential in different ASR applications. This paper presents CORAA (Corpus of Annotated Audios) v1. with 290.77 hours, a publicly available dataset for ASR in BP containing validated pairs (audio-transcription). CORAA also contains European Portuguese audios (4.69 hours). We also present a public ASR model based on Wav2Vec 2.0 XLSR-53 and fine-tuned over CORAA. Our model achieved a Word Error Rate of 24.18% on CORAA test set and 20.08% on Common Voice test set. When measuring the Character Error Rate, we obtained 11.02% and 6.34% for CORAA and Common Voice, respectively. CORAA corpora were assembled to both improve ASR models in BP with phenomena from spontaneous speech and motivate young researchers to start their studies on ASR for Portuguese. All the corpora are publicly available at //github.com/nilc-nlp/CORAA under the CC BY-NC-ND 4.0 license.

We consider using the system's optical imaging process with convolutional neural networks (CNNs) to solve the snapshot hyperspectral imaging reconstruction problem, which uses a dual-camera system to capture the three-dimensional hyperspectral images (HSIs) in a compressed way. Various methods using CNNs have been developed in recent years to reconstruct HSIs, but most of the supervised deep learning methods aimed to fit a brute-force mapping relationship between the captured compressed image and standard HSIs. Thus, the learned mapping would be invalid when the observation data deviate from the training data. Especially, we usually don't have ground truth in real-life scenarios. In this paper, we present a self-supervised dual-camera equipment with an untrained physics-informed CNNs framework. Extensive simulation and experimental results show that our method without training can be adapted to a wide imaging environment with good performance. Furthermore, compared with the training-based methods, our system can be constantly fine-tuned and self-improved in real-life scenarios.

A common yet challenging scenario in periocular biometrics is cross-spectral matching - in particular, the matching of visible wavelength against near-infrared (NIR) periocular images. We propose a novel approach to cross-spectral periocular verification that primarily focuses on learning a mapping from visible and NIR periocular images to a shared latent representational subspace, and supports this effort by simultaneously learning intra-spectral image reconstruction. We show the auxiliary image reconstruction task (and in particular the reconstruction of high-level, semantic features) results in learning a more discriminative, domain-invariant subspace compared to the baseline while incurring no additional computational or memory costs at test-time. The proposed Coupled Conditional Generative Adversarial Network (CoGAN) architecture uses paired generator networks (one operating on visible images and the other on NIR) composed of U-Nets with ResNet-18 encoders trained for feature learning via contrastive loss and for intra-spectral image reconstruction with adversarial, pixel-based, and perceptual reconstruction losses. Moreover, the proposed CoGAN model beats the current state-of-art (SotA) in cross-spectral periocular recognition. On the Hong Kong PolyU benchmark dataset, we achieve 98.65% AUC and 5.14% EER compared to the SotA EER of 8.02%. On the Cross-Eyed dataset, we achieve 99.31% AUC and 3.99% EER versus SotA EER of 4.39%.

Deep learning has become the most widely used approach for cardiac image segmentation in recent years. In this paper, we provide a review of over 100 cardiac image segmentation papers using deep learning, which covers common imaging modalities including magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound (US) and major anatomical structures of interest (ventricles, atria and vessels). In addition, a summary of publicly available cardiac image datasets and code repositories are included to provide a base for encouraging reproducible research. Finally, we discuss the challenges and limitations with current deep learning-based approaches (scarcity of labels, model generalizability across different domains, interpretability) and suggest potential directions for future research.

Transformer is a popularly used neural network architecture, especially for language understanding. We introduce an extended and unified architecture which can be used for tasks involving a variety of modalities like image, text, videos, etc. We propose a spatio-temporal cache mechanism that enables learning spatial dimension of the input in addition to the hidden states corresponding to the temporal input sequence. The proposed architecture further enables a single model to support tasks with multiple input modalities as well as asynchronous multi-task learning, thus we refer to it as OmniNet. For example, a single instance of OmniNet can concurrently learn to perform the tasks of part-of-speech tagging, image captioning, visual question answering and video activity recognition. We demonstrate that training these four tasks together results in about three times compressed model while retaining the performance in comparison to training them individually. We also show that using this neural network pre-trained on some modalities assists in learning an unseen task. This illustrates the generalization capacity of the self-attention mechanism on the spatio-temporal cache present in OmniNet.

State-of-the-art deep convolutional networks (DCNs) such as squeeze-and- excitation (SE) residual networks implement a form of attention, also known as contextual guidance, which is derived from global image features. Here, we explore a complementary form of attention, known as visual saliency, which is derived from local image features. We extend the SE module with a novel global-and-local attention (GALA) module which combines both forms of attention -- resulting in state-of-the-art accuracy on ILSVRC. We further describe ClickMe.ai, a large-scale online experiment designed for human participants to identify diagnostic image regions to co-train a GALA network. Adding humans-in-the-loop is shown to significantly improve network accuracy, while also yielding visual features that are more interpretable and more similar to those used by human observers.

Motivation: Biomedical named entity recognition (BioNER) is the most fundamental task in biomedical text mining. State-of-the-art BioNER systems often require handcrafted features specifically designed for each type of biomedical entities. This feature generation process requires intensive labors from biomedical and linguistic experts, and makes it difficult to adapt these systems to new biomedical entity types. Although recent studies explored using neural network models for BioNER to free experts from manual feature generation, these models still require substantial human efforts to annotate massive training data. Results: We propose a multi-task learning framework for BioNER that is based on neural network models to save human efforts. We build a global model by collectively training multiple models that share parameters, each model capturing the characteristics of a different biomedical entity type. In experiments on five BioNER benchmark datasets covering four major biomedical entity types, our model outperforms state-of-the-art systems and other neural network models by a large margin, even when only limited training data are available. Further analysis shows that the large performance gains come from sharing character- and word-level information between different biomedical entities. The approach creates new opportunities for text-mining approaches to help biomedical scientists better exploit knowledge in biomedical literature.

北京阿比特科技有限公司