韩国成年性午夜免费视频_国产视频999免费在线观看_欧洲最大无毒AV网_免费一级中文字幕无码_久久受WWW免费人成看片_欧美一区二区实拍视频_九九热线精品视频16首页

Various precoders have been recently studied by the wireless community to combat the channel fading effects. Two prominent precoders are implemented with the discrete Fourier transform (DFT) and Walsh-Hadamard transform (WHT). The WHT precoder is implemented with less complexity since it does not need complex multiplications. Also, spreading can be applied sparsely to decrease the transceiver complexity, leading to sparse DFT (SDFT) and sparse Walsh-Hadamard (SWH). Another relevant topic is the design of iterative receivers that deal with inter-symbol-interference (ISI). In particular, many detectors based on expectation propagation (EP) have been proposed recently for channels with high levels of ISI. An alternative is the maximum a-posterior (MAP) detector, although it leads to unfeasible high complexity in many cases. In this paper, we provide a relatively low-complexity \textcolor{black}{computation} of the MAP detector for the SWH. We also propose two \textcolor{black}{feasible methods} based on the Log-MAP and Max-Log-MAP. Additionally, the DFT, SDFT and SWH precoders are compared using an EP-based receiver with one-tap FD equalization. Lastly, SWH-Max-Log-MAP is compared to the (S)DFT with EP-based receiver in terms of performance and complexity. The results show that the proposed SWH-Max-Log-MAP has a better performance and complexity trade-off for QPSK and 16-QAM under highly selective channels, but has unfeasible complexity for higher QAM orders.

相關內容

通道

關注 1

Networking · Performer · 通道 · 網絡結構 ·

2023 年 10 月 24 日

Metro Access Network with Convergence of Coherent and Analog RoF Data Services

Amol Delmade,Frank Slyne,Colm Browning,Daniel Kilper Liam Barry,Marco Ruffini

Efficient use of spectral resources will be an important aspect of converged access network deployment. This work analyzes the performance of variable bandwidth Analog Radio-over-Fiber signals transmitted in the unfilled spectral spaces of telecom-grade ROADM channels dedicated for coherent signals transmission over the OpenIreland testbed.

有向 · 暫退法 · Performer · 模態 · 多峰值 ·

2023 年 10 月 23 日

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

Gautam Krishna,Sameer Dharur,Oggi Rudovic,Pranay Dighe,Saurabh Adya,Ahmed Hussen Abdelaziz,Ahmed H Tewfik

from arxiv, 5 pages

Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or more of these modalities being unavailable when deployed in real-world settings. In this paper, we investigate fusion schemes for DDSD systems that can be made more robust to missing modalities. Concurrently, we study the use of non-verbal cues, specifically prosody features, in addition to verbal cues for DDSD. We present different approaches to combine scores and embeddings from prosody with the corresponding verbal cues, finding that prosody improves DDSD performance by upto 8.5% in terms of false acceptance rate (FA) at a given fixed operating point via non-linear intermediate fusion, while our use of modality dropout techniques improves the performance of these models by 7.4% in terms of FA when evaluated with missing modalities during inference time.

語言模型化 · 語音翻譯 · 約束 · 解碼 · MoDELS ·

2023 年 10 月 23 日

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

Arya D. McCarthy,Hao Zhang,Shankar Kumar,Felix Stahlberg,Ke Wu

from arxiv, accepted to the Findings of EMNLP 2023. arXiv admin note: text overlap with arXiv:2212.09895

One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we adapt large language models (LLMs) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We overcome the tendency of hallucination in LLMs by incorporating finite-state constraints during decoding; these eliminate invalid outputs without requiring additional training. We discover that LLMs are adaptable to transcripts containing ASR errors through prompt-tuning or fine-tuning. Relative to a state-of-the-art automatic punctuation baseline, our best LLM improves the average BLEU by 2.9 points for English-German, English-Spanish, and English-Arabic TED talk translation in 9 test sets, just by improving segmentation.

相互獨立的 · SimPLe · 語言模型化 · 縮放 · 無監督 ·

2023 年 10 月 23 日

Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

Wei Liu,Songlin Yang,Yoon Kim,Kewei Tu

from arxiv, Accepted to Findings of EMNLP, 2023

Scaling dense PCFGs to thousands of nonterminals via a low-rank parameterization of the rule probability tensor has been shown to be beneficial for unsupervised parsing. However, PCFGs scaled this way still perform poorly as a language model, and even underperform similarly-sized HMMs. This work introduces \emph{SimplePCFG}, a simple PCFG formalism with independent left and right productions. Despite imposing a stronger independence assumption than the low-rank approach, we find that this formalism scales more effectively both as a language model and as an unsupervised parser. As an unsupervised parser, our simple PCFG obtains an average F1 of 65.1 on the English PTB, and as a language model, it obtains a perplexity of 119.0, outperforming similarly-sized low-rank PCFGs. We further introduce \emph{FlashInside}, a hardware IO-aware implementation of the inside algorithm for efficiently scaling simple PCFGs.

語音翻譯 · 端到端 · 輸出 · MoDELS · state-of-the-art ·

2023 年 10 月 23 日

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

Christian Huber,Tu Anh Dinh,Carlos Mullov,Ngoc Quan Pham,Thai Binh Nguyen,Fabian Retkowski,Stefan Constantin,Enes Yavuz Ugan,Danni Liu,Zhaolin Li,Sai Koneru,Jan Niehues,Alexander Waibel

The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work, we propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions. The evaluation is carried out in an end-to-end fashion. This includes the segmentation of the audio as well as the run-time of the different components. Secondly, we compare different approaches to low-latency speech translation using this framework. We evaluate models with the option to revise the output as well as methods with fixed output. Furthermore, we directly compare state-of-the-art cascaded as well as end-to-end systems. Finally, the framework allows to automatically evaluate the translation quality as well as latency and also provides a web interface to show the low-latency model outputs to the user.

Networking · 代碼 · 通道 · Wyner-Ziv · INFORMS ·

2023 年 10 月 19 日

A Lego-Brick Approach to Coding for Network Communication

Nadim Ghaddar,Shouvik Ganguly,Lele Wang,Young-Han Kim

Coding schemes for several problems in network information theory are constructed starting from point-to-point channel codes that are designed for symmetric channels. Given that the point-to-point codes satisfy certain properties pertaining to the rate, the error probability, and the distribution of decoded sequences, bounds on the performance of the coding schemes are derived and shown to hold irrespective of other properties of the codes. In particular, we consider the problems of lossless and lossy source coding, Slepian-Wolf coding, Wyner-Ziv coding, Berger-Tung coding, multiple description coding, asymmetric channel coding, Gelfand-Pinsker coding, coding for multiple access channels, Marton coding for broadcast channels, and coding for cloud radio access networks (C-RAN's). We show that the coding schemes can achieve the best known inner bounds for these problems, provided that the constituent point-to-point channel codes are rate-optimal. This would allow one to leverage commercial off-the-shelf codes for point-to-point symmetric channels in the practical implementation of codes over networks. Simulation results demonstrate the gain of the proposed coding schemes compared to existing practical solutions to these problems.

圖形處理器 · 圖 · Better · Neural Networks · 視覺問答 ·

2020 年 3 月 31 日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Difei Gao,Ke Li,Ruiping Wang,Shiguang Shan,Xilin Chen

from arxiv, Published as a CVPR2020 paper

Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.

Performer · 判別器 · 正例 · 假陽性 · 監督 ·

2018 年 5 月 24 日

DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction

Pengda Qin,Weiran Xu,William Yang Wang

Distant supervision can effectively label data for relation extraction, but suffers from the noise labeling problem. Recent works mainly perform soft bag-level noise reduction strategies to find the relatively better samples in a sentence bag, which is suboptimal compared with making a hard decision of false positive samples in sentence level. In this paper, we introduce an adversarial learning framework, which we named DSGAN, to learn a sentence-level true-positive generator. Inspired by Generative Adversarial Networks, we regard the positive samples generated by the generator as the negative samples to train the discriminator. The optimal generator is obtained until the discrimination ability of the discriminator has the greatest decline. We adopt the generator to filter distant supervision training dataset and redistribute the false positive instances into the negative set, in which way to provide a cleaned dataset for relation classification. The experimental results show that the proposed strategy significantly improves the performance of distant supervision relation extraction comparing to state-of-the-art systems.

視頻描述生成（Video Caption） · 掩碼 · MoDELS · 端到端 · 解碼 ·

2018 年 4 月 3 日

End-to-End Dense Video Captioning with Masked Transformer

Luowei Zhou,Yingbo Zhou,Jason J. Corso,Richard Socher,Caiming Xiong

from arxiv, To appear at CVPR18

Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events. Therefore, all previous methods on dense video captioning tackle this problem by building two models, i.e. an event proposal and a captioning model, for these two sub-problems. The models are either trained separately or in alternation. This prevents direct influence of the language description to the event proposal, which is important for generating accurate descriptions. To address this problem, we propose an end-to-end transformer model for dense video captioning. The encoder encodes the video into appropriate representations. The proposal decoder decodes from the encoding with different anchors to form video event proposals. The captioning decoder employs a masking network to restrict its attention to the proposal event over the encoding feature. This masking network converts the event proposal to a differentiable mask, which ensures the consistency between the proposal and captioning during training. In addition, our model employs a self-attention mechanism, which enables the use of efficient non-recurrent structure during encoding and leads to performance improvements. We demonstrate the effectiveness of this end-to-end model on ActivityNet Captions and YouCookII datasets, where we achieved 10.12 and 6.58 METEOR score, respectively.

視覺問答 · 自頂向下 · 圖像字幕 · 注意力機制 · 自下而上 ·

2018 年 3 月 14 日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Peter Anderson,Xiaodong He,Chris Buehler,Damien Teney,Mark Johnson,Stephen Gould,Lei Zhang

from arxiv, CVPR 2018 full oral, winner of the 2017 Visual Question Answering challenge

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.