The core requirement of massive Machine-Type Communication (mMTC) is to support reliable and fast access for an enormous number of machine-type devices (MTDs). In many practical applications, the base station (BS) only concerns the list of received messages instead of the source information, introducing the emerging concept of unsourced random access (URA). Although some massive multiple-input multiple-output (MIMO) URA schemes have been proposed recently, the unique propagation properties of millimeter-wave (mmWave) massive MIMO systems are not fully exploited in conventional URA schemes. In grant-free random access, the BS cannot perform receive beamforming independently as the identities of active users are unknown to the BS. Therefore, only the intrinsic beam division property can be exploited to improve the decoding performance. In this paper, a URA scheme based on beam-space tree decoding is proposed for mmWave massive MIMO system. Specifically, two beam-space tree decoders are designed based on hard decision and soft decision, respectively, to utilize the beam division property. They both leverage the beam division property to assist in discriminating the sub-blocks transmitted from different users. Besides, the first decoder can reduce the searching space, enjoying a low complexity. The second decoder exploits the advantage of list decoding to recover the miss-detected packets. Simulation results verify the superiority of the proposed URA schemes compared to the conventional URA schemes in terms of error probability.
Given its status as a classic problem and its importance to both theoreticians and practitioners, edit distance provides an excellent lens through which to understand how the theoretical analysis of algorithms impacts practical implementations. From an applied perspective, the goals of theoretical analysis are to predict the empirical performance of an algorithm and to serve as a yardstick to design novel algorithms that perform well in practice. In this paper, we systematically survey the types of theoretical analysis techniques that have been applied to edit distance and evaluate the extent to which each one has achieved these two goals. These techniques include traditional worst-case analysis, worst-case analysis parametrized by edit distance or entropy or compressibility, average-case analysis, semi-random models, and advice-based models. We find that the track record is mixed. On one hand, two algorithms widely used in practice have been born out of theoretical analysis and their empirical performance is captured well by theoretical predictions. On the other hand, all the algorithms developed using theoretical analysis as a yardstick since then have not had any practical relevance. We conclude by discussing the remaining open problems and how they can be tackled.
Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy. The majority of prior studies consider HTC as a flat multi-label classification problem, which inevitably leads to "label inconsistency" problem. In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure. Moreover, we design a constrained decoding strategy with dynamic vocabulary to secure the label consistency of the results. Compared with previous works, the proposed approach achieves significant and consistent improvements on three benchmark datasets.
Autonomous marine vessels are expected to avoid inter-vessel collisions and comply with the international regulations for safe voyages. This paper presents a stepwise path planning method using stream functions. The dynamic flow of fluids is used as a guidance model, where the collision avoidance in static environments is achieved by applying the circular theorem in the sink flow. We extend this method to dynamic environments by adding vortex flows in the flow field. The stream function is recursively updated to enable on the fly waypoint decisions. The vessel avoids collisions and also complies with several rules of the Convention on the International Regulations for Preventing Collisions at Sea. The method is conceptually and computationally simple and convenient to tune, and yet versatile to handle complex and dense marine traffic with multiple dynamic obstacles. The ship dynamics are taken into account, by using B\'{e}zier curves to generate a sufficiently smooth path with feasible curvature. Numerical simulations are conducted to verify the proposed method.
Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article introduces the use of linear and non-linear compression schemes on barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.
Recent decades, the emergence of numerous novel algorithms makes it a gimmick to propose an intelligent optimization system based on metaphor, and hinders researchers from exploring the essence of search behavior in algorithms. However, it is difficult to directly discuss the search behavior of an intelligent optimization algorithm, since there are so many kinds of intelligent schemes. To address this problem, an intelligent optimization system is regarded as a simulated physical optimization system in this paper. The dynamic search behavior of such a simplified physical optimization system are investigated with quantum theory. To achieve this goal, the Schroedinger equation is employed as the dynamics equation of the optimization algorithm, which is used to describe dynamic search behaviours in the evolution process with quantum theory. Moreover, to explore the basic behaviour of the optimization system, the optimization problem is assumed to be decomposed and approximated. Correspondingly, the basic search behaviour is derived, which constitutes the basic iterative process of a simple optimization system. The basic iterative process is compared with some classical bare-bones schemes to verify the similarity of search behavior under different metaphors. The search strategies of these bare bones algorithms are analyzed through experiments.
Bayesian model selection provides a powerful framework for objectively comparing models directly from observed data, without reference to ground truth data. However, Bayesian model selection requires the computation of the marginal likelihood (model evidence), which is computationally challenging, prohibiting its use in many high-dimensional Bayesian inverse problems. With Bayesian imaging applications in mind, in this work we present the proximal nested sampling methodology to objectively compare alternative Bayesian imaging models for applications that use images to inform decisions under uncertainty. The methodology is based on nested sampling, a Monte Carlo approach specialised for model comparison, and exploits proximal Markov chain Monte Carlo techniques to scale efficiently to large problems and to tackle models that are log-concave and not necessarily smooth (e.g., involving l_1 or total-variation priors). The proposed approach can be applied computationally to problems of dimension O(10^6) and beyond, making it suitable for high-dimensional inverse imaging problems. It is validated on large Gaussian models, for which the likelihood is available analytically, and subsequently illustrated on a range of imaging problems where it is used to analyse different choices of dictionary and measurement model.
Speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose COnSecutive Transcription and Translation (COSTT), an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. It benefits the model training so that additional large parallel text corpus can be fully exploited to enhance the speech translation training. Our method is verified on three mainstream datasets, including Augmented LibriSpeech English-French dataset, IWSLT2018 English-German dataset, and TED English-Chinese dataset. Experiments show that our proposed COSTT outperforms or on par with the previous state-of-the-art methods on the three datasets. We have released our code at \url{//github.com/dqqcasia/st}.
5G applications have become increasingly popular in recent years as the spread of fifth-generation (5G) network deployment has grown. For vehicular networks, mmWave band signals have been well studied and used for communication and sensing. In this work, we propose a new dynamic ray tracing algorithm that exploits spatial and temporal coherence. We evaluate the performance by comparing the results on typical vehicular communication scenarios with GEMV^2, which uses a combination of deterministic and stochastic models, and WinProp, which utilizes the deterministic model for simulations with given environment information. We also compare the performance of our algorithm on complex, urban models and observe a reduction in computation time by 36% compared to GEMV^2 and by 30% compared to WinProp, while maintaining similar prediction accuracy.
Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR). However, pre-training objectives tailored for ad-hoc retrieval have not been well explored. In this paper, we propose Pre-training with Representative wOrds Prediction (PROP) for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the "ideal" document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. Given an input document, we sample a pair of word sets according to the document language model, where the set with higher likelihood is deemed as more representative of the document. We then pre-train the Transformer model to predict the pairwise preference between the two word sets, jointly with the Masked Language Model (MLM) objective. By further fine-tuning on a variety of representative downstream ad-hoc retrieval tasks, PROP achieves significant improvements over baselines without pre-training or with other pre-training methods. We also show that PROP can achieve exciting performance under both the zero- and low-resource IR settings. The code and pre-trained models are available at //github.com/Albert-Ma/PROP.
Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.