We propose a novel binary polarization shift keying modulation scheme for a line-of-sight environment by exploiting the polarization control ability of the reconfigurable intelligent surface (RIS). The RIS encodes the information data in terms of the polarization states of either the reflected wave from the RIS or the composite wireless channel between an RF source and receiver. In the first case, polarization mismatch correction becomes essential at the receiver. In the second case, the RIS pre-codes the reflected wave to compensate for the polarization mismatch which allows non-coherent demodulation at the receiver.
In this work, we study the performance of sub-gradient method (SubGM) on a natural nonconvex and nonsmooth formulation of low-rank matrix recovery with $\ell_1$-loss, where the goal is to recover a low-rank matrix from a limited number of measurements, a subset of which may be grossly corrupted with noise. We study a scenario where the rank of the true solution is unknown and over-estimated instead. The over-estimation of the rank gives rise to an over-parameterized model in which there are more degrees of freedom than needed. Such over-parameterization may lead to overfitting, or adversely affect the performance of the algorithm. We prove that a simple SubGM with small initialization is agnostic to both over-parameterization and noise in the measurements. In particular, we show that small initialization nullifies the effect of over-parameterization on the performance of SubGM, leading to an exponential improvement in its convergence rate. Moreover, we provide the first unifying framework for analyzing the behavior of SubGM under both outlier and Gaussian noise models, showing that SubGM converges to the true solution, even under arbitrarily large and arbitrarily dense noise values, and--perhaps surprisingly--even if the globally optimal solutions do not correspond to the ground truth. At the core of our results is a robust variant of restricted isometry property, called Sign-RIP, which controls the deviation of the sub-differential of the $\ell_1$-loss from that of an ideal, expected loss. As a byproduct of our results, we consider a subclass of robust low-rank matrix recovery with Gaussian measurements, and show that the number of required samples to guarantee the global convergence of SubGM is independent of the over-parameterized rank.
In this paper, the indoor dense space (IDS) channel at 28 GHz is characterized through extensive Ray-Tracing (RT) simulations. We consider IDS as a specific type of indoor environment with confined geometry and packed with humans, such as aircraft cabins and train wagons. Based on RT simulations, we characterize path loss, shadow fading, root-mean-square delay spread, Rician K-factor, azimuth/elevation angular spread of arrival/departure considering different RT simulation scenarios of the fuselage geometry, material, and human presence. While the large-scale fading parameters are similar to the state-of-the-art channel models, the small-scale fading parameters demonstrate richer multipath scattering in IDS, resulting in poorer bit error rate performance in comparison to the 3GPP indoor channel model.
With the rapid expansion of the Internet of Things, the efficient sharing of the wireless medium by a large amount of simple transmitters is becoming essential. Scheduling-based solutions are inefficient for this setting, where small data units are broadcast sporadically by terminals that most of the time are idle. Modern random access has embraced the challenge and provides suitable slot-synchronous and asynchronous multiple access solutions based on replicating the packets and exploiting successive interference cancellation (SIC) at the receiver. In this work, we focus on asynchronous modern random access. Specifically, we derive an analytical approximation of the performance of irregular repetition ALOHA (IRA) in the so-called error floor region. Numerical results show the tightness of the derived approximation under various scenarios.
The quadruped robot is a versatile mobile platform with potential ability for high payload carrying. However, most of the existing quadruped robots aim at high maneuverability, highly dynamic and agile locomotion. In spite of this, payload carrying is still an indispensable ability for the quadruped robots. Design of a quadruped robot with high payload capacity is yet deeply explored. In this study, a 50 kg electrically-actuated quadruped robot, Kirin, is presented to leverage the payload carrying capability. Kirin is an characterized with prismatic quasi-direct-drive (QDD) leg. This mechanism greatly augments the payload carrying capability. This study presents several design principles for the payload-carrying-oriented quadruped robots, including the mechanical design, actuator parameters selection, and locomotion control method. The theoretical analysis implies that the lifting task tends to be a bottleneck for the existing robots with the articulated knee joints. By using prismatic QDD leg, the payload carrying capability of Kirin is enhanced greatly. To demonstrate Kirin's payload carrying capability, in preliminary experiment, up to 125 kg payload lifting in static stance and 50 kg payload carrying in dynamic trotting are tested. Whole body compliance with payload carrying is also demonstrated.
In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples. We focus on a rigorous and empirical "closed-model adversarial robustness" setting (e.g., on-device or cloud applications). The adversarial noise is only generated by closed-model optimization (e.g., evolutionary and zeroth-order estimation) without accessing gradient information of a targeted ASR model directly. We propose an advanced Bayesian neural network (BNN) based adversarial detector, which could model latent distributions against adaptive adversarial perturbation with divergence measurement. We further simulate deployment scenarios of RNN Transducer, Conformer, and wav2vec-2.0 based ASR systems with the proposed adversarial detection system. Leveraging the proposed BNN based detection system, we improve detection rate by +2.77 to +5.42% (relative +3.03 to +6.26%) and reduce the word error rate by 5.02 to 7.47% on LibriSpeech datasets compared to the current model enhancement methods against the adversarial speech examples.
We introduce a divergence measure between data distributions based on operators in reproducing kernel Hilbert spaces defined by kernels. The empirical estimator of the divergence is computed using the eigenvalues of positive definite Gram matrices that are obtained by evaluating the kernel over pairs of data points. The new measure shares similar properties to Jensen-Shannon divergence. Convergence of the proposed estimators follows from concentration results based on the difference between the ordered spectrum of the Gram matrices and the integral operators associated with the population quantities. The proposed measure of divergence avoids the estimation of the probability distribution underlying the data. Numerical experiments involving comparing distributions and applications to sampling unbalanced data for classification show that the proposed divergence can achieve state of the art results.
Cell-free massive multiple-input multiple-output (MIMO) and intelligent reflecting surface (IRS) are considered as the prospective multiple antenna technologies for beyond the fifth-generation (5G) networks. Cell-free MIMO systems powered by IRSs, combining both technologies, can further improve the performance of cell-free MIMO systems at low cost and energy consumption. Prior works focused on instantaneous performance metrics and relied on alternating optimization algorithms, which impose huge computational complexity and signaling overhead. To address these challenges, we propose a novel two-step algorithm that provides the long-term passive beamformers at the IRSs using statistical channel state information (S-CSI) and short-term active precoders and long-term power allocation at the access points (APs) to maximize the minimum achievable rate. Simulation results verify that the proposed scheme outperforms benchmark schemes and brings a significant performance gain to the cell-free MIMO systems powered by IRSs.
The integration of Reconfigurable Intelligent Surfaces (RISs) into wireless environments endows channels with programmability, and is expected to play a key role in future communication standards. To date, most RIS-related efforts focus on quasi-free-space, where wireless channels are typically modeled analytically. Many realistic communication scenarios occur, however, in rich-scattering environments which, moreover, evolve dynamically. These conditions present a tremendous challenge in identifying an RIS configuration that optimizes the achievable communication rate. In this paper, we make a first step toward tackling this challenge. Based on a simulator that is faithful to the underlying wave physics, we train a deep neural network as surrogate forward model to capture the stochastic dependence of wireless channels on the RIS configuration under dynamic rich-scattering conditions. Subsequently, we use this model in combination with a genetic algorithm to identify RIS configurations optimizing the communication rate. We numerically demonstrate the ability of the proposed approach to tune RISs to improve the achievable rate in rich-scattering setups.
Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii) geometric context from 2D keypoint distribution. Moreover, we propose an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. The proposed augmentation scheme is lightweight compared with the raw local feature description, meanwhile improves remarkably on several large-scale benchmarks with diversified scenes, which demonstrates both strong practicality and generalization ability in geometric matching applications.
This paper proposes a neural sequence-to-sequence text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and recording conditions. The model is formulated as a conditional generative model based on the variational autoencoder (VAE) framework, with two levels of hierarchical latent variables. The first level is a categorical variable, which represents attribute groups (e.g. clean/noisy) and provides interpretability. The second level, conditioned on the first, is a multivariate Gaussian variable, which characterizes specific attribute configurations (e.g. noise level, speaking rate) and enables disentangled fine-grained control over these attributes. This amounts to using a Gaussian mixture model (GMM) for the latent distribution. Extensive evaluation demonstrates its ability to control the aforementioned attributes. In particular, we train a high-quality controllable TTS model on real found data, which is capable of inferring speaker and style attributes from a noisy utterance and use it to synthesize clean speech with controllable speaking style.