To reduce multiuser interference and maximize the spectrum efficiency in frequency division duplexing massive multiple-input multiple-output (MIMO) systems, the downlink channel state information (CSI) estimated at the user equipment (UE) is required at the base station (BS). This paper presents a novel method for massive MIMO CSI feedback via a one-sided deep learning framework. The CSI is compressed via linear projections at the UE, and is recovered at the BS using deep plug-and-play priors (PPP). Instead of using handcrafted regularizers for the wireless channel responses, the proposed approach, namely CSI-PPPNet, exploits a deep learning (DL) based denoisor in place of the proximal operator of the prior in an alternating optimization scheme. This way, a DL model trained once for denoising can be repurposed for CSI recovery tasks with arbitrary linear projections. In addition to the one-for-all property, in comparison to the two-sided autoencoder-based CSI feedback architecture, the one-sided framework relieves the burden of joint model training and model delivery, and could be applied at UEs with limited device memories and computation power. This opens new perspectives in the field of DL-based CSI feedback. Extensive experiments over the open indoor and urban macro scenarios show the effectiveness of the proposed method.
In wideband millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems, channel estimation is challenging due to the hybrid analog-digital architecture, which compresses the received pilot signal and makes channel estimation a compressive sensing (CS) problem. However, existing high-performance CS algorithms usually suffer from high complexity. On the other hand, the beam squint effect caused by huge bandwidth and massive antennas will deteriorate estimation performance. In this paper, frequency-dependent angular dictionaries are first adopted to compensate for beam squint. Then, the expectation-maximization (EM)-based sparse Bayesian learning (SBL) algorithm is enhanced in two aspects, where the E-step in each iteration is implemented by approximate message passing (AMP) to reduce complexity while the M-step is realized by a deep neural network (DNN) to improve performance. In simulation, the proposed AMP-SBL unfolding-based channel estimator achieves satisfactory performance with low complexity.
Software Defined Networks have opened the door to statistical and AI-based techniques to improve efficiency of networking. Especially to ensure a certain Quality of Service (QoS) for specific applications by routing packets with awareness on content nature (VoIP, video, files, etc.) and its needs (latency, bandwidth, etc.) to use efficiently resources of a network. Monitoring and predicting various Key Performance Indicators (KPIs) at any level may handle such problems while preserving network bandwidth. The question addressed in this work is the design of efficient, low-cost adaptive algorithms for KPI estimation, monitoring and prediction. We focus on end-to-end latency prediction, for which we illustrate our approaches and results on data obtained from a public generator provided after the recent international challenge on GNN [12]. In this paper, we improve our previously proposed low-cost estimators [6] by adding the adaptive dimension, and show that the performances are minimally modified while gaining the ability to track varying networks.
Federated learning (FL) is a decentralized model for training data distributed across client devices. Coded computing (CC) is a method for mitigating straggling workers in a centralized computing network, by using erasure-coding techniques. In this work we propose approximating the inverse of a data matrix, where the data is generated by clients; similar to the FL paradigm, while also being resilient to stragglers. To do so, we propose a CC method based on gradient coding. We modify this method so that the coordinator does not need to have access to the local data, the network we consider is not centralized, and the communications which take place are secure against potential eavesdroppers.
Cell-free massive MIMO is emerging as a promising technology for future wireless communication systems, which is expected to offer uniform coverage and high spectral efficiency compared to classical cellular systems. We study in this paper how cell-free massive MIMO can support federated edge learning. Taking advantage of the additive nature of the wireless multiple access channel, over-the-air computation is exploited, where the clients send their local updates simultaneously over the same communication resource. This approach, known as over-the-air federated learning (OTA-FL), is proven to alleviate the communication overhead of federated learning over wireless networks. Considering channel correlation and only imperfect channel state information available at the central server, we propose a practical implementation of OTA-FL over cell-free massive MIMO. The convergence of the proposed implementation is studied analytically and experimentally, confirming the benefits of cell-free massive MIMO for OTA-FL.
Orthogonal time frequency space (OTFS) modulation is a recently proposed delay-Doppler (DD) domain communication scheme, which has shown promising performance in general wireless communications, especially over high-mobility channels. In this paper, we investigate DD domain Tomlinson-Harashima precoding (THP) for downlink multiuser multiple-input and multiple-output OTFS (MU-MIMO-OTFS) transmissions. Instead of directly applying THP based on the huge equivalent channel matrix, we propose a simple implementation of THP that does not require any matrix decomposition or inversion. Such a simple implementation is enabled by the DD domain channel property, i.e., different resolvable paths do not share the same delay and Doppler shifts, which makes it possible to pre-cancel all the DD domain interference in a symbol-by-symbol manner. We also study the achievable rate performance for the proposed scheme by leveraging the information-theoretical equivalent models. In particular, we show that the proposed scheme can achieve a near optimal performance in the high signal-to-noise ratio (SNR) regime. More importantly, scaling laws for achievable rates with respect to number of antennas and users are derived, which indicate that the achievable rate increases logarithmically with the number of antennas and linearly with the number of users. Our numerical results align well with our findings and also demonstrate a significant improvement compared to existing MU-MIMO schemes on OTFS and orthogonal frequency-division multiplexing (OFDM).
In autonomous robot exploration tasks, a mobile robot needs to actively explore and map an unknown environment as fast as possible. Since the environment is being revealed during exploration, the robot needs to frequently re-plan its path online, as new information is acquired by onboard sensors and used to update its partial map. While state-of-the-art exploration planners are frontier- and sampling-based, encouraged by the recent development in deep reinforcement learning (DRL), we propose ARiADNE, an attention-based neural approach to obtain real-time, non-myopic path planning for autonomous exploration. ARiADNE is able to learn dependencies at multiple spatial scales between areas of the agent's partial map, and implicitly predict potential gains associated with exploring those areas. This allows the agent to sequence movement actions that balance the natural trade-off between exploitation/refinement of the map in known areas and exploration of new areas. We experimentally demonstrate that our method outperforms both learning and non-learning state-of-the-art baselines in terms of average trajectory length to complete exploration in hundreds of simplified 2D indoor scenarios. We further validate our approach in high-fidelity Robot Operating System (ROS) simulations, where we consider a real sensor model and a realistic low-level motion controller, toward deployment on real robots.
Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have all have increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code, for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. Our hope is that this survey would provide the reader with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.
Conventionally, spatiotemporal modeling network and its complexity are the two most concentrated research topics in video action recognition. Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance. In this paper, we attempt to acquire both efficiency and effectiveness simultaneously. First of all, besides traditionally treating H x W x T video frames as space-time signal (viewing from the Height-Width spatial plane), we propose to also model video from the other two Height-Time and Width-Time planes, to capture the dynamics of video thoroughly. Secondly, our model is designed based on 2D CNN backbones and model complexity is well kept in mind by design. Specifically, we introduce a novel multi-view fusion (MVF) module to exploit video dynamics using separable convolution for efficiency. It is a plug-and-play module and can be inserted into off-the-shelf 2D CNNs to form a simple yet effective model called MVFNet. Moreover, MVFNet can be thought of as a generalized video modeling framework and it can specialize to be existing methods such as C2D, SlowOnly, and TSM under different settings. Extensive experiments are conducted on popular benchmarks (i.e., Something-Something V1 & V2, Kinetics, UCF-101, and HMDB-51) to show its superiority. The proposed MVFNet can achieve state-of-the-art performance with 2D CNN's complexity.
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research.
Image segmentation is considered to be one of the critical tasks in hyperspectral remote sensing image processing. Recently, convolutional neural network (CNN) has established itself as a powerful model in segmentation and classification by demonstrating excellent performances. The use of a graphical model such as a conditional random field (CRF) contributes further in capturing contextual information and thus improving the segmentation performance. In this paper, we propose a method to segment hyperspectral images by considering both spectral and spatial information via a combined framework consisting of CNN and CRF. We use multiple spectral cubes to learn deep features using CNN, and then formulate deep CRF with CNN-based unary and pairwise potential functions to effectively extract the semantic correlations between patches consisting of three-dimensional data cubes. Effective piecewise training is applied in order to avoid the computationally expensive iterative CRF inference. Furthermore, we introduce a deep deconvolution network that improves the segmentation masks. We also introduce a new dataset and experimented our proposed method on it along with several widely adopted benchmark datasets to evaluate the effectiveness of our method. By comparing our results with those from several state-of-the-art models, we show the promising potential of our method.