Owing to the widespread adoption of the Internet of Things, a vast amount of sensor information is being acquired in real time. Accordingly, the communication cost of data from edge devices is increasing. Compressed sensing (CS), a data compression method that can be used on edge devices, has been attracting attention as a method to reduce communication costs. In CS, estimating the appropriate compression ratio is important. There is a method to adaptively estimate the compression ratio for the acquired data using reinforcement learning (RL). However, the computational costs associated with existing RL methods that can be utilized on edges are often high. In this study, we developed an efficient RL method for edge devices, referred to as the actor--critic online sequential extreme learning machine (AC-OSELM), and a system to compress data by estimating an appropriate compression ratio on the edge using AC-OSELM. The performance of the proposed method in estimating the compression ratio is evaluated by comparing it with other RL methods for edge devices. The experimental results indicate that AC-OSELM demonstrated the same or better compression performance and faster compression ratio estimation than the existing methods.
Channel modeling is a fundamental task for the design and evaluation of wireless technologies and networks, before actual prototyping, commercial product development and real deployments. The recent trends of current and future mobile networks, which include large antenna systems, massive deployments, and high-frequency bands, require complex channel models for the accurate simulation of massive MIMO in mmWave and THz bands. To address the complexity/accuracy trade-off, a spatial channel model has been defined by 3GPP (TR 38.901), which has been shown to be the main bottleneck of current system-level simulations in ns-3. In this paper, we focus on improving the channel modeling efficiency for large-scale MIMO system-level simulations. Extensions are developed in two directions. First, we improve the efficiency of the current 3GPP TR 38.901 implementation code in ns-3, by allowing the use of the Eigen library for more efficient matrix algebra operations, among other optimizations and a more modular code structure. Second, we propose a new performance-oriented MIMO channel model for reduced complexity, as an alternative model suitable for mmWave}/THz bands, and calibrate it against the 3GPP TR 38.901 model. Simulation results demonstrate the proper calibration of the newly introduced model for various scenarios and channel conditions, and exhibit an effective reduction of the simulation time (up to 16 times compared to the previous baseline) thanks to the various proposed improvements.
The issue of over-limit during passenger aircraft flights has drawn increasing attention in civil aviation due to its potential safety risks. To address this issue, real-time automated warning systems are essential. In this study, a real-time warning model for civil aviation over-limit is proposed based on QAR data monitoring. Firstly, highly correlated attributes to over-limit are extracted from a vast QAR dataset using the Spearman rank correlation coefficient. Because flight over-limit poses a binary classification problem with unbalanced samples, this paper incorporates cost-sensitive learning in the LSTM model. Finally, the time step length, number of LSTM cells, and learning rate in the LSTM model are optimized using a grid search approach. The model is trained on a real dataset, and its performance is evaluated on a validation set. The experimental results show that the proposed model achieves an F1 score of 0.991 and an accuracy of 0.978, indicating its effectiveness in real-time warning of civil aviation over-limit.
The rapid advancement of quantum computing has led to an extensive demand for effective techniques to extract classical information from quantum systems, particularly in fields like quantum machine learning and quantum chemistry. However, quantum systems are inherently susceptible to noises, which adversely corrupt the information encoded in quantum systems. In this work, we introduce an efficient algorithm that can recover information from quantum states under Pauli noise. The core idea is to learn the necessary information of the unknown Pauli channel by post-processing the classical shadows of the channel. For a local and bounded-degree observable, only partial knowledge of the channel is required rather than its complete classical description to recover the ideal information, resulting in a polynomial-time algorithm. This contrasts with conventional methods such as probabilistic error cancellation, which requires the full information of the channel and exhibits exponential scaling with the number of qubits. We also prove that this scalable method is optimal on the sample complexity and generalise the algorithm to the weight contracting channel. Furthermore, we demonstrate the validity of the algorithm on the 1D anisotropic Heisenberg-type model via numerical simulations. As a notable application, our method can be severed as a sample-efficient error mitigation scheme for Clifford circuits.
Many machine learning problems can be framed in the context of estimating functions, and often these are time-dependent functions that are estimated in real-time as observations arrive. Gaussian processes (GPs) are an attractive choice for modeling real-valued nonlinear functions due to their flexibility and uncertainty quantification. However, the typical GP regression model suffers from several drawbacks: 1) Conventional GP inference scales $O(N^{3})$ with respect to the number of observations; 2) Updating a GP model sequentially is not trivial; and 3) Covariance kernels typically enforce stationarity constraints on the function, while GPs with non-stationary covariance kernels are often intractable to use in practice. To overcome these issues, we propose a sequential Monte Carlo algorithm to fit infinite mixtures of GPs that capture non-stationary behavior while allowing for online, distributed inference. Our approach empirically improves performance over state-of-the-art methods for online GP estimation in the presence of non-stationarity in time-series data. To demonstrate the utility of our proposed online Gaussian process mixture-of-experts approach in applied settings, we show that we can sucessfully implement an optimization algorithm using online Gaussian process bandits.
Federated edge learning (FEEL) is a popular distributed learning framework for privacy-preserving at the edge, in which densely distributed edge devices periodically exchange model-updates with the server to complete the global model training. Due to limited bandwidth and uncertain wireless environment, FEEL may impose heavy burden to the current communication system. In addition, under the common FEEL framework, the server needs to wait for the slowest device to complete the update uploading before starting the aggregation process, leading to the straggler issue that causes prolonged communication time. In this paper, we propose to accelerate FEEL from two aspects: i.e., 1) performing data compression on the edge devices and 2) setting a deadline on the edge server to exclude the straggler devices. However, undesired gradient compression errors and transmission outage are introduced by the aforementioned operations respectively, affecting the convergence of FEEL as well. In view of these practical issues, we formulate a training time minimization problem, with the compression ratio and deadline to be optimized. To this end, an asymptotically unbiased aggregation scheme is first proposed to ensure zero optimality gap after convergence, and the impact of compression error and transmission outage on the overall training time are quantified through convergence analysis. Then, the formulated problem is solved in an alternating manner, based on which, the novel joint compression and deadline optimization (JCDO) algorithm is derived. Numerical experiments for different use cases in FEEL including image classification and autonomous driving show that the proposed method is nearly 30X faster than the vanilla FedAVG algorithm, and outperforms the state-of-the-art schemes.
We consider a joint sampling and compression system for timely status updates. Samples are taken, quantized and encoded into binary sequences, which are sent to the destination. We formulate an optimization problem to jointly design sampler, quantizer and encoder, minimizing the age of information (AoI) on the basis of satisfying a mean-squared error (MSE) distortion constraint of the samples. We prove that the zero-wait sampling, the uniform quantization, and the real-valued AoI-optimal coding policies together provide an asymptotically optimal solution to this problem, i.e., as the average distortion approaches zero, the combination achieves the minimum AoI asymptotically. Furthermore, we prove that the AoI of this solution is asymptotically linear with respect to the log MSE distortion with a slope of $-\frac{3}{4}$. We also show that the real-valued Shannon coding policy suffices to achieve the optimal performance asymptotically. Numerical simulations corroborate the analysis.
Approximate computing (AC) has become a prominent solution to improve the performance, area, and power/energy efficiency of a digital design at the cost of output accuracy. We propose a novel scalable approximate multiplier that utilizes a lookup table-based compensation unit. To improve energy-efficiency, input operands are truncated to a reduced bitwidth representation (e.g., h bits) based on their leading one positions. Then, a curve-fitting method is employed to map the product term to a linear function, and a piecewise constant error-correction term is used to reduce the approximation error. For computing the piecewise constant error-compensation term, we partition the function space into M segments and compute the compensation factor for each segment by averaging the errors in the segment. The multiplier supports various degrees of truncation and error-compensation to exploit accuracy-efficiency trade-off. The proposed approximate multiplier offers better error metrics such as mean and standard deviation of absolute relative error (MARED and StdARED) compare to a state-of-the-art integer approximate multiplier. The proposed approximate multiplier improves the MARED and StdARED by about 38% and 32% when its energy consumption is about equal to the state-of-the-art approximate multiplier. Moreover, the performance of the proposed approximate multiplier is evaluated in image classification applications using a Deep Neural Network (DNN). The results indicate that the degradation of DNN accuracy is negligible especially due to the compensation properties of our approximate multiplier.
Due to the ever increasing data rate demand of beyond 5G networks and considering the wide range of Orthogonal Frequency Division Multipllexing (OFDM) technique in cellular systems, it is critical to reduce pilot overhead of OFDM systems in order to increase data rate of such systems. Due to sparsity of multipath channels, sparse recovery methods can be exploited to reduce pilot overhead. OFDM pilots are utilized as random samples for channel impulse response estimation. We propose a three-step sparsity recovery algorithm which is based on sparsity domain smoothing. Time domain residue computation, sparsity domain smoothing, and adaptive thresholding sparsifying are the three-steps of the proposed scheme. To the best of our knowledge, the proposed sparsity domain smoothing based thresholding recovery method known as SDS-IMAT has not been used for OFDM sparse channel estimation in the literature. Pilot locations are also derived based on the minimization of the measurement matrix coherence. Numerical results verify that the performance of the proposed scheme outperforms other existing thresholding and greedy recovery methods and has a near-optimal performance. The effectiveness of the proposed scheme is shown in terms of mean square error and bit error rate.
In many neuromorphic workflows, simulators play a vital role for important tasks such as training spiking neural networks (SNNs), running neuroscience simulations, and designing, implementing and testing neuromorphic algorithms. Currently available simulators are catered to either neuroscience workflows (such as NEST and Brian2) or deep learning workflows (such as BindsNET). While the neuroscience-based simulators are slow and not very scalable, the deep learning-based simulators do not support certain functionalities such as synaptic delay that are typical of neuromorphic workloads. In this paper, we address this gap in the literature and present SuperNeuro, which is a fast and scalable simulator for neuromorphic computing, capable of both homogeneous and heterogeneous simulations as well as GPU acceleration. We also present preliminary results comparing SuperNeuro to widely used neuromorphic simulators such as NEST, Brian2 and BindsNET in terms of computation times. We demonstrate that SuperNeuro can be approximately 10--300 times faster than some of the other simulators for small sparse networks. On large sparse and large dense networks, SuperNeuro can be approximately 2.2 and 3.4 times faster than the other simulators respectively.
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.