This paper presents two hybrid beamforming (HYBF) designs for a multi-user multi-cell millimeter (mmWave) full-duplex (FD) system. The base stations (BSs) and the users are assumed to be suffering from the limited dynamic range (LDR) noise. Firstly, we present a centralized HYBF (C-HYBF) scheme based on alternating optimization. In general, the complexity of C-HYBF schemes scales quadratically as a function of the number of users, which is very undesirable. Moreover, tremendous computational power is required to optimize numerous variables jointly in FD. Another major drawback is that huge communication overhead is also required to transfer complete channel state information (CSI) to the central node every channel coherence time. To overcome these drawbacks, we present a very low-complexity and highly scalable cooperative per-link parallel and distributed (P$\&$D)-HYBF scheme. It allows each FD BS to update the beamformers for its users independently in parallel on different computational processors. Its complexity scales only linearly as the network size grows, making it desirable for the next generation of large and dense mmWave FD networks. Simulation results show that both designs significantly outperform the fully digital half-duplex (HD) system with only a few radio-frequency (RF) chains, achieve similar performance, and the P$\&$D-HYBF requires considerably less execution time.
We consider the problem of sparse normal means estimation in a distributed setting with communication constraints. We assume there are $M$ machines, each holding $d$-dimensional observations of a $K$-sparse vector $\mu$ corrupted by additive Gaussian noise. The $M$ machines are connected in a star topology to a fusion center, whose goal is to estimate the vector $\mu$ with a low communication budget. Previous works have shown that to achieve the centralized minimax rate for the $\ell_2$ risk, the total communication must be high - at least linear in the dimension $d$. This phenomenon occurs, however, at very weak signals. We show that at signal-to-noise ratios (SNRs) that are sufficiently high - but not enough for recovery by any individual machine - the support of $\mu$ can be correctly recovered with significantly less communication. Specifically, we present two algorithms for distributed estimation of a sparse mean vector corrupted by either Gaussian or sub-Gaussian noise. We then prove that above certain SNR thresholds, with high probability, these algorithms recover the correct support with total communication that is sublinear in the dimension $d$. Furthermore, the communication decreases exponentially as a function of signal strength. If in addition $KM\ll \tfrac{d}{\log d}$, then with an additional round of sublinear communication, our algorithms achieve the centralized rate for the $\ell_2$ risk. Finally, we present simulations that illustrate the performance of our algorithms in different parameter regimes.
To overcome the high path-loss and the intense shadowing in millimeter-wave (mmWave) communications, effective beamforming schemes are required which incorporate narrow beams with high beamforming gains. The mmWave channel consists of a few spatial clusters each associated with an angle of departure (AoD). The narrow beams must be aligned with the channel AoDs to increase the beamforming gain. This is achieved through a procedure called beam alignment (BA). Most of the BA schemes in the literature consider channels with a single dominant path while in practice the channel has a few resolvable paths with different AoDs, hence, such BA schemes may not work correctly in the presence of multi-path or at the least do not exploit such multipath to achieve diversity or increase robustness. In this paper, we propose an efficient BA scheme in presence of multi-path. The proposed BA scheme transmits probing packets using a set of scanning beams and receives feedback for all the scanning beams at the end of the probing phase from each user. We formulate the BA scheme as minimizing the expected value of the average transmission beamwidth under different policies. The policy is defined as a function from the set of received feedback to the set of transmission beams (TB). In order to maximize the number of possible feedback sequences, we prove that the set of scanning beams (SB) has a special form, namely, Tulip Design. Consequently, we rewrite the minimization problem with a set of linear constraints and a reduced number of variables which is solved by using an efficient greedy algorithm.
Non-orthogonal multiple access (NOMA) assisted semi-grant-free (SGF) transmission has recently received significant research attention due to its outstanding ability of serving grant-free (GF) users with grant-based (GB) users' spectrum, which greatly improves the spectrum efficiency and effectively relieves the massive access problem of 5G and beyond networks. In this paper, we first study the outage performance of the greedy best user scheduling SGF scheme (BU-SGF) by considering the impacts of Rayleigh fading, path loss, and random user locations. In order to tackle the admission fairness problem of the BU-SGF scheme, we propose a fair SGF scheme by applying cumulative distribution function (CDF)-based scheduling (CS-SGF), in which the GF user with the best channel relative to its own statistics will be admitted. Moreover, by employing the theories of order statistics and stochastic geometry, the outage performances of both BU-SGF and CS-SGF schemes are analyzed. Theoretical results show that both schemes can achieve full diversity orders only when the served users' data rate is capped, which severely limits the rate performance of SGF schemes. To further address this issue, we propose a distributed power control strategy to relax such data rate constraint, and derive analytical expressions of the two schemes' outage performances under this strategy. Finally, simulation results validate the fairness performance of the proposed CS-SGF scheme, the effectiveness of the power control strategy, and the accuracy of the theoretical analyses.
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices, via iterative local updates (at devices) and global aggregations (at the server). In this paper, we develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions: (i) Network, allowing decentralized cooperation among the devices via device-to-device (D2D) communications. (ii) Heterogeneity, interpreted at three levels: (ii-a) Learning: PSL considers heterogeneous number of stochastic gradient descent iterations with different mini-batch sizes at the devices; (ii-b) Data: PSL presumes a dynamic environment with data arrival and departure, where the distributions of local datasets evolve over time, captured via a new metric for model/concept drift. (ii-c) Device: PSL considers devices with different computation and communication capabilities. (iii) Proximity, where devices have different distances to each other and the access point. PSL considers the realistic scenario where global aggregations are conducted with idle times in-between them for resource efficiency improvements, and incorporates data dispersion and model dispersion with local model condensation into FedL. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning. We then propose network-aware dynamic model tracking to optimize the model learning vs. resource efficiency tradeoff, which we show is an NP-hard signomial programming problem. We finally solve this problem through proposing a general optimization solver. Our numerical results reveal new findings on the interdependencies between the idle times in-between the global aggregations, model/concept drift, and D2D cooperation configuration.
In this paper, we propose a framework where over-the-air computation (OAC) occurs in both uplink (UL) and downlink (DL), sequentially, in a multi-cell environment to address the latency and the scalability issues of federated edge learning (FEEL). To eliminate the channel state information (CSI) at the edge devices (EDs) and edge servers (ESs) and relax the time-synchronization requirement for the OAC, we use a non-coherent computation scheme, i.e., frequency-shift keying (FSK)-based majority vote (MV) (FSK-MV). With the proposed framework, multiple ESs function as the aggregation nodes in the UL and each ES determines the MVs independently. After the ESs broadcast the detected MVs, the EDs determine the sign of the gradient through another OAC in the DL. Hence, inter-cell interference is exploited for the OAC. In this study, we prove the convergence of the non-convex optimization problem for the FEEL with the proposed OAC framework. We also numerically evaluate the efficacy of the proposed method by comparing the test accuracy in both multi-cell and single-cell scenarios for both homogeneous and heterogeneous data distributions.
Massive multiple-input multiple-output (MIMO) is believed to deliver unrepresented spectral efficiency gains for 5G and beyond. However, a practical challenge arises during its commercial deployment, which is known as the "curse of mobility". The performance of massive MIMO drops alarmingly when the velocity level of user increases. In this paper, we tackle the problem in frequency division duplex (FDD) massive MIMO with a novel Channel State Information (CSI) acquisition framework. A joint angle-delay-Doppler (JADD) wideband beamformer is proposed for channel training. Our idea consists in the exploitation of the partial channel reciprocity of FDD and the angle-delay-Doppler channel structure. More precisely, the base station (BS) estimates the angle-delay-Doppler information of the UL channel based on UL pilots using Matrix Pencil method. It then computes the wideband JADD beamformers according to the extracted parameters. Afterwards, the user estimates and feeds back some scalar coefficients for the BS to reconstruct the predicted DL channel. Asymptotic analysis shows that the CSI prediction error converges to zero when the number of BS antennas and the bandwidth increases. Numerical results with industrial channel model demonstrate that our framework can well adapt to high speed (350 km/h), large CSI delay (10 ms) and channel sample noise.
The paper presents IEC flickermeter measurement results for voltage fluctuations modelled by amplitude modulation of distorted supply voltage. The supply voltage distortion caused by electronic and power electronic devices in the "clipped cosine" form is assumed. This type of supply voltage distortion is a common disturbance in low voltage networks. Several arbitrary distorted waveforms of the modulating signal with different modulation depth and modulating frequency up to approx. 1 kHz are selected to determine the dependence of severity of voltage fluctuation on their shape. The paper mainly presents the dependence of voltage fluctuation severity with a frequency greater than 3fc, where fc is the power frequency. The voltage fluctuation severity and the dependencies associated with it have been determined on the basis of numerical simulation studies and experimental laboratory tests.
In an aerial hybrid massive multiple-input multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM) system, how to design a spectral-efficient broadband multi-user hybrid beamforming with a limited pilot and feedback overhead is challenging. To this end, by modeling the key transmission modules as an end-to-end (E2E) neural network, this paper proposes a data-driven deep learning (DL)-based unified hybrid beamforming framework for both the time division duplex (TDD) and frequency division duplex (FDD) systems with implicit channel state information (CSI). For TDD systems, the proposed DL-based approach jointly models the uplink pilot combining and downlink hybrid beamforming modules as an E2E neural network. While for FDD systems, we jointly model the downlink pilot transmission, uplink CSI feedback, and downlink hybrid beamforming modules as an E2E neural network. Different from conventional approaches separately processing different modules, the proposed solution simultaneously optimizes all modules with the sum rate as the optimization object. Therefore, by perceiving the inherent property of air-to-ground massive MIMO-OFDM channel samples, the DL-based E2E neural network can establish the mapping function from the channel to the beamformer, so that the explicit channel reconstruction can be avoided with reduced pilot and feedback overhead. Besides, practical low-resolution phase shifters (PSs) introduce the quantization constraint, leading to the intractable gradient backpropagation when training the neural network. To mitigate the performance loss caused by the phase quantization error, we adopt the transfer learning strategy to further fine-tune the E2E neural network based on a pre-trained network that assumes the ideal infinite-resolution PSs. Numerical results show that our DL-based schemes have considerable advantages over state-of-the-art schemes.
Neural networks of ads systems usually take input from multiple resources, e.g., query-ad relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot binary features, with typically only a tiny fraction of nonzero feature values per example. Deep learning models in online advertising industries can have terabyte-scale parameters that do not fit in the GPU memory nor the CPU main memory on a computing node. For example, a sponsored online advertising system can contain more than $10^{11}$ sparse features, making the neural network a massive model with around 10 TB parameters. In this paper, we introduce a distributed GPU hierarchical parameter server for massive scale deep learning ads systems. We propose a hierarchical workflow that utilizes GPU High-Bandwidth Memory, CPU main memory and SSD as 3-layer hierarchical storage. All the neural network training computations are contained in GPUs. Extensive experiments on real-world data confirm the effectiveness and the scalability of the proposed system. A 4-node hierarchical GPU parameter server can train a model more than 2X faster than a 150-node in-memory distributed parameter server in an MPI cluster. In addition, the price-performance ratio of our proposed system is 4-9 times better than an MPI-cluster solution.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.