Cell-free massive MIMO systems have promising data rate and uniform coverage gains. These systems, however, typically rely on optical fiber based fronthaul for the communication between the central processing unit (CPU) and the distributed access points (APs), which increases the infrastructure cost, leads to high installation time, and limits the deployment flexibility and adaptability. To address these challenges, this paper proposes two architectures for cell-free massive MIMO systems based on wireless fronthaul that is operating at a \textit{higher-band} compared to the access links: (i) A wireless-only fronthaul architecture where the CPU has a wireless fronthaul link to each AP, and (ii) a mixed-fronthaul architecture where the CPU has a wireless link to each cluster of APs that are connected together via optical fibers. These dual-band architectures ensure high-data rate fronthaul and provide high capability to synchronize the distributed APs. Further, the wireless fronthaul reduces the infrastructure cost and installation time, and enhances the flexibility, adaptability, and scalability of the network deployment. To investigate the achievable data rates with the proposed architectures, we formulate the end-to-end data rate optimization problem accounting for the various practical aspects of the fronthaul and access links. Then, we develop a low-complexity yet efficient joint beamforming and resource allocation solution for the proposed architectures based on user-centric AP grouping. With this solution, we show that the proposed architectures can achieve comparable data rates to those obtained with optical fiber-based fronthaul under realistic assumptions on the fronthaul bandwidth, hardware constraints, and deployment scenarios. This highlights a promising path for realizing the cell-free massive MIMO gains in practice while reducing the infrastructure and deployment overhead.
In order to achieve the full potential of the Internet-of-Things, connectivity between devices should be ubiquitous and efficient. Wireless mesh networks are a critical component to achieve this ubiquitous connectivity for a wide range of services, and are composed of terminal devices (i.e., nodes), such as sensors of various types, and wall powered gateway devices, which provide further internet connectivity (e..g, via WiFi). When considering large indoor areas, such as hospitals or industrial scenarios, the mesh must cover a large area, which introduces concerns regarding range and the number of gateways needed and respective wall cabling infrastructure. Solutions for mesh networks implemented over different wireless protocols exist, like the recent Bluetooth Low Energy (BLE) 5.1. Besides range concerns, choosing which nodes forward data through the mesh has a large impact on performance and power consumption. We address the area coverage issue via a battery powered BLE relay device of our own design, which acts as a range extender by forwarding packets from end nodes to gateways. We present the relay's design and experimentally determine the packet forwarding efficiency for several scenarios and configurations. In the best case, up to 35% of the packets transmitted by 11 nodes can be forwarded to a gateway by a single relay under continuous operation. A battery lifetime of 1 year can be achieved with a relay duty cycle of 20%.
This paper addresses join wireless and computing resource allocation in mobile edge computing (MEC) systems with several access points and with the possibility that users connect to many access points, and utilize the computation capability of many servers at the same time. The problem of sum transmission energy minimization under response time constraints is considered. It is proved, that the optimization problem is non-convex. The complexity of optimization of a part of the system parameters is investigated, and based on these results an Iterative Resource Allocation procedure is proposed, that converges to a local optimum. The performance of the joint resource allocation is evaluated by comparing it to lower and upper bounds defined by less or more flexible multi-cell MEC architectures. The results show that the free selection of the access point is crucial for good system performance.
We consider the mobile localization problem in future millimeter-wave wireless networks with distributed Base Stations (BSs) based on multi-antenna channel state information (CSI). For this problem, we propose a Semi-supervised tdistributed Stochastic Neighbor Embedding (St-SNE) algorithm to directly embed the high-dimensional CSI samples into the 2D geographical map. We evaluate the performance of St-SNE in a simulated urban outdoor millimeter-wave radio access network. Our results show that St-SNE achieves a mean localization error of 6.8 m with only 5% of labeled CSI samples in a 200*200 m^2 area with a ray-tracing channel model. St-SNE does not require accurate synchronization among multiple BSs, and is promising for future large-scale millimeter-wave localization.
Recent years have witnessed a growing list of systems for distributed data-parallel training. Existing systems largely fit into two paradigms, i.e., parameter server and MPI-style collective operations. On the algorithmic side, researchers have proposed a wide range of techniques to lower the communication via system relaxations: quantization, decentralization, and communication delay. However, most, if not all, existing systems only rely on standard synchronous and asynchronous stochastic gradient (SG) based optimization, therefore, cannot take advantage of all possible optimizations that the machine learning community has been developing recently. Given this emerging gap between the current landscapes of systems and theory, we build BAGUA, a MPI-style communication library, providing a collection of primitives, that is both flexible and modular to support state-of-the-art system relaxation techniques of distributed training. Powered by this design, BAGUA has a great ability to implement and extend various state-of-the-art distributed learning algorithms. In a production cluster with up to 16 machines (128 GPUs), BAGUA can outperform PyTorch-DDP, Horovod and BytePS in the end-to-end training time by a significant margin (up to 2 times) across a diverse range of tasks. Moreover, we conduct a rigorous tradeoff exploration showing that different algorithms and system relaxations achieve the best performance over different network conditions.
Based on the analysis of the proportion of utility in the supporting transactions used in the field of data mining, high utility-occupancy pattern mining (HUOPM) has recently attracted widespread attention. Unlike high-utility pattern mining (HUPM), which involves the enumeration of high-utility (e.g., profitable) patterns, HUOPM aims to find patterns representing a collection of existing transactions. In practical applications, however, not all patterns are used or valuable. For example, a pattern might contain too many items, that is, the pattern might be too specific and therefore lack value for users in real life. To achieve qualified patterns with a flexible length, we constrain the minimum and maximum lengths during the mining process and introduce a novel algorithm for the mining of flexible high utility-occupancy patterns. Our algorithm is referred to as HUOPM+. To ensure the flexibility of the patterns and tighten the upper bound of the utility-occupancy, a strategy called the length upper-bound (LUB) is presented to prune the search space. In addition, a utility-occupancy nested list (UO-nlist) and a frequency-utility-occupancy table (FUO-table) are employed to avoid multiple scans of the database. Evaluation results of the subsequent experiments confirm that the proposed algorithm can effectively control the length of the derived patterns, for both real-world and synthetic datasets. Moreover, it can decrease the execution time and memory consumption.
Intelligent reflecting surfaces (IRSs) are emerging as promising enablers for the next generation of wireless communication systems, because of their ability to customize favorable radio propagation environments. However, with the conventional passive architecture, IRSs can only adjust the phase of the incident signals limiting the achievable beamforming gain. To fully unleash the potential of IRSs, in this paper, we consider a more general IRS architecture, i.e., active IRSs, which can adapt the phase and amplify the magnitude of the reflected incident signal simultaneously with the support of an additional power source. To realize green communication in active IRS-assisted multiuser systems, we jointly optimize the reflection matrix at the IRS and the beamforming vector at the base station (BS) for the minimization of the BS transmit power. The resource allocation algorithm design is formulated as an optimization problem taking into account the maximum power budget of the active IRS and the quality-of-service (QoS) requirements of the users. To handle the non-convex design problem, we develop a novel and computationally efficient algorithm based on the bilinear transformation and inner approximation methods. The proposed algorithm is guaranteed to converge to a locally optimal solution of the considered problem. Simulation results illustrate the effectiveness of the proposed scheme compared to the two baseline schemes. Moreover, the results unveil that deploying active IRSs is a promising approach to enhance the system performance compared to conventional passive IRSs, especially when strong direct links exist.
It has been a long time that computer architecture and systems are optimized to enable efficient execution of machine learning (ML) algorithms or models. Now, it is time to reconsider the relationship between ML and systems, and let ML transform the way that computer architecture and systems are designed. This embraces a twofold meaning: the improvement of designers' productivity, and the completion of the virtuous cycle. In this paper, we present a comprehensive review of work that applies ML for system design, which can be grouped into two major categories, ML-based modelling that involves predictions of performance metrics or some other criteria of interest, and ML-based design methodology that directly leverages ML as the design tool. For ML-based modelling, we discuss existing studies based on their target level of system, ranging from the circuit level to the architecture/system level. For ML-based design methodology, we follow a bottom-up path to review current work, with a scope of (micro-)architecture design (memory, branch prediction, NoC), coordination between architecture/system and workload (resource allocation and management, data center management, and security), compiler, and design automation. We further provide a future vision of opportunities and potential directions, and envision that applying ML for computer architecture and systems would thrive in the community.
Neural networks of ads systems usually take input from multiple resources, e.g., query-ad relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot binary features, with typically only a tiny fraction of nonzero feature values per example. Deep learning models in online advertising industries can have terabyte-scale parameters that do not fit in the GPU memory nor the CPU main memory on a computing node. For example, a sponsored online advertising system can contain more than $10^{11}$ sparse features, making the neural network a massive model with around 10 TB parameters. In this paper, we introduce a distributed GPU hierarchical parameter server for massive scale deep learning ads systems. We propose a hierarchical workflow that utilizes GPU High-Bandwidth Memory, CPU main memory and SSD as 3-layer hierarchical storage. All the neural network training computations are contained in GPUs. Extensive experiments on real-world data confirm the effectiveness and the scalability of the proposed system. A 4-node hierarchical GPU parameter server can train a model more than 2X faster than a 150-node in-memory distributed parameter server in an MPI cluster. In addition, the price-performance ratio of our proposed system is 4-9 times better than an MPI-cluster solution.
Driven by the visions of Internet of Things and 5G communications, the edge computing systems integrate computing, storage and network resources at the edge of the network to provide computing infrastructure, enabling developers to quickly develop and deploy edge applications. Nowadays the edge computing systems have received widespread attention in both industry and academia. To explore new research opportunities and assist users in selecting suitable edge computing systems for specific applications, this survey paper provides a comprehensive overview of the existing edge computing systems and introduces representative projects. A comparison of open source tools is presented according to their applicability. Finally, we highlight energy efficiency and deep learning optimization of edge computing systems. Open issues for analyzing and designing an edge computing system are also studied in this survey.
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in the real-world problem are discussed.