This letter investigates a downlink multiple input single output (MISO) system based on transmissive reconfigurable metasurface (RMS) transmitter. Specifically, a transmitter design based on a transmissive RMS equipped with a feed antenna is first proposed. Then, in order to maximize the achievable sum-rate of the system, the beamforming design and power allocation are jointly optimized. Since the optimization variables are coupled, this formulated optimization problem is non-convex, so it is difficult to solve it directly. To solve this problem, we propose an alternating optimization (AO) technique based on difference-of-convex (DC) programming and successive convex approximation (SCA). Simulation results verify that the proposed algorithm can achieve convergence and improve the achievable sum-rate of the system.
Self-energy recycling (sER), which allows transmit energy re-utilization, has emerged as a viable option for improving the energy efficiency (EE) in low-power Internet of Things networks. In this work, we investigate its benefits also in terms of reliability improvements and compare the performance of full-duplex (FD) and half-duplex (HD) schemes when using multi-antenna techniques in a communication system. We analyze the trade-offs when considering not only the energy spent on transmission but also the circuitry power consumption, thus making the analysis of much more practical interest. In addition to the well known spectral efficiency improvements, results show that FD also outperforms HD in terms of reliability. We show that sER introduces not only benefits in EE matters but also some modifications on how to achieve maximum reliability fairness between uplink and downlink transmissions, which is the main goal in this work. In order to achieve this objective, we propose the use of a dynamic FD scheme where the small base station (SBS) determines the optimal allocation of antennas for transmission and reception. We show the significant improvement gains of this strategy for the system outage probability when compared to the simple HD and FD schemes.
Along with the progress of AI democratization, machine learning (ML) has been successfully applied to edge applications, such as smart phones and automated driving. Nowadays, more applications require ML on tiny devices with extremely limited resources, like implantable cardioverter defibrillator (ICD), which is known as TinyML. Unlike ML on the edge, TinyML with a limited energy supply has higher demands on low-power execution. Stochastic computing (SC) using bitstreams for data representation is promising for TinyML since it can perform the fundamental ML operations using simple logical gates, instead of the complicated binary adder and multiplier. However, SC commonly suffers from low accuracy for ML tasks due to low data precision and inaccuracy of arithmetic units. Increasing the length of the bitstream in the existing works can mitigate the precision issue but incur higher latency. In this work, we propose a novel SC architecture, namely Block-based Stochastic Computing (BSC). BSC divides inputs into blocks, such that the latency can be reduced by exploiting high data parallelism. Moreover, optimized arithmetic units and output revision (OUR) scheme are proposed to improve accuracy. On top of it, a global optimization approach is devised to determine the number of blocks, which can make a better latency-power trade-off. Experimental results show that BSC can outperform the existing designs in achieving over 10% higher accuracy on ML tasks and over 6 times power reduction.
The exponential growth of Internet of Things (IoT) has given rise to a new wave of edge computing due to the need to process data on the edge, closer to where it is being produced and attempting to move away from a cloud-centric architecture. This provides its own opportunity to decrease latency and address data privacy concerns along with the ability to reduce public cloud costs. The serverless computing model provides a potential solution with its event-driven architecture to reduce the need for ever-running servers and convert the backend services to an as-used model. This model is an attractive prospect in edge computing environments with varying workloads and limited resources. Furthermore, its setup on the edge of the network promises reduced latency to the edge devices communicating with it and eliminates the need to manage the underlying infrastructure. In this book chapter, first, we introduce the novel concept of serverless edge computing, then, we analyze the performance of multiple serverless platforms, namely, OpenFaaS, AWS Greengrass, Apache OpenWhisk, when set up on the single-board computers (SBCs) on the edge and compare it with public cloud serverless offerings, namely, AWS Lambda and Azure Functions, to deduce the suitability of serverless architectures on the network edge. These serverless platforms are set up on a cluster of Raspberry Pis and we evaluate their performance by simulating different types of edge workloads. The evaluation results show that OpenFaaS achieves the lowest response time on the SBC edge computing infrastructure while serverless cloud offerings are the most reliable with the highest success rate.
Reconfigurable intelligent surface (RIS) has become a promising technology to improve wireless communication in recent years. It steers the incident signals to create a favorable propagation environment by controlling the reconfigurable passive elements with less hardware cost and lower power consumption. In this paper, we consider a RIS-aided multiuser multiple-input single-output downlink communication system. We aim to maximize the weighted sum-rate of all users by joint optimizing the active beamforming at the access point and the passive beamforming vector of the RIS elements. Unlike most existing works, we consider the more practical situation with the discrete phase shifts and imperfect channel state information (CSI). Specifically, for the situation that the discrete phase shifts and perfect CSI are considered, we first develop a deep quantization neural network (DQNN) to simultaneously design the active and passive beamforming while most reported works design them alternatively. Then, we propose an improved structure (I-DQNN) based on DQNN to simplify the parameters decision process when the control bits of each RIS element are greater than 1 bit. Finally, we extend the two proposed DQNN-based algorithms to the case that the discrete phase shifts and imperfect CSI are considered simultaneously. Our simulation results show that the two DQNN-based algorithms have better performance than traditional algorithms in the perfect CSI case, and are also more robust in the imperfect CSI case.
We studied power splitting-based simultaneous wireless information and power transfer (PS-SWIPT) in multiple access channels (MAC), considering the decoding cost and non-linear energy harvesting (EH) constraints at the receiving nodes to study practical limitations of an EH communication system. Under these restrictions, we formulated and analyzed the achievable rate and maximum departure regions in two well-studied scenarios, i.e., a classical PS-SWIPT MAC and a PS-SWIPT MAC with user cooperation. In the classical PS-SWIPT MAC setting, closed-form expressions for the optimal values of the PS factors are derived for two fundamental decoding schemes: simultaneous decoding and successive interference cancellation. In the PS-SWIPT MAC with user cooperation, the joint optimal power allocation for users as well as the optimal PS factor are derived. This reveals that one decoding scheme outperforms the other in the classical PS-SWIPT MAC, depending on the function type of the decoding cost. Finally, it is shown that the cooperation between users can potentially boost the performance of a PS-SWIPT MAC under decoding cost and non-linear EH constraints. Moreover, effects of the decoding cost functions, non-linear EH model and channel quality between the users are studied, and performance characteristics of the system are discussed.
Over-the-air computation (AirComp) has emerged as a new analog power-domain non-orthogonal multiple access (NOMA) technique for low-latency model/gradient-updates aggregation in federated edge learning (FEEL). By integrating communication and computation into a joint design, AirComp can significantly enhance the communication efficiency, but at the cost of aggregation errors caused by channel fading and noise. This paper studies a particular type of FEEL with federated averaging (FedAvg) and AirComp-based model-update aggregation, namely over-the-air FedAvg (Air-FedAvg). We investigate the transmission power control to combat against the AirComp aggregation errors for enhancing the training accuracy and accelerating the training speed of Air-FedAvg. Towards this end, we first analyze the convergence behavior (in terms of the optimality gap) of Air-FedAvg with aggregation errors at different outer iterations. Then, to enhance the training accuracy, we minimize the optimality gap by jointly optimizing the transmission power control at edge devices and the denoising factors at edge server, subject to a series of power constraints at individual edge devices. Furthermore, to accelerate the training speed, we also minimize the training latency of Air-FedAvg with a given targeted optimality gap, in which learning hyper-parameters including the numbers of outer iterations and local training epochs are jointly optimized with the power control. Finally, numerical results show that the proposed transmission power control policy achieves significantly faster convergence for Air-FedAvg, as compared with benchmark policies with fixed power transmission or per-iteration mean squared error (MSE) minimization. It is also shown that the Air-FedAvg achieves an order-of-magnitude shorter training latency than the conventional FedAvg with digital orthogonal multiple access (OMA-FedAvg).
Federated meta-learning (FML) has emerged as a promising paradigm to cope with the data limitation and heterogeneity challenges in today's edge learning arena. However, its performance is often limited by slow convergence and corresponding low communication efficiency. In addition, since the available radio spectrum and IoT devices' energy capacity are usually insufficient, it is crucial to control the resource allocation and energy consumption when deploying FML in practical wireless networks. To overcome the challenges, in this paper, we rigorously analyze each device's contribution to the global loss reduction in each round and develop an FML algorithm (called NUFM) with a non-uniform device selection scheme to accelerate the convergence. After that, we formulate a resource allocation problem integrating NUFM in multi-access wireless systems to jointly improve the convergence rate and minimize the wall-clock time along with energy cost. By deconstructing the original problem step by step, we devise a joint device selection and resource allocation strategy to solve the problem with theoretical guarantees. Further, we show that the computational complexity of NUFM can be reduced from $O(d^2)$ to $O(d)$ (with the model dimension $d$) via combining two first-order approximation techniques. Extensive simulation results demonstrate the effectiveness and superiority of the proposed methods in comparison with existing baselines.
Flexible Transmitter Network (FTNet) is a recently proposed bio-plausible neural network and has achieved competitive performance with the state-of-the-art models when handling temporal-spatial data. However, there remains an open problem about the theoretical understanding of FTNet. This work investigates the theoretical properties of one-hidden-layer FTNet from the perspectives of approximation and local minima. Under mild assumptions, we show that: i) FTNet is a universal approximator; ii) the approximation complexity of FTNet can be exponentially smaller than those of real-valued neural networks with feedforward/recurrent architectures and is of the same order in the worst case; iii) any local minimum of FTNet is the global minimum, which suggests that it is possible for local search algorithms to converge to the global minimum. Our theoretical results indicate that FTNet can efficiently express target functions and has no concern about local minima, which complements the theoretical blank of FTNet and exhibits the possibility for ameliorating the FTNet.
It has been a long time that computer architecture and systems are optimized to enable efficient execution of machine learning (ML) algorithms or models. Now, it is time to reconsider the relationship between ML and systems, and let ML transform the way that computer architecture and systems are designed. This embraces a twofold meaning: the improvement of designers' productivity, and the completion of the virtuous cycle. In this paper, we present a comprehensive review of work that applies ML for system design, which can be grouped into two major categories, ML-based modelling that involves predictions of performance metrics or some other criteria of interest, and ML-based design methodology that directly leverages ML as the design tool. For ML-based modelling, we discuss existing studies based on their target level of system, ranging from the circuit level to the architecture/system level. For ML-based design methodology, we follow a bottom-up path to review current work, with a scope of (micro-)architecture design (memory, branch prediction, NoC), coordination between architecture/system and workload (resource allocation and management, data center management, and security), compiler, and design automation. We further provide a future vision of opportunities and potential directions, and envision that applying ML for computer architecture and systems would thrive in the community.
Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR-10 and language modeling task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain $2.07\%$ test set error rate for CIFAR-10 image classification task and $55.9$ test set perplexity of PTB language modeling task. The best discovered architectures on both tasks are successfully transferred to other tasks such as CIFAR-100 and WikiText-2.