5G brings many improvements to cellular networks in terms of performance, such as lower latency, improved network efficiency, and higher throughput, making it an attractive candidate for many applications. One such domain is industrial applications that may require real-time guarantees to transmit time-critical control messages. Assuming the immense number of devices exchanging data in support of Massive Machine-Type Communications (mMTC) applications, the capability of the cellular infrastructure to handle a large number of real-time transmissions may be inadequate. For such cases, there exists an acute desire to reduce any overheads as much as possible in order to guarantee certain deadlines. One such target is the Domain Name System (DNS) service, for which queries precede almost every new network request. This incorporates additional communication delays based on the response time, which in turn is affected by the proximity of the DNS server. While bringing DNS service to the edge has been touted as a logical solution, its integration with 5G systems is still challenging. This is due to the inability to access the DNS query information at the application layer since the User Equipment (UE) traffic is tunneled through to the core network. To this end, we propose a novel approach that can identify DNS queries at the base stations through Software-Defined Networking (SDN) capabilities. Specifically, we develop an SDN controller which is used to identify and extract DNS queries at the base station and handle the query at the edge without going through the 5G core network. This approach was implemented in a virtualized 5G network, in which we demonstrate that it is feasible and can potentially bring significant performance gains, especially in the case of mMTC applications.
Large scientific collaborations often have multiple scientists accessing the same set of files while doing different analyses, which create repeated accesses to the large amounts of shared data located far away. These data accesses have long latency due to distance and occupy the limited bandwidth available over the wide-area network. To reduce the wide-area network traffic and the data access latency, regional data storage caches have been installed as a new networking service. To study the effectiveness of such a cache system in scientific applications, we examine the Southern California Petabyte Scale Cache for a high-energy physics experiment. By examining about 3TB of operational logs, we show that this cache removed 67.6% of file requests from the wide-area network and reduced the traffic volume on wide-area network by 12.3TB (or 35.4%) an average day. The reduction in the traffic volume (35.4%) is less than the reduction in file counts (67.6%) because the larger files are less likely to be reused. Due to this difference in data access patterns, the cache system has implemented a policy to avoid evicting smaller files when processing larger files. We also build a machine learning model to study the predictability of the cache behavior. Tests show that this model is able to accurately predict the cache accesses, cache misses, and network throughput, making the model useful for future studies on resource provisioning and planning.
Cloud computing has become a critical infrastructure for modern society, like electric power grids and roads. As the backbone of the modern economy, it offers subscription-based computing services anytime, anywhere, on a pay-as-you-go basis. Its use is growing exponentially with the continued development of new classes of applications driven by a huge number of emerging networked devices. However, the success of Cloud computing has created a new global energy challenge, as it comes at the cost of vast energy usage. Currently, data centres hosting Cloud services world-wide consume more energy than most countries. Globally, by 2025, they are projected to consume 20% of global electricity and emit up to 5.5% of the world's carbon emissions. In addition, a significant part of the energy consumed is transformed into heat which leads to operational problems, including a reduction in system reliability and the life expectancy of devices, and escalation in cooling requirements. Therefore, for future generations of Cloud computing to address the environmental and operational consequences of such significant energy usage, they must become energy-efficient and environmentally sustainable while continuing to deliver high-quality services. In this paper, we propose a vision for learning-centric approach for the integrated management of new generation Cloud computing environments to reduce their energy consumption and carbon footprint while delivering service quality guarantees. In this paper, we identify the dimensions and key issues of integrated resource management and our envisioned approaches to address them. We present a conceptual architecture for energy-efficient new generation Clouds and early results on the integrated management of resources and workloads that evidence its potential benefits towards energy efficiency and sustainability.
With the increasing popularity of AArch64 processors in general-purpose computing, securing software running on AArch64 systems against control-flow hijacking attacks has become a critical part toward secure computation. Shadow stacks keep shadow copies of function return addresses and, when protected from illegal modifications and coupled with forward-edge control-flow integrity, form an effective and proven defense against such attacks. However, AArch64 lacks native support for write-protected shadow stacks, while software alternatives either incur prohibitive performance overhead or provide weak security guarantees. We present InversOS, the first hardware-assisted write-protected shadow stacks for AArch64 user-space applications, utilizing commonly available features of AArch64 to achieve efficient intra-address space isolation (called Privilege Inversion) required to protect shadow stacks. Privilege Inversion adopts unconventional design choices that run protected applications in the kernel mode and mark operating system (OS) kernel memory as user-accessible; InversOS therefore uses a novel combination of OS kernel modifications, compiler transformations, and another AArch64 feature to ensure the safety of doing so and to support legacy applications. We show that InversOS is secure by design, effective against various control-flow hijacking attacks, and performant on selected benchmarks and applications (incurring overhead of 7.0% on LMBench, 7.1% on SPEC CPU 2017, and 3.0% on Nginx web server).
Utilizing spherical harmonic (SH) domain has been established as the default method of obtaining continuity over space in head-related transfer functions (HRTFs). This paper concerns different variants of extending this solution by replacing SHs with four-dimensional (4D) continuous functional models in which frequency is imagined as another physical dimension. Recently developed hyperspherical harmonic (HSH) representation is compared with models defined in spherindrical coordinate system by merging SHs with one-dimensional basis functions. The efficiency of both approaches is evaluated based on the reproduction errors for individual HRTFs from HUTUBS database, including detailed analysis of its dependency on chosen orders of approximation in frequency and space. Employing continuous functional models defined in 4D coordinate systems allows HRTF magnitude spectra to be expressed as a small set of coefficients which can be decoded back into values at any direction and frequency. The best performance was noted for HSHs and SHs merged with reverse Fourier-Bessel series, with the former featuring better compression abilities, achieving slightly higher accuracy for low number of coefficients. The presented models can serve multiple purposes, such as interpolation, compression or parametrization for machine learning applications, and can be applied not only to HRTFs but also to other types of directivity functions, e.g. sound source directivity.
In many industrial applications, obtaining labeled observations is not straightforward as it often requires the intervention of human experts or the use of expensive testing equipment. In these circumstances, active learning can be highly beneficial in suggesting the most informative data points to be used when fitting a model. Reducing the number of observations needed for model development alleviates both the computational burden required for training and the operational expenses related to labeling. Online active learning, in particular, is useful in high-volume production processes where the decision about the acquisition of the label for a data point needs to be taken within an extremely short time frame. However, despite the recent efforts to develop online active learning strategies, the behavior of these methods in the presence of outliers has not been thoroughly examined. In this work, we investigate the performance of online active linear regression in contaminated data streams. Our study shows that the currently available query strategies are prone to sample outliers, whose inclusion in the training set eventually degrades the predictive performance of the models. To address this issue, we propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator. Our approach strikes a balance between exploring unseen regions of the input space and protecting against outliers. Through numerical simulations, we show that the proposed method is effective in improving the performance of online active learning in the presence of outliers, thus expanding the potential applications of this powerful tool.
Scientific workflows are designed as directed acyclic graphs (DAGs) and consist of multiple dependent task definitions. They are executed over a large amount of data, often resulting in thousands of tasks with heterogeneous compute requirements and long runtimes, even on cluster infrastructures. In order to optimize the workflow performance, enough resources, e.g., CPU and memory, need to be provisioned for the respective tasks. Typically, workflow systems rely on user resource estimates which are known to be highly error-prone and can result in over- or underprovisioning. While resource overprovisioning leads to high resource wastage, underprovisioning can result in long runtimes or even failed tasks. In this paper, we propose two different reinforcement learning approaches based on gradient bandits and Q-learning, respectively, in order to minimize resource wastage by selecting suitable CPU and memory allocations. We provide a prototypical implementation in the well-known scientific workflow management system Nextflow, evaluate our approaches with five workflows, and compare them against the default resource configurations and a state-of-the-art feedback loop baseline. The evaluation yields that our reinforcement learning approaches significantly reduce resource wastage compared to the default configuration. Further, our approaches also reduce the allocated CPU hours compared to the state-of-the-art feedback loop by 6.79% and 24.53%.
As an efficient distributed machine learning approach, Federated learning (FL) can obtain a shared model by iterative local model training at the user side and global model aggregating at the central server side, thereby protecting privacy of users. Mobile users in FL systems typically communicate with base stations (BSs) via wireless channels, where training performance could be degraded due to unreliable access caused by user mobility. However, existing work only investigates a static scenario or random initialization of user locations, which fail to capture mobility in real-world networks. To tackle this issue, we propose a practical model for user mobility in FL across multiple BSs, and develop a user scheduling and resource allocation method to minimize the training delay with constrained communication resources. Specifically, we first formulate an optimization problem with user mobility that jointly considers user selection, BS assignment to users, and bandwidth allocation to minimize the latency in each communication round. This optimization problem turned out to be NP-hard and we proposed a delay-aware greedy search algorithm (DAGSA) to solve it. Simulation results show that the proposed algorithm achieves better performance than the state-of-the-art baselines and a certain level of user mobility could improve training performance.
Edge computing facilitates low-latency services at the network's edge by distributing computation, communication, and storage resources within the geographic proximity of mobile and Internet-of-Things (IoT) devices. The recent advancement in Unmanned Aerial Vehicles (UAVs) technologies has opened new opportunities for edge computing in military operations, disaster response, or remote areas where traditional terrestrial networks are limited or unavailable. In such environments, UAVs can be deployed as aerial edge servers or relays to facilitate edge computing services. This form of computing is also known as UAV-enabled Edge Computing (UEC), which offers several unique benefits such as mobility, line-of-sight, flexibility, computational capability, and cost-efficiency. However, the resources on UAVs, edge servers, and IoT devices are typically very limited in the context of UEC. Efficient resource management is, therefore, a critical research challenge in UEC. In this article, we present a survey on the existing research in UEC from the resource management perspective. We identify a conceptual architecture, different types of collaborations, wireless communication models, research directions, key techniques and performance indicators for resource management in UEC. We also present a taxonomy of resource management in UEC. Finally, we identify and discuss some open research challenges that can stimulate future research directions for resource management in UEC.
Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc. The high-performed DNNs heavily rely on intensive resource consumption. For example, training a DNN requires high dynamic memory, a large-scale dataset, and a large number of computations (a long training time); even inference with a DNN also demands a large amount of static storage, computations (a long inference time), and energy. Therefore, state-of-the-art DNNs are often deployed on a cloud server with a large number of super-computers, a high-bandwidth communication bus, a shared storage infrastructure, and a high power supplement. Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices. Compare to a cloud server, edge devices often have a rather small amount of resources. To deploy DNNs on edge devices, we need to reduce the size of DNNs, i.e., we target a better trade-off between resource consumption and model accuracy. In this dissertation, we studied four edge intelligence scenarios, i.e., Inference on Edge Devices, Adaptation on Edge Devices, Learning on Edge Devices, and Edge-Server Systems, and developed different methodologies to enable deep learning in each scenario. Since current DNNs are often over-parameterized, our goal is to find and reduce the redundancy of the DNNs in each scenario.
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, "reinforced self-attention (ReSA)", for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft self-attention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called "reinforced sequence sampling (RSS)", selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNN-free sentence-encoding model, "reinforced self-attention network (ReSAN)", solely based on ReSA. It achieves state-of-the-art performance on both Stanford Natural Language Inference (SNLI) and Sentences Involving Compositional Knowledge (SICK) datasets.