Reliable distributed systems require replication and consensus among distributed processes to tolerate process and communication failures. Understanding and assuring the correctness of protocols for replication and consensus have been a significant challenge. This paper describes the precise specification and runtime checking of Derecho, a more recent, sophisticated protocol for fast replication and consensus for cloud services. A precise specification must fill in missing details and resolve ambiguities in English and pseudocode algorithm descriptions while also faithfully following the descriptions. To help check the correctness of the protocol, we also performed careful manual analysis and increasingly systematic runtime checking. We obtain a complete specification that is directly executable, and we discover and fix a number of issues in the pseudocode. These results were facilitated by the already detailed pseudocode of Derecho and made possible by using DistAlgo, a language that allows distributed algorithms to be easily and clearly expressed and directly executed.
Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths, achieve better task performance for the same resource constraint compared to networks with homogeneous bitwidths. However, finding the optimal bitwidth allocation is a challenging problem as the search space grows exponentially with the number of layers in the network. In this paper, we propose QBitOpt, a novel algorithm for updating bitwidths during quantization-aware training (QAT). We formulate the bitwidth allocation problem as a constraint optimization problem. By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints. This contrasts with existing mixed-precision methods that learn bitwidths using gradients and cannot provide such guarantees. We evaluate QBitOpt on ImageNet and confirm that we outperform existing fixed and mixed-precision methods under average bitwidth constraints commonly found in the literature.
Cloud virtual reality (VR) has emerged as a promising technology, offering users a highly immersive and easily accessible experience. However, the current 5G radio access network faces challenges in accommodating the bursty traffic generated by multiple cloudVR flows simultaneously, leading to congestion at the 5G base station and increased delays. In this research, we present a comprehensive quantitative analysis that highlights the underlying causes for the poor delay performance of cloudVR flows within the existing 5G protocol stack and network. To address these issues, we propose a novel cross-layer informationassisted congestion control mechanism deployed in the 5G edge network. Experiment results show that our mechanism enhances the number of concurrent flows meeting delay standards by 1.5x to 2.5x, while maintaining a smooth network load. These findings underscore the potential of leveraging 5G edge nodes as a valuable resource to effectively meet the anticipated demands of future services.
Resilience testing, which measures the ability to minimize service degradation caused by unexpected failures, is crucial for microservice systems. The current practice for resilience testing relies on manually defining rules for different microservice systems. Due to the diverse business logic of microservices, there are no one-size-fits-all microservice resilience testing rules. As the quantity and dynamic of microservices and failures largely increase, manual configuration exhibits its scalability and adaptivity issues. To overcome the two issues, we empirically compare the impacts of common failures in the resilient and unresilient deployments of a benchmark microservice system. Our study demonstrates that the resilient deployment can block the propagation of degradation from system performance metrics (e.g., memory usage) to business metrics (e.g., response latency). In this paper, we propose AVERT, the first AdaptiVE Resilience Testing framework for microservice systems. AVERT first injects failures into microservices and collects available monitoring metrics. Then AVERT ranks all the monitoring metrics according to their contributions to the overall service degradation caused by the injected failures. Lastly, AVERT produces a resilience index by how much the degradation in system performance metrics propagates to the degradation in business metrics. The higher the degradation propagation, the lower the resilience of the microservice system. We evaluate AVERT on two open-source benchmark microservice systems. The experimental results show that AVERT can accurately and efficiently test the resilience of microservice systems.
A location-aware multi-antenna coded caching scheme is proposed for applications with location-dependent data requests, such as wireless immersive experience, where users are immersed in a three-dimensional virtual world. The wireless connectivity conditions vary as the users move within the application area motivating the use of a non-uniform cache memory allocation process to avoid excessive delivery time for users located in wireless bottleneck areas. To this end, a location-aware placement and delivery array (LAPDA) is designed for cache-aided multiantenna data delivery with a fast converging, iterative linear beamforming process. The underlying weighted max-min transmit precoder design enables the proposed scheme to serve users in poor connectivity areas with smaller amounts of data while simultaneously delivering larger amounts to other users. Our new scheme is suitable for large networks due to its linear transceiver structure and it is not constrained by the number of users, cache size, or the number of antennas at the transmitter, unlike the existing schemes. Despite non-uniform cache placement, the proposed scheme still achieves a significant degree of coded caching gain that is additive to the multiplexing gain and greatly outperforms the conventional symmetric CC schemes in terms of both average and 95-percentile delivery time.
With the increasing use of multi-cloud environments, security professionals face challenges in configuration, management, and integration due to uneven security capabilities and features among providers. As a result, a fragmented approach toward security has been observed, leading to new attack vectors and potential vulnerabilities. Other research has focused on single-cloud platforms or specific applications of multi-cloud environments. Therefore, there is a need for a holistic security and vulnerability assessment and defense strategy that applies to multi-cloud platforms. We perform a risk and vulnerability analysis to identify attack vectors from software, hardware, and the network, as well as interoperability security issues in multi-cloud environments. Applying the STRIDE and DREAD threat modeling methods, we present an analysis of the ecosystem across six attack vectors: cloud architecture, APIs, authentication, automation, management differences, and cybersecurity legislation. We quantitatively determine and rank the threats in multi-cloud environments and suggest mitigation strategies.
Conformal inference is a popular tool for constructing prediction intervals (PI). We consider here the scenario of post-selection/selective conformal inference, that is PIs are reported only for individuals selected from an unlabeled test data. To account for multiplicity, we develop a general split conformal framework to construct selective PIs with the false coverage-statement rate (FCR) control. We first investigate the Benjamini and Yekutieli (2005)'s FCR-adjusted method in the present setting, and show that it is able to achieve FCR control but yields uniformly inflated PIs. We then propose a novel solution to the problem, named as Selective COnditional conformal Predictions (SCOP), which entails performing selection procedures on both calibration set and test set and construct marginal conformal PIs on the selected sets by the aid of conditional empirical distribution obtained by the calibration set. Under a unified framework and exchangeable assumptions, we show that the SCOP can exactly control the FCR. More importantly, we provide non-asymptotic miscoverage bounds for a general class of selection procedures beyond exchangeablity and discuss the conditions under which the SCOP is able to control the FCR. As special cases, the SCOP with quantile-based selection or conformal p-values-based multiple testing procedures enjoys valid coverage guarantee under mild conditions. Numerical results confirm the effectiveness and robustness of SCOP in FCR control and show that it achieves more narrowed PIs over existing methods in many settings.
Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.
The envisioned robotic aerial base station (RABS) concept is expected to bring further flexibility to integrated sensing and communication (ISAC) systems. In this letter, characterizing the spatial traffic distribution on a grid-based model, the RABS-assisted ISAC system is formulated as a robust optimization problem to maximize the minimum satisfaction rate (SR) under a cardinality constrained uncertainty set. The problem is reformulated as a mixed-integer linear programming (MILP) and solved approximately by the iterative linear programming rounding algorithm. Numerical investigations show that the minimum SR can be improved by 28.61% on average compared to fixed small cells.
Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc. The high-performed DNNs heavily rely on intensive resource consumption. For example, training a DNN requires high dynamic memory, a large-scale dataset, and a large number of computations (a long training time); even inference with a DNN also demands a large amount of static storage, computations (a long inference time), and energy. Therefore, state-of-the-art DNNs are often deployed on a cloud server with a large number of super-computers, a high-bandwidth communication bus, a shared storage infrastructure, and a high power supplement. Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices. Compare to a cloud server, edge devices often have a rather small amount of resources. To deploy DNNs on edge devices, we need to reduce the size of DNNs, i.e., we target a better trade-off between resource consumption and model accuracy. In this dissertation, we studied four edge intelligence scenarios, i.e., Inference on Edge Devices, Adaptation on Edge Devices, Learning on Edge Devices, and Edge-Server Systems, and developed different methodologies to enable deep learning in each scenario. Since current DNNs are often over-parameterized, our goal is to find and reduce the redundancy of the DNNs in each scenario.
Games and simulators can be a valuable platform to execute complex multi-agent, multiplayer, imperfect information scenarios with significant parallels to military applications: multiple participants manage resources and make decisions that command assets to secure specific areas of a map or neutralize opposing forces. These characteristics have attracted the artificial intelligence (AI) community by supporting development of algorithms with complex benchmarks and the capability to rapidly iterate over new ideas. The success of artificial intelligence algorithms in real-time strategy games such as StarCraft II have also attracted the attention of the military research community aiming to explore similar techniques in military counterpart scenarios. Aiming to bridge the connection between games and military applications, this work discusses past and current efforts on how games and simulators, together with the artificial intelligence algorithms, have been adapted to simulate certain aspects of military missions and how they might impact the future battlefield. This paper also investigates how advances in virtual reality and visual augmentation systems open new possibilities in human interfaces with gaming platforms and their military parallels.