Cloud resource management is often modeled by two-dimensional bin packing with a set of items that correspond to tasks having fixed CPU and memory requirements. However, applications running in clouds are much more flexible: modern frameworks allow to (horizontally) scale a single application to dozens, even hundreds of instances; and then the load balancer can precisely divide the workload between them. We analyze a model that captures this (semi)-flexibility of cloud resource management. Each cloud application is characterized by its memory footprint and its momentary CPU load. Combining the scheduler and the load balancer, the resource manager decides how many instances of each application will be created and how the CPU load will be balanced between them. In contrast to the divisible load model, each instance of the application requires a certain amount of memory, independent of the number of instances. Thus, the resource manager effectively trades additional memory for more evenly balanced load. We study two objectives: the bin-packing-like minimization of the number of machines used; and the makespan-like minimization of the maximum load among all the machines. We prove NP-hardness of the general problems, but also propose polynomial-time exact algorithms for boundary special cases. Notably, we show that (semi)-flexibility may result in reducing the required number of machines by a tight factor of $2-\varepsilon$. For the general case, we propose heuristics that we validate by simulation on instances derived from the Azure trace.
Unit testing is one of the most established quality-assurance techniques for software development. One major advantage of unit testing is the adjustable trade-off between efficiency (i.e., testing effort) and effectiveness (i.e., fault-detection probability). To this end, various strategies have been proposed to exploit this trade-off. In particular, test-suite reduction (TSR) reduces the number of (presumably redundant) test cases while testing a single program version. Regression-test selection (RTS) selects test cases for testing consecutive program revisions. However, both TSR and RTS may influence -- or even obstruct -- each others' performance when used in combination. For instance, test cases discarded during TSR for a particular program version may become relevant again for RTS. However, finding a combination of both strategies leading to a reasonable trade-off throughout the version history of a program is an open question. The goal of this paper is to gain a better understanding of the interactions between TSR and RTS with respect to efficiency and effectiveness. To this end, we present a configurable framework called RegreTS for automated unit-testing of C programs. The framework comprises different strategies for TSR and RTS and possible combinations thereof. We apply this framework to a collection of subject systems, delivering several crucial insights. First, TSR has almost always a negative impact on the effectiveness of RTS, yet a positive impact on efficiency. Second, test cases revealing to testers the effect of program modifications between consecutive program versions are far more effective than test cases simply covering modified code parts, yet causing much more testing effort.
Unsupervised time series anomaly detection is instrumental in monitoring and alarming potential faults of target systems in various domains. Current state-of-the-art time series anomaly detectors mainly focus on devising advanced neural network structures and new reconstruction/prediction learning objectives to learn data normality (normal patterns and behaviors) as accurately as possible. However, these one-class learning methods can be deceived by unknown anomalies in the training data (i.e., anomaly contamination). Further, their normality learning also lacks knowledge about the anomalies of interest. Consequently, they often learn a biased, inaccurate normality boundary. This paper proposes a novel one-class learning approach, named calibrated one-class classification, to tackle this problem. Our one-class classifier is calibrated in two ways: (1) by adaptively penalizing uncertain predictions, which helps eliminate the impact of anomaly contamination while accentuating the predictions that the one-class model is confident in, and (2) by discriminating the normal samples from native anomaly examples that are generated to simulate genuine time series abnormal behaviors on the basis of original data. These two calibrations result in contamination-tolerant, anomaly-informed one-class learning, yielding a significantly improved normality modeling. Extensive experiments on six real-world datasets show that our model substantially outperforms twelve state-of-the-art competitors and obtains 6% - 31% F1 score improvement. The source code is available at \url{//github.com/xuhongzuo/couta}.
Although generative facial prior and geometric prior have recently demonstrated high-quality results for blind face restoration, producing fine-grained facial details faithful to inputs remains a challenging problem. Motivated by the classical dictionary-based methods and the recent vector quantization (VQ) technique, we propose a VQ-based face restoration method - VQFR. VQFR takes advantage of high-quality low-level feature banks extracted from high-quality faces and can thus help recover realistic facial details. However, the simple application of the VQ codebook cannot achieve good results with faithful details and identity preservation. Therefore, we further introduce two special network designs. 1). We first investigate the compression patch size in the VQ codebook and find that the VQ codebook designed with a proper compression patch size is crucial to balance the quality and fidelity. 2). To further fuse low-level features from inputs while not "contaminating" the realistic details generated from the VQ codebook, we proposed a parallel decoder consisting of a texture decoder and a main decoder. Those two decoders then interact with a texture warping module with deformable convolution. Equipped with the VQ codebook as a facial detail dictionary and the parallel decoder design, the proposed VQFR can largely enhance the restored quality of facial details while keeping the fidelity to previous methods.
This paper proposes networked dynamics to solve resource allocation problems over time-varying multi-agent networks. The state of each agent represents the amount of used resources (or produced utilities) while the total amount of resources is fixed. The idea is to optimally allocate the resources among the group of agents by minimizing the overall cost function subject to fixed sum of resources. Each agents' information is restricted to its own state and cost function and those of its immediate in-neighbors. This is motivated by distributed applications such as mobile edge-computing, economic dispatch over smart grids, and multi-agent coverage control. This work provides a fast convergent solution (in comparison with linear dynamics) while considering relaxed network connectivity with quantized communication links. The proposed dynamics reaches optimal solution over switching (possibly disconnected) undirected networks as far as their union over some bounded non-overlapping time-intervals has a spanning-tree. We prove feasibility of the solution, uniqueness of the optimal state, and convergence to the optimal value under the proposed dynamics, where the analysis is applicable to similar 1st-order allocation dynamics with strongly sign-preserving nonlinearities, such as actuator saturation.
The fifth-generation of wireless communication networks is required to support a range of use cases such as enhanced mobile broadband (eMBB), ultra-reliable, low-latency communications (URLLC), massive machine-type communications (mMTCs), with heterogeneous data rate, delay, and power requirements. The 4G LTE air interface uses extra overhead to enable scheduled access, which is not justified for small payload sizes. We employ a random access communication model with retransmissions for multiple users with small payloads at the low spectral efficiency regime. The radio resources are split non-orthogonally in the time and frequency dimensions. Retransmissions are combined via Hybrid Automatic Repeat reQuest (HARQ) methods, namely Chase Combining and Incremental Redundancy with a finite buffer size constraint $C_{\sf buf}$. We determine the best scaling for the spectral efficiency (SE) versus signal-to-noise ratio (SNR) per bit and for the user density versus SNR per bit, for the sum-optimal regime and when the interference is treated as noise, using a Shannon capacity approximation. Numerical results show that the scaling results are applicable over a range of $\eta$, $T$, $C_{\sf buf}$, $J$, at low received SNR values. The proposed analytical framework provides insights for resource allocation in general random access systems and specific 5G use cases for massive URLLC uplink access.
US Wind power generation has grown significantly over the last decades, both in number and average size of operating turbines. A lower specific power, i.e. larger rotor blades relative to wind turbine capacities, allows to increase capacity factors and to reduce cost. However, this development also reduces system efficiency, i.e. the share of power in the wind flowing through rotor swept areas which is converted to electricity. At the same time, also output power density, the amount of electric energy generated per unit of rotor swept area, may decrease due to the decline of specific power. The precise outcome depends, however, on the interplay of wind resources and wind turbine models. In this study, we present a decomposition of historical US wind power generation data for the period 2001-2021 to study to which extent the decrease in specific power affected system efficiency and output power density. We show that as a result of a decrease in specific power, system efficiency fell and therefore, output power density was reduced during the last decade. Furthermore, we show that the wind available to turbines has increased substantially due to increases in the average hub height of turbines since 2001. However, site quality has slightly decreased during the last 20 years.
Instance segmentation on point clouds is crucially important for 3D scene understanding. Distance clustering is commonly used in state-of-the-art methods (SOTAs), which is typically effective but does not perform well in segmenting adjacent objects with the same semantic label (especially when they share neighboring points). Due to the uneven distribution of offset points, these existing methods can hardly cluster all instance points. To this end, we design a novel divide and conquer strategy and propose an end-to-end network named PBNet that binarizes each point and clusters them separately to segment instances. PBNet divides offset instance points into two categories: high and low density points (HPs vs.LPs), which are then conquered separately. Adjacent objects can be clearly separated by removing LPs, and then be completed and refined by assigning LPs via a neighbor voting method. To further reduce clustering errors, we develop an iterative merging algorithm based on mean size to aggregate fragment instances. Experiments on ScanNetV2 and S3DIS datasets indicate the superiority of our model. In particular, PBNet achieves so far the best AP50 and AP25 on the ScanNetV2 official benchmark challenge (Validation Set) while demonstrating high efficiency.
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand in order to enable numerous edge AI applications. This paper provides an overview of efficient deep learning methods, systems and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.
Some neurons in deep networks specialize in recognizing highly specific perceptual, structural, or semantic features of inputs. In computer vision, techniques exist for identifying neurons that respond to individual concept categories like colors, textures, and object classes. But these techniques are limited in scope, labeling only a small subset of neurons and behaviors in any network. Is a richer characterization of neuron-level computation possible? We introduce a procedure (called MILAN, for mutual-information-guided linguistic annotation of neurons) that automatically labels neurons with open-ended, compositional, natural language descriptions. Given a neuron, MILAN generates a description by searching for a natural language string that maximizes pointwise mutual information with the image regions in which the neuron is active. MILAN produces fine-grained descriptions that capture categorical, relational, and logical structure in learned features. These descriptions obtain high agreement with human-generated feature descriptions across a diverse set of model architectures and tasks, and can aid in understanding and controlling learned models. We highlight three applications of natural language neuron descriptions. First, we use MILAN for analysis, characterizing the distribution and importance of neurons selective for attribute, category, and relational information in vision models. Second, we use MILAN for auditing, surfacing neurons sensitive to protected categories like race and gender in models trained on datasets intended to obscure these features. Finally, we use MILAN for editing, improving robustness in an image classifier by deleting neurons sensitive to text features spuriously correlated with class labels.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.