Typical Convolutional Neural Networks (ConvNets) depend heavily on large amounts of image data and resort to an iterative optimization algorithm (e.g., SGD or Adam) to learn network parameters, which makes training very time- and resource-intensive. In this paper, we propose a new training paradigm and formulate the parameter learning of ConvNets into a prediction task: given a ConvNet architecture, we observe there exists correlations between image datasets and their corresponding optimal network parameters, and explore if we can learn a hyper-mapping between them to capture the relations, such that we can directly predict the parameters of the network for an image dataset never seen during the training phase. To do this, we put forward a new hypernetwork based model, called PudNet, which intends to learn a mapping between datasets and their corresponding network parameters, and then predicts parameters for unseen data with only a single forward propagation. Moreover, our model benefits from a series of adaptive hyper recurrent units sharing weights to capture the dependencies of parameters among different network layers. Extensive experiments demonstrate that our proposed method achieves good efficacy for unseen image datasets on two kinds of settings: Intra-dataset prediction and Inter-dataset prediction. Our PudNet can also well scale up to large-scale datasets, e.g., ImageNet-1K. It takes 8967 GPU seconds to train ResNet-18 on the ImageNet-1K using GC from scratch and obtain a top-5 accuracy of 44.65 %. However, our PudNet costs only 3.89 GPU seconds to predict the network parameters of ResNet-18 achieving comparable performance (44.92 %), more than 2,300 times faster than the traditional training paradigm.
AI-Generated Content (AIGC), as a novel manner of providing Metaverse services in the forthcoming Internet paradigm, can resolve the obstacles of immersion requirements. Concurrently, edge computing, as an evolutionary paradigm of computing in communication systems, effectively augments real-time interactive services. In pursuit of enhancing the accessibility of AIGC services, the deployment of AIGC models (e.g., diffusion models) to edge servers and local devices has become a prevailing trend. Nevertheless, this approach faces constraints imposed by battery life and computational resources when tasks are offloaded to local devices, limiting the capacity to deliver high-quality content to users while adhering to stringent latency requirements. So there will be a tradeoff between the utility of AIGC models and offloading decisions in the edge computing paradigm. This paper proposes a joint optimization algorithm for offloading decisions, computation time, and diffusion steps of the diffusion models in the reverse diffusion stage. Moreover, we take the average error into consideration as the metric for evaluating the quality of the generated results. Experimental results conclusively demonstrate that the proposed algorithm achieves superior joint optimization performance compared to the baselines.
One of the motivations for explainable AI is to allow humans to make better and more informed decisions regarding the use and deployment of AI models. But careful evaluations are needed to assess whether this expectation has been fulfilled. Current evaluations mainly focus on algorithmic properties of explanations, and those that involve human subjects often employ subjective questions to test human's perception of explanation usefulness, without being grounded in objective metrics and measurements. In this work, we evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development. We conduct a mixed-methods user study involving image data to evaluate saliency maps generated by SmoothGrad, GradCAM, and an oracle explanation on two tasks: model selection and counterfactual simulation. To our surprise, we did not find evidence of significant improvement on these tasks when users were provided with any of the saliency maps, even the synthetic oracle explanation designed to be simple to understand and highly indicative of the answer. Nonetheless, explanations did help users more accurately describe the models. These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
Deep Learning(DL) and Machine Learning(ML) applications are rapidly increasing in recent days. Massive amounts of data are being generated over the internet which can derive meaningful results by the use of ML and DL algorithms. Hardware resources and open-source libraries have made it easy to implement these algorithms. Tensorflow and Pytorch are one of the leading frameworks for implementing ML projects. By using those frameworks, we can trace the operations executed on both GPU and CPU to analyze the resource allocations and consumption. This paper presents the time and memory allocation of CPU and GPU while training deep neural networks using Pytorch. This paper analysis shows that GPU has a lower running time as compared to CPU for deep neural networks. For a simpler network, there are not many significant improvements in GPU over the CPU.
A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content, including grammatically precise sentences, human-like paragraphs, and syntactically accurate code snippets. LLMs can play a pivotal role in software development, including software testing. LLMs go beyond traditional roles such as requirement analysis and documentation and can support test case generation, making them valuable tools that significantly enhance testing practices within the field. Hence, we explore the practical application of LLMs in software testing within an industrial setting, focusing on their current use by professional testers. In this context, rather than relying on existing data, we conducted a cross-sectional survey and collected data within real working contexts, specifically, engaging with practitioners in industrial settings. We applied quantitative and qualitative techniques to analyze and synthesize our collected data. Our findings demonstrate that LLMs effectively enhance testing documents and significantly assist testing professionals in programming tasks like debugging and test case automation. LLMs can support individuals engaged in manual testing who need to code. However, it is crucial to emphasize that, at this early stage, software testing professionals should use LLMs with caution while well-defined methods and guidelines are being built for the secure adoption of these tools.
Nowadays, neuromorphic systems based on Spiking Neural Networks (SNNs) attract attentions of many researchers. There are many studies to improve performances of neuromorphic systems. These studies have been showing satisfactory results. To magnify performances of neuromorphic systems, developing actual neuromorphic systems is essential. For developing them, memristors play key role due to their useful characteristics. Although memristors are essential for actual neuromorphic systems, they are vulnerable to faults. However, there are few studies analyzing effects of fault elements in neuromorphic systems using memristors. To solve this problem, we analyze performance of a memristive neuromorphic system with fault elements changing fault ratios, types, and positions. We choose neurons and synapses to inject faults. We inject two types of faults to synapses: SA0 and SA1 faults. The fault synapses appear in random and important positions. Through our analysis, we discover the following four interesting points. First, memristive characteristics increase vulnerability of neuromorphic systems to fault elements. Second, fault neuron ratios reducing performance sharply exist. Third, performance degradation by fault synapses depends on fault types. Finally, SA1 fault synapses improve performance when they appear in important positions.
Distributed Data Processing Platforms (e.g., Hadoop, Spark, and Flink) are widely used to store and process data in a cloud environment. These platforms distribute the storage and processing of data among the computing nodes of a cloud. The efficient use of these platforms requires users to (i) configure the cloud i.e., determine the number and type of computing nodes, and (ii) tune the configuration parameters (e.g., data replication factor) of the platform. However, both these tasks require in-depth knowledge of the cloud infrastructure and distributed data processing platforms. Therefore, in this paper, we first study the relationship between the configuration of the cloud and the configuration of distributed data processing platforms to determine how cloud configuration impacts platform configuration. After understanding the impacts, we propose a co-tuning approach for recommending optimal co-configuration of cloud and distributed data processing platforms. The proposed approach utilizes machine learning and optimization techniques to maximize the performance of the distributed data processing system deployed on the cloud. We evaluated our approach for Hadoop, Spark, and Flink in a cluster deployed on the OpenStack cloud. We used three benchmarking workloads (WordCount, Sort, and K-means) in our evaluation. Our results reveal that, in comparison to default settings, our co-tuning approach reduces execution time by 17.5% and $ cost by 14.9% solely via configuration tuning.
Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of machine learning models is going on this way such as active learning, few-shot learning, deep clustering. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by one specified sampling scenario. This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion. With these theoretical analyses, we categorize the small data learning models from two geometric perspectives: the Euclidean and non-Euclidean (hyperbolic) mean representation, where their optimization solutions are also presented and discussed. Later, some potential learning scenarios that may benefit from small data learning are then summarized, and their potential learning scenarios are also analyzed. Finally, some challenging applications such as computer vision, natural language processing that may benefit from learning on small data are also surveyed.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.
Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.