We address the challenges associated with deploying neural networks on CPUs, with a particular focus on minimizing inference time while maintaining accuracy. Our novel approach is to use the dataflow (i.e., computation order) of a neural network to explore data reuse opportunities using heuristic-guided analysis and a code generation framework, which enables exploration of various Single Instruction, Multiple Data (SIMD) implementations to achieve optimized neural network execution. Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4.8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.
With the rise in communication capacity, deep neural networks (DNN) for digital pre-distortion (DPD) to correct non-linearity in wideband power amplifiers (PAs) have become prominent. Yet, there is a void in open-source and measurement-setup-independent platforms for fast DPD exploration and objective DPD model comparison. This paper presents an open-source framework, OpenDPD, crafted in PyTorch, with an associated dataset for PA modeling and DPD learning. We introduce a Dense Gated Recurrent Unit (DGRU)-DPD, trained via a novel end-to-end learning architecture, outperforming previous DPD models on a digital PA DPA in the new digital transmitter (DTX) architecture with unconventional transfer characteristics compared to analog PAs. Measurements show our DGRU-DPD achieves an ACPR of -44.69/-44.47 dBc and an EVM of -35.22 dB for 200 MHz OFDM signals. OpenDPD code, datasets, and documentation are publicly available at //github.com/lab-emi/OpenDPD.
Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However, training or fine-tuning such models requires substantial computational power and resources, where the memory capacity of a single acceleration device like a GPU is one of the most important bottlenecks. Owing to the prohibitively large overhead (e.g., $10 \times$) of GPUs' native memory allocator, DNN frameworks like PyTorch and TensorFlow adopt a caching allocator that maintains a memory pool with a splitting mechanism for fast memory (de)allocation. Unfortunately, the caching allocator's efficiency degrades quickly for popular memory reduction techniques such as recomputation, offloading, distributed training, and low-rank adaptation. The primary reason is that those memory reduction techniques introduce frequent and irregular memory (de)allocation requests, leading to severe fragmentation problems for the splitting-based caching allocator. To mitigate this fragmentation problem, we propose a novel memory allocation framework based on low-level GPU virtual memory management called GPU memory lake (GMLake). GMLake employs a novel virtual memory stitching (VMS) mechanism, which can fuse or combine non-contiguous memory blocks with a virtual memory address mapping. GMLake can reduce an average of 9.2 GB (up to 25 GB) GPU memory usage and 15% (up to 33% ) fragmentation among eight LLM models on GPU A100 with 80 GB memory. GMLake is completely transparent to the DNN models and memory reduction techniques and ensures the seamless execution of resource-intensive deep-learning tasks. We have open-sourced GMLake at //github.com/intelligent-machine-learning/glake/tree/main/GMLake.
Uncertainty quantification in forecasting represents a topic of great importance in energy trading, as understanding the status of the energy market would enable traders to directly evaluate the impact of their own offers/bids. To this end, we propose a scalable procedure that outputs closed-form simultaneous prediction bands for multivariate functional response variables in a time series setting, which is able to guarantee performance bounds in terms of unconditional coverage and asymptotic exactness, both under some conditions. After evaluating its performance on synthetic data, the method is used to build multivariate prediction bands for daily demand and offer curves in the Italian gas market.
This manuscript delves into the intersection of genomics and phenotypic prediction, focusing on the statistical innovation required to navigate the complexities introduced by noisy covariates and confounders. The primary emphasis is on the development of advanced robust statistical models tailored for genomic prediction from single nucleotide polymorphism (SNP) data collected from genome-wide association studies (GWAS) in plant and animal breeding and multi-field trials. The manuscript explores the limitations of traditional marker-assisted recurrent selection, highlighting the significance of incorporating all estimated effects of marker loci into the statistical framework and aiming to reduce the high dimensionality of GWAS data while preserving critical information. This paper introduces a new robust statistical framework for genomic prediction, employing one-stage and two-stage linear mixed model analyses along with utilizing the popular robust minimum density power divergence estimator (MDPDE) to estimate genetic effects on phenotypic traits. The study illustrates the superior performance of the proposed MDPDE-based genomic prediction and associated heritability estimation procedures over existing competitors through extensive empirical experiments on artificial datasets and application to a real-life maize breeding dataset. The results showcase the robustness and accuracy of the proposed MDPDE-based approaches, especially in the presence of data contamination, emphasizing their potential applications in improving breeding programs and advancing genomic prediction of phenotyping traits.
Recent advances in Multi-Agent Reinforcement Learning have prompted the modeling of intricate interactions between agents in simulated environments. In particular, the predator-prey dynamics have captured substantial interest and various simulations been tailored to unique requirements. To prevent further time-intensive developments, we introduce Aquarium, a comprehensive Multi-Agent Reinforcement Learning environment for predator-prey interaction, enabling the study of emergent behavior. Aquarium is open source and offers a seamless integration of the PettingZoo framework, allowing a quick start with proven algorithm implementations. It features physics-based agent movement on a two-dimensional, edge-wrapping plane. The agent-environment interaction (observations, actions, rewards) and the environment settings (agent speed, prey reproduction, predator starvation, and others) are fully customizable. Besides a resource-efficient visualization, Aquarium supports to record video files, providing a visual comprehension of agent behavior. To demonstrate the environment's capabilities, we conduct preliminary studies which use PPO to train multiple prey agents to evade a predator. In accordance to the literature, we find Individual Learning to result in worse performance than Parameter Sharing, which significantly improves coordination and sample-efficiency.
We consider a wireless communication system with a passive eavesdropper, in which a transmitter and legitimate receiver generate and use key bits to secure the transmission of their data. These bits are added to and used from a pool of available key bits. In this work, we analyze the reliability of the system in terms of the probability that the budget of available key bits will be exhausted. In addition, we investigate the latency before a transmission can take place. Since security, reliability, and latency are three important metrics for modern communication systems, it is of great interest to jointly analyze them in relation to the system parameters. In particular, we show under what conditions the system may remain in an active state indefinitely, i.e., never run out of available secret-key bits. The results presented in this work will allow system designers to adjust the system parameters in such a way that the requirements of the application in terms of both reliability and latency are met.
Knowledge-based dialogue systems with internet retrieval have recently attracted considerable attention from researchers. The dialogue systems overcome a major limitation of traditional knowledge dialogue systems, where the timeliness of knowledge cannot be assured, hence providing greater practical application value. Knowledge-based dialogue systems with internet retrieval can be typically segmented into three tasks: Retrieval Decision, Query Generation, and Response Generation. However, many of studies assumed that all conversations require external knowledge to continue, neglecting the critical step of determining when retrieval is necessary. This assumption often leads to an over-dependence on external knowledge, even when it may not be required. Our work addresses this oversight by employing a single unified model facilitated by prompt and multi-task learning approaches. This model not only decides whether retrieval is necessary but also generates retrieval queries and responses. By integrating these functions, our system leverages the full potential of pre-trained models and reduces the complexity and costs associated with deploying multiple models. We conducted extensive experiments to investigate the mutual enhancement among the three tasks in our system. What is more, the experiment results on the Wizint and Dusinc datasets not only demonstrate that our unified model surpasses the baseline performance for individual tasks, but also reveal that it achieves comparable results when contrasted with SOTA systems that deploy separate, specialized models for each task.
With the advent of 5G commercialization, the need for more reliable, faster, and intelligent telecommunication systems are envisaged for the next generation beyond 5G (B5G) radio access technologies. Artificial Intelligence (AI) and Machine Learning (ML) are not just immensely popular in the service layer applications but also have been proposed as essential enablers in many aspects of B5G networks, from IoT devices and edge computing to cloud-based infrastructures. However, most of the existing surveys in B5G security focus on the performance of AI/ML models and their accuracy, but they often overlook the accountability and trustworthiness of the models' decisions. Explainable AI (XAI) methods are promising techniques that would allow system developers to identify the internal workings of AI/ML black-box models. The goal of using XAI in the security domain of B5G is to allow the decision-making processes of the security of systems to be transparent and comprehensible to stakeholders making the systems accountable for automated actions. In every facet of the forthcoming B5G era, including B5G technologies such as RAN, zero-touch network management, E2E slicing, this survey emphasizes the role of XAI in them and the use cases that the general users would ultimately enjoy. Furthermore, we presented the lessons learned from recent efforts and future research directions on top of the currently conducted projects involving XAI.
One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network's prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features' information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.
Small data challenges have emerged in many learning problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address it, many efforts have been made on training complex models with small data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of small data models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. We will review the criteria of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, which underpin the foundations of recent developments. Many instantiations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. While we focus on the unsupervised and semi-supervised methods, we will also provide a broader review of other emerging topics, from unsupervised and semi-supervised domain adaptation to the fundamental roles of transformation equivariance and invariance in training a wide spectrum of deep networks. It is impossible for us to write an exclusive encyclopedia to include all related works. Instead, we aim at exploring the main ideas, principles and methods in this area to reveal where we are heading on the journey towards addressing the small data challenges in this big data era.