The Domain Name System (DNS) is part of critical internet infrastructure, as DNS is invoked whenever a remote server is accessed (an URL is visited, an API request is made, etc.) by any application. DNS queries are served in hierarchical manner, with most queries served locally from cached data, and a small fraction propagating to the top of the hierarchy - DNS root name servers. Our research aims to provide a comprehensive, longitudinal characterization of DNS queries received at B-Root over ten years. We sampled and analyzed a 28-billion-query large dataset from the ten annual Day in the Life of the Internet (DITL) experiments from 2013 through 2022. We sought to identify and quantify unexpected DNS queries, establish longitudinal trends, and compare our findings with published results of others. We found that unexpected query traffic increased from 39.57% in 2013 to 67.91% in 2022, with 36.55% of queries being priming queries. We also observed growth and decline of Chromium-initiated, random DNS queries. Finally, we analyzed the largest DNS query senders and established that most of their traffic consists of unexpected queries.
Generative AI has received significant attention among a spectrum of diverse industrial and academic domains, thanks to the magnificent results achieved from deep generative models such as generative pre-trained transformers (GPT) and diffusion models. In this paper, we explore the applications of denoising diffusion probabilistic models (DDPMs) in wireless communication systems under practical assumptions such as hardware impairments (HWI), low-SNR regime, and quantization error. Diffusion models are a new class of state-of-the-art generative models that have already showcased notable success with some of the popular examples by OpenAI1 and Google Brain2. The intuition behind DDPM is to decompose the data generation process over small ``denoising'' steps. Inspired by this, we propose using denoising diffusion model-based receiver for a practical wireless communication scheme, while providing network resilience in low-SNR regimes, non-Gaussian noise, different HWI levels, and quantization error. We evaluate the reconstruction performance of our scheme in terms of mean-squared error (MSE) metric. Our results show that more than 25 dB improvement in MSE is achieved compared to deep neural network (DNN)-based receivers. We also highlight robust out-of-distribution performance under non-Gaussian noise.
Location-awareness is essential in various wireless applications. The capability of performing precise ranging is substantial in achieving high-accuracy localization. Due to the notorious ambiguity phenomenon, optimal ranging waveforms should be adaptive to the signal-to-noise ratio (SNR). In this letter, we propose to use the Ziv-Zakai bound (ZZB) as the ranging performance metric, as well as an associated waveform design algorithm having theoretical guarantee of achieving the optimal ZZB at a given SNR. Numerical results suggest that, in stark contrast to the well-known high-SNR design philosophy, the detection probability of the ranging signal becomes more important than the resolution in the low-SNR regime.
Today the LHC offline computing relies heavily on CPU resources, despite the interest in compute accelerators, such as GPUs, for the longer term future. The number of cores per CPU socket has continued to increase steadily, reaching the levels of 64 cores (128 threads) with recent AMD EPYC processors, and 128 cores on Ampere Altra Max ARM processors. Over the course of the past decade, the CMS data processing framework, CMSSW, has been transformed from a single-threaded framework into a highly concurrent one. The first multithreaded version was brought into production by the start of the LHC Run 2 in 2015. Since then, the framework's threading efficiency has gradually been improved by adding more levels of concurrency and reducing the amount of serial code paths. The latest addition was support for concurrent Runs. In this work we review the concurrency model of the CMSSW, and measure its scalability with real CMS applications, such as simulation and reconstruction, on mode rn many-core machines. We show metrics such as event processing throughput and application memory usage with and without the contribution of I/O, as I/O has been the major scaling limitation for the CMS applications.
Generalizing to unseen image domains is a challenging problem primarily due to the lack of diverse training data, inaccessible target data, and the large domain shift that may exist in many real-world settings. As such data augmentation is a critical component of domain generalization methods that seek to address this problem. We present Adversarial Bayesian Augmentation (ABA), a novel algorithm that learns to generate image augmentations in the challenging single-source domain generalization setting. ABA draws on the strengths of adversarial learning and Bayesian neural networks to guide the generation of diverse data augmentations -- these synthesized image domains aid the classifier in generalizing to unseen domains. We demonstrate the strength of ABA on several types of domain shift including style shift, subpopulation shift, and shift in the medical imaging setting. ABA outperforms all previous state-of-the-art methods, including pre-specified augmentations, pixel-based and convolutional-based augmentations.
In the era of data-driven Music Information Retrieval (MIR), the scarcity of labeled data has been one of the major concerns to the success of an MIR task. In this work, we leverage the semi-supervised teacher-student training approach to improve MIR tasks. For training, we scale up the unlabeled music data to 240k hours, which is much larger than any public MIR datasets. We iteratively create and refine the pseudo-labels in the noisy teacher-student training process. Knowledge expansion is also explored to iteratively scale up the model sizes from as small as less than 3M to almost 100M parameters. We study the performance correlation between data size and model size in the experiments. By scaling up both model size and training data, our models achieve state-of-the-art results on several MIR tasks compared to models that are either trained in a supervised manner or based on a self-supervised pretrained model. To our knowledge, this is the first attempt to study the effects of scaling up both model and training data for a variety of MIR tasks.
Internet of Things has pervaded every area of modern life. From a research and industry standpoint, there has been an increasing demand and desire in recent years to develop Internet of Things networks with distributed structure. Wireless communication under emergency circumstances is one of the important applications that distributed Internet of Things can have. In order for a network to be functional in this scenario, it must be developed without the aid of a pre-established or centralized structure and operated in a self-organized manner to accommodate the communication requirements of the time. Although the design and development of such networks can be highly advantageous, they frequently confront difficulties, the most significant of which is attaining and maintaining effective connectivity to have reliable communications despite the requirement to optimize energy usage. In this study, we present a model for self-organizing topology control for ad hoc-based Internet of Things networks that can address the aforementioned challenges. The model that will be presented employs the notion of the Hamiltonian function in classical mechanics and has two key objectives: regulating the network's topology and dynamics to enhance connectivity to a desirable level while requiring the least amount of energy possible. The results of the simulation indicate that the proposed model satisfactorily fulfills the goals of the problem.
Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.
It has been shown that deep neural networks are prone to overfitting on biased training data. Towards addressing this issue, meta-learning employs a meta model for correcting the training bias. Despite the promising performances, super slow training is currently the bottleneck in the meta learning approaches. In this paper, we introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient computation with a faster layer-wise approximation. We empirically find that FaMUS yields not only a reasonably accurate but also a low-variance approximation of the meta gradient. We conduct extensive experiments to verify the proposed method on two tasks. We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance. In particular, our method achieves the state-of-the-art performance on both synthetic and realistic noisy labels, and obtains promising performance on long-tailed recognition on standard benchmarks.
For better user experience and business effectiveness, Click-Through Rate (CTR) prediction has been one of the most important tasks in E-commerce. Although extensive CTR prediction models have been proposed, learning good representation of items from multimodal features is still less investigated, considering an item in E-commerce usually contains multiple heterogeneous modalities. Previous works either concatenate the multiple modality features, that is equivalent to giving a fixed importance weight to each modality; or learn dynamic weights of different modalities for different items through technique like attention mechanism. However, a problem is that there usually exists common redundant information across multiple modalities. The dynamic weights of different modalities computed by using the redundant information may not correctly reflect the different importance of each modality. To address this, we explore the complementarity and redundancy of modalities by considering modality-specific and modality-invariant features differently. We propose a novel Multimodal Adversarial Representation Network (MARN) for the CTR prediction task. A multimodal attention network first calculates the weights of multiple modalities for each item according to its modality-specific features. Then a multimodal adversarial network learns modality-invariant representations where a double-discriminators strategy is introduced. Finally, we achieve the multimodal item representations by combining both modality-specific and modality-invariant representations. We conduct extensive experiments on both public and industrial datasets, and the proposed method consistently achieves remarkable improvements to the state-of-the-art methods. Moreover, the approach has been deployed in an operational E-commerce system and online A/B testing further demonstrates the effectiveness.