Compared to "black-box" models, like random forests and deep neural networks, explainable boosting machines (EBMs) are considered "glass-box" models that can be competitively accurate while also maintaining a higher degree of transparency and explainability. However, EBMs become readily less transparent and harder to interpret in high-dimensional settings with many predictor variables; they also become more difficult to use in production due to increases in scoring time. We propose a simple solution based on the least absolute shrinkage and selection operator (LASSO) that can help introduce sparsity by reweighting the individual model terms and removing the less relevant ones, thereby allowing these models to maintain their transparency and relatively fast scoring times in higher-dimensional settings. In short, post-processing a fitted EBM with many (i.e., possibly hundreds or thousands) of terms using the LASSO can help reduce the model's complexity and drastically improve scoring time. We illustrate the basic idea using two real-world examples with code.
Millimeter wave (mmWave) has been recognized as one of key technologies for 5G and beyond networks due to its potential to enhance channel bandwidth and network capacity. The use of mmWave for various applications including vehicular communications has been extensively discussed. However, applying mmWave to vehicular communications faces challenges of high mobility nodes and narrow coverage along the mmWave beams. Due to high mobility in dense networks, overlapping beams can cause strong interference which leads to performance degradation. As a remedy, beam switching capability in mmWave can be utilized. Then, frequent beam switching and cell change become inevitable to manage interference, which increase computational and signalling complexity. In order to deal with the complexity in interference control, we develop a new strategy called Multi-Agent Context Learning (MACOL), which utilizes Contextual Bandit to manage interference while allocating mmWave beams to serve vehicles in the network. Our approach demonstrates that by leveraging knowledge of neighbouring beam status, the machine learning agent can identify and avoid potential interfering transmissions to other ongoing transmissions. Furthermore, we show that even under heavy traffic loads, our proposed MACOL strategy is able to maintain low interference levels at around 10%.
Training deep neural networks (DNNs) requires large datasets and powerful computing resources, which has led some owners to restrict redistribution without permission. Watermarking techniques that embed confidential data into DNNs have been used to protect ownership, but these can degrade model performance and are vulnerable to watermark removal attacks. Recently, DeepJudge was introduced as an alternative approach to measuring the similarity between a suspect and a victim model. While DeepJudge shows promise in addressing the shortcomings of watermarking, it primarily addresses situations where the suspect model copies the victim's architecture. In this study, we introduce DeepTaster, a novel DNN fingerprinting technique, to address scenarios where a victim's data is unlawfully used to build a suspect model. DeepTaster can effectively identify such DNN model theft attacks, even when the suspect model's architecture deviates from the victim's. To accomplish this, DeepTaster generates adversarial images with perturbations, transforms them into the Fourier frequency domain, and uses these transformed images to identify the dataset used in a suspect model. The underlying premise is that adversarial images can capture the unique characteristics of DNNs built with a specific dataset. To demonstrate the effectiveness of DeepTaster, we evaluated the effectiveness of DeepTaster by assessing its detection accuracy on three datasets (CIFAR10, MNIST, and Tiny-ImageNet) across three model architectures (ResNet18, VGG16, and DenseNet161). We conducted experiments under various attack scenarios, including transfer learning, pruning, fine-tuning, and data augmentation. Specifically, in the Multi-Architecture Attack scenario, DeepTaster was able to identify all the stolen cases across all datasets, while DeepJudge failed to detect any of the cases.
Large antenna arrays can steer narrow beams towards a target area, and thus improve the communications capacity of wireless channels and the fidelity of radio sensing. Hardware that is capable of continuously-variable phase shifts is expensive, presenting scaling challenges. PIN diodes that apply only discrete phase shifts are promising and cost-effective; however, unlike continuous phase shifters, finding the best phase configuration across elements is an NP-hard optimization problem. Thus, the complexity of optimization becomes a new bottleneck for large-antenna arrays. To address this challenge, this paper suggests a procedure for converting the optimization objective function from a ratio of quadratic functions to a sequence of more easily solvable quadratic unconstrained binary optimization (QUBO) sub-problems. This conversion is an exact equivalence, and the resulting QUBO forms are standard input formats for various physics-inspired optimization methods. We demonstrate that a simulated annealing approach is very effective for solving these sub-problems, and we give performance metrics for several large array types optimized by this technique. Through numerical experiments, we report 3D beamforming performance for extra-large arrays with up to 10,000 elements.
Recently, intelligent reflecting surface (IRS)-aided millimeter-wave (mmWave) and terahertz (THz) communications are considered in the wireless community. This paper aims to design a beam-based multiple-access strategy for this new paradigm. Its key idea is to make use of multiple sub-arrays over a hybrid digital-analog array to form independent beams, each of which is steered towards the desired direction to mitigate inter-user interference and suppress unwanted signal reflection. The proposed scheme combines the advantages of both orthogonal multiple access (i.e., no inter-user interference) and non-orthogonal multiple access (i.e., full time-frequency resource use). Consequently, it can substantially boost the system capacity, as verified by Monte-Carlo simulations.
Despite technological advancements, the significance of interdisciplinary subjects like complex networks has grown. Exploring communication within these networks is crucial, with traffic becoming a key concern due to the expanding population and increased need for connections. Congestion tends to originate in specific network areas but quickly proliferates throughout. Consequently, understanding the transition from a flow-free state to a congested state is vital. Numerous studies have delved into comprehending the emergence and control of congestion in complex networks, falling into three general categories: soft strategies, hard strategies, and resource allocation strategies. This article introduces a routing algorithm leveraging reinforcement learning to address two primary objectives: congestion control and optimizing path length based on the shortest path algorithm, ultimately enhancing network throughput compared to previous methods. Notably, the proposed method proves effective not only in Barab\'asi-Albert scale-free networks but also in other network models such as Watts-Strogatz (small-world) and Erd\"os-R\'enyi (random network). Simulation experiment results demonstrate that, across various traffic scenarios and network topologies, the proposed method can enhance efficiency criteria by up to 30% while reducing maximum node congestion by five times.
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning -- a recent trend in NLP -- to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp overfits base classes observed during training. To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). Compared to CoOp's static prompts, our dynamic prompts adapt to each instance and are thus less sensitive to class shift. Extensive experiments show that CoCoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset; and yields stronger domain generalization performance as well. Code is available at //github.com/KaiyangZhou/CoOp.
Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. Conventional methods assume either the known heterogeneity of training data (e.g. domain labels) or the approximately equal capacities of different domains. In this paper, we consider a more challenging case where neither of the above assumptions holds. We propose to address this problem by removing the dependencies between features via learning weights for training samples, which helps deep models get rid of spurious correlations and, in turn, concentrate more on the true connection between discriminative features and labels. Extensive experiments clearly demonstrate the effectiveness of our method on multiple distribution generalization benchmarks compared with state-of-the-art counterparts. Through extensive experiments on distribution generalization benchmarks including PACS, VLCS, MNIST-M, and NICO, we show the effectiveness of our method compared with state-of-the-art counterparts.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
Pre-trained deep neural network language models such as ELMo, GPT, BERT and XLNet have recently achieved state-of-the-art performance on a variety of language understanding tasks. However, their size makes them impractical for a number of scenarios, especially on mobile and edge devices. In particular, the input word embedding matrix accounts for a significant proportion of the model's memory footprint, due to the large input vocabulary and embedding dimensions. Knowledge distillation techniques have had success at compressing large neural network models, but they are ineffective at yielding student models with vocabularies different from the original teacher models. We introduce a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions. Specifically, we employ a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary. We combine this approach with learning shared projection matrices that transfer layer-wise knowledge from the teacher model to the student model. Our method is able to compress the BERT_BASE model by more than 60x, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7MB. Experimental results also demonstrate higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques.