The Quantum Alternating Operator Ansatz (QAOA) is a prominent variational quantum algorithm for solving combinatorial optimization problems. Its effectiveness depends on identifying input parameters that yield high-quality solutions. However, understanding the complexity of training QAOA remains an under-explored area. Previous results have given analytical performance guarantees for a small, fixed number of parameters. At the opposite end of the spectrum, barren plateaus are likely to emerge at $\Omega(n)$ parameters for $n$ qubits. In this work, we study the difficulty of training in the intermediate regime, which is the focus of most current numerical studies and near-term hardware implementations. Through extensive numerical analysis of the quality and quantity of local minima, we argue that QAOA landscapes can exhibit a superpolynomial growth in the number of low-quality local minima even when the number of parameters scales logarithmically with $n$. This means that the common technique of gradient descent from randomly initialized parameters is doomed to fail beyond small $n$, and emphasizes the need for good initial guesses of the optimal parameters.
We introduce a distributed algorithm, termed noise-robust distributed maximum consensus (RD-MC), for estimating the maximum value within a multi-agent network in the presence of noisy communication links. Our approach entails redefining the maximum consensus problem as a distributed optimization problem, allowing a solution using the alternating direction method of multipliers. Unlike existing algorithms that rely on multiple sets of noise-corrupted estimates, RD-MC employs a single set, enhancing both robustness and efficiency. To further mitigate the effects of link noise and improve robustness, we apply moving averaging to the local estimates. Through extensive simulations, we demonstrate that RD-MC is significantly more robust to communication link noise compared to existing maximum-consensus algorithms.
The Reinforcement Learning (RL) algorithm, renowned for its robust learning capability and search stability, has garnered significant attention and found extensive application in Automated Guided Vehicle (AGV) path planning. However, RL planning algorithms encounter challenges stemming from the substantial variance of neural networks caused by environmental instability and significant fluctuations in system structure. These challenges manifest in slow convergence speed and low learning efficiency. To tackle this issue, this paper presents the Particle Filter-Double Deep Q-Network (PF-DDQN) approach, which incorporates the Particle Filter (PF) into multi-AGV reinforcement learning path planning. The PF-DDQN method leverages the imprecise weight values of the network as state values to formulate the state space equation. Through the iterative fusion process of neural networks and particle filters, the DDQN model is optimized to acquire the optimal true weight values, thus enhancing the algorithm's efficiency. The proposed method's effectiveness and superiority are validated through numerical simulations. Overall, the simulation results demonstrate that the proposed algorithm surpasses the traditional DDQN algorithm in terms of path planning superiority and training time indicators by 92.62% and 76.88%, respectively. In conclusion, the PF-DDQN method addresses the challenges encountered by RL planning algorithms in AGV path planning. By integrating the Particle Filter and optimizing the DDQN model, the proposed method achieves enhanced efficiency and outperforms the traditional DDQN algorithm in terms of path planning superiority and training time indicators.
Training a Graph Neural Network (GNN) model on large-scale graphs involves a high volume of data communication and compu- tations. While state-of-the-art CPUs and GPUs feature high computing power, the Standard GNN training protocol adopted in existing GNN frameworks cannot efficiently utilize the platform resources. To this end, we propose a novel Unified CPU-GPU protocol that can improve the resource utilization of GNN training on a CPU-GPU platform. The Unified CPU-GPU protocol instantiates multiple GNN training processes in parallel on both the CPU and the GPU. By allocating training processes on the CPU to perform GNN training collaboratively with the GPU, the proposed protocol improves the platform resource utilization and reduces the CPU-GPU data transfer overhead. Since the performance of a CPU and a GPU varies, we develop a novel load balancer that balances the workload dynamically between CPUs and GPUs during runtime. We evaluate our protocol using two representative GNN sampling algorithms, with two widely-used GNN models, on three datasets. Compared with the standard training protocol adopted in the state-of-the-art GNN frameworks, our protocol effectively improves resource utilization and overall training time. On a platform where the GPU moderately outperforms the CPU, our protocol speeds up GNN training by up to 1.41x. On a platform where the GPU significantly outperforms the CPU, our protocol speeds up GNN training by up to 1.26x. Our protocol is open-sourced and can be seamlessly integrated into state-of-the-art GNN frameworks and accelerate GNN training. Our protocol particularly benefits those with limited GPU access due to its high demand.
Efficient implementation of massive multiple-input-multiple-output (MIMO) transceivers is essential for the next-generation wireless networks. To reduce the high computational complexity of the massive MIMO transceiver, in this paper, we propose a new massive MIMO architecture using finite-precision arithmetic. First, we conduct the rounding error analysis and derive the lower bound of the achievable rate for single-input-multiple-output (SIMO) using maximal ratio combining (MRC) and multiple-input-single-output (MISO) systems using maximal ratio transmission (MRT) with finite-precision arithmetic. Then, considering the multi-user scenario, the rounding error analysis of zero-forcing (ZF) detection and precoding is derived by using the normal equations (NE) method. The corresponding lower bounds of the achievable sum rate are also derived and asymptotic analyses are presented. Built upon insights from these analyses and lower bounds, we propose a mixed-precision architecture for massive MIMO systems to offset performance gaps due to finite-precision arithmetic. The corresponding analysis of rounding errors and computational costs is obtained. Simulation results validate the derived bounds and underscore the superiority of the proposed mixed-precision architecture to the conventional structure.
Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental methodology for deploying deep SNNs on resource-constrained edge devices. Though the existing pruning methods can provide extremely high weight sparsity for deep SNNs, the high weight sparsity brings a workload imbalance problem. Specifically, the workload imbalance happens when a different number of non-zero weights are assigned to hardware units running in parallel. This results in low hardware utilization and thus imposes longer latency and higher energy costs. In preliminary experiments, we show that sparse SNNs (~98% weight sparsity) can suffer as low as ~59% utilization. To alleviate the workload imbalance problem, we propose u-Ticket, where we monitor and adjust the weight connections of the SNN during Lottery Ticket Hypothesis (LTH) based pruning, thus guaranteeing the final ticket gets optimal utilization when deployed onto the hardware. Experiments indicate that our u-Ticket can guarantee up to 100% hardware utilization, thus reducing up to 76.9% latency and 63.8% energy cost compared to the non-utilization-aware LTH method.
Variational quantum algorithms are gaining attention as an early application of Noisy Intermediate-Scale Quantum (NISQ) devices. One of the main problems of variational methods lies in the phenomenon of Barren Plateaus, present in the optimization of variational parameters. Adding geometric inductive bias to the quantum models has been proposed as a potential solution to mitigate this problem, leading to a new field called Geometric Quantum Machine Learning. In this work, an equivariant architecture for variational quantum classifiers is introduced to create a label-invariant model for image classification with $C_4$ rotational label symmetry. The equivariant circuit is benchmarked against two different architectures, and it is experimentally observed that the geometric approach boosts the model's performance. Finally, a classical equivariant convolution operation is proposed to extend the quantum model for the processing of larger images, employing the resources available in NISQ devices.
Low-Rank Adaptation (LoRA) is a widely used Parameter-Efficient Fine-Tuning (PEFT) method that updates an initial weight matrix $W_0$ with a delta matrix $\Delta W$ consisted by two low-rank matrices $A$ and $B$. A previous study suggested that there is correlation between $W_0$ and $\Delta W$. In this study, we aim to delve deeper into relationships between $W_0$ and low-rank matrices $A$ and $B$ to further comprehend the behavior of LoRA. In particular, we analyze a conversion matrix that transform $W_0$ into low-rank matrices, which encapsulates information about the relationships. Our analysis reveals that the conversion matrices are similar across each layer. Inspired by these findings, we hypothesize that a single linear layer, which takes each layer's $W_0$ as input, can yield task-adapted low-rank matrices. To confirm this hypothesis, we devise a method named Conditionally Parameterized LoRA (CondLoRA) that updates initial weight matrices with low-rank matrices derived from a single linear layer. Our empirical results show that CondLoRA maintains a performance on par with LoRA, despite the fact that the trainable parameters of CondLoRA are fewer than those of LoRA. Therefore, we conclude that "a single linear layer yields task-adapted low-rank matrices."
With the emergence of numerous Large Language Models (LLM), the usage of such models in various Natural Language Processing (NLP) applications is increasing extensively. Counterspeech generation is one such key task where efforts are made to develop generative models by fine-tuning LLMs with hatespeech - counterspeech pairs, but none of these attempts explores the intrinsic properties of large language models in zero-shot settings. In this work, we present a comprehensive analysis of the performances of four LLMs namely GPT-2, DialoGPT, ChatGPT and FlanT5 in zero-shot settings for counterspeech generation, which is the first of its kind. For GPT-2 and DialoGPT, we further investigate the deviation in performance with respect to the sizes (small, medium, large) of the models. On the other hand, we propose three different prompting strategies for generating different types of counterspeech and analyse the impact of such strategies on the performance of the models. Our analysis shows that there is an improvement in generation quality for two datasets (17%), however the toxicity increase (25%) with increase in model size. Considering type of model, GPT-2 and FlanT5 models are significantly better in terms of counterspeech quality but also have high toxicity as compared to DialoGPT. ChatGPT are much better at generating counter speech than other models across all metrics. In terms of prompting, we find that our proposed strategies help in improving counter speech generation across all the models.
Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.