A symmetry of a state $\vert \psi \rangle$ is a unitary operator of which $\vert \psi \rangle$ is an eigenvector. When $\vert \psi \rangle$ is an unknown state supplied by a black-box oracle, the state's symmetries provide key physical insight into the quantum system; symmetries also boost many crucial quantum learning techniques. In this paper, we develop a variational hybrid quantum-classical learning scheme to systematically probe for symmetries of $\vert \psi \rangle$ with no a priori assumptions about the state. This procedure can be used to learn various symmetries at the same time. In order to avoid re-learning already known symmetries, we introduce an interactive protocol with a classical deep neural net. The classical net thereby regularizes against repetitive findings and allows our algorithm to terminate empirically with all possible symmetries found. Our scheme can be implemented efficiently on average with non-local SWAP gates; we also give a less efficient algorithm with only local operations, which may be more appropriate for current noisy quantum devices. We simulate our algorithm on representative families of states, including cluster states and ground states of Rydberg and Ising Hamiltonians. We also find that the numerical query complexity scales well with qubit size.
The CONGEST and CONGEST-CLIQUE models have been carefully studied to represent situations where the communication bandwidth between processors in a network is severely limited. Messages of only $O(log(n))$ bits of information each may be sent between processors in each round. The quantum versions of these models allow the processors instead to communicate and compute with quantum bits under the same bandwidth limitations. This leads to the following natural research question: What problems can be solved more efficiently in these quantum models than in the classical ones? Building on existing work, we contribute to this question in two ways. Firstly, we present two algorithms in the Quantum CONGEST-CLIQUE model of distributed computation that succeed with high probability; one for producing an approximately optimal Steiner Tree, and one for producing an exact directed minimum spanning tree, each of which uses $\tilde{O}(n^{1/4})$ rounds of communication and $\tilde{O}(n^{9/4})$ messages, where $n$ is the number of nodes in the network. The algorithms thus achieve a lower asymptotic round and message complexity than any known algorithms in the classical CONGEST-CLIQUE model. At a high level, we achieve these results by combining classical algorithmic frameworks with quantum subroutines. An existing framework for using distributed version of Grover's search algorithm to accelerate triangle finding lies at the core of the asymptotic speedup. Secondly, we carefully characterize the constants and logarithmic factors involved in our algorithms as well as related algorithms, otherwise commonly obscured by $\tilde{O}$ notation. The analysis shows that some improvements are needed to render both our and existing related quantum and classical algorithms practical, as their asymptotic speedups only help for very large values of $n$.
This paper studies an multi-cluster over-the-air computation (AirComp) system, where an intelligent reflecting surface (IRS) assists the signal transmission from devices to an access point (AP). The clusters are activated to compute heterogeneous functions in a time-division manner. Specifically, two types of IRS beamforming (BF) schemes are proposed to reveal the performancecost tradeoff. One is the cluster-adaptive BF scheme, where each BF pattern is dedicated to one cluster, and the other is the dynamic BF scheme, which is applied to any number of IRS BF patterns. By deeply exploiting their inherent properties, both generic and lowcomplexity algorithms are proposed in which the IRS BF patterns, time and power resource allocation are jointly optimized. Numerical results show that IRS can significantly enhance the function computation performance, and demonstrate that the dynamic IRS BF scheme with half of the total IRS BF patterns can achieve near-optimal performance which can be deemed as a cost-efficient approach for IRS-aided multi-cluster AirComp systems.
Solving symbolic reasoning problems that require compositionality and systematicity is considered one of the key ingredients of human intelligence. However, symbolic reasoning is still a great challenge for deep learning models, which often cannot generalize the reasoning pattern to out-of-distribution test cases. In this work, we propose a hybrid system capable of solving arithmetic problems that require compositional and systematic reasoning over sequences of symbols. The model acquires such a skill by learning appropriate substitution rules, which are applied iteratively to the input string until the expression is completely resolved. We show that the proposed system can accurately solve nested arithmetical expressions even when trained only on a subset including the simplest cases, significantly outperforming both a sequence-to-sequence model trained end-to-end and a state-of-the-art large language model.
Quantum Computing (QC) promises computational speedup over classic computing for solving some complex problems. However, noise exists in current and near-term quantum computers. Quantum software testing (for gaining confidence in quantum software's correctness) is inevitably impacted by noise, to the extent that it is impossible to know if a test case failed due to noise or real faults. Existing testing techniques test quantum programs without considering noise, i.e., by executing tests on ideal quantum computer simulators. Consequently, they are not directly applicable to testing quantum software on real QC hardware or noisy simulators. To this end, we propose a noise-aware approach (named QOIN) to alleviate the noise effect on test results of quantum programs. QOIN employs machine learning techniques (e.g., transfer learning) to learn the noise effect of a quantum computer and filter it from a quantum program's outputs. Such filtered outputs are then used as the input to perform test case assessments (determining the passing or failing of a test case execution against a test oracle). We evaluated QOIN on IBM's 23 noise models with nine real-world quantum programs and 1000 artificial quantum programs. We also generated faulty versions of these programs to check if a failing test case execution can be determined under noise. Results show that QOIN can reduce the noise effect by more than $80\%$. To check QOIN's effectiveness for quantum software testing, we used an existing test oracle for quantum software testing. The results showed that the F1-score of the test oracle was improved on average by $82\%$ for six real-world programs and by $75\%$ for 800 artificial programs, demonstrating that QOIN can effectively learn noise patterns and enable noise-aware quantum software testing.
Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, significant recent research has been directed toward co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models also rely on some form of model parallelism. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.
Deep Learning algorithms have achieved the state-of-the-art performance for Image Classification and have been used even in security-critical applications, such as biometric recognition systems and self-driving cars. However, recent works have shown those algorithms, which can even surpass the human capabilities, are vulnerable to adversarial examples. In Computer Vision, adversarial examples are images containing subtle perturbations generated by malicious optimization algorithms in order to fool classifiers. As an attempt to mitigate these vulnerabilities, numerous countermeasures have been constantly proposed in literature. Nevertheless, devising an efficient defense mechanism has proven to be a difficult task, since many approaches have already shown to be ineffective to adaptive attackers. Thus, this self-containing paper aims to provide all readerships with a review of the latest research progress on Adversarial Machine Learning in Image Classification, however with a defender's perspective. Here, novel taxonomies for categorizing adversarial attacks and defenses are introduced and discussions about the existence of adversarial examples are provided. Further, in contrast to exisiting surveys, it is also given relevant guidance that should be taken into consideration by researchers when devising and evaluating defenses. Finally, based on the reviewed literature, it is discussed some promising paths for future research.
Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivariate time series forecasting is that its variables depend on one another but, upon looking closely, it is fair to say that existing methods fail to fully exploit latent spatial dependencies between pairs of variables. In recent years, meanwhile, graph neural networks (GNNs) have shown high capability in handling relational dependencies. GNNs require well-defined graph structures for information propagation which means they cannot be applied directly for multivariate time series where the dependencies are not known in advance. In this paper, we propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module, into which external knowledge like variable attributes can be easily integrated. A novel mix-hop propagation layer and a dilated inception layer are further proposed to capture the spatial and temporal dependencies within the time series. The graph learning, graph convolution, and temporal convolution modules are jointly learned in an end-to-end framework. Experimental results show that our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets and achieves on-par performance with other approaches on two traffic datasets which provide extra structural information.
There has been appreciable progress in unsupervised network representation learning (UNRL) approaches over graphs recently with flexible random-walk approaches, new optimization objectives and deep architectures. However, there is no common ground for systematic comparison of embeddings to understand their behavior for different graphs and tasks. In this paper we theoretically group different approaches under a unifying framework and empirically investigate the effectiveness of different network representation methods. In particular, we argue that most of the UNRL approaches either explicitly or implicit model and exploit context information of a node. Consequently, we propose a framework that casts a variety of approaches -- random walk based, matrix factorization and deep learning based -- into a unified context-based optimization function. We systematically group the methods based on their similarities and differences. We study the differences among these methods in detail which we later use to explain their performance differences (on downstream tasks). We conduct a large-scale empirical study considering 9 popular and recent UNRL techniques and 11 real-world datasets with varying structural properties and two common tasks -- node classification and link prediction. We find that there is no single method that is a clear winner and that the choice of a suitable method is dictated by certain properties of the embedding methods, task and structural properties of the underlying graph. In addition we also report the common pitfalls in evaluation of UNRL methods and come up with suggestions for experimental design and interpretation of results.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.