The prospect of achieving quantum advantage with Quantum Neural Networks (QNNs) is exciting. Understanding how QNN properties (e.g., the number of parameters $M$) affect the loss landscape is crucial to the design of scalable QNN architectures. Here, we rigorously analyze the overparametrization phenomenon in QNNs with periodic structure. We define overparametrization as the regime where the QNN has more than a critical number of parameters $M_c$ that allows it to explore all relevant directions in state space. Our main results show that the dimension of the Lie algebra obtained from the generators of the QNN is an upper bound for $M_c$, and for the maximal rank that the quantum Fisher information and Hessian matrices can reach. Underparametrized QNNs have spurious local minima in the loss landscape that start disappearing when $M\geq M_c$. Thus, the overparametrization onset corresponds to a computational phase transition where the QNN trainability is greatly improved by a more favorable landscape. We then connect the notion of overparametrization to the QNN capacity, so that when a QNN is overparametrized, its capacity achieves its maximum possible value. We run numerical simulations for eigensolver, compilation, and autoencoding applications to showcase the overparametrization computational phase transition. We note that our results also apply to variational quantum algorithms and quantum optimal control.
A locally testable code is an error-correcting code that admits very efficient probabilistic tests of membership. Tensor codes provide a simple family of combinatorial constructions of locally testable codes that generalize the family of Reed-Muller codes. The natural test for tensor codes, the axis-parallel line vs. point test, plays an essential role in constructions of probabilistically checkable proofs. We analyze the axis-parallel line vs. point test as a two-prover game and show that the test is sound against quantum provers sharing entanglement. Our result implies the quantum-soundness of the low individual degree test, which is an essential component of the MIP* = RE theorem. Our proof also generalizes to the infinite-dimensional commuting-operator model of quantum provers.
Optimization of parameterized quantum circuits is indispensable for applications of near-term quantum devices to computational tasks with variational quantum algorithms (VQAs). However, the existing optimization algorithms for VQAs require an excessive number of quantum-measurement shots in estimating expectation values of observables or iterating updates of circuit parameters, whose cost has been a crucial obstacle for practical use. To address this problem, we develop an efficient framework, \textit{stochastic gradient line Bayesian optimization} (SGLBO), for the circuit optimization with fewer measurement shots. The SGLBO reduces the cost of measurement shots by estimating an appropriate direction of updating the parameters based on stochastic gradient descent (SGD) and further by utilizing Bayesian optimization (BO) to estimate the optimal step size in each iteration of the SGD. We formulate an adaptive measurement-shot strategy to achieve the optimization feasibly without relying on precise expectation-value estimation and many iterations; moreover, we show that a technique of suffix averaging can significantly reduce the effect of statistical and hardware noise in the optimization for the VQAs. Our numerical simulation demonstrates that the SGLBO augmented with these techniques can drastically reduce the required number of measurement shots, improve the accuracy in the optimization, and enhance the robustness against the noise compared to other state-of-art optimizers in representative tasks for the VQAs. These results establish a framework of quantum-circuit optimizers integrating two different optimization approaches, SGD and BO, to reduce the cost of measurement shots significantly.
As one of the most important function in quantum networks, entanglement routing, i.e., how to efficiently establish remote entanglement connection between two arbitrary quantum nodes, becomes a critical problem that is worth to be studied. However, the entanglement fidelity, which can be regarded as the most important metric to evaluate the quality of connection, is rarely considered in existing works. Thus, in this paper, we propose purification-enabled entanglement routing designs to provide fidelity guarantee for multiple Source-Destination (S-D) pairs in quantum networks. To find the routing path with minimum entangled pair cost, we first design an iterative routing algorithm for single S-D pair, called Q-PATH, to find the optimal solution. After that, due to the relatively high computational complexity, we also design a low-complexity routing algorithm by using an extended dijkstra algorithm, called Q-LEAP, to efficiently find the near-optimal solution. Based on these two algorithms, we design a utility metric to solve the resource allocation problem for multiple S-D pairs, and further design a greedy-based algorithm considering resource allocation and re-routing process for routing requests from multiple S-D pairs. To verify the effectiveness and superiority of the proposed algorithms, extensive simulations are conducted compared to the existing purification-enabled routing algorithm. The simulation results show that, compared with the traditional routing scheme, the proposed algorithms not only can provide fidelity-guaranteed routing solutions under various scenarios, but also has superior performance in terms of throughput, fidelity of end-to-end entanglement connection, and resource utilization ratio.
We investigate the equilibrium behavior for the decentralized cheap talk problem for real random variables and quadratic cost criteria in which an encoder and a decoder have misaligned objective functions. In prior work, it has been shown that the number of bins in any equilibrium has to be countable, generalizing a classical result due to Crawford and Sobel who considered sources with density supported on $[0,1]$. In this paper, we first refine this result in the context of log-concave sources. For sources with two-sided unbounded support, we prove that, for any finite number of bins, there exists a unique equilibrium. In contrast, for sources with semi-unbounded support, there may be a finite upper bound on the number of bins in equilibrium depending on certain conditions stated explicitly. Moreover, we prove that for log-concave sources, the expected costs of the encoder and the decoder in equilibrium decrease as the number of bins increases. Furthermore, for strictly log-concave sources with two-sided unbounded support, we prove convergence to the unique equilibrium under best response dynamics which starts with a given number of bins, making a connection with the classical theory of optimal quantization and convergence results of Lloyd's method. In addition, we consider more general sources which satisfy certain assumptions on the tail(s) of the distribution and we show that there exist equilibria with infinitely many bins for sources with two-sided unbounded support. Further explicit characterizations are provided for sources with exponential, Gaussian, and compactly-supported probability distributions.
A basic problem for constant dimension codes is to determine the maximum possible size $A_q(n,d;k)$ of a set of $k$-dimensional subspaces in $\mathbb{F}_q^n$, called codewords, such that the subspace distance satisfies $d_S(U,W):=2k-2\dim(U\cap W)\ge d$ for all pairs of different codewords $U$, $W$. Constant dimension codes have applications in e.g.\ random linear network coding, cryptography, and distributed storage. Bounds for $A_q(n,d;k)$ are the topic of many recent research papers. Providing a general framework we survey many of the latest constructions and show up the potential for further improvements. As examples we give improved constructions for the cases $A_q(10,4;5)$, $A_q(11,4;4)$, $A_q(12,6;6)$, and $A_q(15,4;4)$. We also derive general upper bounds for subcodes arising in those constructions.
A fibration of graphs is an homomorphism that is a local isomorphism of in-neighbourhoods, much in the same way a covering projection is a local isomorphism of neighbourhoods. Recently, it has been shown that graph fibrations are useful tools to uncover symmetries and synchronization patterns in biological networks ranging from gene, protein,and metabolic networks to the brain. However, the inherent incompleteness and disordered nature of biological data precludes the application of the definition of fibration as it is; as a consequence, also the currently known algorithms to identify fibrations fail in these domains. In this paper, we introduce and develop systematically the theory of quasifibrations which attempts to capture more realistic patterns of almost-synchronization of units in biological networks. We provide an algorithmic solution to the problem of finding quasifibrations in networks where the existence of missing links and variability across samples preclude the identification of perfect symmetries in the connectivity structure. We test the algorithm against other strategies to repair missing links in incomplete networks using real connectome data and synthetic networks. Quasifibrations can be applied to reconstruct any incomplete network structure characterized by underlying symmetries and almost synchronized clusters.
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models -- from simple linear models to deep neural networks -- have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance. Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.
We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs), a structured network with MLP modules, have shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. But, they can provably learn a linear target function when the training distribution is sufficiently diverse. Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features. Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
Quantum machine learning is expected to be one of the first potential general-purpose applications of near-term quantum devices. A major recent breakthrough in classical machine learning is the notion of generative adversarial training, where the gradients of a discriminator model are used to train a separate generative model. In this work and a companion paper, we extend adversarial training to the quantum domain and show how to construct generative adversarial networks using quantum circuits. Furthermore, we also show how to compute gradients -- a key element in generative adversarial network training -- using another quantum circuit. We give an example of a simple practical circuit ansatz to parametrize quantum machine learning models and perform a simple numerical experiment to demonstrate that quantum generative adversarial networks can be trained successfully.