The qubit routing problem, also known as the swap minimization problem, is a (classical) combinatorial optimization problem that arises in the design of compilers of quantum programs. We study the qubit routing problem from the viewpoint of theoretical computer science, while most of the existing studies investigated the practical aspects. We concentrate on the linear nearest neighbor (LNN) architectures of quantum computers, in which the graph topology is a path. Our results are three-fold. (1) We prove that the qubit routing problem is NP-hard. (2) We give a fixed-parameter algorithm when the number of two-qubit gates is a parameter. (3) We give a polynomial-time algorithm when each qubit is involved in at most one two-qubit gate.
Recent work has observed an intriguing ''Neural Collapse'' phenomenon in well-trained neural networks, where the last-layer representations of training samples with the same label collapse into each other. This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution. We provide evidence that this is not a complete description, and that the apparent collapse hides important fine-grained structure in the representations. Specifically, even when representations apparently collapse, the small amount of remaining variation can still faithfully and accurately captures the intrinsic structure of input distribution. As an example, if we train on CIFAR-10 using only 5 coarse-grained labels (by combining two classes into one super-class) until convergence, we can reconstruct the original 10-class labels from the learned representations via unsupervised clustering. The reconstructed labels achieve $93\%$ accuracy on the CIFAR-10 test set, nearly matching the normal CIFAR-10 accuracy for the same architecture. We also provide an initial theoretical result showing the fine-grained representation structure in a simplified synthetic setting. Our results show concretely how the structure of input data can play a significant role in determining the fine-grained structure of neural representations, going beyond what Neural Collapse predicts.
Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or its variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time $O(n)$. Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a $O(n)$-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline.
We study the problem of extracting randomness from somewhere-random sources, and related combinatorial phenomena: partition analogues of Shearer's lemma on projections. A somewhere-random source is a tuple $(X_1, \ldots, X_t)$ of (possibly correlated) $\{0,1\}^n$-valued random variables $X_i$ where for some unknown $i \in [t]$, $X_i$ is guaranteed to be uniformly distributed. An $extracting$ $merger$ is a seeded device that takes a somewhere-random source as input and outputs nearly uniform random bits. We study the seed-length needed for extracting mergers with constant $t$ and constant error. We show: $\cdot$ Just like in the case of standard extractors, seedless extracting mergers with even just one output bit do not exist. $\cdot$ Unlike the case of standard extractors, it $is$ possible to have extracting mergers that output a constant number of bits using only constant seed. Furthermore, a random choice of merger does not work for this purpose! $\cdot$ Nevertheless, just like in the case of standard extractors, an extracting merger which gets most of the entropy out (namely, having $\Omega$ $(n)$ output bits) must have $\Omega$ $(\log n)$ seed. This is the main technical result of our work, and is proved by a second-moment strengthening of the graph-theoretic approach of Radhakrishnan and Ta-Shma to extractors. In contrast, seed-length/output-length tradeoffs for condensing mergers (where the output is only required to have high min-entropy), can be fully explained by using standard condensers. Inspired by such considerations, we also formulate a new and basic class of problems in combinatorics: partition analogues of Shearer's lemma. We show basic results in this direction; in particular, we prove that in any partition of the $3$-dimensional cube $[0,1]^3$ into two parts, one of the parts has an axis parallel $2$-dimensional projection of area at least $3/4$.
In this work, drawing inspiration from the type of noise present in real hardware, we study the output distribution of random quantum circuits under practical non-unital noise sources with constant noise rates. We show that even in the presence of unital sources like the depolarizing channel, the distribution, under the combined noise channel, never resembles a maximally entropic distribution at any depth. To show this, we prove that the output distribution of such circuits never anticoncentrates $\unicode{x2014}$ meaning it is never too "flat" $\unicode{x2014}$ regardless of the depth of the circuit. This is in stark contrast to the behavior of noiseless random quantum circuits or those with only unital noise, both of which anticoncentrate at sufficiently large depths. As consequences, our results have interesting algorithmic implications on both the hardness and easiness of noisy random circuit sampling, since anticoncentration is a critical property exploited by both state-of-the-art classical hardness and easiness results.
We prove existence of equal area partitions of the unit sphere via optimal transport methods, accompanied by diameter bounds written in terms of Monge--Kantorovich distances. This can be used to obtain bounds on the expectation of the maximum diameter of partition sets, when points are uniformly sampled from the sphere. An application to the computation of sliced Monge--Kantorovich distances is also presented.
The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradient-based methods requires the derivative of the matrix exponential evaluated at the CTMC's infinitesimal generator (rate) matrix. Motivated by the derivative's extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a "blessing of dimensionality" result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory Hamiltonian Monte Carlo for the analysis of the early spread of SARS-CoV-2 across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.
A set of classical or quantum states is equivalent to another one if there exists a pair of classical or quantum channels mapping either set to the other one. For dichotomies (pairs of states) this is closely connected to (classical or quantum) R\'enyi divergences (RD) and the data-processing inequality: If a RD remains unchanged when a channel is applied to the dichotomy, then there is a recovery channel mapping the image back to the initial dichotomy. Here, we prove for classical dichotomies that equality of the RDs alone is already sufficient for the existence of a channel in any of the two directions and discuss some applications. We conjecture that equality of the minimal quantum RDs is sufficient in the quantum case and prove it for special cases. We also show that neither the Petz quantum nor the maximal quantum RDs are sufficient. As a side-result of our techniques we obtain an infinite list of inequalities fulfilled by the classical, the Petz quantum, and the maximal quantum RDs. These inequalities are not true for the minimal quantum RDs.
Context. Algorithmic racism is the term used to describe the behavior of technological solutions that constrains users based on their ethnicity. Lately, various data-driven software systems have been reported to discriminate against Black people, either for the use of biased data sets or due to the prejudice propagated by software professionals in their code. As a result, Black people are experiencing disadvantages in accessing technology-based services, such as housing, banking, and law enforcement. Goal. This study aims to explore algorithmic racism from the perspective of software professionals. Method. A survey questionnaire was applied to explore the understanding of software practitioners on algorithmic racism, and data analysis was conducted using descriptive statistics and coding techniques. Results. We obtained answers from a sample of 73 software professionals discussing their understanding and perspectives on algorithmic racism in software development. Our results demonstrate that the effects of algorithmic racism are well-known among practitioners. However, there is no consensus on how the problem can be effectively addressed in software engineering. In this paper, some solutions to the problem are proposed based on the professionals' narratives. Conclusion. Combining technical and social strategies, including training on structural racism for software professionals, is the most promising way to address the algorithmic racism problem and its effects on the software solutions delivered to our society.
Graph neural networks (GNNs) have been a hot spot of recent research and are widely utilized in diverse applications. However, with the use of huger data and deeper models, an urgent demand is unsurprisingly made to accelerate GNNs for more efficient execution. In this paper, we provide a comprehensive survey on acceleration methods for GNNs from an algorithmic perspective. We first present a new taxonomy to classify existing acceleration methods into five categories. Based on the classification, we systematically discuss these methods and highlight their correlations. Next, we provide comparisons from aspects of the efficiency and characteristics of these methods. Finally, we suggest some promising prospects for future research.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.