Generalization is the ability of quantum machine learning models to make accurate predictions on new data by learning from training data. Here, we introduce the data quantum Fisher information metric (DQFIM) to determine when a model can generalize. For variational learning of unitaries, the DQFIM quantifies the amount of circuit parameters and training data needed to successfully train and generalize. We apply the DQFIM to explain when a constant number of training states and polynomial number of parameters are sufficient for generalization. Further, we can improve generalization by removing symmetries from training data. Finally, we show that out-of-distribution generalization, where training and testing data are drawn from different data distributions, can be better than using the same distribution. Our work opens up new approaches to improve generalization in quantum machine learning.
Holistic benchmarks for quantum computers are essential for testing and summarizing the performance of quantum hardware. However, holistic benchmarks -- such as algorithmic or randomized benchmarks -- typically do not predict a processor's performance on circuits outside the benchmark's necessarily very limited set of test circuits. In this paper, we introduce a general framework for building predictive models from benchmarking data using capability models. Capability models can be fit to many kinds of benchmarking data and used for a variety of predictive tasks. We demonstrate this flexibility with two case studies. In the first case study, we predict circuit (i) process fidelities and (ii) success probabilities by fitting error rates models to two kinds of volumetric benchmarking data. Error rates models are simple, yet versatile capability models which assign effective error rates to individual gates, or more general circuit components. In the second case study, we construct a capability model for predicting circuit success probabilities by applying transfer learning to ResNet50, a neural network trained for image classification. Our case studies use data from cloud-accessible quantum computers and simulations of noisy quantum computers.
The paper presents a technique for constructing noisy data structures called a walking tree. We apply it for a Red-Black tree (an implementation of a Self-Balanced Binary Search Tree) and a segment tree. We obtain the same complexity of the main operations for these data structures as in the case without noise (asymptotically). We present several applications of the data structures for quantum algorithms. Finally, we suggest new quantum solution for strings sorting problem and show the lower bound. The upper and lower bounds are the same up to a log factor. At the same time, it is more effective than classical counterparts.
Given data on choices made by consumers for different assortments, a key challenge is to develop parsimonious models that describe and predict consumer choice behavior. One such choice model is the marginal distribution model, which requires only the specification of the marginal distributions of the random utilities of the alternatives to explain choice data. In this paper, we develop an exact characterization of the set of choice probabilities that can be represented by this model and show that verifying the consistency of choice probability data with this model is equivalent to solving a polynomial-size linear program. We extend these results to the case where alternatives are grouped based on the marginal distribution of their utilities. Based on the representable conditions, we find the best-fit to the choice data that reduces to solving a mixed integer convex program and develop novel prediction intervals for the choice probabilities of unseen assortments. Our numerical results show that the marginal distribution model provides much better representational power, estimation performance, and prediction accuracy than multinomial logit and much better computational performance than the random utility model.
Solving decision problems in complex, stochastic environments is often achieved by estimating the expected outcome of decisions via Monte Carlo sampling. However, sampling may overlook rare, but important events, which can severely impact the decision making process. We present a method in which a Normalizing Flow generative model is trained to simulate samples directly from a conditional distribution given that a rare event occurs. By utilizing Coupling Flows, our model can, in principle, approximate any sampling distribution arbitrarily well. By combining the approximation method with Importance Sampling, highly accurate estimates of complicated integrals and expectations can be obtained. We include several examples to demonstrate how the method can be used for efficient sampling and estimation, even in high-dimensional and rare-event settings. We illustrate that by simulating directly from a rare-event distribution significant insight can be gained into the way rare events happen.
The paper considers the distribution of a general linear combination of central and non-central chi-square random variables by exploring the branch cut regions that appear in the standard Laplace inversion process. Due to the original interest from the directional statistics, the focus of this paper is on the density function of such distributions and not on their cumulative distribution function. In fact, our results confirm that the latter is a special case of the former. Our approach provides new insight by generating alternative characterizations of the probability density function in terms of a finite number of feasible univariate integrals. In particular, the central cases seem to allow an interesting representation in terms of the branch cuts, while general degrees of freedom and non-centrality can be easily adopted using recursive differentiation. Numerical results confirm that the proposed approach works well while more transparency and therefore easier control in the accuracy is ensured.
We explore the analytic properties of the density function $ h(x;\gamma,\alpha) $, $ x \in (0,\infty) $, $ \gamma > 0 $, $ 0 < \alpha < 1 $ which arises from the domain of attraction problem for a statistic interpolating between the supremum and sum of random variables. The parameter $ \alpha $ controls the interpolation between these two cases, while $ \gamma $ parametrises the type of extreme value distribution from which the underlying random variables are drawn from. For $ \alpha = 0 $ the Fr\'echet density applies, whereas for $ \alpha = 1 $ we identify a particular Fox H-function, which are a natural extension of hypergeometric functions into the realm of fractional calculus. In contrast for intermediate $ \alpha $ an entirely new function appears, which is not one of the extensions to the hypergeometric function considered to date. We derive series, integral and continued fraction representations of this latter function.
Although numerous clustering algorithms have been developed, many existing methods still leverage k-means technique to detect clusters of data points. However, the performance of k-means heavily depends on the estimation of centers of clusters, which is very difficult to achieve an optimal solution. Another major drawback is that it is sensitive to noise and outlier data. In this paper, from manifold learning perspective, we rethink k-means and present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter such that distance between any two data points in the same clusters equals to a small constant, while increasing the distance between other data pairs from different clusters. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization on the 3rd-order tensor which consists of indicator matrices of different views. Finally, an efficient alternating algorithm is derived to optimize our model. The constructed sequence was proved to converge to the stationary KKT point. Extensive experimental results indicate the superiority of our proposed method.
Monocular 3D human pose and shape estimation is an ill-posed problem since multiple 3D solutions can explain a 2D image of a subject. Recent approaches predict a probability distribution over plausible 3D pose and shape parameters conditioned on the image. We show that these approaches exhibit a trade-off between three key properties: (i) accuracy - the likelihood of the ground-truth 3D solution under the predicted distribution, (ii) sample-input consistency - the extent to which 3D samples from the predicted distribution match the visible 2D image evidence, and (iii) sample diversity - the range of plausible 3D solutions modelled by the predicted distribution. Our method, HuManiFlow, predicts simultaneously accurate, consistent and diverse distributions. We use the human kinematic tree to factorise full body pose into ancestor-conditioned per-body-part pose distributions in an autoregressive manner. Per-body-part distributions are implemented using normalising flows that respect the manifold structure of SO(3), the Lie group of per-body-part poses. We show that ill-posed, but ubiquitous, 3D point estimate losses reduce sample diversity, and employ only probabilistic training losses. Code is available at: //github.com/akashsengupta1997/HuManiFlow.
We develop the first active learning method in the predict-then-optimize framework. Specifically, we develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream, where the labels correspond to the parameters of an optimization model for decision-making. Our active learning method is the first to be directly informed by the decision error induced by the predicted parameters, which is referred to as the Smart Predict-then-Optimize (SPO) loss. Motivated by the structure of the SPO loss, our algorithm adopts a margin-based criterion utilizing the concept of distance to degeneracy and minimizes a tractable surrogate of the SPO loss on the collected data. In particular, we develop an efficient active learning algorithm with both hard and soft rejection variants, each with theoretical excess risk (i.e., generalization) guarantees. We further derive bounds on the label complexity, which refers to the number of samples whose labels are acquired to achieve a desired small level of SPO risk. Under some natural low-noise conditions, we show that these bounds can be better than the naive supervised learning approach that labels all samples. Furthermore, when using the SPO+ loss function, a specialized surrogate of the SPO loss, we derive a significantly smaller label complexity under separability conditions. We also present numerical evidence showing the practical value of our proposed algorithms in the settings of personalized pricing and the shortest path problem.
In this paper, we study the identifiability and the estimation of the parameters of a copula-based multivariate model when the margins are unknown and are arbitrary, meaning that they can be continuous, discrete, or mixtures of continuous and discrete. When at least one margin is not continuous, the range of values determining the copula is not the entire unit square and this situation could lead to identifiability issues that are discussed here. Next, we propose estimation methods when the margins are unknown and arbitrary, using pseudo log-likelihood adapted to the case of discontinuities. In view of applications to large data sets, we also propose a pairwise composite pseudo log-likelihood. These methodologies can also be easily modified to cover the case of parametric margins. One of the main theoretical result is an extension to arbitrary distributions of known convergence results of rank-based statistics when the margins are continuous. As a by-product, under smoothness assumptions, we obtain that the asymptotic distribution of the estimation errors of our estimators are Gaussian. Finally, numerical experiments are presented to assess the finite sample performance of the estimators, and the usefulness of the proposed methodologies is illustrated with a copula-based regression model for hydrological data. The proposed estimation is implemented in the R package CopulaInference, together with a function for checking identifiability.