Quantum multiplication is a fundamental operation in quantum computing. Most existing quantum multipliers require $O(n)$ qubits to multiply two $n$-bit integer numbers, limiting their applicability to multiply large integer numbers using near-term quantum computers. In this paper, we propose the Quantum Multiplier Based on Exponent Adder (QMbead), a new approach that addresses this limitation by requiring just $\log_2(n)$ qubits to multiply two $n$-bit integer numbers. QMbead uses a so-called exponent encoding to represent two multiplicands as two superposition states, respectively, and then employs a quantum adder to obtain the sum of these two superposition states, and subsequently measures the outputs of the quantum adder to calculate the product of the multiplicands. This paper presents two types of quantum adders based on the quantum Fourier transform (QFT) for use in QMbead. The circuit depth of QMbead is determined by the chosen quantum adder, being $O(\log^2 n)$ when using the two QFT-based adders. If leveraging a logarithmic-depth quantum adder, the time complexity of QMbead is $O(n \log n)$, identical to that of the fastest classical multiplication algorithm, Harvey-Hoeven algorithm. Interestingly, QMbead maintains an advantage over the Harvey-Hoeven algorithm, given that the latter is only suitable for excessively large numbers, whereas QMbead is valid for both small and large numbers. The multiplicand can be either an integer or a decimal number. QMbead has been successfully implemented on quantum simulators to compute products with a bit length of up to 273 bits using only 17 qubits. This establishes QMbead as an efficient solution for multiplying large integer or decimal numbers with many bits.
We study the sample complexity of learning an $\varepsilon$-optimal policy in an average-reward Markov decision process (MDP) under a generative model. We establish the complexity bound $\widetilde{O}\left(SA\frac{H}{\varepsilon^2} \right)$, where $H$ is the span of the bias function of the optimal policy and $SA$ is the cardinality of the state-action space. Our result is the first that is minimax optimal (up to log factors) in all parameters $S,A,H$ and $\varepsilon$, improving on existing work that either assumes uniformly bounded mixing times for all policies or has suboptimal dependence on the parameters. Our result is based on reducing the average-reward MDP to a discounted MDP. To establish the optimality of this reduction, we develop improved bounds for $\gamma$-discounted MDPs, showing that $\widetilde{O}\left(SA\frac{H}{(1-\gamma)^2\varepsilon^2} \right)$ samples suffice to learn a $\varepsilon$-optimal policy in weakly communicating MDPs under the regime that $\gamma \geq 1 - \frac{1}{H}$, circumventing the well-known lower bound of $\widetilde{\Omega}\left(SA\frac{1}{(1-\gamma)^3\varepsilon^2} \right)$ for general $\gamma$-discounted MDPs. Our analysis develops upper bounds on certain instance-dependent variance parameters in terms of the span parameter. These bounds are tighter than those based on the mixing time or diameter of the MDP and may be of broader use.
The functional linear model is an important extension of the classical regression model allowing for scalar responses to be modeled as functions of stochastic processes. Yet, despite the usefulness and popularity of the functional linear model in recent years, most treatments, theoretical and practical alike, suffer either from (i) lack of resistance towards the many types of anomalies one may encounter with functional data or (ii) biases resulting from the use of discretely sampled functional data instead of completely observed data. To address these deficiencies, this paper introduces and studies the first class of robust functional regression estimators for partially observed functional data. The proposed broad class of estimators is based on thin-plate splines with a novel computationally efficient quadratic penalty, is easily implementable and enjoys good theoretical properties under weak assumptions. We show that, in the incomplete data setting, both the sample size and discretization error of the processes determine the asymptotic rate of convergence of functional regression estimators and the latter cannot be ignored. These theoretical properties remain valid even with multi-dimensional random fields acting as predictors and random smoothing parameters. The effectiveness of the proposed class of estimators in practice is demonstrated by means of a simulation study and a real-data example.
Suppose we are given an $n$-node, $m$-edge input graph $G$, and the goal is to compute a spanning subgraph $H$ on $O(n)$ edges. This can be achieved in linear $O(m + n)$ time via breadth-first search. But can we hope for \emph{sublinear} runtime in some range of parameters? If the goal is to return $H$ as an adjacency list, there are simple lower bounds showing that $\Omega(m + n)$ runtime is necessary. If the goal is to return $H$ as an adjacency matrix, then we need $\Omega(n^2)$ time just to write down the entries of the output matrix. However, we show that neither of these lower bounds still apply if instead the goal is to return $H$ as an \emph{implicit} adjacency matrix, which we call an \emph{adjacency oracle}. An adjacency oracle is a data structure that gives a user the illusion that an adjacency matrix has been computed: it accepts edge queries $(u, v)$, and it returns in near-constant time a bit indicating whether $(u, v) \in E(H)$. Our main result is that one can construct an adjacency oracle for a spanning subgraph on at most $(1+\varepsilon)n$ edges, in $\tilde{O}(n \varepsilon^{-1})$ time, and that this construction time is near-optimal. Additional results include constructions of adjacency oracles for $k$-connectivity certificates and spanners, which are similarly sublinear on dense-enough input graphs. Our adjacency oracles are closely related to Local Computation Algorithms (LCAs) for graph sparsifiers; they can be viewed as LCAs with some computation moved to a preprocessing step, in order to speed up queries. Our oracles imply the first Local algorithm for computing sparse spanning subgraphs of general input graphs in $\tilde{O}(n)$ query time, which works by constructing our adjacency oracle, querying it once, and then throwing the rest of the oracle away. This addresses an open problem of Rubinfeld [CSR '17].
We define a symbolic execution framework QSE for quantum programs by integrating symbolic variables into quantum states and the outcomes of quantum measurements. The soundness theorem of QSE is proved. We further introduce symbolic stabilizer states, which facilitate the efficient analysis of quantum error correction programs. Within the QSE framework, we can use symbolic expressions to characterize the possible adversarial errors in quantum error correction, providing a significant improvement over existing methods that rely on sampling with simulators. We implement QSE with the support of symbolic stabilizer states in a prototype tool named QuantumSE.jl. With experiments on representative quantum error correction codes, including quantum repetition codes, Kitaev's toric codes, and quantum Tanner codes, we demonstrate the efficiency of QuantumSE.jl for debugging quantum error correction programs with over 1000 qubits. In addition, as a by-product of QSE, QuantumSE.jl's sampling functionality for stabilizer circuits also outperforms the state-of-the-art stabilizer simulator, Google's Stim, in the experiments.
Given a graph $G$, a community structure $\mathcal{C}$, and a budget $k$, the fair influence maximization problem aims to select a seed set $S$ ($|S|\leq k$) that maximizes the influence spread while narrowing the influence gap between different communities. While various fairness notions exist, the welfare fairness notion, which balances fairness level and influence spread, has shown promising effectiveness. However, the lack of efficient algorithms for optimizing the welfare fairness objective function restricts its application to small-scale networks with only a few hundred nodes. In this paper, we adopt the objective function of welfare fairness to maximize the exponentially weighted summation over the influenced fraction of all communities. We first introduce an unbiased estimator for the fractional power of the arithmetic mean. Then, by adapting the reverse influence sampling (RIS) approach, we convert the optimization problem to a weighted maximum coverage problem. We also analyze the number of reverse reachable sets needed to approximate the fair influence at a high probability. Further, we present an efficient algorithm that guarantees $1-1/e - \varepsilon$ approximation.
In dynamic submodular maximization, the goal is to maintain a high-value solution over a sequence of element insertions and deletions with a fast update time. Motivated by large-scale applications and the fact that dynamic data often exhibits patterns, we ask the following question: can predictions be used to accelerate the update time of dynamic submodular maximization algorithms? We consider the model for dynamic algorithms with predictions where predictions regarding the insertion and deletion times of elements can be used for preprocessing. Our main result is an algorithm with an $O(poly(\log \eta, \log w, \log k))$ amortized update time over the sequence of updates that achieves a $1/2 - \epsilon$ approximation in expectation for dynamic monotone submodular maximization under a cardinality constraint $k$, where the prediction error $\eta$ is the number of elements that are not inserted and deleted within $w$ time steps of their predicted insertion and deletion times. This amortized update time is independent of the length of the stream and instead depends on the prediction error.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
It is always well believed that modeling relationships between objects would be helpful for representing and eventually describing an image. Nevertheless, there has not been evidence in support of the idea on image description generation. In this paper, we introduce a new design to explore the connections between objects for image captioning under the umbrella of attention-based encoder-decoder framework. Specifically, we present Graph Convolutional Networks plus Long Short-Term Memory (dubbed as GCN-LSTM) architecture that novelly integrates both semantic and spatial object relationships into image encoder. Technically, we build graphs over the detected objects in an image based on their spatial and semantic connections. The representations of each region proposed on objects are then refined by leveraging graph structure through GCN. With the learnt region-level features, our GCN-LSTM capitalizes on LSTM-based captioning framework with attention mechanism for sentence generation. Extensive experiments are conducted on COCO image captioning dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, GCN-LSTM increases CIDEr-D performance from 120.1% to 128.7% on COCO testing set.