The successive cancellation list decoder (SCL) is an efficient decoder for classical polar codes with low decoding error, approximating the maximum likelihood decoder (MLD) for small list sizes. Here we adapt the SCL to the task of decoding quantum polar codes and show that it inherits the high performance and low complexity of the classical case, and can approximate the quantum MLD for certain channels. We apply SCL decoding to a novel version of quantum polar codes based on the polarization weight (PW) method, which entirely avoids the need for small amounts of entanglement assistance apparent in previous quantum polar code constructions. When used to find the precise error pattern, the quantum SCL decoder (SCL-E) shows competitive performance with surface codes of similar size and low-density parity check codes of similar size and rate. The SCL decoder may instead be used to approximate the probability of each equivalence class of errors, and then choose the most likely class. We benchmark this class-oriented decoder (SCL-C) against the SCL-E decoder and find a noticeable improvement in the logical error rate. This improvement stems from the fact that the contributions from just the low-weight errors give a reasonable approximation to the error class probabilities. Both SCL-E and SCL-C maintain the complexity O(LN logN) of SCL for code size N and list size L. We also show that the list decoder can be used to gain insight into the weight distribution of the codes and how this impacts the effect of degenerate errors.
Complexity theory typically focuses on the difficulty of solving computational problems using classical inputs and outputs, even with a quantum computer. In the quantum world, it is natural to apply a different notion of complexity, namely the complexity of synthesizing quantum states. We investigate a state-synthesizing counterpart of the class NP, referred to as stateQMA, which is concerned with preparing certain quantum states through a polynomial-time quantum verifier with the aid of a single quantum message from an all-powerful but untrusted prover. This is a subclass of the class stateQIP recently introduced by Rosenthal and Yuen (ITCS 2022), which permits polynomially many interactions between the prover and the verifier. Our main result consists of error reduction of this class and its variants with an exponentially small gap or a bounded space, as well as how this class relates to other fundamental state synthesizing classes, i.e., states generated by uniform polynomial-time quantum circuits (stateBQP) and space-uniform polynomial-space quantum circuits (statePSPACE). Additionally, we demonstrate that stateQCMA is closed under perfect completeness. Our proof techniques are based on the quantum singular value transformation introduced by Gily\'en, Su, Low, and Wiebe (STOC 2019), and its adaption to achieve exponential precision with a bounded space.
Since the control of the Lipschitz constant has a great impact on the training stability, generalization, and robustness of neural networks, the estimation of this value is nowadays a real scientific challenge. In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration. Called the Gram iteration, our approach exhibits a superlinear convergence. First, we show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. Then, it proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches. Code is available at //github.com/blaisedelattre/lip4conv.
An implementation-efficient finite alphabet decoder for polar codes relying on coarsely quantized messages and low-complexity operations is proposed. Typically, finite alphabet decoding performs concatenated compression operations on the received channel messages to aggregate compact reliability information for error correction. These compression operations or mappings can be considered as lookup tables. For polar codes, the finite alphabet decoder design boils down to constructing lookup tables for the upper and lower branches of the building blocks within the code structure. A key challenge is to realize a hardware-friendly implementation of the lookup tables. This work uses the min-sum implementation for the upper branch lookup table and, as a novelty, a computational domain implementation for the lower branch lookup table. The computational domain approach drastically reduces the number of implementation parameters. Furthermore, a restriction to uniform quantization in the lower branch allows a very hardware-friendly compression via clipping and bit-shifting. Its behavior is close to the optimal non-uniform quantization, whose implementation would require multiple high-resolution threshold comparisons. Simulation results confirm excellent performance for the developed decoder. Unlike conventional fixed-point decoders, the proposed method involves an offline design that explicitly maximizes the preserved mutual information under coarse quantization.
Quantum error-correcting codes (QECCs) can eliminate the negative effects of quantum noise, the major obstacle to the execution of quantum algorithms. However, realizing practical quantum error correction (QEC) requires resolving many challenges to implement a high-performance real-time decoding system. Many decoding algorithms have been proposed and optimized in the past few decades, of which neural network (NNs) based solutions have drawn an increasing amount of attention due to their high efficiency. Unfortunately, previous works on neural decoders are still at an early stage and have only relatively simple architectures, which makes them unsuitable for practical QEC. In this work, we propose a scalable, fast, and programmable neural decoding system to meet the requirements of FTQEC for rotated surface codes (RSC). Firstly, we propose a hardware-efficient NN decoding algorithm with relatively low complexity and high accuracy. Secondly, we develop a customized hardware decoder with architectural optimizations to reduce latency. Thirdly, our proposed programmable architecture boosts the scalability and flexibility of the decoder by maximizing parallelism. Fourthly, we build an FPGA-based decoding system with integrated control hardware for evaluation. Our $L=5$ ($L$ is the code distance) decoder achieves an extremely low decoding latency of 197 ns, and the $L=7$ configuration also requires only 1.136 $\mu$s, both taking $2L$ rounds of syndrome measurements. The accuracy results of our system are close to minimum weight perfect matching (MWPM). Furthermore, our programmable architecture reduces hardware resource consumption by up to $3.0\times$ with only a small latency loss. We validated our approach in real-world scenarios by conducting a proof-of-concept benchmark with practical noise models, including one derived from experimental data gathered from physical hardware.
Coded caching (CC) schemes exploit the cumulative cache memory of the users and simple linear coding to turn unicast traffic (individual file requests) into a multicast transmission. For the originally proposed $K$-user single-server/single shared link network model, CC yields an $O(K)$ gain with respect to conventional uncoded caching with the same per-user memory. While several information-theoretic optimality results for a variety of problems and carefully crafted network topologies have been proved, the gains and suitability of CC for practical scenarios such as content streaming over existing wireless networks have not yet been fully demonstrated. In this work, we consider CC for on-demand video streaming over WLANs where multiple users are served simultaneously by multiple spatially distributed access points (AP). Users sequentially request video ``chunks". The CC scheme operates above the IP layer, leaving the underlying standard physical layer and MAC layer untouched. The cache placement is completely asynchronous and decentralized, and the users are placed at random over the network coverage area. For such a system, we consider the region of achievable long-term average delivery rate (defined as the number of video chunks delivered per unit of time) and study the per-user rate distribution under proportional fairness scheduling. We also consider reduced complexity scheduling strategies and compare them with standard state-of-the-art techniques such as conventional (uncoded) caching and collision avoidance by allocating APs on different sub-channels (i.e., frequency reuse).
Due to the high communication overhead when training machine learning models in a distributed environment, modern algorithms invariably rely on lossy communication compression. However, when untreated, the errors caused by compression propagate, and can lead to severely unstable behavior, including exponential divergence. Almost a decade ago, Seide et al [2014] proposed an error feedback (EF) mechanism, which we refer to as EF14, as an immensely effective heuristic for mitigating this issue. However, despite steady algorithmic and theoretical advances in the EF field in the last decade, our understanding is far from complete. In this work we address one of the most pressing issues. In particular, in the canonical nonconvex setting, all known variants of EF rely on very large batch sizes to converge, which can be prohibitive in practice. We propose a surprisingly simple fix which removes this issue both theoretically, and in practice: the application of Polyak's momentum to the latest incarnation of EF due to Richt\'{a}rik et al. [2021] known as EF21. Our algorithm, for which we coin the name EF21-SGDM, improves the communication and sample complexities of previous error feedback algorithms under standard smoothness and bounded variance assumptions, and does not require any further strong assumptions such as bounded gradient dissimilarity. Moreover, we propose a double momentum version of our method that improves the complexities even further. Our proof seems to be novel even when compression is removed from the method, and as such, our proof technique is of independent interest in the study of nonconvex stochastic optimization enriched with Polyak's momentum.
Missing values challenge the probabilistic wind power forecasting at both parameter estimation and operational forecasting stages. In this paper, we illustrate that we are allowed to estimate forecasting functions for each missing patterns conveniently, and propose an adaptive quantile regression model whose parameters can adapt to missing patterns. For that, we particularly design a feature extraction block within the quantile regression model, where parameters are set as a function of missingness pattern and only account for observed values. To avoidthe quantile-crossing phenomena, we design a multi-task model to ensure the monotonicity of quantiles, where higher quantiles are derived by the addition between lower quantiles and non-negative increments modeled by neural networks. The proposed approach is distribution-free and applicable to both missing-at-random and missing-not-at-random cases. Case studies demonstrate that the proposed approach achieves the state-of-the-art in terms of the continuous ranked probability score.
Let $A(n, d)$ denote the maximum size of a binary code of length $n$ and minimum Hamming distance $d$. Studying $A(n, d)$, including efforts to determine it as well to derive bounds on $A(n, d)$ for large $n$'s, is one of the most fundamental subjects in coding theory. In this paper, we explore new lower and upper bounds on $A(n, d)$ in the large-minimum distance regime, in particular, when $d = n/2 - \Omega(\sqrt{n})$. We first provide a new construction of cyclic codes, by carefully selecting specific roots in the binary extension field for the check polynomial, with length $n= 2^m -1$, distance $d \geq n/2 - 2^{c-1}\sqrt{n}$, and size $n^{c+1/2}$, for any $m\geq 4$ and any integer $c$ with $0 \leq c \leq m/2 - 1$. These code parameters are slightly worse than those of the Delsarte--Goethals (DG) codes that provide the previously known best lower bound in the large-minimum distance regime. However, using a similar and extended code construction technique we show a sequence of cyclic codes that improve upon DG codes and provide the best lower bound in a narrower range of the minimum distance $d$, in particular, when $d = n/2 - \Omega(n^{2/3})$. Furthermore, by leveraging a Fourier-analytic view of Delsarte's linear program, upper bounds on $A(n, n/2 - \rho\sqrt{n})$ with $\rho\in (0.5, 9.5)$ are obtained that scale polynomially in $n$. To the best of authors' knowledge, the upper bound due to Barg and Nogin \cite{barg2006spectral} is the only previously known upper bound that scale polynomially in $n$ in this regime. We numerically demonstrate that our upper bound improves upon the Barg-Nogin upper bound in the specified high-minimum distance regime.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.