Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein kernel thinning (SKT) returns $\sqrt{n}$ equal-weighted points with $\widetilde{O}(n^{-1/2})$ maximum mean discrepancy (MMD) to $\mathbb{P}$. For larger-scale compression tasks, low-rank SKT achieves the same feat in sub-quadratic time using an adaptive low-rank debiasing procedure that may be of independent interest. For downstream tasks that support simplex or constant-preserving weights, Stein recombination and Stein Cholesky achieve even greater parsimony, matching the guarantees of SKT with as few as $\text{poly-log}(n)$ weighted points. Underlying these advances are new guarantees for the quality of simplex-weighted coresets, the spectral decay of kernel matrices, and the covering numbers of Stein kernel Hilbert spaces. In our experiments, our techniques provide succinct and accurate posterior summaries while overcoming biases due to burn-in, approximate Markov chain Monte Carlo, and tempering.
We show that each set of $n\ge 2$ points in the plane in general position has a straight-line matching with at least $(5n+1)/27$ edges whose segments form a connected set, and such a matching can be computed in $O(n \log n)$ time. As an upper bound, we show that for some planar point sets in general position the largest matching whose segments form a connected set has $\lceil \frac{n-1}{3}\rceil$ edges. We also consider a colored version, where each edge of the matching should connect points with different colors.
We describe a formulation of multi-agents operating within a Cyber-Physical System, resulting in collaborative or adversarial games. We show that the non-determinism inherent in the communication medium between agents and the underlying physical environment gives rise to environment evolution that is a probabilistic function of agents' strategies. We name these emergent properties Cyber Physical Games and study its properties. We present an algorithmic model that determines the most likely system evolution, approximating Cyber Physical Games through Probabilistic Finite State Automata, and evaluate it on collaborative and adversarial versions of the Iterated Boolean Game, comparing theoretical results with simulated ones. Results support the validity of the proposed model, and suggest several required research directions to continue evolving our understanding of Cyber Physical System, as well as how to best design agents that must operate within such environments.
The exact common information between a set of random variables $X_1,...,X_n$ is defined as the minimum entropy of a shared random variable that allows for the exact distributive simulation of $X_1,...,X_n$. It has been established that, in certain instances, infinite entropy is required to achieve distributive simulation, suggesting that continuous random variables may be needed in such scenarios. However, to date, there is no established metric to characterize such cases. In this paper, we propose the concept of Common Information Dimension (CID) with respect to a given class of functions $\mathcal{F}$, defined as the minimum dimension of a random variable $W$ required to distributively simulate a set of random variables $X_1,...,X_n$, such that $W$ can be expressed as a function of $X_1,\cdots,X_n$ using a member of $\mathcal{F}$. Our main contributions include the computation of the common information dimension for jointly Gaussian random vectors in a closed form, with $\mathcal{F}$ being the linear functions class.
In this paper we study the problem of minimizing a submodular function $f : 2^V \rightarrow \mathbb{R}$ that is guaranteed to have a $k$-sparse minimizer. We give a deterministic algorithm that computes an additive $\epsilon$-approximate minimizer of such $f$ in $\widetilde{O}(\mathsf{poly}(k) \log(|f|/\epsilon))$ parallel depth using a polynomial number of queries to an evaluation oracle of $f$, where $|f| = \max_{S \subseteq V} |f(S)|$. Further, we give a randomized algorithm that computes an exact minimizer of $f$ with high probability using $\widetilde{O}(|V| \cdot \mathsf{poly}(k))$ queries and polynomial time. When $k = \widetilde{O}(1)$, our algorithms use either nearly-constant parallel depth or a nearly-linear number of evaluation oracle queries. All previous algorithms for this problem either use $\Omega(|V|)$ parallel depth or $\Omega(|V|^2)$ queries. In contrast to state-of-the-art weakly-polynomial and strongly-polynomial time algorithms for SFM, our algorithms use first-order optimization methods, e.g., mirror descent and follow the regularized leader. We introduce what we call {\em sparse dual certificates}, which encode information on the structure of sparse minimizers, and both our parallel and sequential algorithms provide new algorithmic tools for allowing first-order optimization methods to efficiently compute them. Correspondingly, our algorithm does not invoke fast matrix multiplication or general linear system solvers and in this sense is more combinatorial than previous state-of-the-art methods.
We consider the well-studied pattern counting problem: given a permutation $\pi \in \mathbb{S}_n$ and an integer $k > 1$, count the number of order-isomorphic occurrences of every pattern $\tau \in \mathbb{S}_k$ in $\pi$. Our first result is an $\widetilde{\mathcal{O}}(n^2)$-time algorithm for $k=6$ and $k=7$. The proof relies heavily on a new family of graphs that we introduce, called pattern-trees. Every such tree corresponds to an integer linear combination of permutations in $\mathbb{S}_k$, and is associated with linear extensions of partially ordered sets. We design an evaluation algorithm for these combinations, and apply it to a family of linearly-independent trees. For $k=8$, we show a barrier: the subspace spanned by trees in the previous family has dimension exactly $|\mathbb{S}_8| - 1$, one less than required. Our second result is an $\widetilde{\mathcal{O}}(n^{7/4})$-time algorithm for $k=5$. This algorithm extends the framework of pattern-trees by speeding-up their evaluation in certain cases. A key component of the proof is the introduction of pair-rectangle-trees, a data structure for dominance counting.
We consider the problem of privately estimating a parameter $\mathbb{E}[h(X_1,\dots,X_k)]$, where $X_1$, $X_2$, $\dots$, $X_k$ are i.i.d. data from some distribution and $h$ is a permutation-invariant function. Without privacy constraints, standard estimators are U-statistics, which commonly arise in a wide range of problems, including nonparametric signed rank tests, symmetry testing, uniformity testing, and subgraph counts in random networks, and can be shown to be minimum variance unbiased estimators under mild conditions. Despite the recent outpouring of interest in private mean estimation, privatizing U-statistics has received little attention. While existing private mean estimation algorithms can be applied to obtain confidence intervals, we show that they can lead to suboptimal private error, e.g., constant-factor inflation in the leading term, or even $\Theta(1/n)$ rather than $O(1/n^2)$ in degenerate settings. To remedy this, we propose a new thresholding-based approach using \emph{local H\'ajek projections} to reweight different subsets of the data. This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.
Sorting is the task of ordering $n$ elements using pairwise comparisons. It is well known that $m=\Theta(n\log n)$ comparisons are both necessary and sufficient when the outcomes of the comparisons are observed with no noise. In this paper, we study the sorting problem when each comparison is incorrect with some fixed yet unknown probability $p$. Unlike the common approach in the literature which aims to minimize the number of pairwise comparisons $m$ to achieve a given desired error probability, we consider randomized algorithms with expected number of queries $\textsf{E}[M]$ and aim at characterizing the maximal sorting rate $\frac{n\log n}{\textsf{E}[M]}$ such that the ordering of the elements can be estimated with a vanishing error probability asymptotically. The maximal rate is referred to as the noisy sorting capacity. In this work, we derive upper and lower bounds on the noisy sorting capacity. The two lower bounds -- one for fixed-length algorithms and one for variable-length algorithms -- are established by combining the insertion sort algorithm with the well-known Burnashev--Zigangirov algorithm for channel coding with feedback. Compared with existing methods, the proposed algorithms are universal in the sense that they do not require the knowledge of $p$, while maintaining a strictly positive sorting rate. Moreover, we derive a general upper bound on the noisy sorting capacity, along with an upper bound on the maximal rate that can be achieved by sorting algorithms that are based on insertion sort.
The problem of an optimal mapping between Hilbert spaces $IN$ and $OUT$, based on a series of density matrix mapping measurements $\rho^{(l)} \to \varrho^{(l)}$, $l=1\dots M$, is formulated as an optimization problem maximizing the total fidelity $\mathcal{F}=\sum_{l=1}^{M} \omega^{(l)} F\left(\varrho^{(l)},\sum_s B_s \rho^{(l)} B^{\dagger}_s\right)$ subject to probability preservation constraints on Kraus operators $B_s$. For $F(\varrho,\sigma)$ in the form that total fidelity can be represented as a quadratic form with superoperator $\mathcal{F}=\sum_s\left\langle B_s\middle|S\middle| B_s \right\rangle$ (either exactly or as an approximation) an iterative algorithm is developed to find the global maximum. The result comprises in $N_s$ operators $B_s$ that collectively form an $IN$ to $OUT$ quantum channel $A^{OUT}=\sum_s B_s A^{IN} B_s^{\dagger}$. The work introduces two important generalizations of unitary learning: 1. $IN$/$OUT$ states are represented as density matrices. 2. The mapping itself is formulated as a general quantum channel. This marks a crucial advancement from the commonly studied unitary mapping of pure states $\phi_l=\mathcal{U} \psi_l$ to a general quantum channel, what allows us to distinguish probabilistic mixture of states and their superposition. An application of the approach is demonstrated on unitary learning of density matrix mapping $\varrho^{(l)}=\mathcal{U} \rho^{(l)} \mathcal{U}^{\dagger}$, in this case a quadratic on $\mathcal{U}$ fidelity can be constructed by considering $\sqrt{\rho^{(l)}} \to \sqrt{\varrho^{(l)}}$ mapping, and on a general quantum channel of Kraus rank $N_s$, where quadratic on $B_s$ fidelity is an approximation -- a quantum channel is then built as a hierarchy of unitary mappings. The approach can be applied to study decoherence effects, spontaneous coherence, synchronizing, etc.
Most curriculum learning methods require an approach to sort the data samples by difficulty, which is often cumbersome to perform. In this work, we propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC), which leverages the use of a different learning rate for each layer of a neural network to create a data-agnostic curriculum during the initial training epochs. More specifically, LeRaC assigns higher learning rates to neural layers closer to the input, gradually decreasing the learning rates as the layers are placed farther away from the input. The learning rates increase at various paces during the first training iterations, until they all reach the same value. From this point on, the neural model is trained as usual. This creates a model-level curriculum learning strategy that does not require sorting the examples by difficulty and is compatible with any neural network, generating higher performance levels regardless of the architecture. We conduct comprehensive experiments on 12 data sets from the computer vision (CIFAR-10, CIFAR-100, Tiny ImageNet, ImageNet-200, Food-101, UTKFace, PASCAL VOC), language (BoolQ, QNLI, RTE) and audio (ESC-50, CREMA-D) domains, considering various convolutional (ResNet-18, Wide-ResNet-50, DenseNet-121, YOLOv5), recurrent (LSTM) and transformer (CvT, BERT, SepTr) architectures. We compare our approach with the conventional training regime, as well as with Curriculum by Smoothing (CBS), a state-of-the-art data-agnostic curriculum learning approach. Unlike CBS, our performance improvements over the standard training regime are consistent across all data sets and models. Furthermore, we significantly surpass CBS in terms of training time (there is no additional cost over the standard training regime for LeRaC). Our code is freely available at: //github.com/CroitoruAlin/LeRaC.
For a given function of user data, a querier must recover with at least a prescribed probability, the value of the function based on a user-provided query response. Subject to this requirement, the user forms the query response so as to minimize the likelihood of the querier guessing a list of prescribed size to which the data value belongs based on the query response. We obtain a general converse upper bound for maximum list privacy. This bound is shown to be tight for the case of a binary-valued function through an explicit achievability scheme that involves an add-noise query response.