We formalize the tail redundancy of a collection of distributions over a countably infinite alphabet, and show that this fundamental quantity characterizes the asymptotic per-symbol redundancy of universally compressing sequences generated iid from a collection $\mathcal P$ of distributions over a countably infinite alphabet. Contrary to the worst case formulations of universal compression, finite single letter (average case) redundancy of $\mathcal P$ does not automatically imply that the expected redundancy of describing length-$n$ strings sampled iid from $\mathcal P$ grows sublinearly with $n$. Instead, we prove that universal compression of length-$n$ \iid sequences from $\mathcal P$ is characterized by how well the tails of distributions in $\mathcal P$ can be universally described, showing that the asymptotic per-symbol redundancy of iid strings is equal to the tail redundancy.
The Mat\'ern covariance function is ubiquitous in the application of Gaussian processes to spatial statistics and beyond. Perhaps the most important reason for this is that the smoothness parameter $\nu$ gives complete control over the mean-square differentiability of the process, which has significant implications for the behavior of estimated quantities such as interpolants and forecasts. Unfortunately, derivatives of the Mat\'ern covariance function with respect to $\nu$ require derivatives of the modified second-kind Bessel function $\mathcal{K}_\nu$ with respect to $\nu$. While closed form expressions of these derivatives do exist, they are prohibitively difficult and expensive to compute. For this reason, many software packages require fixing $\nu$ as opposed to estimating it, and all existing software packages that attempt to offer the functionality of estimating $\nu$ use finite difference estimates for $\partial_\nu \mathcal{K}_\nu$. In this work, we introduce a new implementation of $\mathcal{K}_\nu$ that has been designed to provide derivatives via automatic differentiation (AD), and whose resulting derivatives are significantly faster and more accurate than those computed using finite differences. We provide comprehensive testing for both speed and accuracy and show that our AD solution can be used to build accurate Hessian matrices for second-order maximum likelihood estimation in settings where Hessians built with finite difference approximations completely fail.
Markov chains with variable length are useful parsimonious stochastic models able to generate most stationary sequence of discrete symbols. The idea is to identify the suffixes of the past, called contexts, that are relevant to predict the future symbol. Sometimes a single state is a context, and looking at the past and finding this specific state makes the further past irrelevant. States with such property are called renewal states and they can be used to split the chain into independent and identically distributed blocks. In order to identify renewal states for chains with variable length, we propose the use of Intrinsic Bayes Factor to evaluate the hypothesis that some particular state is a renewal state. In this case, the difficulty lies in integrating the marginal posterior distribution for the random context trees for general prior distribution on the space of context trees, with Dirichlet prior for the transition probabilities, and Monte Carlo methods are applied. To show the strength of our method, we analyzed artificial datasets generated from different binary models models and one example coming from the field of Linguistics.
We study the accuracy of differentially private mechanisms in the continual release model. A continual release mechanism receives a sensitive dataset as a stream of $T$ inputs and produces, after receiving each input, an accurate output on the obtained inputs.In contrast, a batch algorithm receives the data as one batch and produces a single output. We provide the first strong lower bounds on the error of continual release mechanisms. In particular, for two fundamental problems that are widely studied and used in the batch model, we show that the worst case error of every continual release algorithm is $\tilde \Omega(T^{1/3})$ times larger than that of the best batch algorithm. Previous work shows only a polylogarithimic (in $T$) gap between the worst case error achievable in these two models; further, for many problems, including the summation of binary attributes, the polylogarithmic gap is tight (Dwork et al., 2010; Chan et al., 2010). Our results show that problems closely related to summation -- specifically, those that require selecting the largest of a set of sums -- are fundamentally harder in the continual release model than in the batch model. Our lower bounds assume only that privacy holds for streams fixed in advance (the "nonadaptive" setting). However, we provide matching upper bounds that hold in a model where privacy is required even for adaptively selected streams. This model may be of independent interest.
Global Positioning Systems are now a standard module in mobile devices, and their ubiquity is fueling the rapid growth of location-based services (LBSs). This poses the risk of location privacy disclosure. Effective location privacy preservation is foremost for various mobile applications. Recently two strong privacy notions, geo-indistinguishability and expected inference error, are proposed based on statistical quantification. They are shown to be complementary for limiting the leakage of location information. In this paper, we argue that personalization means regionalization for geo-indistinguishability, and we propose a regionalized location obfuscation mechanism with personalized utility sensitivities. This substantially corrects the differential privacy problem of PIVE framework proposed by Yu, Liu and Pu on ISOC Network and Distributed System Security Symposium (NDSS) in 2017. Since PIVE fails to provide differential privacy guarantees on adaptive protection location set (PLS) as pointed in our previous work, we develop DPIVE with two phases. In Phase I, we determine disjoint sets by partitioning all possible positions such that different locations in the same set share the common PLS. In Phase II, we construct a probability distribution matrix by exponential mechanism in which the rows corresponding to the same PLS have their own sensitivity of utility (diameter of PLS). Moreover, we improve DPIVE with refined location partition and fine-grained personalization, in which each location has its own privacy level on two privacy control knobs, minimum inference error and differential privacy parameter. Experiments with two public datasets demonstrate that our mechanisms have the superior performance typically on skewed locations.
The aim of this thesis is to develop a theoretical framework to study parameter estimation of quantum channels. We study the task of estimating unknown parameters encoded in a channel in the sequential setting. A sequential strategy is the most general way to use a channel multiple times. Our goal is to establish lower bounds (called Cramer-Rao bounds) on the estimation error. The bounds we develop are universally applicable; i.e., they apply to all permissible quantum dynamics. We consider the use of catalysts to enhance the power of a channel estimation strategy. This is termed amortization. The power of a channel for a parameter estimation is determined by its Fisher information. Thus, we study how much a catalyst quantum state can enhance the Fisher information of a channel by defining the amortized Fisher information. We establish our bounds by proving that for certain Fisher information quantities, catalyst states do not improve the performance of a sequential estimation protocol compared to a parallel one. The technical term for this is an amortization collapse. We use this to establish bounds when estimating one parameter, or multiple parameters simultaneously. Our bounds apply universally and we also cast them as optimization problems. For the single parameter case, we establish bounds for general quantum channels using both the symmetric logarithmic derivative (SLD) Fisher information and the right logarithmic derivative (RLD) Fisher information. The task of estimating multiple parameters simultaneously is more involved than the single parameter case, because the Cramer-Rao bounds take the form of matrix inequalities. We establish a scalar Cramer-Rao bound for multiparameter channel estimation using the RLD Fisher information. For both single and multiparameter estimation, we provide a no-go condition for the so-called Heisenberg scaling using our RLD-based bound.
Percentiles and more generally, quantiles are commonly used in various contexts to summarize data. For most distributions, there is exactly one quantile that is unbiased. For distributions like the Gaussian that have the same mean and median, that becomes the medians. There are different ways to estimate quantiles from finite samples described in the literature and implemented in statistics packages. It is possible to leverage the memory-less property of the exponential distribution and design high quality estimators that are unbiased and have low variance and mean squared errors. Naturally, these estimators out-perform the ones in statistical packages when the underlying distribution is exponential. But, they also happen to generalize well when that assumption is violated.
In this paper, we bring consumer theory to bear in the analysis of Fisher markets whose buyers have arbitrary continuous, concave, homogeneous (CCH) utility functions representing locally non-satiated preferences. The main tools we use are the dual concepts of expenditure minimization and indirect utility maximization. First, we use expenditure functions to construct a new convex program whose dual, like the dual of the Eisenberg-Gale program, characterizes the equilibrium prices of CCH Fisher markets. We then prove that the subdifferential of the dual of our convex program is equal to the negative excess demand in the associated market, which makes generalized gradient descent equivalent to computing equilibrium prices via t\^atonnement. Finally, we run a series of experiments which suggest that t\^atonnement may converge at a rate of $O\left(\frac{(1+E)}{t^2}\right)$ in CCH Fisher markets that comprise buyers with elasticity of demand bounded by $E$. Our novel characterization of equilibrium prices may provide a path to proving the convergence of t\^atonnement in Fisher markets beyond those in which buyers utilities exhibit constant elasticity of substitution.
We analyze the effect of lossy compression in the processing of sensor signals that must be used to detect anomalous events in the system under observation. The intuitive relationship between the quality loss at higher compression and the possibility of telling anomalous behaviours from normal ones is formalized in terms of information-theoretic quantities. Some analytic derivations are made within the Gaussian framework and possibly in the asymptotic regime for what concerns the stretch of signals considered. Analytical conclusions are matched with the performance of practical detectors in a toy case allowing the assessment of different compression/detector configurations.
We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.
Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, these impacts have never been systematically studied and quantified in existing FL literature. In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration. Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32x lengthened training time, and undermined fairness. Furthermore, we analyze potential impact factors and find that device failure and participant bias are two potential factors for performance degradation. Our study provides insightful implications for FL practitioners. On the one hand, our findings suggest that FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity.