Private inference refers to a two-party setting in which one has a model (e.g., a linear classifier), the other has data, and the model is to be applied over the data while safeguarding the privacy of both parties. In particular, models in which the weights are quantized (e.g., to 1 or -1) gained increasing attention lately, due to their benefits in efficient, private, or robust computations. Traditionally, private inference has been studied from a cryptographic standpoint, which suffers from high complexity and degraded accuracy. More recently, Raviv et al. showed that in quantized models, an information theoretic tradeoff exists between the privacy of the parties, and a scheme based on a combination of Boolean and real-valued algebra was presented which attains that tradeoff. Both the scheme and the respective bound required the computation to be done exactly. In this work we show that by relaxing the requirement for exact computation, one can break the information theoretic privacy barrier of Raviv et al., and provide better privacy at the same communication costs. We provide a scheme for such approximate computation, bound its error, show its improved privacy, and devise a respective lower bound for some parameter regimes.
We propose to use L\'evy {\alpha}-stable distributions for constructing priors for Bayesian inverse problems. The construction is based on Markov fields with stable-distributed increments. Special cases include the Cauchy and Gaussian distributions, with stability indices {\alpha} = 1, and {\alpha} = 2, respectively. Our target is to show that these priors provide a rich class of priors for modelling rough features. The main technical issue is that the {\alpha}-stable probability density functions do not have closed-form expressions in general, and this limits their applicability. For practical purposes, we need to approximate probability density functions through numerical integration or series expansions. Current available approximation methods are either too time-consuming or do not function within the range of stability and radius arguments needed in Bayesian inversion. To address the issue, we propose a new hybrid approximation method for symmetric univariate and bivariate {\alpha}-stable distributions, which is both fast to evaluate, and accurate enough from a practical viewpoint. Then we use approximation method in the numerical implementation of {\alpha}-stable random field priors. We demonstrate the applicability of the constructed priors on selected Bayesian inverse problems which include the deconvolution problem, and the inversion of a function governed by an elliptic partial differential equation. We also demonstrate hierarchical {\alpha}-stable priors in the one-dimensional deconvolution problem. We employ maximum-a-posterior-based estimation at all the numerical examples. To that end, we exploit the limited-memory BFGS and its bounded variant for the estimator.
Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for variables with large support sizes. In this paper, we analyze the effect of "weak confounding" on causal estimands. More specifically, under the assumption that the unobserved confounders that render a query non-identifiable have small entropy, we propose an efficient linear program to derive the upper and lower bounds of the causal effect. We show that our bounds are consistent in the sense that as the entropy of unobserved confounders goes to zero, the gap between the upper and lower bound vanishes. Finally, we conduct synthetic and real data simulations to compare our bounds with the bounds obtained by the existing work that cannot incorporate such entropy constraints and show that our bounds are tighter for the setting with weak confounders.
Moving horizon estimation (MHE) is a widely studied state estimation approach in several practical applications. In the MHE problem, the state estimates are obtained via the solution of an approximated nonlinear optimization problem. However, this optimization step is known to be computationally complex. Given this limitation, this paper investigates the idea of iteratively preconditioned gradient-descent (IPG) to solve MHE problem with the aim of an improved performance than the existing solution techniques. To our knowledge, the preconditioning technique is used for the first time in this paper to reduce the computational cost and accelerate the crucial optimization step for MHE. The convergence guarantee of the proposed iterative approach for a class of MHE problems is presented. Additionally, sufficient conditions for the MHE problem to be convex are also derived. Finally, the proposed method is implemented on a unicycle localization example. The simulation results demonstrate that the proposed approach can achieve better accuracy with reduced computational costs.
In this work, we present a generic step-size choice for the ADMM type proximal algorithms. It admits a closed-form expression and is theoretically optimal with respect to a worst-case convergence rate bound. It is simply given by the ratio of Euclidean norms of the dual and primal solutions, i.e., $ ||{\lambda}^\star|| / ||{x}^\star||$. Numerical tests show that its practical performance is near-optimal in general. The only challenge is that such a ratio is not known a priori and we provide two strategies to address it. The derivation of our step-size choice is based on studying the fixed-point structure of ADMM using the proximal operator. However, we demonstrate that the classical proximal operator definition contains an input scaling issue. This leads to a scaled step-size optimization problem which would yield a false solution. Such an issue is naturally avoided by our proposed new definition of the proximal operator. A series of its properties is established.
Federated Learning (FL) allows multiple participating clients to train machine learning models collaboratively by keeping their datasets local and only exchanging the gradient or model updates with a coordinating server. Existing FL protocols were shown to be vulnerable to attacks that aim to compromise data privacy and/or model robustness. Recently proposed defenses focused on ensuring either privacy or robustness, but not both. In this paper, we focus on simultaneously achieving differential privacy (DP) and Byzantine robustness for cross-silo FL, based on the idea of learning from history. The robustness is achieved via client momentum, which averages the updates of each client over time, thus reduces the variance of the honest clients and exposes the small malicious perturbations of Byzantine clients that are undetectable in a single round but accumulate over time. In our initial solution DP-BREM, the DP property is achieved via adding noise to the aggregated momentum, and we account for the privacy cost from the momentum, which is different from the conventional DP-SGD that accounts for the privacy cost from gradient. Since DP-BREM assumes a trusted server (who can obtain clients' local models or updates), we further develop the final solution called DP-BREM+, which achieves the same DP and robustness properties as DP-BREM without a trusted server by utilizing secure aggregation techniques, where DP noise is securely and jointly generated by the clients. Our theoretical analysis on the convergence rate and experimental results under different DP guarantees and attack settings demonstrate that our proposed protocols achieve better privacy-utility tradeoff and stronger Byzantine robustness than several baseline methods.
Given a dataset of $n$ i.i.d. samples from an unknown distribution $P$, we consider the problem of generating a sample from a distribution that is close to $P$ in total variation distance, under the constraint of differential privacy (DP). We study the problem when $P$ is a multi-dimensional Gaussian distribution, under different assumptions on the information available to the DP mechanism: known covariance, unknown bounded covariance, and unknown unbounded covariance. We present new DP sampling algorithms, and show that they achieve near-optimal sample complexity in the first two settings. Moreover, when $P$ is a product distribution on the binary hypercube, we obtain a pure-DP algorithm whereas only an approximate-DP algorithm (with slightly worse sample complexity) was previously known.
The approximate stabilizer rank of a quantum state is the minimum number of terms in any approximate decomposition of that state into stabilizer states. Bravyi and Gosset showed that the approximate stabilizer rank of a so-called "magic" state like $|T\rangle^{\otimes n}$, up to polynomial factors, is an upper bound on the number of classical operations required to simulate an arbitrary quantum circuit with Clifford gates and $n$ number of $T$ gates. As a result, an exponential lower bound on this quantity seems inevitable. Despite this intuition, several attempts using various techniques could not lead to a better than a linear lower bound on the "exact" rank of ${|T\rangle}^{\otimes n}$, meaning the minimal size of a decomposition that exactly produces the state. For the "approximate" rank, which is more realistically related to the cost of simulating quantum circuits, no lower bound better than $\tilde \Omega(\sqrt n)$ has been known. In this paper, we improve the lower bound on the approximate rank to $\tilde \Omega (n^2)$ for a wide range of the approximation parameters. An immediate corollary of our result is the existence of polynomial time computable functions which require a super-linear number of terms in any decomposition into exponentials of quadratic forms over $\mathbb{F}_2$, resolving a question in [Wil18]. Our approach is based on a strong lower bound on the approximate rank of a quantum state sampled from the Haar measure, a step-by-step analysis of the approximate rank of a magic-state teleportation protocol to sample from the Haar measure, and a result about trading Clifford operations with $T$ gates by [LKS18].
We present a novel technique to estimate the 6D pose of objects from single images where the 3D geometry of the object is only given approximately and not as a precise 3D model. To achieve this, we employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel. In addition to the 3D coordinates, our model also estimates the pixel-wise coordinate error to discard correspondences that are likely wrong. This allows us to generate multiple 6D pose hypotheses of the object, which we then refine iteratively using a highly efficient region-based approach. We also introduce a novel pixel-wise posterior formulation by which we can estimate the probability for each hypothesis and select the most likely one. As we show in experiments, our approach is capable of dealing with extreme visual conditions including overexposure, high contrast, or low signal-to-noise ratio. This makes it a powerful technique for the particularly challenging task of estimating the pose of tumbling satellites for in-orbit robotic applications. Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition.
State-space models (SSMs) are a powerful statistical tool for modelling time-varying systems via a latent state. In these models, the latent state is never directly observed. Instead, a sequence of data points related to the state are obtained. The linear-Gaussian state-space model is widely used, since it allows for exact inference when all model parameters are known, however this is rarely the case. The estimation of these parameters is a very challenging but essential task to perform inference and prediction. In the linear-Gaussian model, the state dynamics are described via a state transition matrix. This model parameter is known to behard to estimate, since it encodes the relationships between the state elements, which are never observed. In many applications, this transition matrix is sparse since not all state components directly affect all other state components. However, most parameter estimation methods do not exploit this feature. In this work we propose SpaRJ, a fully probabilistic Bayesian approach that obtains sparse samples from the posterior distribution of the transition matrix. Our method explores sparsity by traversing a set of models that exhibit differing sparsity patterns in the transition matrix. Moreover, we also design new effective rules to explore transition matrices within the same level of sparsity. This novel methodology has strong theoretical guarantees, and unveils the latent structure of the data generating process, thereby enhancing interpretability. The performance of SpaRJ is showcased in example with dimension 144 in the parameter space, and in a numerical example with real data.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.