Decentralized optimization is gaining increased traction due to its widespread applications in large-scale machine learning and multi-agent systems. The same mechanism that enables its success, i.e., information sharing among participating agents, however, also leads to the disclosure of individual agents' private information, which is unacceptable when sensitive data are involved. As differential privacy is becoming a de facto standard for privacy preservation, recently results have emerged integrating differential privacy with distributed optimization. However, directly incorporating differential privacy design in existing distributed optimization approaches significantly compromises optimization accuracy. In this paper, we propose to redesign and tailor gradient methods for differentially-private distributed optimization, and propose two differential-privacy oriented gradient methods that can ensure both rigorous epsilon-differential privacy and optimality. The first algorithm is based on static-consensus based gradient methods, and the second algorithm is based on dynamic-consensus (gradient-tracking) based distributed optimization methods and, hence, is applicable to general directed interaction graph topologies. Both algorithms can simultaneously ensure almost sure convergence to an optimal solution and a finite privacy budget, even when the number of iterations goes to infinity. To our knowledge, this is the first time that both goals are achieved simultaneously. Numerical simulations using a distributed estimation problem and experimental results on a benchmark dataset confirm the effectiveness of the proposed approaches.
This paper introduces a new method that embeds any Bayesian model used to generate synthetic data and converts it into a differentially private (DP) mechanism. We propose an alteration of the model synthesizer to utilize a censored likelihood that induces upper and lower bounds of [$\exp(-\epsilon / 2), \exp(\epsilon / 2)$], where $\epsilon$ denotes the level of the DP guarantee. This censoring mechanism equipped with an $\epsilon-$DP guarantee will induce distortion into the joint parameter posterior distribution by flattening or shifting the distribution towards a weakly informative prior. To minimize the distortion in the posterior distribution induced by likelihood censoring, we embed a vector-weighted pseudo posterior mechanism within the censoring mechanism. The pseudo posterior is formulated by selectively downweighting each likelihood contribution proportionally to its disclosure risk. On its own, the pseudo posterior mechanism produces a weaker asymptotic differential privacy (aDP) guarantee. After embedding in the censoring mechanism, the DP guarantee becomes strict such that it does not rely on asymptotics. We demonstrate that the pseudo posterior mechanism creates synthetic data with the highest utility at the price of a weaker, aDP guarantee, while embedding the pseudo posterior mechanism in the proposed censoring mechanism produces synthetic data with a stronger, non-asymptotic DP guarantee at the cost of slightly reduced utility. The perturbed histogram mechanism is included for comparison.
Strong secrecy communication over a discrete memoryless state-dependent multiple access channel (SD-MAC) with an external eavesdropper is investigated. The channel is governed by discrete memoryless and i.i.d. channel states and the channel state information (CSI) is revealed to the encoders in a causal manner. An inner bound of the capacity is provided. To establish the inner bound, we investigate coding schemes incorporating wiretap coding and secret key agreement between the sender and the legitimate receiver. Two kinds of block Markov coding schemes are studied. The first one uses backward decoding and Wyner-Ziv coding and the secret key is constructed from a lossy reproduction of the CSI. The other one is an extended version of the existing coding scheme for point-to-point wiretap channels with causal CSI. We further investigate some capacity-achieving cases for state-dependent multiple access wiretap channels (SD-MAWCs) with degraded message sets. It turns out that the two coding schemes are both optimal in these cases.
Privacy in AI remains a topic that draws attention from researchers and the general public in recent years. As one way to implement privacy-preserving AI, differentially private learning is a framework that enables AI models to use differential privacy (DP). To achieve DP in the learning process, existing algorithms typically limit the magnitude of gradients with a constant clipping, which requires carefully tuned due to its significant impact on model performance. As a solution to this issue, latest works NSGD and Auto-S innovatively propose to use normalization instead of clipping to avoid hyperparameter tuning. However, normalization-based approaches like NSGD and Auto-S rely on a monotonic weight function, which imposes excessive weight on small gradient samples and introduces extra deviation to the update. In this paper, we propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which guarantees privacy without the typical hyperparameter tuning process of using a constant clipping while significantly reducing the deviation between the update and true batch-averaged gradient. We provide a rigorous theoretical convergence analysis and show that with convergence rate at the same order, the proposed algorithm achieves a lower non-vanishing bound, which is maintained over training iterations, compared with NSGD/Auto-S. In addition, through extensive experimental evaluation, we show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.
Bayesian neural networks (BNNs) have recently regained a significant amount of attention in the deep learning community due to the development of scalable approximate Bayesian inference techniques. There are several advantages of using a Bayesian approach: Parameter and prediction uncertainties become easily available, facilitating rigorous statistical analysis. Furthermore, prior knowledge can be incorporated. However, so far, there have been no scalable techniques capable of combining both structural and parameter uncertainty. In this paper, we apply the concept of model uncertainty as a framework for structural learning in BNNs and hence make inference in the joint space of structures/models and parameters. Moreover, we suggest an adaptation of a scalable variational inference approach with reparametrization of marginal inclusion probabilities to incorporate the model space constraints. Experimental results on a range of benchmark datasets show that we obtain comparable accuracy results with the competing models, but based on methods that are much more sparse than ordinary BNNs.
Integer linear programming models a wide range of practical combinatorial optimization problems and has significant impacts in industry and management sectors. This work develops the first standalone local search solver for general integer linear programming validated on a large heterogeneous problem dataset. We propose a local search framework that switches in three modes, namely Search, Improve, and Restore modes, and design tailored operators adapted to different modes, thus improve the quality of the current solution according to different situations. For the Search and Restore modes, we propose an operator named tight move, which adaptively modifies variables' values trying to make some constraint tight. For the Improve mode, an efficient operator lift move is proposed to improve the quality of the objective function while maintaining feasibility. Putting these together, we develop a local search solver for integer linear programming called Local-ILP. Experiments conducted on the MIPLIB dataset show the effectiveness of our solver in solving large-scale hard integer linear programming problems within a reasonably short time. Local-ILP is competitive and complementary to the state-of-the-art commercial solver Gurobi and significantly outperforms the state-of-the-art non-commercial solver SCIP. Moreover, our solver establishes new records for 6 MIPLIB open instances.
We provide an exact expressions for the 1-Wasserstein distance between independent location-scale distributions. The expressions are represented using location and scale parameters and special functions such as the standard Gaussian CDF or the Gamma function. Specifically, we find that the 1-Wasserstein distance between independent univariate location-scale distributions is equivalent to the mean of a folded distribution within the same family whose underlying location and scale are equal to the difference of the locations and scales of the original distributions. A new linear upper bound on the 1-Wasserstein distance is presented and the asymptotic bounds of the 1-Wasserstein distance are detailed in the Gaussian case. The effect of differential privacy using the Laplace and Gaussian mechanisms on the 1-Wasserstein distance is studied using the closed-form expressions and bounds.
This paper proposes a method for assessing differential item functioning (DIF) in item response theory (IRT) models. The method does not require pre-specification of anchor items, which is its main virtue. It is developed in two main steps, first by showing how DIF can be re-formulated as a problem of outlier detection in IRT-based scaling, then tackling the latter using established methods from robust statistics. The proposal is a redescending M-estimator of IRT scaling parameters that is tuned to flag items with DIF at the desired asymptotic Type I Error rate. One way of quantifying the robustness of the estimator is in terms of its finite sample breakdown point, which is shown to equal to 1/2 (i.e., the estimator remains bounded whenever fewer than 1/2 of the items on an assessment exhibit DIF). This theoretical result is complemented by simulation studies that illustrate the performance of the estimator and its associated test of DIF. The simulation studies show that the proposed method compares favorably to currently available approaches, and a real data example illustrates its application in a research context where pre-specification of anchor items is infeasible. The focus of the paper is the two-parameter logistic model in two independent groups, with extensions to other settings considered in the conclusion.
We study numerical integration over bounded regions in $\mathbb{R}^s, s\ge1$ with respect to some probability measure. We replace random sampling with quasi-Monte Carlo methods, where the underlying point set is derived from deterministic constructions that aim to fill the space more evenly than random points. Such quasi-Monte Carlo point sets are ordinarily designed for the uniform measure, and the theory only works for product measures when a coordinate-wise transformation is applied. Going beyond this setting, we first consider the case where the target density is a mixture distribution where each term in the mixture comes from a product distribution. Next we consider target densities which can be approximated with such mixture distributions. We require the approximation to be a sum of coordinate-wise products and the approximation to be positive everywhere (so that they can be re-scaled to probability density functions). We use tensor product hat function approximations for this purpose here, since a hat function approximation of a positive function is itself positive. We also study more complex algorithms, where we first approximate the target density with a general Gaussian mixture distribution and approximate the mixtures with an adaptive hat function approximation on rotated intervals. The Gaussian mixture approximation allows us to locate the essential parts of the target density, whereas the adaptive hat function approximation allows us to approximate the finer structure of the target density. We prove convergence rates for each of the integration techniques based on quasi-Monte Carlo sampling for integrands with bounded partial mixed derivatives. The employed algorithms are based on digital $(t,s)$-sequences over the finite field $\mathbb{F}_2$ and an inversion method. Numerical examples illustrate the performance of the algorithms for some target densities and integrands.
This PhD thesis contains several contributions to the field of statistical causal modeling. Statistical causal models are statistical models embedded with causal assumptions that allow for the inference and reasoning about the behavior of stochastic systems affected by external manipulation (interventions). This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust (out-of-distribution generalizing) prediction methods. We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization. Our proposed estimators show, in certain settings, mean squared error improvements compared to both canonical and state-of-the-art estimators. We show that recent research on distributionally robust prediction methods has connections to well-studied estimators from econometrics. This connection leads us to prove that general K-class estimators possess distributional robustness properties. We, furthermore, propose a general framework for distributional robustness with respect to intervention-induced distributions. In this framework, we derive sufficient conditions for the identifiability of distributionally robust prediction methods and present impossibility results that show the necessity of several of these conditions. We present a new structure learning method applicable in additive noise models with directed trees as causal graphs. We prove consistency in a vanishing identifiability setup and provide a method for testing substructure hypotheses with asymptotic family-wise error control that remains valid post-selection. Finally, we present heuristic ideas for learning summary graphs of nonlinear time-series models.
This paper aims to mitigate straggler effects in synchronous distributed learning for multi-agent reinforcement learning (MARL) problems. Stragglers arise frequently in a distributed learning system, due to the existence of various system disturbances such as slow-downs or failures of compute nodes and communication bottlenecks. To resolve this issue, we propose a coded distributed learning framework, which speeds up the training of MARL algorithms in the presence of stragglers, while maintaining the same accuracy as the centralized approach. As an illustration, a coded distributed version of the multi-agent deep deterministic policy gradient(MADDPG) algorithm is developed and evaluated. Different coding schemes, including maximum distance separable (MDS)code, random sparse code, replication-based code, and regular low density parity check (LDPC) code are also investigated. Simulations in several multi-robot problems demonstrate the promising performance of the proposed framework.