In this paper we study the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike the previous results which need to assume bounded reward distributions, here we mainly focus on the case the reward distribution of each arm only has $(1+v)$-th moment with some $v\in (0, 1]$. In the first part, we study the problem in the central $\epsilon$-DP model. We first provide a near-optimal result by developing a private and robust Upper Confidence Bound (UCB) algorithm. Then, we improve the result via a private and robust version of the Successive Elimination (SE) algorithm. Finally, we show that the instance-dependent regret bound of our improved algorithm is optimal by showing its lower bound. In the second part of the paper, we study the problem in the $\epsilon$-LDP model. We propose an algorithm which could be seen as locally private and robust version of the SE algorithm, and show it could achieve (near) optimal rates for both instance-dependent and instance-independent regrets. All of the above results can also reveal the differences between the problem of private MAB with bounded rewards and heavy-tailed rewards. To achieve these (near) optimal rates, we develop several new hard instances and private robust estimators as byproducts, which might could be used to other related problems. Finally, experimental results also support our theoretical analysis and show the effectiveness of our algorithms.
We present a priori and a posteriori error analysis of a high order hybridizable discontinuous Galerkin (HDG) method applied to a semi-linear elliptic problem posed on a piecewise curved, non polygonal domain. We approximate $\Omega$ by a polygonal subdomain $\Omega_h$ and propose an HDG discretization, which is shown to be optimal under mild assumptions related to the non-linear source term and the distance between the boundaries of the polygonal subdomain $\Omega_h$ and the true domain $\Omega$. Moreover, a local non-linear post-processing of the scalar unknown is proposed and shown to provide an additional order of convergence. A reliable and locally efficient a posteriori error estimator that takes into account the error in the approximation of the boundary data of $\Omega_h$ is also provided.
Differentially private (DP) mechanisms face the challenge of providing accurate results while protecting their inputs: the privacy-utility trade-off. A simple but powerful technique for DP adds noise to sensitivity-bounded query outputs to blur the exact query output: additive mechanisms. While a vast body of work considers infinitely wide noise distributions, some applications (e.g., real-time operating systems) require hard bounds on the deviations from the real query, and only limited work on such mechanisms exist. An additive mechanism with truncated noise (i.e., with bounded range) can offer such hard bounds. We introduce a gradient-descent-based tool to learn truncated noise for additive mechanisms with strong utility bounds while simultaneously optimizing for differential privacy under sequential composition, i.e., scenarios where multiple noisy queries on the same data are revealed. Our method can learn discrete noise patterns and not only hyper-parameters of a predefined probability distribution. For sensitivity bounded mechanisms, we show that it is sufficient to consider symmetric and that\new{, for from the mean monotonically falling noise,} ensuring privacy for a pair of representative query outputs guarantees privacy for all pairs of inputs (that differ in one element). We find that the utility-privacy trade-off curves of our generated noise are remarkably close to truncated Gaussians and even replicate their shape for $l_2$ utility-loss. For a low number of compositions, we also improved DP-SGD (sub-sampling). Moreover, we extend Moments Accountant to truncated distributions, allowing to incorporate mechanism output events with varying input-dependent zero occurrence probability.
In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution. While existing works in this setting mainly focus on Upper Confidence Bound algorithms, we introduce a new Thompson Sampling approach for CVaR bandits on bounded rewards that is flexible enough to solve a variety of problems grounded on physical resources. Building on a recent work by Riou & Honda (2020), we introduce B-CVTS for continuous bounded rewards and M-CVTS for multinomial distributions. On the theoretical side, we provide a non-trivial extension of their analysis that enables to theoretically bound their CVaR regret minimization performance. Strikingly, our results show that these strategies are the first to provably achieve asymptotic optimality in CVaR bandits, matching the corresponding asymptotic lower bounds for this setting. Further, we illustrate empirically the benefit of Thompson Sampling approaches both in a realistic environment simulating a use-case in agriculture and on various synthetic examples.
An increasingly popular method for computing aggregate statistics while preserving users' privacy is local differential privacy (LDP). Under this model, users perturb their data before sending it to an untrusted central party to be processed. Key value data is a naturally occurring data type that has not been thoroughly investigated in the local trust model. Existing LDP solutions for computing statistics over key value data suffer from the inherent accuracy limitations of each user adding their own noise. Multi-party computation (MPC) is a common alternative to LDP that removes the requirement for a trusted central party while maintaining accuracy; however, naively applying MPC to key value data results in prohibitively expensive computation costs. In this work, we present selective multi-party computation, a novel approach to distributed computation that leverages DP leakage to efficiently and accurately compute statistics over key value data. We show that our protocol satisfies pure DP and is provably secure in the combined DP/MPC model. Our empirical evaluation demonstrates that we can compute statistics over 10,000 keys in 20 seconds and can scale up to 30 servers while obtaining results for a single key in under a second.
The RKHS bandit problem (also called kernelized multi-armed bandit problem) is an online optimization problem of non-linear functions with noisy feedback. Although the problem has been extensively studied, there are unsatisfactory results for some problems compared to the well-studied linear bandit case. Specifically, there is no general algorithm for the adversarial RKHS bandit problem. In addition, high computational complexity of existing algorithms hinders practical application. We address these issues by considering a novel amalgamation of approximation theory and the misspecified linear bandit problem. Using an approximation method, we propose efficient algorithms for the stochastic RKHS bandit problem and the first general algorithm for the adversarial RKHS bandit problem. Furthermore, we empirically show that one of our proposed methods has comparable cumulative regret to IGP-UCB and its running time is much shorter.
Generalization performance of stochastic optimization stands a central place in learning theory. In this paper, we investigate the excess risk performance and towards improved learning rates for two popular approaches of stochastic optimization: empirical risk minimization (ERM) and stochastic gradient descent (SGD). Although there exists plentiful generalization analysis of ERM and SGD for supervised learning, current theoretical understandings of ERM and SGD either have stronger assumptions in convex learning, e.g., strong convexity, or show slow rates and less studied in nonconvex learning. Motivated by these problems, we aim to provide improved rates under milder assumptions in convex learning and derive faster rates in nonconvex learning. It is notable that our analysis span two popular theoretical viewpoints: \emph{stability} and \emph{uniform convergence}. Specifically, in stability regime, we present high probability learning rates of order $\mathcal{O} (1/n)$ w.r.t. the sample size $n$ for ERM and SGD with milder assumptions in convex learning and similar high probability rates of order $\mathcal{O} (1/n)$ in nonconvex learning, rather than in expectation. Furthermore, this type of learning rate is improved to faster order $\mathcal{O} (1/n^2)$ in uniform convergence regime. To our best knowledge, for ERM and SGD, the learning rates presented in this paper are all state-of-the-art.
We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time. We introduce the adaptive resetting bandit (ADR-bandit), which is a class of bandit algorithms that leverages adaptive windowing techniques from the data stream community. We first provide new guarantees on the quality of estimators resulting from adaptive windowing techniques, which are of independent interest in the data mining community. Furthermore, we conduct a finite-time analysis of ADR-bandit in two typical environments: an abrupt environment where changes occur instantaneously and a gradual environment where changes occur progressively. We demonstrate that ADR-bandit has nearly optimal performance when the abrupt or global changes occur in a coordinated manner that we call global changes. We demonstrate that forced exploration is unnecessary when we restrict the interest to the global changes. Unlike the existing nonstationary bandit algorithms, ADR-bandit has optimal performance in stationary environments as well as nonstationary environments with global changes. Our experiments show that the proposed algorithms outperform the existing approaches in synthetic and real-world environments.
As one of the most fundamental problems in machine learning, statistics and differential privacy, Differentially Private Stochastic Convex Optimization (DP-SCO) has been extensively studied in recent years. However, most of the previous work can only handle either regular data distribution or irregular data in the low dimensional space case. To better understand the challenges arising from irregular data distribution, in this paper we provide the first study on the problem of DP-SCO with heavy-tailed data in the high dimensional space. In the first part we focus on the problem over some polytope constraint (such as the $\ell_1$-norm ball). We show that if the loss function is smooth and its gradient has bounded second order moment, it is possible to get a (high probability) error bound (excess population risk) of $\tilde{O}(\frac{\log d}{(n\epsilon)^\frac{1}{3}})$ in the $\epsilon$-DP model, where $n$ is the sample size and $d$ is the dimensionality of the underlying space. Next, for LASSO, if the data distribution that has bounded fourth-order moments, we improve the bound to $\tilde{O}(\frac{\log d}{(n\epsilon)^\frac{2}{5}})$ in the $(\epsilon, \delta)$-DP model. In the second part of the paper, we study sparse learning with heavy-tailed data. We first revisit the sparse linear model and propose a truncated DP-IHT method whose output could achieve an error of $\tilde{O}(\frac{s^{*2}\log d}{n\epsilon})$, where $s^*$ is the sparsity of the underlying parameter. Then we study a more general problem over the sparsity ({\em i.e.,} $\ell_0$-norm) constraint, and show that it is possible to achieve an error of $\tilde{O}(\frac{s^{*\frac{3}{2}}\log d}{n\epsilon})$, which is also near optimal up to a factor of $\tilde{O}{(\sqrt{s^*})}$, if the loss function is smooth and strongly convex.
Estimating post-click conversion rate (CVR) accurately is crucial for ranking systems in industrial applications such as recommendation and advertising. Conventional CVR modeling applies popular deep learning methods and achieves state-of-the-art performance. However it encounters several task-specific problems in practice, making CVR modeling challenging. For example, conventional CVR models are trained with samples of clicked impressions while utilized to make inference on the entire space with samples of all impressions. This causes a sample selection bias problem. Besides, there exists an extreme data sparsity problem, making the model fitting rather difficult. In this paper, we model CVR in a brand-new perspective by making good use of sequential pattern of user actions, i.e., impression -> click -> conversion. The proposed Entire Space Multi-task Model (ESMM) can eliminate the two problems simultaneously by i) modeling CVR directly over the entire space, ii) employing a feature representation transfer learning strategy. Experiments on dataset gathered from Taobao's recommender system demonstrate that ESMM significantly outperforms competitive methods. We also release a sampling version of this dataset to enable future research. To the best of our knowledge, this is the first public dataset which contains samples with sequential dependence of click and conversion labels for CVR modeling.
In this paper, we propose to tackle the problem of reducing discrepancies between multiple domains referred to as multi-source domain adaptation and consider it under the target shift assumption: in all domains we aim to solve a classification problem with the same output classes, but with labels' proportions differing across them. We design a method based on optimal transport, a theory that is gaining momentum to tackle adaptation problems in machine learning due to its efficiency in aligning probability distributions. Our method performs multi-source adaptation and target shift correction simultaneously by learning the class probabilities of the unlabeled target sample and the coupling allowing to align two (or more) probability distributions. Experiments on both synthetic and real-world data related to satellite image segmentation task show the superiority of the proposed method over the state-of-the-art.