This work addresses the problem of revenue maximization in a repeated, unlimited supply item-pricing auction while preserving buyer privacy. We present a novel algorithm that provides differential privacy with respect to the buyer's input pair: item selection and bid. Notably, our algorithm is the first to offer a sublinear $O(\sqrt{T}\log{T})$ regret with a privacy guarantee. Our method is based on an exponential weights meta-algorithm, and we mitigate the issue of discontinuities in revenue functions via small random perturbations. As a result of its structural similarity to the exponential mechanism, our method inherently secures differential privacy. We also extend our algorithm to accommodate scenarios where buyers strategically bid over successive rounds. The inherent differential privacy allows us to adapt our algorithm with minimal modification to ensure a sublinear regret in this setting.
This paper presents the design and development of an Anderson Accelerated Preconditioned Modified Hermitian and Skew-Hermitian Splitting (AA-PMHSS) method for solving complex-symmetric linear systems with application to electromagnetics problems, such as wave scattering and eddy currents. While it has been shown that the Anderson Acceleration of real linear systems is essentially equivalent to GMRES, we show here that the formulation using Anderson acceleration leads to a more performant method. We show relatively good robustness compared to existing preconditioned GMRES methods and significantly better performance due to the faster evaluation of the preconditioner. In particular, AA-PMHSS can be applied to solve problems and equations arising from electromagnetics, such as time-harmonic eddy current simulations discretized with the Finite Element Method. We also evaluate three test systems present in previous literature. We show that the method is competitive with two types of preconditioned GMRES. One of the significant advantages of these methods is that the convergence rate is independent of the discretization size.
Federated learning (FL) is an emerging paradigm that allows a central server to train machine learning models using remote users' data. Despite its growing popularity, FL faces challenges in preserving the privacy of local datasets, its sensitivity to poisoning attacks by malicious users, and its communication overhead. The latter is additionally considerably dominant in large-scale networks. These limitations are often individually mitigated by local differential privacy (LDP) mechanisms, robust aggregation, compression, and user selection techniques, which typically come at the cost of accuracy. In this work, we present compressed private aggregation (CPA), that allows massive deployments to simultaneously communicate at extremely low bit rates while achieving privacy, anonymity, and resilience to malicious users. CPA randomizes a codebook for compressing the data into a few bits using nested lattice quantizers, while ensuring anonymity and robustness, with a subsequent perturbation to hold LDP. The proposed CPA is proven to result in FL convergence in the same asymptotic rate as FL without privacy, compression, and robustness considerations, while satisfying both anonymity and LDP requirements. These analytical properties are empirically confirmed in a numerical study, where we demonstrate the performance gains of CPA compared with separate mechanisms for compression and privacy for training different image classification models, as well as its robustness in mitigating the harmful effects of malicious users.
The private collection of multiple statistics from a population is a fundamental statistical problem. One possible approach to realize this is to rely on the local model of differential privacy (LDP). Numerous LDP protocols have been developed for the task of frequency estimation of single and multiple attributes. These studies mainly focused on improving the utility of the algorithms to ensure the server performs the estimations accurately. In this paper, we investigate privacy threats (re-identification and attribute inference attacks) against LDP protocols for multidimensional data following two state-of-the-art solutions for frequency estimation of multiple attributes. To broaden the scope of our study, we have also experimentally assessed five widely used LDP protocols, namely, generalized randomized response, optimal local hashing, subset selection, RAPPOR and optimal unary encoding. Finally, we also proposed a countermeasure that improves both utility and robustness against the identified threats. Our contributions can help practitioners aiming to collect users' statistics privately to decide which LDP mechanism best fits their needs.
We consider the problem of finite-time identification of linear dynamical systems from $T$ samples of a single trajectory. Recent results have predominantly focused on the setup where no structural assumption is made on the system matrix $A^* \in \mathbb{R}^{n \times n}$, and have consequently analyzed the ordinary least squares (OLS) estimator in detail. We assume prior structural information on $A^*$ is available, which can be captured in the form of a convex set $\mathcal{K}$ containing $A^*$. For the solution of the ensuing constrained least squares estimator, we derive non-asymptotic error bounds in the Frobenius norm that depend on the local size of $\mathcal{K}$ at $A^*$. To illustrate the usefulness of these results, we instantiate them for three examples, namely when (i) $A^*$ is sparse and $\mathcal{K}$ is a suitably scaled $\ell_1$ ball; (ii) $\mathcal{K}$ is a subspace; (iii) $\mathcal{K}$ consists of matrices each of which is formed by sampling a bivariate convex function on a uniform $n \times n$ grid (convex regression). In all these situations, we show that $A^*$ can be reliably estimated for values of $T$ much smaller than what is needed for the unconstrained setting.
Using techniques developed recently in the field of compressed sensing we prove new upper bounds for general (nonlinear) sampling numbers of (quasi-)Banach smoothness spaces in $L^2$. In particular, we show that in relevant cases such as mixed and isotropic weighted Wiener classes or Sobolev spaces with mixed smoothness, sampling numbers in $L^2$ can be upper bounded by best $n$-term trigonometric widths in $L^\infty$. We describe a recovery procedure from $m$ function values based on $\ell^1$-minimization (basis pursuit denoising). With this method, a significant gain in the rate of convergence compared to recently developed linear recovery methods is achieved. In this deterministic worst-case setting we see an additional speed-up of $m^{-1/2}$ (up to log factors) compared to linear methods in case of weighted Wiener spaces. For their quasi-Banach counterparts even arbitrary polynomial speed-up is possible. Surprisingly, our approach allows to recover mixed smoothness Sobolev functions belonging to $S^r_pW(\mathbb{T}^d)$ on the $d$-torus with a logarithmically better rate of convergence than any linear method can achieve when $1 < p < 2$ and $d$ is large. This effect is not present for isotropic Sobolev spaces.
In recent years, there has been a growing interest in understanding complex microstructures and their effect on macroscopic properties. In general, it is difficult to derive an effective constitutive law for such microstructures with reasonable accuracy and meaningful parameters. One numerical approach to bridge the scales is computational homogenization, in which a microscopic problem is solved at every macroscopic point, essentially replacing the effective constitutive model. Such approaches are, however, computationally expensive and typically infeasible in multi-query contexts such as optimization and material design. To render these analyses tractable, surrogate models that can accurately approximate and accelerate the microscopic problem over a large design space of shapes, material and loading parameters are required. In previous works, such models were constructed in a data-driven manner using methods such as Neural Networks (NN) or Gaussian Process Regression (GPR). However, these approaches currently suffer from issues, such as need for large amounts of training data, lack of physics, and considerable extrapolation errors. In this work, we develop a reduced order model based on Proper Orthogonal Decomposition (POD), Empirical Cubature Method (ECM) and a geometrical transformation method with the following key features: (i) large shape variations of the microstructure are captured, (ii) only relatively small amounts of training data are necessary, and (iii) highly non-linear history-dependent behaviors are treated. The proposed framework is tested and examined in two numerical examples, involving two scales and large geometrical variations. In both cases, high speed-ups and accuracies are achieved while observing good extrapolation behavior.
A package query returns a package - a multiset of tuples - that maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of package queries to Integer Linear Programs (ILPs) and developed the SketchRefine algorithm for package query processing. While this algorithm was an important first step toward supporting prescriptive analytics scalably inside a relational database, it struggles when the data size grows beyond a few hundred million tuples or when the constraints become very tight. In this paper, we present Progressive Shading, a novel algorithm for processing package queries that can scale efficiently to billions of tuples and gracefully handle tight constraints. Progressive Shading solves a sequence of optimization problems over a hierarchy of relations, each resulting from an ever-finer partitioning of the original tuples into homogeneous groups until the original relation is obtained. This strategy avoids the premature discarding of high-quality tuples that can occur with SketchRefine. Our novel partitioning scheme, Dynamic Low Variance, can handle very large relations with multiple attributes and can dynamically adapt to both concentrated and spread-out sets of attribute values, provably outperforming traditional partitioning schemes such as KD-tree. We further optimize our system by replacing our off-the-shelf optimization software with customized ILP and LP solvers, called Dual Reducer and Parallel Dual Simplex respectively, that are highly accurate and orders of magnitude faster.
Trustworthy federated learning aims to achieve optimal performance while ensuring clients' privacy. Existing privacy-preserving federated learning approaches are mostly tailored for image data, lacking applications for time series data, which have many important applications, like machine health monitoring, human activity recognition, etc. Furthermore, protective noising on a time series data analytics model can significantly interfere with temporal-dependent learning, leading to a greater decline in accuracy. To address these issues, we develop a privacy-preserving federated learning algorithm for time series data. Specifically, we employ local differential privacy to extend the privacy protection trust boundary to the clients. We also incorporate shuffle techniques to achieve a privacy amplification, mitigating the accuracy decline caused by leveraging local differential privacy. Extensive experiments were conducted on five time series datasets. The evaluation results reveal that our algorithm experienced minimal accuracy loss compared to non-private federated learning in both small and large client scenarios. Under the same level of privacy protection, our algorithm demonstrated improved accuracy compared to the centralized differentially private federated learning in both scenarios.
A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data, and provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in the setting we introduce.
This paper investigates the problem of efficient constrained global optimization of hybrid models that are a composition of a known white-box function and an expensive multi-output black-box function subject to noisy observations, which often arises in real-world science and engineering applications. We propose a novel method, Constrained Upper Quantile Bound (CUQB), to solve such problems that directly exploits the composite structure of the objective and constraint functions that we show leads substantially improved sampling efficiency. CUQB is a conceptually simple, deterministic approach that avoid constraint approximations used by previous methods. Although the CUQB acquisition function is not available in closed form, we propose a novel differentiable sample average approximation that enables it to be efficiently maximized. We further derive bounds on the cumulative regret and constraint violation under a non-parametric Bayesian representation of the black-box function. Since these bounds depend sublinearly on the number of iterations under some regularity assumptions, we establis bounds on the convergence rate to the optimal solution of the original constrained problem. In contrast to most existing methods, CUQB further incorporates a simple infeasibility detection scheme, which we prove triggers in a finite number of iterations when the original problem is infeasible (with high probability given the Bayesian model). Numerical experiments on several test problems, including environmental model calibration and real-time optimization of a reactor system, show that CUQB significantly outperforms traditional Bayesian optimization in both constrained and unconstrained cases. Furthermore, compared to other state-of-the-art methods that exploit composite structure, CUQB achieves competitive empirical performance while also providing substantially improved theoretical guarantees.