In this paper, we present distributed fault-tolerant algorithms that approximate the centroid of a set of $n$ data points in $\mathbb{R}^d$. Our work falls into the broader area of approximate multidimensional Byzantine agreement. The standard approach used in existing algorithms is to agree on a vector inside the convex hull of all correct vectors. This strategy dismisses many possibly correct data points. As a result, the algorithm does not necessarily agree on a representative value. To find better convergence strategies for the algorithms, we use the novel concept of defining an approximation of the centroid in the presence of Byzantine adversaries. We show that the standard agreement algorithms do not allow us to compute a better approximation than $2d$ of the centroid in the synchronous case. We investigate the trade-off between the quality of the approximation, the resilience of the algorithm, and the validity of the solution in order to design better approximation algorithms. For the synchronous case, we show that it is possible to achieve an optimal approximation of the centroid with up to $t<n/(d+1)$ Byzantine data points. This approach however does not give any guarantee on the validity of the solution. Therefore, we develop a second approach that reaches a $2\sqrt{d}$-approximation of the centroid, while satisfying the standard validity condition for agreement protocols. We are even able to restrict the validity condition to agreement inside the box of correct data points, while achieving optimal resilience of $t< n/3$. For the asynchronous case, we can adapt all three algorithms to reach the same approximation results (up to a constant factor). Our results suggest that it is reasonable to study the trade-off between validity conditions and the quality of the solution.
The convex hull of a data set $P$ is the smallest convex set that contains $P$. In this work, we present a new data structure for convex hull, that allows for efficient dynamic updates. In a dynamic convex hull implementation, the following traits are desirable: (1) algorithms for efficiently answering queries as to whether a specified point is inside or outside the hull, (2) adhering to geometric robustness, and (3) algorithmic simplicity.Furthermore, a specific but well-motivated type of two-dimensional data is rank-based data. Here, the input is a set of real-valued numbers $Y$ where for any number $y\in Y$ its rank is its index in $Y$'s sorted order. Each value in $Y$ can be mapped to a point $(rank, value)$ to obtain a two-dimensional point set. In this work, we give an efficient, geometrically robust, dynamic convex hull algorithm, that facilitates queries to whether a point is internal. Furthermore, our construction can be used to efficiently update the convex hull of rank-ordered data, when the real-valued point set is subject to insertions and deletions. Our improved solution is based on an algorithmic simplification of the classical convex hull data structure by Overmars and van Leeuwen~[STOC'80], combined with new algorithmic insights. Our theoretical guarantees on the update time match those of Overmars and van Leeuwen, namely $O(\log^2 |P|)$, while we allow a wider range of functionalities (including rank-based data). Our algorithmic simplification includes simplifying an 11-case check down to a 3-case check that can be written in 20 lines of easily readable C-code. We extend our solution to provide a trade-off between theoretical guarantees and the practical performance of our algorithm. We test and compare our solutions extensively on inputs that were generated randomly or adversarially, including benchmarking datasets from the literature.
In this paper, we study the coded caching scheme for the $(K,L,M_{\text{T}},M_{\text{U}},N)$ partially connected linear network, where there are $N$ files each of which has an equal size, $K+L-1$ transmitters and $K$ users; each user and transmitter caches at most $M_{\text{U}}$ and $M_{\text{T}}$ files respectively; each user cyclically communicates with $L$ transmitters. The goal is to design caching and delivery schemes to reduce the transmission latency measured by the metric normalized delivery time (NDT). By delicately designing the data placement of the transmitters and users according to the topology, we show that a combinatorial structure called multiple-antenna placement delivery array (MAPDA), which was originally proposed for the multiple-input single-output broadcast channels, can be also used to design schemes for the partially connected linear network. Then, based on existing MAPDAs and our constructing approach, we propose new schemes that achieve the optimal NDT when $ {M_\text{T}}+ {M_\text{U}}\geq N$ and smaller NDT than that of the existing schemes when (${M_\text{T}}+ {M_\text{U}}\leq N$, $\frac{M_\text{U}}{N}+\frac{M_\text{T}}{N} \frac{L}{K}\left\lceil \frac{K}{L} \right\rceil \geq 1$) or ($ {M_\text{U}}+ {M_\text{T}}< N, \frac{K}{L}\notin\mathbb{Z}^+$). Moreover, our schemes operate in one-shot linear delivery and significantly reduce the subpacketizations compared to the existing scheme, which implies that our schemes have a wider range of applications and lower complexity of implementation.
We consider the problem of testing and learning quantum $k$-juntas: $n$-qubit unitary matrices which act non-trivially on just $k$ of the $n$ qubits and as the identity on the rest. As our main algorithmic results, we give (a) a $\widetilde{O}(\sqrt{k})$-query quantum algorithm that can distinguish quantum $k$-juntas from unitary matrices that are "far" from every quantum $k$-junta; and (b) a $O(4^k)$-query algorithm to learn quantum $k$-juntas. We complement our upper bounds for testing quantum $k$-juntas and learning quantum $k$-juntas with near-matching lower bounds of $\Omega(\sqrt{k})$ and $\Omega(\frac{4^k}{k})$, respectively. Our techniques are Fourier-analytic and make use of a notion of influence of qubits on unitaries.
We propose a novel and practical privacy notion called $f$-Membership Inference Privacy ($f$-MIP), which explicitly considers the capabilities of realistic adversaries under the membership inference attack threat model. Consequently, $f$-MIP offers interpretable privacy guarantees and improved utility (e.g., better classification accuracy). In particular, we derive a parametric family of $f$-MIP guarantees that we refer to as $\mu$-Gaussian Membership Inference Privacy ($\mu$-GMIP) by theoretically analyzing likelihood ratio-based membership inference attacks on stochastic gradient descent (SGD). Our analysis highlights that models trained with standard SGD already offer an elementary level of MIP. Additionally, we show how $f$-MIP can be amplified by adding noise to gradient updates. Our analysis further yields an analytical membership inference attack that offers two distinct advantages over previous approaches. First, unlike existing state-of-the-art attacks that require training hundreds of shadow models, our attack does not require any shadow model. Second, our analytical attack enables straightforward auditing of our privacy notion $f$-MIP. Finally, we quantify how various hyperparameters (e.g., batch size, number of model parameters) and specific data characteristics determine an attacker's ability to accurately infer a point's membership in the training set. We demonstrate the effectiveness of our method on models trained on vision and tabular datasets.
We propose a new algorithm for efficiently solving the damped Fisher matrix in large-scale scenarios where the number of parameters significantly exceeds the number of available samples. This problem is fundamental for natural gradient descent and stochastic reconfiguration. Our algorithm is based on Cholesky decomposition and is generally applicable. Benchmark results show that the algorithm is significantly faster than existing methods.
Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the rows and columns or the full collection of all valid orderings and valid matrices. We show that there is a constant $c > 0$ such that the algorithm terminates in $O(n^2)$ time with high probability and in expectation for random $n \times n$ binary matrices with i.i.d.\ Bernoulli $(p)$ entries $(m_{ij})_{ij=1}^n$ such that $\frac{c\log^2(n)}{n(\log\log(n))^2} \leq p \leq \frac{1}{2}$.
We study the problem of testing whether a symmetric $d \times d$ input matrix $A$ is symmetric positive semidefinite (PSD), or is $\epsilon$-far from the PSD cone, meaning that $\lambda_{\min}(A) \leq - \epsilon \|A\|_p$, where $\|A\|_p$ is the Schatten-$p$ norm of $A$. In applications one often needs to quickly tell if an input matrix is PSD, and a small distance from the PSD cone may be tolerable. We consider two well-studied query models for measuring efficiency, namely, the matrix-vector and vector-matrix-vector query models. We first consider one-sided testers, which are testers that correctly classify any PSD input, but may fail on a non-PSD input with a tiny failure probability. Up to logarithmic factors, in the matrix-vector query model we show a tight $\widetilde{\Theta}(1/\epsilon^{p/(2p+1)})$ bound, while in the vector-matrix-vector query model we show a tight $\widetilde{\Theta}(d^{1-1/p}/\epsilon)$ bound, for every $p \geq 1$. We also show a strong separation between one-sided and two-sided testers in the vector-matrix-vector model, where a two-sided tester can fail on both PSD and non-PSD inputs with a tiny failure probability. In particular, for the important case of the Frobenius norm, we show that any one-sided tester requires $\widetilde{\Omega}(\sqrt{d}/\epsilon)$ queries. However we introduce a bilinear sketch for two-sided testing from which we construct a Frobenius norm tester achieving the optimal $\widetilde{O}(1/\epsilon^2)$ queries. We also give a number of additional separations between adaptive and non-adaptive testers. Our techniques have implications beyond testing, providing new methods to approximate the spectrum of a matrix with Frobenius norm error using dimensionality reduction in a way that preserves the signs of eigenvalues.
For some $\epsilon > 10^{-36}$ we give a randomized $3/2-\epsilon$ approximation algorithm for metric TSP.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at //github.com/happynear/AMSoftmax