In this work we consider the problem of releasing a differentially private statistical summary that resides on a Riemannian manifold. We present an extension of the Laplace or K-norm mechanism that utilizes intrinsic distances and volumes on the manifold. We also consider in detail the specific case where the summary is the Fr\'echet mean of data residing on a manifold. We demonstrate that our mechanism is rate optimal and depends only on the dimension of the manifold, not on the dimension of any ambient space, while also showing how ignoring the manifold structure can decrease the utility of the sanitized summary. We illustrate our framework in two examples of particular interest in statistics: the space of symmetric positive definite matrices, which is used for covariance matrices, and the sphere, which can be used as a space for modeling discrete distributions.
A method is developed here for building differentiable three-dimensional manifolds on multicube structures. This method constructs a sequence of reference metrics that determine differentiable structures on the cubic regions that serve as non-overlapping coordinate charts on these manifolds. It uses solutions to the two- and three-dimensional biharmonic equations in a sequence of steps that increase the differentiability of the reference metrics across the interfaces between cubic regions. This method is algorithmic and has been implemented in a computer code that automatically generates these reference metrics. Examples of three-manifolds constructed in this way are presented here, including representatives from five of the eight Thurston geometrization classes, plus the well-known Hantzsche-Wendt, the Poincare dodecahedral space, and the Seifert-Weber space.
Global Positioning Systems are now a standard module in mobile devices, and their ubiquity is fueling the rapid growth of location-based services (LBSs). This poses the risk of location privacy disclosure. Effective location privacy preservation is foremost for various mobile applications. Recently two strong privacy notions, geo-indistinguishability and expected inference error, are proposed based on statistical quantification. They are shown to be complementary for limiting the leakage of location information. In this paper, we argue that personalization means regionalization for geo-indistinguishability, and we propose a regionalized location obfuscation mechanism with personalized utility sensitivities. This substantially corrects the differential privacy problem of PIVE framework proposed by Yu, Liu and Pu on ISOC Network and Distributed System Security Symposium (NDSS) in 2017. Since PIVE fails to provide differential privacy guarantees on adaptive protection location set (PLS) as pointed in our previous work, we develop DPIVE with two phases. In Phase I, we determine disjoint sets by partitioning all possible positions such that different locations in the same set share the common PLS. In Phase II, we construct a probability distribution matrix by exponential mechanism in which the rows corresponding to the same PLS have their own sensitivity of utility (diameter of PLS). Moreover, we improve DPIVE with refined location partition and fine-grained personalization, in which each location has its own privacy level on two privacy control knobs, minimum inference error and differential privacy parameter. Experiments with two public datasets demonstrate that our mechanisms have the superior performance typically on skewed locations.
$\newcommand{\Emph}[1]{{\it{#1}}} \newcommand{\FF}{\mathcal{F}}\newcommand{\region}{\mathsf{r}}\newcommand{\restrictY}[2]{#1 \cap {#2}}$For a set of points $P \subseteq \mathbb{R}^2$, and a family of regions $\FF$, a $\Emph{local~t-spanner}$ of $P$, is a sparse graph $G$ over $P$, such that, for any region $\region \in \FF$, the subgraph restricted to $\region$, denoted by $\restrictY{G}{\region} = G_{P \cap \region}$, is a $t$-spanner for all the points of $\region \cap P$. We present algorithms for the construction of local spanners with respect to several families of regions, such as homothets of a convex region. Unfortunately, the number of edges in the resulting graph depends logarithmically on the spread of the input point set. We prove that this dependency can not be removed, thus settling an open problem raised by Abam and Borouny. We also show improved constructions (with no dependency on the spread) of local spanners for fat triangles, and regular $k$-gons. In particular, this improves over the known construction for axis parallel squares. We also study a somewhat weaker notion of local spanner where one allows to shrink the region a "bit". Any spanner is a weak local spanner if the shrinking is proportional to the diameter. Surprisingly, we show a near linear size construction of a weak spanner for axis-parallel rectangles, where the shrinkage is $\Emph{multiplicative}$.
Linear time-varying differential-algebraic equations with symmetries are studied. The structures that we address are self-adjoint and skew-adjoint systems. Local and global canonical forms under congruence are presented and used to classify the geometric properties of the flow associated with the differential equation as symplectic or generalized orthogonal flow. As applications, the results are applied to the analysis of dissipative Hamiltonian systems arising from circuit simulation and incompressible flow.
The inductive biases of graph representation learning algorithms are often encoded in the background geometry of their embedding space. In this paper, we show that general directed graphs can be effectively represented by an embedding model that combines three components: a pseudo-Riemannian metric structure, a non-trivial global topology, and a unique likelihood function that explicitly incorporates a preferred direction in embedding space. We demonstrate the representational capabilities of this method by applying it to the task of link prediction on a series of synthetic and real directed graphs from natural language applications and biology. In particular, we show that low-dimensional cylindrical Minkowski and anti-de Sitter spacetimes can produce equal or better graph representations than curved Riemannian manifolds of higher dimensions.
There has recently been increasing interest in learning representations of temporal knowledge graphs (KGs), which record the dynamic relationships between entities over time. Temporal KGs often exhibit multiple simultaneous non-Euclidean structures, such as hierarchical and cyclic structures. However, existing embedding approaches for temporal KGs typically learn entity representations and their dynamic evolution in the Euclidean space, which might not capture such intrinsic structures very well. To this end, we propose Dy- ERNIE, a non-Euclidean embedding approach that learns evolving entity representations in a product of Riemannian manifolds, where the composed spaces are estimated from the sectional curvatures of underlying data. Product manifolds enable our approach to better reflect a wide variety of geometric structures on temporal KGs. Besides, to capture the evolutionary dynamics of temporal KGs, we let the entity representations evolve according to a velocity vector defined in the tangent space at each timestamp. We analyze in detail the contribution of geometric spaces to representation learning of temporal KGs and evaluate our model on temporal knowledge graph completion tasks. Extensive experiments on three real-world datasets demonstrate significantly improved performance, indicating that the dynamics of multi-relational graph data can be more properly modeled by the evolution of embeddings on Riemannian manifolds.
Train machine learning models on sensitive user data has raised increasing privacy concerns in many areas. Federated learning is a popular approach for privacy protection that collects the local gradient information instead of real data. One way to achieve a strict privacy guarantee is to apply local differential privacy into federated learning. However, previous works do not give a practical solution due to three issues. First, the noisy data is close to its original value with high probability, increasing the risk of information exposure. Second, a large variance is introduced to the estimated average, causing poor accuracy. Last, the privacy budget explodes due to the high dimensionality of weights in deep learning models. In this paper, we proposed a novel design of local differential privacy mechanism for federated learning to address the abovementioned issues. It is capable of making the data more distinct from its original value and introducing lower variance. Moreover, the proposed mechanism bypasses the curse of dimensionality by splitting and shuffling model updates. A series of empirical evaluations on three commonly used datasets, MNIST, Fashion-MNIST and CIFAR-10, demonstrate that our solution can not only achieve superior deep learning performance but also provide a strong privacy guarantee at the same time.
The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments to time-series in general dimension. For $\ell_p$-products of Euclidean metrics, for any $p$, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fr\'echet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms, our algorithm is especially efficient when the length of the curves is bounded.
Alternating Direction Method of Multipliers (ADMM) is a widely used tool for machine learning in distributed settings, where a machine learning model is trained over distributed data sources through an interactive process of local computation and message passing. Such an iterative process could cause privacy concerns of data owners. The goal of this paper is to provide differential privacy for ADMM-based distributed machine learning. Prior approaches on differentially private ADMM exhibit low utility under high privacy guarantee and often assume the objective functions of the learning problems to be smooth and strongly convex. To address these concerns, we propose a novel differentially private ADMM-based distributed learning algorithm called DP-ADMM, which combines an approximate augmented Lagrangian function with time-varying Gaussian noise addition in the iterative process to achieve higher utility for general objective functions under the same differential privacy guarantee. We also apply the moments accountant method to bound the end-to-end privacy loss. The theoretical analysis shows that DP-ADMM can be applied to a wider class of distributed learning problems, is provably convergent, and offers an explicit utility-privacy tradeoff. To our knowledge, this is the first paper to provide explicit convergence and utility properties for differentially private ADMM-based distributed learning algorithms. The evaluation results demonstrate that our approach can achieve good convergence and model accuracy under high end-to-end differential privacy guarantee.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.