Homomorphic encryption (HE), which allows computations on encrypted data, is an enabling technology for confidential cloud computing. One notable example is privacy-preserving Prediction-as-a-Service (PaaS), where machine-learning predictions are computed on encrypted data. However, developing HE-based solutions for encrypted PaaS is a tedious task which requires a careful design that predominantly depends on the deployment scenario and on leveraging the characteristics of modern HE schemes. Prior works on privacy-preserving PaaS focus solely on protecting the confidentiality of the client data uploaded to a remote model provider, e.g., a cloud offering a prediction API, and assume (or take advantage of the fact) that the model is held in plaintext. Furthermore, their aim is to either minimize the latency of the service by processing one sample at a time, or to maximize the number of samples processed per second, while processing a fixed (large) number of samples. In this work, we present slytHErin, an agile framework that enables privacy-preserving PaaS beyond the application scenarios considered in prior works. Thanks to its hybrid design leveraging HE and its multiparty variant (MHE), slytHErin enables novel PaaS scenarios by encrypting the data, the model or both. Moreover, slytHErin features a flexible input data packing approach that allows processing a batch of an arbitrary number of samples, and several computation optimizations that are model-and-setting-agnostic. slytHErin is implemented in Go and it allows end-users to perform encrypted PaaS on custom deep learning models comprising fully-connected, convolutional, and pooling layers, in a few lines of code and without having to worry about the cumbersome implementation and optimization concerns inherent to HE.
Pini and Vantini (2017) introduced the interval-wise testing procedure which performs local inference for functional data defined on an interval domain, where the output is an adjusted p-value function that controls for type I errors. We extend this idea to a general setting where domain is a Riemannian manifolds. This requires new methodology such as how to define adjustment sets on product manifolds and how to approximate the test statistic when the domain has non-zero curvature. We propose to use permutation tests for inference and apply the procedure in three settings: a simulation on a "chameleon-shaped" manifold and two applications related to climate change where the manifolds are a complex subset of $S^2$ and $S^2 \times S^1$, respectively. We note the tradeoff between type I and type II errors: increasing the adjustment set reduces the type I error but also results in smaller areas of significance. However, some areas still remain significant even at maximal adjustment.
We develop an algorithmic framework that finds an optimal solution by enumerating some feasible solutions, which number is bounded by a specially derived Variable Parameter (VP) with a favorable asymptotic behavior. We build a VP algorithm for a strongly $\mathsf{NP}$-hard single-machine scheduling problem. The target VP $\nu$ is the number of jobs with some special properties, the so-called emerging jobs. At phase 1 a partial solution including $n-\nu$ non-emerging jobs is constructed in a low degree polynomial time. At phase 2 less than $\nu!$ permutations of the $\nu$ emerging jobs are considered, each of them being incorporated into the partial schedule of phase 1. Based on an earlier conducted experimental study, in practice, $\nu/n$ varied from $1/4$ for small problem instances to $1/10$ for the largest tested instances. We illustrate how the proposed method can be used to build a polynomial-time approximation scheme (PTAS) with the worst-case time complexity $O(\kappa!\kappa k n \log n)$, where $\kappa$, $\kappa<\nu< n$, is a VP and the corresponding approximation factor is $1+1/k$, with $k\kappa<k$. This is better than the time complexity of the earlier known approximation schemes. Using an intuitive probabilistic model, we give more realistic bounds on the running time of the VP algorithm and the PTAS, which are far below the worst-case bounds $\nu!$ and $\kappa!$.
Homomorphic Encryption (HE) is a cryptographic tool that allows performing computation under encryption, which is used by many privacy-preserving machine learning solutions, for example, to perform secure classification. Modern deep learning applications yield good performance for example in image processing tasks benchmarks by including many skip connections. The latter appears to be very costly when attempting to execute model inference under HE. In this paper, we show that by replacing (mid-term) skip connections with (short-term) Dirac parameterization and (long-term) shared-source skip connection we were able to reduce the skip connections burden for HE-based solutions, achieving x1.3 computing power improvement for the same accuracy.
Over the last years, topic modeling has emerged as a powerful technique for organizing and summarizing big collections of documents or searching for particular patterns in them. However, privacy concerns may arise when cross-analyzing data from different sources. Federated topic modeling solves this issue by allowing multiple parties to jointly train a topic model without sharing their data. While several federated approximations of classical topic models do exist, no research has been conducted on their application for neural topic models. To fill this gap, we propose and analyze a federated implementation based on state-of-the-art neural topic modeling implementations, showing its benefits when there is a diversity of topics across the nodes' documents and the need to build a joint model. In practice, our approach is equivalent to a centralized model training, but preserves the privacy of the nodes. Advantages of this federated scenario are illustrated by means of experiments using both synthetic and real data scenarios.
To efficiently exploit the massive amounts of raw data that are increasingly being generated in mobile edge networks, federated learning (FL) has emerged as a promising distributed learning technique. By collaboratively training a shared learning model on edge devices, raw data transmission and storage are replaced by the exchange of the local computed parameters/gradients in FL, which thus helps address latency and privacy issues. However, the number of resource blocks when using traditional orthogonal transmission strategies for FL linearly scales with the number of participating devices, which conflicts with the scarcity of communication resources. To tackle this issue, over-the-air computation (AirComp) has emerged recently which leverages the inherent superposition property of wireless channels to perform one-shot model aggregation. However, the aggregation accuracy in AirComp suffers from the unfavorable wireless propagation environment. In this paper, we consider the use of intelligent reflecting surfaces (IRSs) to mitigate this problem and improve FL performance with AirComp. Specifically, a performance-oriented design scheme that directly minimizes the optimality gap of the loss function is proposed to accelerate the convergence of AirComp-based FL. We first analyze the convergence behavior of the FL procedure with the absence of channel fading and noise. Based on the obtained optimality gap which characterizes the impact of channel fading and noise in different communication rounds on the ultimate performance of FL, we propose both online and offline approaches to tackle the resulting design problem. Simulation results demonstrate that such a performance-oriented design strategy can achieve higher test accuracy than the conventional isolated mean square error (MSE) minimization approach in FL.
In secure machine learning inference, most of the schemes assume that the server is semi-honest (honestly following the protocol but attempting to infer additional information). However, the server may be malicious (e.g., using a low-quality model or deviating from the protocol) in the real world. Although a few studies have considered a malicious server that deviates from the protocol, they ignore the verification of model accuracy (where the malicious server uses a low-quality model) meanwhile preserving the privacy of both the server's model and the client's inputs. To address these issues, we propose \textit{Fusion}, where the client mixes the public samples (which have known query results) with their own samples to be queried as the inputs of multi-party computation to jointly perform the secure inference. Since a server that uses a low-quality model or deviates from the protocol can only produce results that can be easily identified by the client, \textit{Fusion} forces the server to behave honestly, thereby addressing all those aforementioned issues without leveraging expensive cryptographic techniques. Our evaluation indicates that \textit{Fusion} is 48.06$\times$ faster and uses 30.90$\times$ less communication than the existing maliciously secure inference protocol (which currently does not support the verification of the model accuracy). In addition, to show the scalability, we conduct ImageNet-scale inference on the practical ResNet50 model and it costs 8.678 minutes and 10.117 GiB of communication in a WAN setting, which is 1.18$\times$ faster and has 2.64$\times$ less communication than those of the semi-honest protocol.
Our goal is to develop an efficient contact detection algorithm for large-scale GPU-based simulation of non-convex objects. Current GPU-based simulators such as IsaacGym and Brax must trade-off speed with fidelity, generality, or both when simulating non-convex objects. Their main issue lies in contact detection (CD): existing CD algorithms, such as Gilbert-Johnson-Keerthi (GJK), must trade off their computational speed with accuracy which becomes expensive as the number of collisions among non-convex objects increases. We propose a data-driven approach for CD, whose accuracy depends only on the quality and quantity of offline dataset rather than online computation time. Unlike GJK, our method inherently has a uniform computational flow, which facilitates efficient GPU usage based on advanced compilers such as XLA (Accelerated Linear Algebra). Further, we offer a data-efficient solution by learning the patterns of colliding local crop object shapes, rather than global object shapes which are harder to learn. We demonstrate our approach improves the efficiency of existing CD methods by a factor of 5-10 for non-convex objects with comparable accuracy. Using the previous work on contact resolution for a neural-network-based contact detector, we integrate our CD algorithm into the open-source GPU-based simulator, Brax, and show that we can improve the efficiency over IsaacGym and generality over standard Brax. We highly recommend the videos of our simulator included in the supplementary materials.
Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries to leverage any exposed information. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for causal learning, suggesting a tighter integration with SCM. To this effect we present a theoretical analysis from first principles that establishes a novel connection between GNN and SCM while providing an extended view on general neural-causal models. We then establish a new model class for GNN-based causal inference that is necessary and sufficient for causal effect identification. Our empirical illustration on simulations and standard benchmarks validate our theoretical proofs.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.