We analyze to what extent final users can infer information about the level of protection of their data when the data obfuscation mechanism is a priori unknown to them (the so-called ''black-box'' scenario). In particular, we delve into the investigation of two notions of local differential privacy (LDP), namely {\epsilon}-LDP and R\'enyi LDP. On one hand, we prove that, without any assumption on the underlying distributions, it is not possible to have an algorithm able to infer the level of data protection with provable guarantees; this result also holds for the central versions of the two notions of DP considered. On the other hand, we demonstrate that, under reasonable assumptions (namely, Lipschitzness of the involved densities on a closed interval), such guarantees exist and can be achieved by a simple histogram-based estimator. We validate our results experimentally and we note that, on a particularly well-behaved distribution (namely, the Laplace noise), our method gives even better results than expected, in the sense that in practice the number of samples needed to achieve the desired confidence is smaller than the theoretical bound, and the estimation of {\epsilon} is more precise than predicted.
We study the concurrent composition properties of interactive differentially private mechanisms, whereby an adversary can arbitrarily interleave its queries to the different mechanisms. We prove that all composition theorems for non-interactive differentially private mechanisms extend to the concurrent composition of interactive differentially private mechanisms, whenever differential privacy is measured using the hypothesis testing framework of $f$-DP, which captures standard $(\eps,\delta)$-DP as a special case. We prove the concurrent composition theorem by showing that every interactive $f$-DP mechanism can be simulated by interactive post-processing of a non-interactive $f$-DP mechanism. In concurrent and independent work, Lyu~\cite{lyu2022composition} proves a similar result to ours for $(\eps,\delta)$-DP, as well as a concurrent composition theorem for R\'enyi DP. We also provide a simple proof of Lyu's concurrent composition theorem for R\'enyi DP. Lyu leaves the general case of $f$-DP as an open problem, which we solve in this paper.
A continuous-time average consensus system is a linear dynamical system defined over a graph, where each node has its own state value that evolves according to a simultaneous linear differential equation. A node is allowed to interact with neighboring nodes. Average consensus is a phenomenon that the all the state values converge to the average of the initial state values. In this paper, we assume that a node can communicate with neighboring nodes through an additive white Gaussian noise channel. We first formulate the noisy average consensus system by using a stochastic differential equation (SDE), which allows us to use the Euler-Maruyama method, a numerical technique for solving SDEs. By studying the stochastic behavior of the residual error of the Euler-Maruyama method, we arrive at the covariance evolution equation. The analysis of the residual error leads to a compact formula for mean squared error (MSE), which shows that the sum of the inverse eigenvalues of the Laplacian matrix is the most dominant factor influencing the MSE. Furthermore, we propose optimization problems aimed at minimizing the MSE at a given target time, and introduce a deep unfolding-based optimization method to solve these problems. The quality of the solution is validated by numerical experiments.
Long-run covariance matrix estimation is the building block of time series inference problems. The corresponding difference-based estimator, which avoids detrending, has attracted considerable interest due to its robustness to both smooth and abrupt structural breaks and its competitive finite sample performance. However, existing methods mainly focus on estimators for the univariate process while their direct and multivariate extensions for most linear models are asymptotically biased. We propose a novel difference-based and debiased long-run covariance matrix estimator for functional linear models with time-varying regression coefficients, allowing time series non-stationarity, long-range dependence, state-heteroscedasticity and their mixtures. We apply the new estimator to i) the structural stability test, overcoming the notorious non-monotonic power phenomena caused by piecewise smooth alternatives for regression coefficients, and (ii) the nonparametric residual-based tests for long memory, improving the performance via the residual-free formula of the proposed estimator. The effectiveness of the proposed method is justified theoretically and demonstrated by superior performance in simulation studies, while its usefulness is elaborated by means of real data analysis.
Here, we investigate whether (and how) experimental design could aid in the estimation of the precision matrix in a Gaussian chain graph model, especially the interplay between the design, the effect of the experiment and prior knowledge about the effect. Estimation of the precision matrix is a fundamental task to infer biological graphical structures like microbial networks. We compare the marginal posterior precision of the precision matrix under four priors: flat, conjugate Normal-Wishart, Normal-MGIG and a general independent. Under the flat and conjugate priors, the Laplace-approximated posterior precision is not a function of the design matrix rendering useless any efforts to find an optimal experimental design to infer the precision matrix. In contrast, the Normal-MGIG and general independent priors do allow for the search of optimal experimental designs, yet there is a sharp upper bound on the information that can be extracted from a given experiment. We confirm our theoretical findings via a simulation study comparing i) the KL divergence between prior and posterior and ii) the Stein's loss difference of MAPs between random and no experiment. Our findings provide practical advice for domain scientists conducting experiments to better infer the precision matrix as a representation of a biological network.
Stochastic gradient descent (SGD) is a scalable and memory-efficient optimization algorithm for large datasets and stream data, which has drawn a great deal of attention and popularity. The applications of SGD-based estimators to statistical inference such as interval estimation have also achieved great success. However, most of the related works are based on i.i.d. observations or Markov chains. When the observations come from a mixing time series, how to conduct valid statistical inference remains unexplored. As a matter of fact, the general correlation among observations imposes a challenge on interval estimation. Most existing methods may ignore this correlation and lead to invalid confidence intervals. In this paper, we propose a mini-batch SGD estimator for statistical inference when the data is $\phi$-mixing. The confidence intervals are constructed using an associated mini-batch bootstrap SGD procedure. Using ``independent block'' trick from \cite{yu1994rates}, we show that the proposed estimator is asymptotically normal, and its limiting distribution can be effectively approximated by the bootstrap procedure. The proposed method is memory-efficient and easy to implement in practice. Simulation studies on synthetic data and an application to a real-world dataset confirm our theory.
In this paper we model the size-effects of metamaterial beams under bending with the aid of the relaxed micromorphic continuum. We analyze first the size-dependent bending stiffness of heterogeneous fully discretized metamaterial beams subjected to pure bending loads. Two equivalent loading schemes are introduced which lead to a constant moment along the beam length with no shear force. The relaxed micromorphic model is employed then to retrieve the size-effects. We present a procedure for the determination of the material parameters of the relaxed micromorphic model based on the fact that the model operates between two well-defined scales. These scales are given by linear elasticity with micro and macro elasticity tensors which bound the relaxed micromorphic continuum from above and below, respectively. The micro elasticity tensor is specified as the maximum possible stiffness that is exhibited by the assumed metamaterial while the macro elasticity tensor is given by standard periodic first-order homogenization. For the identification of the micro elasticity tensor, two different approaches are shown which rely on affine and non-affine Dirichlet boundary conditions of candidate unit cell variants with the possible stiffest response. The consistent coupling condition is shown to allow the model to act on the whole intended range between macro and micro elasticity tensors for both loading cases. We fit the relaxed micromorphic model against the fully resolved metamaterial solution by controlling the curvature magnitude after linking it with the specimen's size. The obtained parameters of the relaxed micromorphic model are tested for two additional loading scenarios.
We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional manifold, is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise, which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct a thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly.
This tutorial studies relationships between differential privacy and various information-theoretic measures by using several selective articles. In particular, we present how these connections can provide new interpretations for the privacy guarantee in systems that deploy differential privacy in an information-theoretic framework. To this end, the tutorial provides an extensive summary on the existing literature that makes use of information-theoretic measures and tools such as mutual information, min-entropy, Kullback-Leibler divergence and rate-distortion function for quantification and characterization of differential privacy in various settings.
Images are increasingly being shared by software developers in diverse channels including question-and-answer forums like Stack Overflow. Although prior work has pointed out that these images are meaningful and provide complementary information compared to their associated text, how images are used to support questions is empirically unknown. To address this knowledge gap, in this paper we specifically conduct an empirical study to investigate (I) the characteristics of images, (II) the extent to which images are used in different question types, and (III) the role of images on receiving answers. Our results first show that user interface is the most common image content and undesired output is the most frequent purpose for sharing images. Moreover, these images essentially facilitate the understanding of 68% of sampled questions. Second, we find that discrepancy questions are more relatively frequent compared to those without images, but there are no significant differences observed in description length in all types of questions. Third, the quantitative results statistically validate that questions with images are more likely to receive accepted answers, but do not speed up the time to receive answers. Our work demonstrates the crucial role that images play by approaching the topic from a new angle and lays the foundation for future opportunities to use images to assist in tasks like generating questions and identifying question-relatedness.
The solution of the governing equation representing the drawdown in a horizontal confined aquifer, where groundwater flow is unsteady, is provided in terms of the exponential integral, which is famously known as the Well function. For the computation of this function in practical applications, it is important to develop not only accurate but also a simple approximation that requires evaluation of the fewest possible terms. To that end, introducing Ramanujan's series expression, this work proposes a full-range approximation to the exponential integral using Ramanujan's series for the small argument (u \leq 1) and an approximation based on the bound of the integral for the other range (u \in (1,100]). The evaluation of the proposed approximation results in the most accurate formulae compared to the existing studies, which possess the maximum percentage error of 0.05\%. Further, the proposed formula is much simpler to apply as it contains just the product of exponential and logarithm functions. To further check the efficiency of the proposed approximation, we consider a practical example for evaluating the discrete pumping kernel, which shows the superiority of this approximation over the others. Finally, the authors hope that the proposed efficient approximation can be useful for groundwater and hydrogeological applications.