The framework of distribution testing is currently ubiquitous in the field of property testing. In this model, the input is a probability distribution accessible via independently drawn samples from an oracle. The testing task is to distinguish a distribution that satisfies some property from a distribution that is far from satisfying it in the $\ell_1$ distance. The task of tolerant testing imposes a further restriction, that distributions close to satisfying the property are also accepted. This work focuses on the connection of the sample complexities of non-tolerant ("traditional") testing of distributions and tolerant testing thereof. When limiting our scope to label-invariant (symmetric) properties of distribution, we prove that the gap is at most quadratic. Conversely, the property of being the uniform distribution is indeed known to have an almost-quadratic gap. When moving to general, not necessarily label-invariant properties, the situation is more complicated, and we show some partial results. We show that if a property requires the distributions to be non-concentrated, then it cannot be non-tolerantly tested with $o(\sqrt{n})$ many samples, where $n$ denotes the universe size. Clearly, this implies at most a quadratic gap, because a distribution can be learned (and hence tolerantly tested against any property) using $\mathcal{O}(n)$ many samples. Being non-concentrated is a strong requirement on the property, as we also prove a close to linear lower bound against their tolerant tests. To provide evidence for other general cases (where the properties are not necessarily label-invariant), we show that if an input distribution is very concentrated, in the sense that it is mostly supported on a subset of size $s$ of the universe, then it can be learned using only $\mathcal{O}(s)$ many samples. The learning procedure adapts to the input, and works without knowing $s$ in advance.
Discretization of the uniform norm of functions from a given finite dimensional subspace of continuous functions is studied. We pay special attention to the case of trigonometric polynomials with frequencies from an arbitrary finite set with fixed cardinality. We give two different proofs of the fact that for any $N$-dimensional subspace of the space of continuous functions it is sufficient to use $e^{CN}$ sample points for an accurate upper bound for the uniform norm. Previous known results show that one cannot improve on the exponential growth of the number of sampling points for a good discretization theorem in the uniform norm. Also, we prove a general result, which connects the upper bound on the number of sampling points in the discretization theorem for the uniform norm with the best $m$-term bilinear approximation of the Dirichlet kernel associated with the given subspace. We illustrate application of our technique on the example of trigonometric polynomials.
A novel fault-tolerant computation technique based on array Belief Propagation (BP)-decodable XOR (BP-XOR) codes is proposed for distributed matrix-matrix multiplication. The proposed scheme is shown to be configurable and suited for modern hierarchical compute architectures such as Graphical Processing Units (GPUs) equipped with multiple nodes, whereby each has many small independent processing units with increased core-to-core communications. The proposed scheme is shown to outperform a few of the well--known earlier strategies in terms of total end-to-end execution time while in presence of slow nodes, called $stragglers$. This performance advantage is due to the careful design of array codes which distributes the encoding operation over the cluster (slave) nodes at the expense of increased master-slave communication. An interesting trade-off between end-to-end latency and total communication cost is precisely described. In addition, to be able to address an identified problem of scaling stragglers, an asymptotic version of array BP-XOR codes based on projection geometry is proposed at the expense of some computation overhead. A thorough latency analysis is conducted for all schemes to demonstrate that the proposed scheme achieves order-optimal computation in both the sublinear as well as the linear regimes in the size of the computed product from an end-to-end delay perspective.
We study solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type - master/workers (thus centralized) architectures and meshed (thus decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms solving the SPP. We show that a given suboptimality $\epsilon>0$ is achieved over master/workers networks in $\Omega\big(\Delta\cdot \delta/\mu\cdot \log (1/\varepsilon)\big)$ rounds of communications, where $\delta>0$ measures the degree of similarity of the local functions, $\mu$ is their strong convexity constant, and $\Delta$ is the diameter of the network. The lower communication complexity bound over meshed networks reads $\Omega\big(1/{\sqrt{\rho}} \cdot {\delta}/{\mu}\cdot\log (1/\varepsilon)\big)$, where $\rho$ is the (normalized) eigengap of the gossip matrix used for the communication between neighbouring nodes. We then propose algorithms matching the lower bounds over either types of networks (up to log-factors). We assess the effectiveness of the proposed algorithms on a robust logistic regression problem.
We investigate the complexity of computing the Zariski closure of a finitely generated group of matrices. The Zariski closure was previously shown to be computable by Derksen, Jeandel, and Koiran, but the termination argument for their algorithm appears not to yield any complexity bound. In this paper we follow a different approach and obtain a bound on the degree of the polynomials that define the closure. Our bound shows that the closure can be computed in elementary time. We also obtain upper bounds on the length of chains of linear algebraic groups, where all the groups are generated over a fixed number field.
We prove that the $f$-divergences between univariate Cauchy distributions are all symmetric, and can be expressed as strictly increasing scalar functions of the symmetric chi-squared divergence. We report the corresponding scalar functions for the total variation distance, the Kullback-Leibler divergence, the squared Hellinger divergence, and the Jensen-Shannon divergence among others. Next, we give conditions to expand the $f$-divergences as converging infinite series of higher-order power chi divergences, and illustrate the criterion for converging Taylor series expressing the $f$-divergences between Cauchy distributions. We then show that the symmetric property of $f$-divergences holds for multivariate location-scale families with prescribed matrix scales provided that the standard density is even which includes the cases of the multivariate normal and Cauchy families. However, the $f$-divergences between multivariate Cauchy densities with different scale matrices are shown asymmetric. Finally, we present several metrizations of $f$-divergences between univariate Cauchy distributions and further report geometric embedding properties of the Kullback-Leibler divergence.
We consider a standard distributed optimisation setting where $N$ machines, each holding a $d$-dimensional function $f_i$, aim to jointly minimise the sum of the functions $\sum_{i = 1}^N f_i (x)$. This problem arises naturally in large-scale distributed optimisation, where a standard solution is to apply variants of (stochastic) gradient descent. We focus on the communication complexity of this problem: our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the $N$ machines to solve this problem under point-to-point communication, within a given error-tolerance. Specifically, we show that $\Omega( Nd \log d / N\varepsilon)$ total bits need to be communicated between the machines to find an additive $\epsilon$-approximation to the minimum of $\sum_{i = 1}^N f_i (x)$. The result holds for both deterministic and randomised algorithms, and, importantly, requires no assumptions on the algorithm structure. The lower bound is tight under certain restrictions on parameter values, and is matched within constant factors for quadratic objectives by a new variant of quantised gradient descent, which we describe and analyse. Our results bring over tools from communication complexity to distributed optimisation, which has potential for further applications.
We consider a situation where the distribution of a random variable is being estimated by the empirical distribution of noisy measurements of that variable. This is common practice in, for example, teacher value-added models and other fixed-effect models for panel data. We use an asymptotic embedding where the noise shrinks with the sample size to calculate the leading bias in the empirical distribution arising from the presence of noise. The leading bias in the empirical quantile function is equally obtained. These calculations are new in the literature, where only results on smooth functionals such as the mean and variance have been derived. We provide both analytical and jackknife corrections that recenter the limit distribution and yield confidence intervals with correct coverage in large samples. Our approach can be connected to corrections for selection bias and shrinkage estimation and is to be contrasted with deconvolution. Simulation results confirm the much-improved sampling behavior of the corrected estimators. An empirical illustration on heterogeneity in deviations from the law of one price is equally provided.
Distributed certification, whether it be proof-labeling schemes, locally checkable proofs, etc., deals with the issue of certifying the legality of a distributed system with respect to a given boolean predicate. A certificate is assigned to each process in the system by a non-trustable oracle, and the processes are in charge of verifying these certificates, so that two properties are satisfied: completeness, i.e., for every legal instance, there is a certificate assignment leading all processes to accept, and soundness, i.e., for every illegal instance, and for every certificate assignment, at least one process rejects. The verification of the certificates must be fast, and the certificates themselves must be small. A large quantity of results have been produced in this framework, each aiming at designing a distributed certification mechanism for specific boolean predicates. This paper presents a "meta-theorem", applying to many boolean predicates at once. Specifically, we prove that, for every boolean predicate on graphs definable in the monadic second-order (MSO) logic of graphs, there exists a distributed certification mechanism using certificates on $O(\log^2n)$ bits in $n$-node graphs of bounded treewidth, with a verification protocol involving a single round of communication between neighbors.
The analysis of the spectral features of a Toeplitz matrix-sequence $\left\{T_{n}(f)\right\}_{n\in\mathbb N}$, generated by a symbol $f\in L^1([-\pi,\pi])$, real-valued almost everywhere (a.e.), has been provided in great detail in the last century, as well as the study of the conditioning, when $f$ is nonnegative a.e. Here we consider a novel type of problem arising in the numerical approximation of distributed-order fractional differential equations (FDEs), where the matrices under consideration take the form \[ \mathcal{T}_{n}=c_0T_{n}(f_0)+c_{1} h^h T_{n}(f_{1})+c_{2} h^{2h} T_{n}(f_{2})+\cdots+c_{n-1} h^{(n-1)h}T_{n}(f_{n-1}), \] $c_0,c_{1},\ldots, c_{n-1} \in [c_*,c^*]$, $c^*\ge c_*>0$, independent of $n$, $h=\frac{1}{n}$, $f_j\sim g_j$, $g_j=|\theta|^{2-jh}$, $j=0,\ldots,n-1$. Since the resulting generating function depends on $n$, the standard theory cannot be applied and the analysis has to be performed using new ideas. Few selected numerical experiments are presented, also in connection with matrices that come from distributed-order FDE problems, and the adherence with the theoretical analysis is discussed together with open questions and future investigations.
We consider the problem of detecting latent community information of mixed membership weighted network in which nodes have mixed memberships and edges connecting between nodes can be finite real numbers. We propose a general mixed membership distribution-free model for this problem. The model has no distribution constraints of edges but only the expected values, and can be viewed as generalizations of some previous models. We use an efficient spectral algorithm to estimate community memberships under the model. We also derive the convergence rate of the proposed algorithm under the model using delicate spectral analysis. We demonstrate the advantages of mixed membership distribution-free model with applications to a small scale of simulated networks when edges follow different distributions.