The group testing problem consists of determining a small set of defective items from a larger set of items based on a number of possibly-noisy tests, and is relevant in applications such as medical testing, communication protocols, pattern matching, and more. We study the noisy version of this problem, where the outcome of each standard noiseless group test is subject to independent noise, corresponding to passing the noiseless result through a binary channel. We introduce a class of algorithms that we refer to as Near-Definite Defectives (NDD), and study bounds on the required number of tests for asymptotically vanishing error probability under Bernoulli random test designs. In addition, we study algorithm-independent converse results, giving lower bounds on the required number of tests under Bernoulli test designs. Under reverse Z-channel noise, the achievable rates and converse results match in a broad range of sparsity regimes, and under Z-channel noise, the two match in a narrower range of dense/low-noise regimes. We observe that although these two channels have the same Shannon capacity when viewed as a communication channel, they can behave quite differently when it comes to group testing. Finally, we extend our analysis of these noise models to a general binary noise model (including symmetric noise), and show improvements over known existing bounds in broad scaling regimes.
Given its status as a classic problem and its importance to both theoreticians and practitioners, edit distance provides an excellent lens through which to understand how the theoretical analysis of algorithms impacts practical implementations. From an applied perspective, the goals of theoretical analysis are to predict the empirical performance of an algorithm and to serve as a yardstick to design novel algorithms that perform well in practice. In this paper, we systematically survey the types of theoretical analysis techniques that have been applied to edit distance and evaluate the extent to which each one has achieved these two goals. These techniques include traditional worst-case analysis, worst-case analysis parametrized by edit distance or entropy or compressibility, average-case analysis, semi-random models, and advice-based models. We find that the track record is mixed. On one hand, two algorithms widely used in practice have been born out of theoretical analysis and their empirical performance is captured well by theoretical predictions. On the other hand, all the algorithms developed using theoretical analysis as a yardstick since then have not had any practical relevance. We conclude by discussing the remaining open problems and how they can be tackled.
Graph Convolutional Networks (GCNs) are one of the most popular architectures that are used to solve classification problems accompanied by graphical information. We present a rigorous theoretical understanding of the effects of graph convolutions in multi-layer networks. We study these effects through the node classification problem of a non-linearly separable Gaussian mixture model coupled with a stochastic block model. First, we show that a single graph convolution expands the regime of the distance between the means where multi-layer networks can classify the data by a factor of at least $1/\sqrt[4]{\mathbb{E}{\rm deg}}$, where $\mathbb{E}{\rm deg}$ denotes the expected degree of a node. Second, we show that with a slightly stronger graph density, two graph convolutions improve this factor to at least $1/\sqrt[4]{n}$, where $n$ is the number of nodes in the graph. Finally, we provide both theoretical and empirical insights into the performance of graph convolutions placed in different combinations among the layers of a network, concluding that the performance is mutually similar for all combinations of the placement. We present extensive experiments on both synthetic and real-world data that illustrate our results.
In this paper we generalize Dillon's switching method to characterize the exact $c$-differential uniformity of functions constructed via this method. More precisely, we modify some PcN/APcN and other functions with known $c$-differential uniformity in a controllable number of coordinates to render more such functions. We present several applications of the method in constructing PcN and APcN functions with respect to all $c\neq 1$. As a byproduct, we generalize some result of [Y. Wu, N. Li, X. Zeng, {\em New PcN and APcN functions over finite fields}, Designs Codes Crypt. 89 (2021), 2637--2651]. Computational results rendering functions with low differential uniformity, as well as, other good cryptographic properties are sprinkled throughout the paper.
Covariance estimation for matrix-valued data has received an increasing interest in applications. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are established, and the derived minimax lower bound shows our proposed estimator is rate-optimal under certain divergence regimes of matrix size. We further introduce a class of robust covariance estimators and provide theoretical guarantees to deal with heavy-tailed data. We demonstrate the superior finite-sample performance of our methods using simulations and real applications from a gridded temperature anomalies dataset and a S&P 500 stock data analysis.
Over the years, many graph problems specifically those in NP-complete are studied by a wide range of researchers. Some famous examples include graph colouring, travelling salesman problem and subgraph isomorphism. Most of these problems are typically addressed by exact algorithms, approximate algorithms and heuristics. There are however some drawback for each of these methods. Recent studies have employed learning-based frameworks such as machine learning techniques in solving these problems, given that they are useful in discovering new patterns in structured data that can be represented using graphs. This research direction has successfully attracted a considerable amount of attention. In this survey, we provide a systematic review mainly on classic graph problems in which learning-based approaches have been proposed in addressing the problems. We discuss the overview of each framework, and provide analyses based on the design and performance of the framework. Some potential research questions are also suggested. Ultimately, this survey gives a clearer insight and can be used as a stepping stone to the research community in studying problems in this field.
Multigrid is a powerful solver for large-scale linear systems arising from discretized partial differential equations. The convergence theory of multigrid methods for symmetric positive definite problems has been well developed over the past decades, while, for nonsymmetric problems, such theory is still not mature. As a foundation for multigrid analysis, two-grid convergence theory plays an important role in motivating multigrid algorithms. Regarding two-grid methods for nonsymmetric problems, most previous works focus on the spectral radius of iteration matrix or rely on convergence measures that are typically difficult to compute in practice. Moreover, the existing results are confined to two-grid methods with exact solution of the coarse-grid system. In this paper, we analyze the convergence of a two-grid method for nonsymmetric positive definite problems (e.g., linear systems arising from the discretizations of convection-diffusion equations). In the case of exact coarse solver, we establish an elegant identity for characterizing two-grid convergence factor, which is measured by a smoother-induced norm. The identity can be conveniently used to derive a class of optimal restriction operators and analyze how the convergence factor is influenced by restriction. More generally, we present some convergence estimates for an inexact variant of the two-grid method, in which both linear and nonlinear coarse solvers are considered.
There is a dearth of convergence results for differentially private federated learning (FL) with non-Lipschitz objective functions (i.e., when gradient norms are not bounded). The primary reason for this is that the clipping operation (i.e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update introduces bias depending on the clipping threshold and the number of local steps in FL, and analyzing this is not easy. For Lipschitz functions, the Lipschitz constant serves as a trivial clipping threshold with zero bias. However, Lipschitzness does not hold in many practical settings; moreover, verifying it and computing the Lipschitz constant is hard. Thus, the choice of the clipping threshold is non-trivial and requires a lot of tuning in practice. In this paper, we provide the first convergence result for private FL on smooth \textit{convex} objectives \textit{for a general clipping threshold} -- \textit{without assuming Lipschitzness}. We also look at a simpler alternative to clipping (for bounding sensitivity) which is \textit{normalization} -- where we use only a scaled version of the unit vector along the client updates, completely discarding the magnitude information. {The resulting normalization-based private FL algorithm is theoretically shown to have better convergence than its clipping-based counterpart on smooth convex functions. We corroborate our theory with synthetic experiments as well as experiments on benchmarking datasets.
There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the $L_1$ and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over $\{0,1\}^n$. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.
In the pooled data problem we are given a set of $n$ agents, each of which holds a hidden state bit, either $0$ or $1$. A querying procedure returns for a query set the sum of the states of the queried agents. The goal is to reconstruct the states using as few queries as possible. In this paper we consider two noise models for the pooled data problem. In the noisy channel model, the result for each agent flips with a certain probability. In the noisy query model, each query result is subject to random Gaussian noise. Our results are twofold. First, we present and analyze for both error models a simple and efficient distributed algorithm that reconstructs the initial states in a greedy fashion. Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability. Secondly, we present simulation results of our algorithm and compare its performance with approximate message passing (AMP) algorithms that are conjectured to be optimal in a number of related problems.
This paper takes a different approach for the distributed linear parameter estimation over a multi-agent network. The parameter vector is considered to be stochastic with a Gaussian distribution. The sensor measurements at each agent are linear and corrupted with additive white Gaussian noise. Under such settings, this paper presents a novel distributed estimation algorithm that fuses the the concepts of consensus and innovations by incorporating the consensus terms (of neighboring estimates) into the innovation terms. Under the assumption of distributed parameter observability, introduced in this paper, we design the optimal gain matrices such that the distributed estimates are consistent and achieves fast convergence.