We develop a simple and generic method to analyze randomized rumor spreading processes in fully connected networks. In contrast to all previous works, which heavily exploit the precise definition of the process under investigation, we only need to understand the probability and the covariance of the events that uninformed nodes become informed. This universality allows us to easily analyze the classic push, pull, and push-pull protocols both in their pure version and in several variations such as messages failing with constant probability or nodes calling a random number of others each round. Some dynamic models can be analyzed as well, e.g., when the network is a $G(n,p)$ random graph sampled independently each round [Clementi et al. (ESA 2013)]. Despite this generality, our method determines the expected rumor spreading time precisely apart from additive constants, which is more precise than almost all previous works. We also prove tail bounds showing that a deviation from the expectation by more than an additive number of $r$ rounds occurs with probability at most $\exp(-\Omega(r))$. We further use our method to discuss the common assumption that nodes can answer any number of incoming calls. We observe that the restriction that only one call can be answered leads to a significant increase of the runtime of the push-pull protocol. In particular, the double logarithmic end phase of the process now takes logarithmic time. This also increases the message complexity from the asymptotically optimal $\Theta(n \log\log n)$ [Karp, Shenker, Schindelhauer, V\"ocking (FOCS 2000)] to $\Theta(n \log n)$. We propose a simple variation of the push-pull protocol that reverts back to the double logarithmic end phase and thus to the $\Theta(n \log\log n)$ message complexity.
Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging. In contrast to the tableau setting, one can not enumerate all the states and then iteratively update the policies for each state. This prevents the application of many well-studied RL methods especially those with provable convergence guarantees. In this paper, we first present a substantial generalization of the recently developed policy mirror descent method to deal with general state and action spaces. We introduce new approaches to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all. Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation techniques can be applied. We establish linear convergence rate to global optimality or sublinear convergence to stationarity for these methods applied to solve different classes of RL problems under exact policy evaluation. We then define proper notions of the approximation errors for policy evaluation and investigate their impact on the convergence of these methods applied to general-state RL problems with either finite-action or continuous-action spaces. To the best of our knowledge, the development of these algorithmic frameworks as well as their convergence analysis appear to be new in the literature.
Discovering causal relations from observational data is important. The existence of unobserved variables (e.g. latent confounding or mediation) can mislead the causal identification. To overcome this problem, proximal causal discovery methods attempted to adjust for the bias via the proxy of the unobserved variable. Particularly, hypothesis test-based methods proposed to identify the causal edge by testing the induced violation of linearity. However, these methods only apply to discrete data with strict level constraints, which limits their practice in the real world. In this paper, we fix this problem by extending the proximal hypothesis test to cases where the system consists of continuous variables. Our strategy is to present regularity conditions on the conditional distributions of the observed variables given the hidden factor, such that if we discretize its observed proxy with sufficiently fine, finite bins, the involved discretization error can be effectively controlled. Based on this, we can convert the problem of testing continuous causal relations to that of testing discrete causal relations in each bin, which can be effectively solved with existing methods. These non-parametric regularities we present are mild and can be satisfied by a wide range of structural causal models. Using both simulated and real-world data, we show the effectiveness of our method in recovering causal relations when unobserved variables exist.
Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained optimization task involving shape optimization of rotor blades in turbo-machinery.
We consider a network of $n$ user nodes that receives updates from a source and employs an age-based gossip protocol for faster dissemination of version updates to all nodes. When a node forwards its packet to another node, the packet information gets mutated with probability $p$ during transmission, creating misinformation. The receiver node does not know whether an incoming packet information is different from the packet information originally at the sender node. We assume that truth prevails over misinformation, and therefore, when a receiver encounters both accurate information and misinformation corresponding to the same version, the accurate information gets chosen for storage at the node. We study the expected fraction of nodes with correct information in the network and version age at the nodes in this setting using stochastic hybrid systems (SHS) modelling and study their properties. We observe that very high or very low gossiping rates help curb misinformation, and misinformation spread is higher with moderate gossiping rates. We support our theoretical findings with simulation results which shed further light on the behavior of above quantities.
In decentralized settings, the shuffle model of differential privacy has emerged as a promising alternative to the classical local model. Analyzing privacy amplification via shuffling is a critical component in both single-message and multi-message shuffle protocols. However, current methods used in these two areas are distinct and specific, making them less convenient for protocol designers and practitioners. In this work, we introduce variation-ratio reduction as a unified framework for privacy amplification analyses in the shuffle model. This framework utilizes total variation bounds of local messages and probability ratio bounds of other users' blanket messages, converting them to indistinguishable levels. Our results indicate that the framework yields tighter bounds for both single-message and multi-message encoders (e.g., with local DP, local metric DP, or multi-message randomizers). Specifically, for a broad range of local randomizers having extremal probability design, our amplification bounds are precisely tight. We also demonstrate that variation-ratio reduction is well-suited for parallel composition in the shuffle model and results in stricter privacy accounting for common sampling-based local randomizers. Our experimental findings show that, compared to existing amplification bounds, our numerical amplification bounds can save up to $30\%$ of the budget for single-message protocols, $75\%$ of the budget for multi-message protocols, and $75\%$-$95\%$ of the budget for parallel composition. Additionally, our implementation for numerical amplification bounds has only $\tilde{O}(n)$ complexity and is highly efficient in practice, taking just $2$ minutes for $n=10^8$ users. The code for our implementation can be found at \url{//github.com/wangsw/PrivacyAmplification}.
We study low sample complexity mechanisms in participatory budgeting (PB), where each voter votes for a preferred allocation of funds to various projects, subject to project costs and total spending constraints. We analyze the distortion that PB mechanisms introduce relative to the minimum-social-cost outcome in expectation. The Random Dictator mechanism for this problem obtains a distortion of 2. In a special case where every voter votes for exactly one project, [Fain et al '17] obtain a distortion of 4/3 We show that when PB outcomes are determined as any convex combination of the votes of two voters, the distortion is 2. When three uniformly randomly sampled votes are used, we give a PB mechanism that obtains a distortion of at most 1.66, thus breaking the barrier of 2 with the smallest possible sample complexity. We give a randomized Nash bargaining scheme where two uniformly randomly chosen voters bargain with the disagreement point as the vote of a voter chosen uniformly at random. This mechanism has a distortion of at most 1.66. We provide a lower bound of 1.38 for the distortion of this scheme. Further, we show that PB mechanisms that output a median of the votes of three voters chosen uniformly at random have a distortion of at most 1.80.
Bayesian optimization is a class of global optimization techniques. In Bayesian optimization, the underlying objective function is modeled as a realization of a Gaussian process. Although the Gaussian process assumption implies a random distribution of the Bayesian optimization outputs, quantification of this uncertainty is rarely studied in the literature. In this work, we propose a novel approach to assess the output uncertainty of Bayesian optimization algorithms, which proceeds by constructing confidence regions of the maximum point (or value) of the objective function. These regions can be computed efficiently, and their confidence levels are guaranteed by the uniform error bounds for sequential Gaussian process regression newly developed in the present work. Our theory provides a unified uncertainty quantification framework for all existing sequential sampling policies and stopping criteria.
Extended Boolean functions (EBFs) are one of the most important tools in cryptography and spreading sequence design in communication systems. In this paper, we use EBFs to design new sets of spreading sequences for non-orthogonal multiple access (NOMA), which is an emerging technique capable of supporting massive machine-type communications (mMTC) in 5G and beyond. In this work, first $p$-ary complementary sequences are constructed using EBFs and then, these sequences are used to design new sets of non-orthogonal spreading sequence sets having very low coherence and peak to average power ratio (PAPR). The proposed spreading sequence sets are capable of supporting a large number of active devices simultaneously. In fact, for a $p$-ary spreading sequence set, we theoretically achieve an overloading factor of $2p$, where $p$ is an odd prime. Specifically, for $p=3$, we achieve an overloading factor of $6$, which cannot be achieved through the existing constructions till date.
Conferences are ubiquitous in many industries, but how effectively they diffuse ideas has been debated and the mechanisms of diffusion unclear. Conference attendees can adopt ideas from presentations they choose to see (direct effect) or see incidentally (serendipity effect). We quantify these direct and serendipitous effects of academic conference presentations on future citations by exploiting quasi-random scheduling conflicts. When multiple papers of interest to an attendee are presented at the same time, the person is less able to see them on average and, if seeing presentations is important, cites them less. We use data from Confer, a scheduling application deployed at 25 in-person computer science conferences that lets users Like papers and receive personalized schedules. Compared to timeslots with many conflicts, users cited Liked papers with no scheduling conflicts 52% more. Users also cited non-Liked papers in sessions with no conflicts 51% more, and this serendipitous diffusion accounted for 22% of the overall diffusion induced by presentations. The study shows that conference presentations stimulate substantial direct and serendipitous diffusion of ideas, and adds the analysis of scheduling conflicts to the toolkit of management scholars.
Generative Adversarial Networks (GAN) have led to the generation of very realistic face images, which have been used in fake social media accounts and other disinformation matters that can generate profound impacts. Therefore, the corresponding GAN-face detection techniques are under active development that can examine and expose such fake faces. In this work, we aim to provide a comprehensive review of recent progress in GAN-face detection. We focus on methods that can detect face images that are generated or synthesized from GAN models. We classify the existing detection works into four categories: (1) deep learning-based, (2) physical-based, (3) physiological-based methods, and (4) evaluation and comparison against human visual performance. For each category, we summarize the key ideas and connect them with method implementations. We also discuss open problems and suggest future research directions.