Suppose Alice has a distribution $P$ and Bob has a distribution $Q$. Alice wants to generate a sample $a\sim P$ and Bob a sample $b \sim Q$ such that $a = b$ with has as high of probability as possible. It is well-known that, by sampling from an optimal coupling between the distributions, Alice and Bob can achieve $Pr[a = b] = 1 - D_{TV}(P,Q)$, where $D_{TV}(P,Q)$ is the total variation distance. What if Alice and Bob must solve this same problem without communicating at all? Perhaps surprisingly, with access to public randomness, they can still achieve $Pr[a=b] \geq \frac{1-D_{TV}(P,Q)}{1+D_{TV}(P,Q)} \geq 1-2D_{TV}(P,Q)$. In fact, this bound can be obtained using a simple protocol based on the Weighted MinHash algorithm. In this work, we explore the communication-free coupling problem in greater depth. First, we show that an equally simple protocol based on Gumbel sampling matches the worst-case guarantees of the Weighted MinHash approach, but tends to perform better in practice. Conversely, we prove that both approaches are actually sharp: no communication-free protocol can achieve $Pr[a=b]>\frac{1-D_{TV}(P,Q)}{1+D_{TV}(P,Q)}$ in the worst-case. Finally, we prove that, for distributions over $n$ items, there exists a scheme that uses just $O(\log(n/\epsilon))$ bits of communication to achieve $Pr[a = b] = 1 - D_{TV}(P,Q) - \epsilon$, i.e. to essentially match optimal coupling. Beyond our theoretical results, we demonstrate an application of communication-free coupling to speculative decoding, a recent method for accelerating autoregressive large language models [Leviathan, Kalman, Matias, ICML 2023]. We show that communication-free protocols yield a variant of speculative decoding that we call Drafter-Invariant Speculative Decoding, which has the desirable property that the output of the method is fixed given a fixed random seed, regardless of what drafter is used for speculation.
The random walk $d$-ary cuckoo hashing algorithm was defined by Fotakis, Pagh, Sanders, and Spirakis to generalize and improve upon the standard cuckoo hashing algorithm of Pagh and Rodler. Random walk $d$-ary cuckoo hashing has low space overhead, guaranteed fast access, and fast in practice insertion time. In this paper, we give a theoretical insertion time bound for this algorithm. More precisely, for every $d\ge 3$ hashes, let $c_d^*$ be the sharp threshold for the load factor at which a valid assignment of $cm$ objects to a hash table of size $m$ likely exists. We show that for any $d\ge 4$ hashes and load factor $c<c_d^*$, the expectation of the random walk insertion time is $O(1)$, that is, a constant depending only on $d$ and $c$ but not $m$.
We prove that the constructive and intuitionistic variants of the modal logic $\mathsf{KB}$ coincide. This result contrasts with a recent result by Das and Marin, who showed that the constructive and intuitionistic variants of $\mathsf{K}$ do not prove the same diamond-free formulas.
We introduce the concept of an imprecise Markov semigroup $\mathbf{Q}$. It is a tool that allows to represent ambiguity around both the initial and the transition probabilities of a Markov process via a compact collection of plausible Markov semigroups, each associated with a (different, plausible) Markov process. We use techniques from geometry, functional analysis, and (high dimensional) probability to study the ergodic behavior of $\mathbf{Q}$. We show that, if the initial distribution of the Markov processes associated with the elements of $\mathbf{Q}$ is known and invariant, under some conditions that also involve the geometry of the state space, eventually the ambiguity around their transition probability fades. We call this property ergodicity of the imprecise Markov semigroup, and we relate it to the classical notion of ergodicity. We prove ergodicity both when the state space is Euclidean or a Riemannian manifold, and when it is an arbitrary measurable space. The importance of our findings for the fields of machine learning and computer vision is also discussed.
Existing Curry-Howard interpretations of call-by-value evaluation for the $\lambda$-calculus involve classical logic or linear logic, despite the fact that call-by-value was introduced in an intuitionistic setting without linear features. This paper shows that the most basic sequent calculus for minimal intuitionistic logic -- dubbed here vanilla -- can naturally be seen as a logical interpretation of call-by-value evaluation. This is obtained by establishing mutual simulations with a well-known formalism for call-by-value evaluation.
Street Scene Semantic Understanding (denoted as TriSU) is a complex task for autonomous driving (AD). However, inference model trained from data in a particular geographical region faces poor generalization when applied in other regions due to inter-city data domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization by collaborative privacy-preserving training over distributed datasets from different cities. Unfortunately, it suffers from slow convergence because data from different cities are with disparate statistical properties. Going beyond existing HFL methods, we propose a Gaussian heterogeneous HFL algorithm (FedGau) to address inter-city data heterogeneity so that convergence can be accelerated. In the proposed FedGau algorithm, both single RGB image and RGB dataset are modelled as Gaussian distributions for aggregation weight design. This approach not only differentiates each RGB image by respective statistical distribution, but also exploits the statistics of dataset from each city in addition to the conventionally considered data volume. With the proposed approach, the convergence is accelerated by 35.5\%-40.6\% compared to existing state-of-the-art (SOTA) HFL methods. On the other hand, to reduce the involved communication resource, we further introduce a novel performance-aware adaptive resource scheduling (AdapRS) policy. Unlike the traditional static resource scheduling policy that exchanges a fixed number of models between two adjacent aggregations, AdapRS adjusts the number of model aggregation at different levels of HFL so that unnecessary communications are minimized. Extensive experiments demonstrate that AdapRS saves 29.65\% communication overhead compared to conventional static resource scheduling policy while maintaining almost the same performance.
We study Clustered Planarity with Linear Saturators, which is the problem of augmenting an $n$-vertex planar graph whose vertices are partitioned into independent sets (called clusters) with paths - one for each cluster - that connect all the vertices in each cluster while maintaining planarity. We show that the problem can be solved in time $2^{O(n)}$ for both the variable and fixed embedding case. Moreover, we show that it can be solved in subexponential time $2^{O(\sqrt{n}\log n)}$ in the fixed embedding case if additionally the input graph is connected. The latter time complexity is tight under the Exponential-Time Hypothesis. We also show that $n$ can be replaced with the vertex cover number of the input graph by providing a linear (resp. polynomial) kernel for the variable-embedding (resp. fixed-embedding) case; these results contrast the NP-hardness of the problem on graphs of bounded treewidth (and even on trees). Finally, we complement known lower bounds for the problem by showing that Clustered Planarity with Linear Saturators is NP-hard even when the number of clusters is at most $3$, thus excluding the algorithmic use of the number of clusters as a parameter.
We can define the error distribution as the limiting distribution of the error between the solution $Y$ of a given stochastic differential equation (SDE) and its numerical approximation $\hat{Y}^{(m)}$, weighted by the convergence rate between the two. A goal when studying the error distribution is to provide a way of determination for error distributions for any SDE and numerical scheme that converge to the exact solution. By dividing the error into a main term and a remainder term in a particular way, the author shows that the remainder term can be negligible compared to the main term under certain suitable conditions. Under these conditions, deriving the error distribution reduces to deriving the limiting distribution of the main term. Even if the dimension is one, there are unsolved problems about the asymptotic behavior of the error when the SDE has a drift term and $0<H\leq 1/3$, but our result in the one-dimensional case can be adapted to any Hurst exponent. The main idea of the proof is to define a stochastic process $Y^{m, \rho}$ with the parameter $\rho$ interpolating between $Y$ and $\hat{Y}^{(m)}$ and to estimate the asymptotic expansion for it. Using this estimate, we determine the error distribution of the ($k$)-Milstein scheme and of the Crank-Nicholson scheme in unsolved cases.
Fog computing is of particular interest to Internet of Things (IoT), where inexpensive simple devices can offload their computation tasks to nearby Fog Nodes. Online scheduling in such fog networks is challenging due to stochastic network states such as task arrivals, wireless channels and location of nodes. In this paper, we focus on the problem of optimizing computation offloading management, arrival data admission control and resource scheduling, in order to improve the overall system performance, in terms of throughput fairness, power efficiency, and average mean of queue backlogs. We investigate this problem for a fog network with homogeneous mobile Fog Nodes, serving multiple wireless devices, controlled by a Fog Control Node. By formulating the problem as a stochastic optimization problem, maximizing utility-power efficiency, defined as achievable utility per-unit power consumption, subject to queue backlog stability, we modify Lyapunov optimization techniques to deal with the fractional form of utility-power efficiency function. Then we propose an online utility-power efficient task scheduling algorithm, which is asymptotically optimal. Our online task scheduling algorithm can achieve the theoretical [O(1/V), O(V)] trade-off between utility-power efficiency and average mean of queue backlogs,
In Multi-Label Text Classification (MLTC), one sample can belong to more than one class. It is observed that most MLTC tasks, there are dependencies or correlations among labels. Existing methods tend to ignore the relationship among labels. In this paper, a graph attention network-based model is proposed to capture the attentive dependency structure among the labels. The graph attention network uses a feature matrix and a correlation matrix to capture and explore the crucial dependencies between the labels and generate classifiers for the task. The generated classifiers are applied to sentence feature vectors obtained from the text feature extraction network (BiLSTM) to enable end-to-end training. Attention allows the system to assign different weights to neighbor nodes per label, thus allowing it to learn the dependencies among labels implicitly. The results of the proposed model are validated on five real-world MLTC datasets. The proposed model achieves similar or better performance compared to the previous state-of-the-art models.
Deep Learning (DL) is vulnerable to out-of-distribution and adversarial examples resulting in incorrect outputs. To make DL more robust, several posthoc anomaly detection techniques to detect (and discard) these anomalous samples have been proposed in the recent past. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection for DL based applications. We provide a taxonomy for existing techniques based on their underlying assumptions and adopted approaches. We discuss various techniques in each of the categories and provide the relative strengths and weaknesses of the approaches. Our goal in this survey is to provide an easier yet better understanding of the techniques belonging to different categories in which research has been done on this topic. Finally, we highlight the unsolved research challenges while applying anomaly detection techniques in DL systems and present some high-impact future research directions.