The Cayley distance between two permutations $\pi, \sigma \in S_n$ is the minimum number of \textit{transpositions} required to obtain the permutation $\sigma$ from $\pi$. When we only allow adjacent transpositions, the minimum number of such transpositions to obtain $\sigma$ from $\pi$ is referred to the Kendall $\tau$-distance. A set $C$ of permutation words of length $n$ is called a $t$-Cayley permutation code if every pair of distinct permutations in $C$ has Cayley distance greater than $t$. A $t$-Kendall permutation code is defined similarly. Let $C(n,t)$ and $K(n,t)$ be the maximum size of a $t$-Cayley and a $t$-Kendall permutation code of length $n$, respectively. In this paper, we improve the Gilbert-Varshamov bound asymptotically by a factor $\log(n)$, namely \[ C(n,t) \geq \Omega_t\left(\frac{n!\log n}{n^{2t}}\right) \text{ and } K(n,t) \geq \Omega_t\left(\frac{n! \log n}{n^t}\right).\] Our proof is based on graph theory techniques.
We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erd\H{o}s-R\'enyi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or suboptimal error rates. Two key ingredients of our algorithm are (1) a new sum-of-squares algorithm for robust edge density estimation, and (2) the reduction from privacy to robustness based on sum-of-squares exponential mechanisms due to Hopkins et al. (STOC 2023).
The $2$-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the $2$-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical $2$-Wasserstein distance on $n$ samples in $\mathbb{R}^2$ to converge to the true distance at a rate of $n^{-1/4}$, which is significantly slower than the rate of $n^{-1/2}$ for $1$-Wasserstein distance. We introduce a new family of distances parameterized by $k \ge 0$, called $k$-RPW that is based on computing the partial $2$-Wasserstein distance. We show that (1) $k$-RPW satisfies the metric properties, (2) $k$-RPW is robust to small outlier mass while retaining the sensitivity of $2$-Wasserstein distance to minor geometric differences, and (3) when $k$ is a constant, $k$-RPW distance between empirical distributions on $n$ samples in $\mathbb{R}^2$ converges to the true distance at a rate of $n^{-1/3}$, which is faster than the convergence rate of $n^{-1/4}$ for the $2$-Wasserstein distance. Using the partial $p$-Wasserstein distance, we extend our distance to any $p \in [1,\infty]$. By setting parameters $k$ or $p$ appropriately, we can reduce our distance to the total variation, $p$-Wasserstein, and the L\'evy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the $1$-Wasserstein, $2$-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.
Many efficient $\textit{approximate}$ self-attention techniques have become prevalent since the inception of the transformer architecture. Two popular classes of these techniques are low-rank and kernel methods. Each of these methods has its strengths. We observe these strengths synergistically complement each other and exploit them to fuse low-rank and kernel methods, producing a new class of transformers: FLuRKA ($\textbf{F}$ast $\textbf{L}$ow-$\textbf{R}$ank & $\textbf{K}$ernel$ \textbf{A}$ttention). FLuRKA are highly $\textit{training-efficient}$ with faster model speeds $\textit{and}$ similar model qualities compared to constituent low-rank and kernel methods. We theoretically and empirically evaluate the speed and quality of FLuRKA. Our model speed analysis posits a variety of parameter configurations where FLuRKA exhibit speedups over low-rank and kernel approximations and our model quality analysis bounds the error of FLuRKA with respect to full-attention. Empirically, we instantiate three FLuRKA variants which experience speedups of up to 3.3x and 1.7x over low-rank and kernel methods respectively. This translates to speedups of up to 20x over models with flash-attention. Across a diverse set of tasks spanning language modeling, language understanding, long sequence modeling, machine translation, and image classification, FLuRKA achieve comparable accuracy with underlying low-rank and kernel approximations, occasionally surpassing both.
Generative AI including large language models (LLMs) has recently gained significant interest in the geo-science community through its versatile task-solving capabilities including programming, arithmetic reasoning, generation of sample data, time-series forecasting, toponym recognition, or image classification. Most existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero-shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, i.e., ChatGPT-4, Gemini, Claude-3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial reasoning. Furthermore, there was a significant difference in correctness of results between the four chatbots. Responses from repeated tasks assigned to each chatbot showed a high level of consistency in responses with matching rates of over 80% for most task categories in the four chatbots.
Given a positive integer $d$, the d-CUT is the problem of deciding if an undirected graph $G=(V,E)$ has a cut $(A,B)$ such that every vertex in $A$ (resp. $B$) has at most $d$ neighbors in $B$ (resp. $A$). For $d=1$, the problem is referred to as MATCHING CUT. Gomes and Sau, in IPEC 2019, gave the first fixed parameter tractable algorithm for d-CUT parameterized by maximum number of the crossing edges in the cut (i.e. the size of edge cut). However, their paper doesn't provide an explicit bound on the running time, as it indirectly relies on a MSOL formulation and Courcelle's Theorem. Motivated by this, we design and present an FPT algorithm for d-CUT for general graphs with running time $2^{O(k\log k)}n^{O(1)}$ where $k$ is the maximum size of the edge cut. This is the first FPT algorithm for the d-CUT and MATCHING CUT with an explicit dependence on this parameter. We also observe that there is no algorithm solving MATCHING CUT in time $2^{o(k)}n^{O(1)}$ where $k$ is the maximum size of the edge cut unless ETH fails.
Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $\textbf{ANAH}$, a bilingual dataset that offers $\textbf{AN}$alytical $\textbf{A}$nnotation of $\textbf{H}$allucinations in LLMs within Generative Question Answering. Each answer sentence in our dataset undergoes rigorous annotation, involving the retrieval of a reference fragment, the judgment of the hallucination type, and the correction of hallucinated content. ANAH consists of ~12k sentence-level annotations for ~4.3k LLM responses covering over 700 topics, constructed by a human-in-the-loop pipeline. Thanks to the fine granularity of the hallucination annotations, we can quantitatively confirm that the hallucinations of LLMs progressively accumulate in the answer and use ANAH to train and evaluate hallucination annotators. We conduct extensive experiments on studying generative and discriminative annotators and show that, although current open-source LLMs have difficulties in fine-grained hallucination annotation, the generative annotator trained with ANAH can surpass all open-source LLMs and GPT-3.5, obtain performance competitive with GPT-4, and exhibits better generalization ability on unseen questions.
We incorporate strong negation in the theory of computable functionals TCF, a common extension of Plotkin's PCF and G\"{o}del's system $\mathbf{T}$, by defining simultaneously strong negation $A^{\mathbf{N}}$ of a formula $A$ and strong negation $P^{\mathbf{N}}$ of a predicate $P$ in TCF. As a special case of the latter, we get strong negation of an inductive and a coinductive predicate of TCF. We prove appropriate versions of the Ex falso quodlibet and of double negation elimination for strong negation in TCF. We introduce the so-called tight formulas of TCF i.e., formulas implied from the weak negation of their strong negation, and the relative tight formulas. We present various case-studies and examples, which reveal the naturality of our definition of strong negation in TCF and justify the use of TCF as a formal system for a large part of Bishop-style constructive mathematics.
In the current note we consider matrix-sequences $\{B_{n,t}\}_n$ of increasing sizes depending on $n$ and equipped with a parameter $t>0$. For every fixed $t>0$, we assume that each $\{B_{n,t}\}_n$ possesses a canonical spectral/singular values symbol $f_t$ defined on $D_t\subset \R^{d}$ of finite measure, $d\ge 1$. Furthermore, we assume that $ \{ \{ B_{n,t}\} : \, t > 0 \} $ is an approximating class of sequences (a.c.s.) for $ \{ A_n \} $ and that $ \bigcup_{t > 0} D_t = D $ with $ D_{t + 1} \supset D_t $. Under such assumptions and via the notion of a.c.s, we prove results on the canonical distributions of $ \{ A_n \} $, whose symbol, when it exists, can be defined on the, possibly unbounded, domain $D$ of finite or even infinite measure. We then extend the concept of a.c.s. to the case where the approximating sequence $ \{ B_{n,t}\}_n $ has possibly a different dimension than the one of $ \{ A_n\} $. This concept seems to be particularly natural when dealing, e.g., with the approximation both of a partial differential equation (PDE) and of its (possibly unbounded, or moving) domain $D$, using an exhausting sequence of domains $\{ D_t \}$. Examples coming from approximated PDEs/FDEs with either moving or unbounded domains are presented in connection with the classical and the new notion of a.c.s., while numerical tests and a list of open questions conclude the present work.
We present two randomised approximate counting algorithms with $\widetilde{O}(n^{2-c}/\varepsilon^2)$ running time for some constant $c>0$ and accuracy $\varepsilon$: (1) for the hard-core model with fugacity $\lambda$ on graphs with maximum degree $\Delta$ when $\lambda=O(\Delta^{-1.5-c_1})$ where $c_1=c/(2-2c)$; (2) for spin systems with strong spatial mixing (SSM) on planar graphs with quadratic growth, such as $\mathbb{Z}^2$. For the hard-core model, Weitz's algorithm (STOC, 2006) achieves sub-quadratic running time when correlation decays faster than the neighbourhood growth, namely when $\lambda = o(\Delta^{-2})$. Our first algorithm does not require this property and extends the range where sub-quadratic algorithms exist. Our second algorithm appears to be the first to achieve sub-quadratic running time up to the SSM threshold, albeit on a restricted family of graphs. It also extends to (not necessarily planar) graphs with polynomial growth, such as $\mathbb{Z}^d$, but with a running time of the form $\widetilde{O}\left(n^2\varepsilon^{-2}/2^{c(\log n)^{1/d}}\right)$ where $d$ is the exponent of the polynomial growth and $c>0$ is some constant.
Given a graph $G$ and sets $\{\alpha_v~|~v \in V(G)\}$ and $\{\beta_v~|~v \in V(G)\}$ of non-negative integers, it is known that the decision problem whether $G$ contains a spanning tree $T$ such that $\alpha_v \le d_T (v) \le \beta_v $ for all $v \in V(G)$ is $NP$-complete. In this article, we relax the problem by demanding that the degree restrictions apply to vertices $v\in U$ only, where $U$ is a stable set of $G$. In this case, the problem becomes tractable. A. Frank presented a result characterizing the positive instances of that relaxed problem. Using matroid intersection developed by J. Edmonds, we give a new and short proof of Frank's result and show that if $U$ is stable and the edges of $G$ are weighted by arbitrary real numbers, then even a minimum-cost tree $T$ with $\alpha_v \le d_T (v) \le \beta_v $ for all $v \in U$ can be found in polynomial time if such a tree exists.