Vertex splitting is a graph operation that replaces a vertex $v$ with two nonadjacent new vertices and makes each neighbor of $v$ adjacent with one or both of the introduced vertices. Vertex splitting has been used in contexts from circuit design to statistical analysis. In this work, we explore the computational complexity of achieving a given graph property $\Pi$ by a limited number of vertex splits, formalized as the problem $\Pi$ Vertex Splitting ($\Pi$-VS). We focus on hereditary graph properties and contribute four groups of results: First, we classify the classical complexity of $\Pi$-VS for graph properties characterized by forbidden subgraphs of size at most 3. Second, we provide a framework that allows to show NP-completeness whenever one can construct a combination of a forbidden subgraph and prescribed vertex splits that satisfy certain conditions. Leveraging this framework we show NP-completeness when $\Pi$ is characterized by forbidden subgraphs that are sufficiently well connected. In particular, we show that $F$-Free-VS is NP-complete for each biconnected graph $F$. Third, we study infinite families of forbidden subgraphs, obtaining NP-hardness for Bipartite-VS and Perfect-VS. Finally, we touch upon the parameterized complexity of $\Pi$-VS with respect to the number of allowed splits, showing para-NP-hardness for $K_3$-Free-VS and deriving an XP-algorithm when each vertex is only allowed to be split at most once.
The Weisfeiler-Leman (WL) dimension of a graph parameter $f$ is the minimum $k$ such that, if $G_1$ and $G_2$ are indistinguishable by the $k$-dimensional WL-algorithm then $f(G_1)=f(G_2)$. The WL-dimension of $f$ is $\infty$ if no such $k$ exists. We study the WL-dimension of graph parameters characterised by the number of answers from a fixed conjunctive query to the graph. Given a conjunctive query $\varphi$, we quantify the WL-dimension of the function that maps every graph $G$ to the number of answers of $\varphi$ in $G$. The works of Dvor\'ak (J. Graph Theory 2010), Dell, Grohe, and Rattan (ICALP 2018), and Neuen (ArXiv 2023) have answered this question for full conjunctive queries, which are conjunctive queries without existentially quantified variables. For such queries $\varphi$, the WL-dimension is equal to the treewidth of the Gaifman graph of $\varphi$. In this work, we give a characterisation that applies to all conjunctive qureies. Given any conjunctive query $\varphi$, we prove that its WL-dimension is equal to the semantic extension width $\mathsf{sew}(\varphi)$, a novel width measure that can be thought of as a combination of the treewidth of $\varphi$ and its quantified star size, an invariant introduced by Durand and Mengel (ICDT 2013) describing how the existentially quantified variables of $\varphi$ are connected with the free variables. Using the recently established equivalence between the WL-algorithm and higher-order Graph Neural Networks (GNNs) due to Morris et al. (AAAI 2019), we obtain as a consequence that the function counting answers to a conjunctive query $\varphi$ cannot be computed by GNNs of order smaller than $\mathsf{sew}(\varphi)$.
A subset $S$ of vertices in a graph $G$ is a secure dominating set of $G$ if $S$ is a dominating set of $G$ and, for each vertex $u \not\in S$, there is a vertex $v \in S$ such that $uv$ is an edge and $(S \setminus \{v\}) \cup \{u\}$ is also a dominating set of $G$. The secure domination number of $G$, denoted by $\gamma_{s}(G)$, is the cardinality of a smallest secure dominating sets of $G$. In this paper, we prove that for any outerplanar graph with $n \geq 4$ vertices, $\gamma_{s}(G) \geq (n+4)/5$ and the bound is tight.
The Random Permutation Set (RPS) is a new type of set proposed recently, which can be regarded as the generalization of evidence theory. To measure the uncertainty of RPS, the entropy of RPS and its corresponding maximum entropy have been proposed. Exploring the maximum entropy provides a possible way of understanding the physical meaning of RPS. In this paper, a new concept, the envelope of entropy function, is defined. In addition, the limit of the envelope of RPS entropy is derived and proved. Compared with the existing method, the computational complexity of the proposed method to calculate the envelope of RPS entropy decreases greatly. The result shows that when $N \to \infty$, the limit form of the envelope of the entropy of RPS converges to $e \times (N!)^2$, which is highly connected to the constant $e$ and factorial. Finally, numerical examples validate the efficiency and conciseness of the proposed envelope, which provides a new insight into the maximum entropy function.
While conformal predictors reap the benefits of rigorous statistical guarantees on their error frequency, the size of their corresponding prediction sets is critical to their practical utility. Unfortunately, there is currently a lack of finite-sample analysis and guarantees for their prediction set sizes. To address this shortfall, we theoretically quantify the expected size of the prediction sets under the split conformal prediction framework. As this precise formulation cannot usually be calculated directly, we further derive point estimates and high-probability interval bounds that can be empirically computed, providing a practical method for characterizing the expected set size. We corroborate the efficacy of our results with experiments on real-world datasets for both regression and classification problems.
Autonomous vehicles (AVs) may use external interfaces, such as LED light bands, to communicate with pedestrians safely and intuitively. While previous research has demonstrated the effectiveness of these interfaces in simple traffic scenarios involving one pedestrian and one vehicle, their performance in more complex scenarios with multiple road users remains unclear. The scalability of AV external communication has therefore attracted increasing attention, prompting the need for further investigation. This scoping review synthesises information from 54 papers to identify seven key scalability issues in multi-vehicle and multi-pedestrian environments, with Clarity of Recipients, Information Overload, and Multi-Lane Safety emerging as the most pressing concerns. To guide future research in scalable AV-pedestrian interactions, we propose high-level design directions focused on three communication loci: vehicle, infrastructure, and pedestrian. Our work contributes the groundwork and a roadmap for designing simplified, coordinated, and targeted external AV communication, ultimately improving safety and efficiency in complex traffic scenarios.
We prove that an $m$ out of $n$ bootstrap procedure for Chatterjee's rank correlation is consistent whenever asymptotic normality of Chatterjee's rank correlation can be established. In particular, we prove that $m$ out of $n$ bootstrap works for continuous as well as for discrete data with independent coordinates; furthermore, simulations indicate that it also performs well for discrete data with dependent coordinates, and that it outperforms alternative estimation methods. Consistency of the bootstrap is proved in the Kolmogorov as well as in the Wasserstein distance.
Sponge hashing is a novel class of cryptographic hash algorithms which underlies the current international hash function standard SHA-3. In a nutshell, a sponge function takes as input a bit-stream of any length and processes it via a simple iterative procedure: it repeatedly feeds each block of the input into a so-called block function, and then produces a short digest which consists of a subset of the final output bits. While much is known about the post-quantum security of the sponge construction in the case when the block function is modeled as a random function or permutation, the case of invertible permutations, which more accurately models the construction underlying SHA-3, has so far remained a fundamental open problem. In this work, we make new progress towards overcoming this barrier and show several results. First, we prove the "double-sided zero-search" conjecture proposed by Unruh (eprint' 2021) and show that finding zero-pairs in a random $2n$-bit permutation requires at least $\Omega(2^{n/2})$ many queries -- and this is tight due to Grover's algorithm. At the core of our proof lies a novel "symmetrization argument" which uses insights from the theory of Young subgroups. Second, we consider more general variants of the double-sided search problem and show similar query lower bounds for them. As an application, we prove the quantum one-wayness of the single-round sponge with invertible permutations in the quantum random oracle model.
Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. Such hallucinations can be dangerous, as occasional factual inaccuracies in the generated text might be obscured by the rest of the output being generally factual, making it extremely hard for the users to spot them. Current services that leverage LLMs usually do not provide any means for detecting unreliable generations. Here, we aim to bridge this gap. In particular, we propose a novel fact-checking and hallucination detection pipeline based on token-level uncertainty quantification. Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output. Moreover, we present a novel token-level uncertainty quantification method that removes the impact of uncertainty about what claim to generate on the current step and what surface form to use. Our method Claim Conditioned Probability (CCP) measures only the uncertainty of particular claim value expressed by the model. Experiments on the task of biography generation demonstrate strong improvements for CCP compared to the baselines for six different LLMs and three languages. Human evaluation reveals that the fact-checking pipeline based on uncertainty quantification is competitive with a fact-checking tool that leverages external knowledge.
An asymptotic theory is established for linear functionals of the predictive function given by kernel ridge regression, when the reproducing kernel Hilbert space is equivalent to a Sobolev space. The theory covers a wide variety of linear functionals, including point evaluations, evaluation of derivatives, $L_2$ inner products, etc. We establish the upper and lower bounds of the estimates and their asymptotic normality. It is shown that $\lambda\sim n^{-1}$ is the universal optimal order of magnitude for the smoothing parameter to balance the variance and the worst-case bias. The theory also implies that the optimal $L_\infty$ error of kernel ridge regression can be attained under the optimal smoothing parameter $\lambda\sim n^{-1}\log n$. These optimal rates for the smoothing parameter differ from the known optimal rate $\lambda\sim n^{-\frac{2m}{2m+d}}$ that minimizes the $L_2$ error of the kernel ridge regression.
Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree of over-segmentation produced. It still remains a challenge to properly select such parameters for human-like perceptual grouping. In this work, we exploit the diversity of segments produced by different choices of parameters. We scan the segmentation parameter space and generate a collection of image segmentation hypotheses (from highly over-segmented to under-segmented). These are fed into a cost minimization framework that produces the final segmentation by selecting segments that: (1) better describe the natural contours of the image, and (2) are more stable and persistent among all the segmentation hypotheses. We compare our algorithm's performance with state-of-the-art algorithms, showing that we can achieve improved results. We also show that our framework is robust to the choice of segmentation kernel that produces the initial set of hypotheses.