We show that every Borel graph $G$ of subexponential growth has a Borel proper edge-coloring with $\Delta(G) + 1$ colors. We deduce this from a stronger result, namely that an $n$-vertex (finite) graph $G$ of subexponential growth can be properly edge-colored using $\Delta(G) + 1$ colors by an $O(\log^\ast n)$-round deterministic distributed algorithm in the $\mathsf{LOCAL}$ model, where the implied constants in the $O(\cdot)$ notation are determined by a bound on the growth rate of $G$.
Real-Time DEVS (RT-DEVS) can model systems with quantitative temporal requirements. Ensuring that such models verify some temporal properties requires to use something beyond simulation. In this work we use the model checker Uppaal to verify a class of recurrent quantitative temporal properties appearing in RT-DEVS models. Secondly, by introducing mutations to quantitative temporal properties we are able to find errors in RT-DEVS models and their implementations. A case study from the railway domain is presented.
Rhetorical Role Labeling (RRL) of legal documents is pivotal for various downstream tasks such as summarization, semantic case search and argument mining. Existing approaches often overlook the varying difficulty levels inherent in legal document discourse styles and rhetorical roles. In this work, we propose HiCuLR, a hierarchical curriculum learning framework for RRL. It nests two curricula: Rhetorical Role-level Curriculum (RC) on the outer layer and Document-level Curriculum (DC) on the inner layer. DC categorizes documents based on their difficulty, utilizing metrics like deviation from a standard discourse structure and exposes the model to them in an easy-to-difficult fashion. RC progressively strengthens the model to discern coarse-to-fine-grained distinctions between rhetorical roles. Our experiments on four RRL datasets demonstrate the efficacy of HiCuLR, highlighting the complementary nature of DC and RC.
In this paper, we introduce a novel approach for bounding the cumulant generating function (CGF) of a Dirichlet process (DP) $X \sim \text{DP}(\alpha \nu_0)$, using superadditivity. In particular, our key technical contribution is the demonstration of the superadditivity of $\alpha \mapsto \log \mathbb{E}_{X \sim \text{DP}(\alpha \nu_0)}[\exp( \mathbb{E}_X[\alpha f])]$, where $\mathbb{E}_X[f] = \int f dX$. This result, combined with Fekete's lemma and Varadhan's integral lemma, converts the known asymptotic large deviation principle into a practical upper bound on the CGF $ \log\mathbb{E}_{X\sim \text{DP}(\alpha\nu_0)}{\exp(\mathbb{E}_{X}{[f]})} $ for any $\alpha > 0$. The bound is given by the convex conjugate of the scaled reversed Kullback-Leibler divergence $\alpha\mathrm{KL}(\nu_0\Vert \cdot)$. This new bound provides particularly effective confidence regions for sums of independent DPs, making it applicable across various fields.
Let $G=(V,E)$ be an undirected weighted graph on $n=|V|$ vertices and $S\subseteq V$ be a Steiner set. Steiner mincut is a well-studied concept, which provides a generalization to both (s,t)-mincut (when $|S|=2$) and global mincut (when $|S|=n$). Here, we address the problem of designing a compact data structure that can efficiently report a Steiner mincut and its capacity after the failure of any edge in $G$; such a data structure is known as a \textit{Sensitivity Oracle} for Steiner mincut. In the area of minimum cuts, although many Sensitivity Oracles have been designed in unweighted graphs, however, in weighted graphs, Sensitivity Oracles exist only for (s,t)-mincut [Annals of Operations Research 1991, NETWORKS 2019, ICALP 2024], which is just a special case of Steiner mincut. Here, we generalize this result to any arbitrary set $S\subseteq V$. 1. Sensitivity Oracle: Assuming the capacity of every edge is known, a. there is an ${\mathcal O}(n)$ space data structure that can report the capacity of Steiner mincut in ${\mathcal O}(1)$ time and b. there is an ${\mathcal O}(n(n-|S|+1))$ space data structure that can report a Steiner mincut in ${\mathcal O}(n)$ time after the failure of any edge in $G$. 2. Lower Bound: We show that any data structure that, after the failure of any edge, can report a Steiner mincut or its capacity must occupy $\Omega(n^2)$ bits of space in the worst case, irrespective of the size of the Steiner set. The lower bound in (2) shows that the assumption in (1) is essential to break the $\Omega(n^2)$ lower bound on space. For $|S|=n-k$ for any constant $k\ge 0$, it occupies only ${\mathcal O}(n)$ space. So, we also present the first Sensitivity Oracle occupying ${\mathcal O}(n)$ space for global mincut.
Reinforcement Learning from Human Feedback (RLHF) enhances the alignment between LLMs and human preference. The workflow of RLHF typically involves several models and tasks in a series of distinct stages. Existing RLHF training systems view each task as the smallest execution unit thus overlooking the opportunities for subtask-level optimizations. Due to the intrinsic nature of RLHF training, i.e., the data skewness in the generation stage, and the pipeline bubbles in the training stage, existing RLHF systems suffer from low GPU utilization in production deployments. RLHFuse breaks the traditional view of RLHF workflow as a composition of individual tasks, splitting each task into finer-grained subtasks, and performing stage fusion to improve GPU utilization. RLHFuse contains two key ideas. First, for generation and inference tasks, RLHFuse splits them into sample-level subtasks, enabling efficient inter-stage fusion to mitigate the original generation bottleneck dominated by long-tailed samples. Second, for training tasks, RLHFuse breaks them into subtasks of micro-batches. By leveraging the intuition that pipeline execution can be essentially complemented by another pipeline, RLHFuse performs intra-stage fusion to concurrently execute these subtasks in the training stage with a fused pipeline schedule, resulting in fewer pipeline bubbles. In addition, RLHFuse incorporates a series of system optimizations tailored for each stage of RLHF, making it efficient and scalable for our internal product usage. We evaluate RLHFuse on various popular LLMs and the results show that RLHFuse increases the training throughput by up to 3.7x, compared to existing state-of-the-art systems.
In a graph $G = (V,E)$, a k-ruling set $S$ is one in which all vertices $V$ \ $S$ are at most $k$ distance from $S$. Finding a minimum k-ruling set is intrinsically linked to the minimum dominating set problem and maximal independent set problem, which have been extensively studied in graph theory. This paper presents the first known algorithm for solving all k-ruling set problems in conjunction with known minimum dominating set algorithms at only additional polynomial time cost compared to a minimum dominating set. The algorithm further succeeds for $(\alpha, \alpha - 1)$ ruling sets in which $\alpha > 1$, for which constraints exist on the proximity of vertices v $\in S$. This secondary application instead works in conjunction with maximal independent set algorithms.
In the solution discovery variant of a vertex (edge) subset problem $\Pi$ on graphs, we are given an initial configuration of tokens on the vertices (edges) of an input graph $G$ together with a budget $b$. The question is whether we can transform this configuration into a feasible solution of $\Pi$ on $G$ with at most $b$ modification steps. We consider the token sliding variant of the solution discovery framework, where each modification step consists of sliding a token to an adjacent vertex (edge). The framework of solution discovery was recently introduced by Fellows et al. [Fellows et al., ECAI 2023] and for many solution discovery problems the classical as well as the parameterized complexity has been established. In this work, we study the kernelization complexity of the solution discovery variants of Vertex Cover, Independent Set, Dominating Set, Shortest Path, Matching, and Vertex Cut with respect to the parameters number of tokens $k$, discovery budget $b$, as well as structural parameters such as pathwidth.
We propose a standalone monocular visual Simultaneous Localization and Mapping (vSLAM) initialization pipeline for autonomous robots in space. Our method, a state-of-the-art factor graph optimization pipeline, enhances classical Structure from Small Motion (SfSM) to robustly initialize a monocular agent in weak-perspective projection scenes. Furthermore, it overcomes visual estimation challenges introduced by spacecraft inspection trajectories, such as: center-pointing motion, which exacerbates the bas-relief ambiguity, and the presence of a dominant plane in the scene, which causes motion estimation degeneracies in classical Structure from Motion (SfM). We validate our method on realistic, simulated satellite inspection images exhibiting weak-perspective projection, and we demonstrate its effectiveness and improved performance compared to other monocular initialization procedures.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.
Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.