In the Minimum Bisection problem, input is a graph $G$ and the goal is to partition the vertex set into two parts $A$ and $B$, such that $||A|-|B|| \le 1$ and the number $k$ of edges between $A$ and $B$ is minimized. This problem can be viewed as a clustering problem where edges represent similarity, and the task is to partition the vertices into two equally sized clusters, while minimizing the number of pairs of similar objects that end up in different clusters. In this paper, we initiate the study of a fair version of Minimum Bisection. In this problem, the vertices of the graph are colored using one of $c \ge 1$ colors. The goal is to find a bisection $(A, B)$ with at most $k$ edges between the parts, such that for each color $i\in [c]$, $A$ has exactly $r_i$ vertices of color $i$. We first show that Fair Bisection is $W$[1]-hard parameterized by $c$ even when $k = 0$. On the other hand, our main technical contribution shows that is that this hardness result is simply a consequence of the very strict requirement that each color class $i$ has {\em exactly} $r_i$ vertices in $A$. In particular, we give an $f(k,c,\epsilon)n^{O(1)}$ time algorithm that finds a balanced partition $(A, B)$ with at most $k$ edges between them, such that for each color $i\in [c]$, there are at most $(1\pm \epsilon)r_i$ vertices of color $i$ in $A$. Our approximation algorithm is best viewed as a proof of concept that the technique introduced by [Lampis, ICALP '18] for obtaining FPT-approximation algorithms for problems of bounded tree-width or clique-width can be efficiently exploited even on graphs of unbounded width. The key insight is that the technique of Lampis is applicable on tree decompositions with unbreakable bags (as introduced in [Cygan et al., SIAM Journal on Computing '14]). Along the way, we also derive a combinatorial result regarding tree decompositions of graphs.
The success of large language models (LLMs), like GPT-4 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by finetuning open-access LLMs with task-specific data (e.g., ChatDoctor) or instruction data (e.g., Alpaca). Among the various fine-tuning methods, adapter-based parameter-efficient fine-tuning (PEFT) is undoubtedly one of the most attractive topics, as it only requires fine-tuning a few external parameters instead of the entire LLMs while achieving comparable or even better performance. To enable further research on PEFT methods of LLMs, this paper presents LLM-Adapters, an easy-to-use framework that integrates various adapters into LLMs and can execute these adapter-based PEFT methods of LLMs for different tasks. The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods. Moreover, we conduct extensive empirical studies on the impact of adapter types, placement locations, and hyper-parameters to the best design for each adapter-based methods. We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning. The results demonstrate that using adapter-based PEFT in smaller-scale LLMs (7B) with few extra trainable parameters yields comparable, and in some cases superior, performance to powerful LLMs (175B) in zero-shot inference on both reasoning tasks.
Let $\Gamma$ be a simple connected graph on $n$ vertices, and let $C$ be a code of length $n$ whose coordinates are indexed by the vertices of $\Gamma$. We say that $C$ is a \textit{storage code} on $\Gamma$ if for any codeword $c \in C$, one can recover the information on each coordinate of $c$ by accessing its neighbors in $\Gamma$. The main problem here is to construct high-rate storage codes on triangle-free graphs. In this paper, we solve an open problem posed by Barg and Z\'emor in 2022, showing that the BCH family of storage codes is of unit rate. Furthermore, we generalize the construction of the BCH family and obtain more storage codes of unit rate on triangle-free graphs.
Great advances in program analysis would be enabled if it were possible to derive the function of a program from inputs to outputs (or from initial states to final states, depending on how we model program semantics). Efforts to do so have always stalled against the difficulty to derive the function of loops; the expedient solution to capture the function of loops by unrolling them an arbitrary number of iterations is clearly inadequate. In this paper, we propose a relations-based method to derive the function of a C-like program, including programs that have loops nested to an arbitrary level. To capture the semantics of loops, we use the concept of invariant relation.
We make an experimental comparison of methods for computing the numerical radius of an $n\times n$ complex matrix, based on two well-known characterizations, the first a nonconvex optimization problem in one real variable and the second a convex optimization problem in $n^{2}+1$ real variables. We make comparisons with respect to both accuracy and computation time using publicly available software.
Consider that there are $k\le n$ agents in a simple, connected, and undirected graph $G=(V,E)$ with $n$ nodes and $m$ edges. The goal of the dispersion problem is to move these $k$ agents to distinct nodes. Agents can communicate only when they are at the same node, and no other means of communication such as whiteboards are available. We assume that the agents operate synchronously. We consider two scenarios: when all agents are initially located at any single node (rooted setting) and when they are initially distributed over any one or more nodes (general setting). Kshemkalyani and Sharma presented a dispersion algorithm for the general setting, which uses $O(m_k)$ time and $\log(k+\delta)$ bits of memory per agent [OPODIS 2021]. Here, $m_k$ is the maximum number of edges in any induced subgraph of $G$ with $k$ nodes, and $\delta$ is the maximum degree of $G$. This algorithm is the fastest in the literature, as no algorithm with $o(m_k)$ time has been discovered even for the rooted setting. In this paper, we present faster algorithms for both the rooted and general settings. First, we present an algorithm for the rooted setting that solves the dispersion problem in $O(k\log \min(k,\delta))=O(k\log k)$ time using $O(\log \delta)$ bits of memory per agent. Next, we propose an algorithm for the general setting that achieves dispersion in $O(k (\log k)\cdot (\log \min(k,\delta))=O(k \log^2 k)$ time using $O(\log (k+\delta))$ bits.
We study a new graph separation problem called Multiway Near-Separator. Given an undirected graph $G$, integer $k$, and terminal set $T \subseteq V(G)$, it asks whether there is a vertex set $S \subseteq V(G) \setminus T$ of size at most $k$ such that in graph $G-S$, no pair of distinct terminals can be connected by two pairwise internally vertex-disjoint paths. Hence each terminal pair can be separated in $G-S$ by removing at most one vertex. The problem is therefore a generalization of (Node) Multiway Cut, which asks for a vertex set for which each terminal is in a different component of $G-S$. We develop a fixed-parameter tractable algorithm for Multiway Near-Separator running in time $2^{O(k \log k)} * n^{O(1)}$. Our algorithm is based on a new pushing lemma for solutions with respect to important separators, along with two problem-specific ingredients. The first is a polynomial-time subroutine to reduce the number of terminals in the instance to a polynomial in the solution size $k$ plus the size of a given suboptimal solution. The second is a polynomial-time algorithm that, given a graph $G$ and terminal set $T \subseteq V(G)$ along with a single vertex $x \in V(G)$ that forms a multiway near-separator, computes a 14-approximation for the problem of finding a multiway near-separator not containing $x$.
A kernelization for a parameterized decision problem $\mathcal{Q}$ is a polynomial-time preprocessing algorithm that reduces any parameterized instance $(x,k)$ into an instance $(x',k')$ whose size is bounded by a function of $k$ alone and which has the same yes/no answer for $\mathcal{Q}$. Such preprocessing algorithms cannot exist in the context of counting problems, when the answer to be preserved is the number of solutions, since this number can be arbitrarily large compared to $k$. However, we show that for counting minimum feedback vertex sets of size at most $k$, and for counting minimum dominating sets of size at most $k$ in a planar graph, there is a polynomial-time algorithm that either outputs the answer or reduces to an instance $(G',k')$ of size polynomial in $k$ with the same number of minimum solutions. This shows that a meaningful theory of kernelization for counting problems is possible and opens the door for future developments. Our algorithms exploit that if the number of solutions exceeds $2^{\mathsf{poly}(k)}$, the size of the input is exponential in terms of $k$ so that the running time of a parameterized counting algorithm can be bounded by $\mathsf{poly}(n)$. Otherwise, we can use gadgets that slightly increase $k$ to represent choices among $2^{O(k)}$ options by only $\mathsf{poly}(k)$ vertices.
Decentralization is a pervasive concept found across disciplines, including Economics, Political Science, and Computer Science, where it is used in distinct yet interrelated ways. Here, we develop and publicly release a general pipeline to investigate the scholarly history of the term, analysing 425,144 academic publications that refer to (de)centralization. We find that the fraction of papers on the topic has been exponentially increasing since the 1950s. In 2021, 1 author in 154 mentioned (de)centralization in the title or abstract of an article. Using both semantic information and citation patterns, we cluster papers in fields and characterize the knowledge flows between them. Our analysis reveals that the topic has independently emerged in the different fields, with small cross-disciplinary contamination. Moreover, we show how Blockchain has become the most influential field about 10 years ago, while Governance dominated before the 1990s. In summary, our findings provide a quantitative assessment of the evolution of a key yet elusive concept, which has undergone cycles of rise and fall within different fields. Our pipeline offers a powerful tool to analyze the evolution of any scholarly term in the academic literature, providing insights into the interplay between collective and independent discoveries in science.
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how one would expand prompt tuning to handle -- concomitantly -- heterogeneous tasks and data distributions is a widely open question. To address this gap, we suggest the use of \emph{Mixture of Prompts}, or MoPs, associated with smart gating functionality: the latter -- whose design is one of the contributions of this paper -- can identify relevant skills embedded in different groups of prompts and dynamically assign combined experts (i.e., collection of prompts), based on the target task. Additionally, MoPs are empirically agnostic to any model compression technique applied -- for efficiency reasons -- as well as instruction data source and task composition. In practice, MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios (e.g., task and data heterogeneity across sources), as well as possible implications from model approximations. As a highlight, MoPs manage to decrease final perplexity from $\sim20\%$ up to $\sim70\%$, as compared to baselines, in the federated scenario, and from $\sim 3\%$ up to $\sim30\%$ in the centralized scenario.
Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.