Given functions $f$ and $g$ defined on the subset lattice of order $n$, their min-sum subset convolution, defined for all $S \subseteq [n]$ as \[ (f \star g)(S) = \min_{T \subseteq S}\:\big(f(T) + g(S \setminus T)\big), \] lies at the heart of several NP-hard optimization problems, such as minimum-cost $k$-coloring, the prize-collecting Steiner tree, and many others in computational biology. Despite its importance, its na\"ive $O(3^n)$-time evaluation remains the fastest known, the other alternative being an $\tilde O(2^n M)$-time algorithm for instances where the input functions have a bounded integer range $\{-M, \ldots, M\}$. We study for the first time the $(1 + \varepsilon)$-approximate min-sum subset convolution and present both a weakly- and strongly-polynomial approximation algorithm, running in time $\tilde O(2^n \log M / \varepsilon)$ and $\tilde O(2^\frac{3n}{2} / \sqrt{\varepsilon})$, respectively. To demonstrate the applicability of our work, we present the first exponential-time $(1 + \varepsilon)$-approximation schemes for the above optimization problems. Our algorithms lie at the intersection of two lines of research that have been so far considered separately: $\textit{sequence}$ and $\textit{subset}$ convolutions in semi-rings. We also extend the recent framework of Bringmann, K\"unnemann, and W\k{e}grzycki [STOC 2019] to the context of subset convolutions.
Popular guidance for denoising diffusion probabilistic model (DDPM) linearly combines distinct conditional models together to provide enhanced control over samples. However, this approach overlooks nonlinear effects that become significant when guidance scale is large. To address this issue, we propose characteristic guidance, a guidance method that provides first-principle non-linear correction for classifier-free guidance. Such correction forces the guided DDPMs to respect the Fokker-Planck (FP) equation of diffusion process, in a way that is training-free and compatible with existing sampling methods. Experiments show that characteristic guidance enhances semantic characteristics of prompts and mitigate irregularities in image generation, proving effective in diverse applications ranging from simulating magnet phase transitions to latent space sampling.
Kahneman & Tversky's $\textit{prospect theory}$ tells us that humans perceive random variables in a biased but well-defined manner (1992); for example, humans are famously loss-averse. We show that objectives for aligning LLMs with human feedback implicitly incorporate many of these biases -- the success of these objectives (e.g., DPO) over cross-entropy minimization can partly be ascribed to them belonging to a family of loss functions that we call $\textit{human-aware losses}$ (HALOs). However, the utility functions these methods attribute to humans still differ from those in the prospect theory literature. Using a Kahneman-Tversky model of human utility, we propose a HALO that directly maximizes the utility of generations instead of maximizing the log-likelihood of preferences, as current methods do. We call this approach KTO, and it matches or exceeds the performance of preference-based methods at scales from 1B to 30B, despite only learning from a binary signal of whether an output is desirable. More broadly, our work suggests that there is no one HALO that is universally superior; the best loss depends on the inductive biases most appropriate for a given setting, an oft-overlooked consideration.
Boolean function $F(x,y)$ for $x,y \in \{0,1\}^n$ is an XOR function if $F(x,y)=f(x\oplus y)$ for some function $f$ on $n$ input bits, where $\oplus$ is a bit-wise XOR. XOR functions are relevant in communication complexity, partially for allowing Fourier analytic technique. For total XOR functions it is known that deterministic communication complexity of $F$ is closely related to parity decision tree complexity of $f$. Montanaro and Osbourne (2009) observed that one-sided communication complexity $D_{cc}^{\rightarrow}(F)$ of $F$ is exactly equal to nonadaptive parity decision tree complexity $NADT^{\oplus}(f)$ of $f$. Hatami et al. (2018) showed that unrestricted communication complexity of $F$ is polynomially related to parity decision tree complexity of $f$. We initiate the studies of a similar connection for partial functions. We show that in case of one-sided communication complexity whether these measures are equal, depends on the number of undefined inputs of $f$. On the one hand, if $D_{cc}^{\rightarrow}(F)=t$ and $f$ is undefined on at most $O(\frac{2^{n-t}}{\sqrt{n-t}})$, then $NADT^{\oplus}(f)=t$. On the other hand, for a wide range of values of $D_{cc}^{\rightarrow}(F)$ and $NADT^{\oplus}(f)$ (from constant to $n-2$) we provide partial functions for which $D_{cc}^{\rightarrow}(F) < NADT^{\oplus}(f)$. In particular, we provide a function with an exponential gap between the two measures. Our separation results translate to the case of two-sided communication complexity as well, in particular showing that the result of Hatami et al. (2018) cannot be generalized to partial functions. Previous results for total functions heavily rely on Boolean Fourier analysis and the technique does not translate to partial functions. For the proofs of our results we build a linear algebraic framework instead. Separation results are proved through the reduction to covering codes.
This paper presents an $O^{*}(1.42^{n})$ time algorithm for the Maximum Cut problem on split graphs, along with a subexponential time algorithm for its decision variant.
A preference matrix $M$ has an entry for each pair of candidates in an election whose value $p_{ij}$ represents the proportion of voters that prefer candidate $i$ over candidate $j$. The matrix is rationalizable if it is consistent with a set of voters whose preferences are total orders. A celebrated open problem asks for a concise characterization of rationalizable preference matrices. In this paper, we generalize this matrix rationalizability question and study when a preference matrix is consistent with a set of voters whose preferences are partial orders of width $\alpha$. The width (the maximum cardinality of an antichain) of the partial order is a natural measure of the rationality of a voter; indeed, a partial order of width $1$ is a total order. Our primary focus concerns the rationality number, the minimum width required to rationalize a preference matrix. We present two main results. The first concerns the class of half-integral preference matrices, where we show the key parameter required in evaluating the rationality number is the chromatic number of the undirected unanimity graph associated with the preference matrix $M$. The second concerns the class of integral preference matrices, where we show the key parameter now is the dichromatic number of the directed voting graph associated with $M$.
Given a positive integer $d$, the d-CUT is the problem of deciding if an undirected graph $G=(V,E)$ has a cut $(A,B)$ such that every vertex in $A$ (resp. $B$) has at most $d$ neighbors in $B$ (resp. $A$). For $d=1$, the problem is referred to as MATCHING CUT. Gomes and Sau, in IPEC 2019, gave the first fixed parameter tractable algorithm for d-CUT parameterized by maximum number of the crossing edges in the cut (i.e. the size of edge cut). However, their paper doesn't provide an explicit bound on the running time, as it indirectly relies on a MSOL formulation and Courcelle's Theorem. Motivated by this, we design and present an FPT algorithm for d-CUT for general graphs with running time $2^{O(k\log k)}n^{O(1)}$ where $k$ is the maximum size of the edge cut. This is the first FPT algorithm for the d-CUT and MATCHING CUT with an explicit dependence on this parameter. We also observe that there is no algorithm solving MATCHING CUT in time $2^{o(k)}n^{O(1)}$ where $k$ is the maximum size of the edge cut unless ETH fails.
In Path Set Packing, the input is an undirected graph $G$, a collection $\calp$ of simple paths in $G$, and a positive integer $k$. The problem is to decide whether there exist $k$ edge-disjoint paths in $\calp$. We study the parameterized complexity of Path Set Packing with respect to both natural and structural parameters. We show that the problem is $W[1]$-hard with respect to vertex cover number, and $W[1]$-hard respect to pathwidth plus maximum degree plus solution size. These results answer an open question raised in COCOON 2018. On the positive side, we present an FPT algorithm parameterized by feedback vertex number plus maximum degree, and present an FPT algorithm parameterized by treewidth plus maximum degree plus maximum length of a path in $\calp$. These positive results complement the hardness of Path Set Packing with respect to any subset of the parameters used in the FPT algorithms. We also give a $4$-approximation algorithm for maximum path set packing problem which runs in FPT time when parameterized by feedback edge number.
Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $\textbf{ANAH}$, a bilingual dataset that offers $\textbf{AN}$alytical $\textbf{A}$nnotation of $\textbf{H}$allucinations in LLMs within Generative Question Answering. Each answer sentence in our dataset undergoes rigorous annotation, involving the retrieval of a reference fragment, the judgment of the hallucination type, and the correction of hallucinated content. ANAH consists of ~12k sentence-level annotations for ~4.3k LLM responses covering over 700 topics, constructed by a human-in-the-loop pipeline. Thanks to the fine granularity of the hallucination annotations, we can quantitatively confirm that the hallucinations of LLMs progressively accumulate in the answer and use ANAH to train and evaluate hallucination annotators. We conduct extensive experiments on studying generative and discriminative annotators and show that, although current open-source LLMs have difficulties in fine-grained hallucination annotation, the generative annotator trained with ANAH can surpass all open-source LLMs and GPT-3.5, obtain performance competitive with GPT-4, and exhibits better generalization ability on unseen questions.
We present two randomised approximate counting algorithms with $\widetilde{O}(n^{2-c}/\varepsilon^2)$ running time for some constant $c>0$ and accuracy $\varepsilon$: (1) for the hard-core model with fugacity $\lambda$ on graphs with maximum degree $\Delta$ when $\lambda=O(\Delta^{-1.5-c_1})$ where $c_1=c/(2-2c)$; (2) for spin systems with strong spatial mixing (SSM) on planar graphs with quadratic growth, such as $\mathbb{Z}^2$. For the hard-core model, Weitz's algorithm (STOC, 2006) achieves sub-quadratic running time when correlation decays faster than the neighbourhood growth, namely when $\lambda = o(\Delta^{-2})$. Our first algorithm does not require this property and extends the range where sub-quadratic algorithms exist. Our second algorithm appears to be the first to achieve sub-quadratic running time up to the SSM threshold, albeit on a restricted family of graphs. It also extends to (not necessarily planar) graphs with polynomial growth, such as $\mathbb{Z}^d$, but with a running time of the form $\widetilde{O}\left(n^2\varepsilon^{-2}/2^{c(\log n)^{1/d}}\right)$ where $d$ is the exponent of the polynomial growth and $c>0$ is some constant.
Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limitations by unifying a prior family of iterative MEC (IMEC) approaches into a generalized partition-based formalism. From this framework, we derive a novel IMEC algorithm called ARIMEC, capable of handling arbitrary discrete distributions, and introduce a method to make IMEC robust to suboptimal hyperparameter settings. These innovations facilitate the application of IMEC to high-throughput steganography with language models, among other settings. Our codebase is available at //github.com/ssokota/mec .