In 1943, Hadwiger conjectured that every graph with no $K_t$ minor is $(t-1)$-colorable for every $t\ge 1$. In the 1980s, Kostochka and Thomason independently proved that every graph with no $K_t$ minor has average degree $O(t\sqrt{\log t})$ and hence is $O(t\sqrt{\log t})$-colorable. Recently, Norin, Song and the second author showed that every graph with no $K_t$ minor is $O(t(\log t)^{\beta})$-colorable for every $\beta > 1/4$, making the first improvement on the order of magnitude of the $O(t\sqrt{\log t})$ bound. The first main result of this paper is that every graph with no $K_t$ minor is $O(t\log\log t)$-colorable. This is a corollary of our main technical result that the chromatic number of a $K_t$-minor-free graph is bounded by $O(t(1+f(G,t)))$ where $f(G,t)$ is the maximum of $\frac{\chi(H)}{a}$ over all $a\ge \frac{t}{\sqrt{\log t}}$ and $K_a$-minor-free subgraphs $H$ of $G$ that are small (i.e. $O(a\log^4 a)$ vertices). This has a number of interesting corollaries. First as mentioned, using the current best-known bounds on coloring small $K_t$-minor-free graphs, we show that $K_t$-minor-free graphs are $O(t\log\log t)$-colorable. Second, it shows that proving Linear Hadwiger's Conjecture (that $K_t$-minor-free graphs are $O(t)$-colorable) reduces to proving it for small graphs. Third, we prove that $K_t$-minor-free graphs with clique number at most $\sqrt{\log t}/ (\log \log t)^2$ are $O(t)$-colorable. This implies our final corollary that Linear Hadwiger's Conjecture holds for $K_r$-free graphs for every fixed $r$. One key to proving the main theorem is a new standalone result that every $K_t$-minor-free graph of average degree $d=\Omega(t)$ has a subgraph on $O(t \log^3 t)$ vertices with average degree $\Omega(d)$.
Generative models, e.g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts. Yet, the generation of 360-degree panorama images from text remains a challenge, particularly due to the dearth of paired text-panorama data and the domain gap between panorama and perspective images. In this paper, we introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt. We leverage the stable diffusion model as one branch to provide prior knowledge in natural image generation and register it to another panorama branch for holistic image generation. We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process. Our experiments validate that PanFusion surpasses existing methods and, thanks to its dual-branch structure, can integrate additional constraints like room layout for customized panorama outputs. Code is available at //chengzhag.github.io/publication/panfusion.
We improve the worst-case information theoretic lower bound of Munro and Wu (ISAAC 2018) for $n-$vertex unlabeled chordal graphs when vertex leafage is bounded and leafage is unbounded. The class of unlabeled $k-$vertex leafage chordal graphs that consists of all chordal graphs with vertex leafage at most $k$ and unbounded leafage, denoted $\mathcal{G}_k$, is introduced for the first time. For $k>0$ in $o(n/\log n)$, we obtain a lower bound of $((k-1)n \log n -kn \log k - O(\log n))-$bits on the size of any data structure that encodes a graph in $\mathcal{G}_k$. Further, for every $k-$vertex leafage chordal graph $G$ such that for $k>1$ in $o(n^c), c >0$, we present a $((k-1)n \log n + o(kn \log n))-$bit succinct data structure, constructed using the succinct data structure for path graphs with $kn/2$ vertices. Our data structure supports adjacency query in $O(k \log n)$ time and using additional $2n \log n$ bits, an $O(k^2 d_v \log n + \log^2 n)$ time neighbourhood query where $d_v$ is degree of $v \in V$.
Language models trained on internet-scale data sets have shown an impressive ability to solve problems in Natural Language Processing and Computer Vision. However, experience is showing that these models are frequently brittle in unexpected ways, and require significant scaffolding to ensure that they operate correctly in the larger systems that comprise "language-model agents." In this paper, we argue that behavior trees provide a unifying framework for combining language models with classical AI and traditional programming. We introduce Dendron, a Python library for programming language model agents using behavior trees. We demonstrate the approach embodied by Dendron in three case studies: building a chat agent, a camera-based infrastructure inspection agent for use on a mobile robot or vehicle, and an agent that has been built to satisfy safety constraints that it did not receive through instruction tuning or RLHF.
Nowadays, the spread of misinformation is a prominent problem in society. Our research focuses on aiding the automatic identification of misinformation by analyzing the persuasive strategies employed in textual documents. We introduce a novel annotation scheme encompassing common persuasive writing tactics to achieve our objective. Additionally, we provide a dataset on health misinformation, thoroughly annotated by experts utilizing our proposed scheme. Our contribution includes proposing a new task of annotating pieces of text with their persuasive writing strategy types. We evaluate fine-tuning and prompt-engineering techniques with pre-trained language models of the BERT family and the generative large language models of the GPT family using persuasive strategies as an additional source of information. We evaluate the effects of employing persuasive strategies as intermediate labels in the context of misinformation detection. Our results show that those strategies enhance accuracy and improve the explainability of misinformation detection models. The persuasive strategies can serve as valuable insights and explanations, enabling other models or even humans to make more informed decisions regarding the trustworthiness of the information.
Conversational Task Assistants (CTAs) guide users in performing a multitude of activities, such as making recipes. However, ensuring that interactions remain engaging, interesting, and enjoyable for CTA users is not trivial, especially for time-consuming or challenging tasks. Grounded in psychological theories of human interest, we propose to engage users with contextual and interesting statements or facts during interactions with a multi-modal CTA, to reduce fatigue and task abandonment before a task is complete. To operationalize this idea, we train a high-performing classifier (82% F1-score) to automatically identify relevant and interesting facts for users. We use it to create an annotated dataset of task-specific interesting facts for the domain of cooking. Finally, we design and validate a dialogue policy to incorporate the identified relevant and interesting facts into a conversation, to improve user engagement and task completion. Live testing on a leading multi-modal voice assistant shows that 66% of the presented facts were received positively, leading to a 40% gain in the user satisfaction rating, and a 37% increase in conversation length. These findings emphasize that strategically incorporating interesting facts into the CTA experience can promote real-world user participation for guided task interactions.
Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu's method solved only ten. In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong. Wu's method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu's method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist.
In 1952, Dirac proved the following theorem about long cycles in graphs with large minimum vertex degrees: Every $n$-vertex $2$-connected graph $G$ with minimum vertex degree $\delta\geq 2$ contains a cycle with at least $\min\{2\delta,n\}$ vertices. In particular, if $\delta\geq n/2$, then $G$ is Hamiltonian. The proof of Dirac's theorem is constructive, and it yields an algorithm computing the corresponding cycle in polynomial time. The combinatorial bound of Dirac's theorem is tight in the following sense. There are 2-connected graphs that do not contain cycles of length more than $2\delta+1$. Also, there are non-Hamiltonian graphs with all vertices but one of degree at least $n/2$. This prompts naturally to the following algorithmic questions. For $k\geq 1$, (A) How difficult is to decide whether a 2-connected graph contains a cycle of length at least $\min\{2\delta+k,n\}$? (B) How difficult is to decide whether a graph $G$ is Hamiltonian, when at least $n - k$ vertices of $G$ are of degrees at least $n/2-k$? The first question was asked by Fomin, Golovach, Lokshtanov, Panolan, Saurabh, and Zehavi. The second question is due to Jansen, Kozma, and Nederlof. Even for a very special case of $k=1$, the existence of a polynomial-time algorithm deciding whether $G$ contains a cycle of length at least $\min\{2\delta+1,n\}$ was open. We resolve both questions by proving the following algorithmic generalization of Dirac's theorem: If all but $k$ vertices of a $2$-connected graph $G$ are of degree at least $\delta$, then deciding whether $G$ has a cycle of length at least $\min\{2\delta +k, n\}$ can be done in time $2^{\mathcal{O}(k)}\cdot n^{\mathcal{O}(1)}$. The proof of the algorithmic generalization of Dirac's theorem builds on new graph-theoretical results that are interesting on their own.
Diagrammatic Teaching is a paradigm for robots to acquire novel skills, whereby the user provides 2D sketches over images of the scene to shape the robot's motion. In this work, we tackle the problem of teaching a robot to approach a surface and then follow cyclic motion on it, where the cycle of the motion can be arbitrarily specified by a single user-provided sketch over an image from the robot's camera. Accordingly, we contribute the Stable Diffeomorphic Diagrammatic Teaching (SDDT) framework. SDDT models the robot's motion as an Orbitally Asymptotically Stable (O.A.S.) dynamical system that learns to stablize based on a single diagrammatic sketch provided by the user. This is achieved by applying a \emph{diffeomorphism}, i.e. a differentiable and invertible function, to morph a known O.A.S. system. The parameterised diffeomorphism is then optimised with respect to the Hausdorff distance between the limit cycle of our modelled system and the sketch, to produce the desired robot motion. We provide novel theoretical insight into the behaviour of the optimised system and also empirically evaluate SDDT, both in simulation and on a quadruped with a mounted 6-DOF manipulator. Results show that we can diagrammatically teach complex cyclic motion patterns with a high degree of accuracy.
We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and analyzing what BERT has already known when solving this task, we obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. The analysis further motivates us to take a different approach than most existing works. Instead of using prior knowledge to create a new training task for fine-tuning BERT, we directly inject knowledge into BERT's multi-head attention mechanism. This leads us to a simple yet effective approach that enjoys fast training stage as it saves the model from training on additional data or tasks other than the main task. Extensive experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance over the original BERT model, and the performance benefit is most salient when training data is scarce.
Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.