This paper studies $k$-claw-free graphs, exploring the connection between an extremal combinatorics question and the power of a convex program in approximating the maximum-weight independent set in this graph class. For the extremal question, we consider the notion, that we call \textit{conditional $\chi$-boundedness} of a graph: Given a graph $G$ that is assumed to contain an independent set of a certain (constant) size, we are interested in upper bounding the chromatic number in terms of the clique number of $G$. This question, besides being interesting on its own, has algorithmic implications (which have been relatively neglected in the literature) on the performance of SDP relaxations in estimating the value of maximum-weight independent set. For $k=3$, Chudnovsky and Seymour (JCTB 2010) prove that any $3$-claw-free graph $G$ with an independent set of size three must satisfy $\chi(G) \leq 2 \omega(G)$. Their result implies a factor $2$-estimation algorithm for the maximum weight independent set via an SDP relaxation (providing the first non-trivial result for maximum-weight independent set in such graphs via a convex relaxation). An obvious open question is whether a similar conditional $\chi$-boundedness phenomenon holds for any $k$-claw-free graph. Our main result answers this question negatively. We further present some evidence that our construction could be useful in studying more broadly the power of convex relaxations in the context of approximating maximum weight independent set in $k$-claw free graphs. In particular, we prove a lower bound on families of convex programs that are stronger than known convex relaxations used algorithmically in this context.
In this paper we study the orbit closure problem for a reductive group $G\subseteq GL(X)$ acting on a finite dimensional vector space $V$ over $\C$. We assume that the center of $GL(X)$ lies within $G$ and acts on $V$ through a fixed non-trivial character. We study points $y,z\in V$ where (i) $z$ is obtained as the leading term of the action of a 1-parameter subgroup $\lambda (t)\subseteq G$ on $y$, and (ii) $y$ and $z$ have large distinctive stabilizers $K,H \subseteq G$. Let $O(z)$ (resp. $O(y)$) denote the $G$-orbits of $z$ (resp. $y$), and $\overline{O(z)}$ (resp. $\overline{O(y)}$) their closures, then (i) implies that $z\in \overline{O(y)}$. We address the question: under what conditions can (i) and (ii) be simultaneously satisfied, i.e, there exists a 1-PS $\lambda \subseteq G$ for which $z$ is observed as a limit of $y$. Using $\lambda$, we develop a leading term analysis which applies to $V$ as well as to ${\cal G}= Lie(G)$ the Lie algebra of $G$ and its subalgebras ${\cal K}$ and ${\cal H}$, the Lie algebras of $K$ and $H$ respectively. Through this we construct the Lie algebra $\hat{\cal K} \subseteq {\cal H}$ which connects $y$ and $z$ through their Lie algebras. We develop the properties of $\hat{\cal K}$ and relate it to the action of ${\cal H}$ on $\overline{N}=V/T_z O(z)$, the normal slice to the orbit $O(z)$. We examine the case of {\em alignment} when a semisimple element belongs to both ${\cal H}$ and ${\cal K}$, and the conditions for the same. We illustrate some consequences of alignment. Next, we examine the possibility of {\em intermediate $G$-varieties} $W$ which lie between the orbit closures of $z$ and $y$, i.e. $\overline{O(z)} \subsetneq W \subsetneq O(y)$. These have a direct bearing on representation theoretic as well as geometric properties which connect $z$ and $y$.
This paper proposes a new framework to study multi-agent interactions in Markov games: Markov $\alpha$-potential game. A game is called Markov $\alpha$-potential game if there exists a Markov potential game such that the pairwise difference between the change of a player's value function under a unilateral policy deviation in the Markov game and Markov potential game can be bounded by $\alpha$. As a special case, Markov potential games are Markov $\alpha$-potential games with $\alpha=0$. The dependence of $\alpha$ on the game parameters is also explicitly characterized in two classes of games that are practically-relevant: Markov congestion games and the perturbed Markov team games. For general Markov games, an optimization-based approach is introduced which can compute a Markov potential game which is closest to the given game in terms of $\alpha$. This approach can also be used to verify whether a game is a Markov potential game, and provide a candidate potential function. Two algorithms -- the projected gradient-ascent algorithm and the {sequential maximum one-stage improvement} -- are provided to approximate the stationary Nash equilibrium in Markov $\alpha$-potential games and the corresponding Nash-regret analysis is presented. The numerical experiments demonstrate that simple algorithms are capable of finding approximate equilibrium in Markov $\alpha$-potential games.
The most advanced text-to-image (T2I) models require significant training costs (e.g., millions of GPU hours), seriously hindering the fundamental innovation for the AIGC community while increasing CO2 emissions. This paper introduces PIXART-$\alpha$, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), reaching near-commercial application standards. Additionally, it supports high-resolution image synthesis up to 1024px resolution with low training cost, as shown in Figure 1 and 2. To achieve this goal, three core designs are proposed: (1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality; (2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch; (3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning. As a result, PIXART-$\alpha$'s training speed markedly surpasses existing large-scale T2I models, e.g., PIXART-$\alpha$ only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days), saving nearly \$300,000 (\$26,000 vs. \$320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, our training cost is merely 1%. Extensive experiments demonstrate that PIXART-$\alpha$ excels in image quality, artistry, and semantic control. We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.
We compute the weight distribution of the ${\mathcal R} (4,9)$ by combining the approach described in D. V. Sarwate's Ph.D. thesis from 1973 with knowledge on the affine equivalence classification of Boolean functions. To solve this problem posed, e.g., in the MacWilliams and Sloane book [p. 447], we apply a refined approach based on the classification of Boolean quartic forms in $8$ variables due to Ph. Langevin and G. Leander, and recent results on the classification of the quotient space ${\mathcal R} (4,7)/{\mathcal R} (2,7)$ due to V. Gillot and Ph. Langevin.
Many-to-many multimodal summarization (M$^3$S) task aims to generate summaries in any language with document inputs in any language and the corresponding image sequence, which essentially comprises multimodal monolingual summarization (MMS) and multimodal cross-lingual summarization (MXLS) tasks. Although much work has been devoted to either MMS or MXLS and has obtained increasing attention in recent years, little research pays attention to the M$^3$S task. Besides, existing studies mainly focus on 1) utilizing MMS to enhance MXLS via knowledge distillation without considering the performance of MMS or 2) improving MMS models by filtering summary-unrelated visual features with implicit learning or explicitly complex training objectives. In this paper, we first introduce a general and practical task, i.e., M$^3$S. Further, we propose a dual knowledge distillation and target-oriented vision modeling framework for the M$^3$S task. Specifically, the dual knowledge distillation method guarantees that the knowledge of MMS and MXLS can be transferred to each other and thus mutually prompt both of them. To offer target-oriented visual features, a simple yet effective target-oriented contrastive objective is designed and responsible for discarding needless visual information. Extensive experiments on the many-to-many setting show the effectiveness of the proposed approach. Additionally, we will contribute a many-to-many multimodal summarization (M$^3$Sum) dataset.
We develop two "Nesterov's accelerated" variants of the well-known extragradient method to approximate a solution of a co-hypomonotone inclusion constituted by the sum of two operators, where one is Lipschitz continuous and the other is possibly multivalued. The first scheme can be viewed as an accelerated variant of Tseng's forward-backward-forward splitting (FBFS) method, while the second one is a Nesterov's accelerated variant of the "past" FBFS scheme, which requires only one evaluation of the Lipschitz operator and one resolvent of the multivalued mapping. Under appropriate conditions on the parameters, we theoretically prove that both algorithms achieve $\mathcal{O}(1/k)$ last-iterate convergence rates on the residual norm, where $k$ is the iteration counter. Our results can be viewed as alternatives of a recent class of Halpern-type methods for root-finding problems. For comparison, we also provide a new convergence analysis of the two recent extra-anchored gradient-type methods for solving co-hypomonotone inclusions.
Kalai's $3^d$-conjecture states that every centrally symmetric $d$-polytope has at least $3^d$ faces. We give short proofs for two special cases: if $P$ is unconditional (that is, invariant w.r.t. reflection in any coordinate hyperplane), and more generally, if $P$ is locally anti-blocking (that is, looks like an unconditional polytope in every orthant). In both cases we show that the minimum is attained exactly for the Hanner polytopes.
Let $P$ be a convex polygon in the plane, and let $T$ be a triangulation of $P$. An edge $e$ in $T$ is called a diagonal if it is shared by two triangles in $T$. A flip of a diagonal $e$ is the operation of removing $e$ and adding the opposite diagonal of the resulting quadrilateral to obtain a new triangulation of $P$ from $T$. The flip distance between two triangulations of $P$ is the minimum number of flips needed to transform one triangulation into the other. The Convex Flip Distance problem asks if the flip distance between two given triangulations of $P$ is at most $k$, for some given parameter $k$. We present an FPT algorithm for the Convex Flip Distance problem that runs in time $O(3.82^k)$ and uses polynomial space, where $k$ is the number of flips. This algorithm significantly improves the previous best FPT algorithms for the problem.
In this paper, we use semidefinite programming and representation theory to compute new lower bounds on the crossing number of the complete bipartite graph $K_{m,n}$, extending a method from de Klerk et al. [SIAM J. Discrete Math. 20 (2006), 189--202] and the subsequent reduction by De Klerk, Pasechnik and Schrijver [Math. Prog. Ser. A and B, 109 (2007) 613--624]. We exploit the full symmetry of the problem using a novel decomposition technique. This results in a full block-diagonalization of the underlying matrix algebra, which we use to improve bounds on several concrete instances. Our results imply that $\text{cr}(K_{10,n}) \geq 4.87057 n^2 - 10n$, $\text{cr}(K_{11,n}) \geq 5.99939 n^2-12.5n$, $\text{cr}(K_{12,n}) \geq 7.25579 n^2 - 15n$, $\text{cr}(K_{13,n}) \geq 8.65675 n^2-18n$ for all $n$. The latter three bounds are computed using a new and well-performing relaxation of the original semidefinite programming bound. This new relaxation is obtained by only requiring one small matrix block to be positive semidefinite.
Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.