The (Perfect) Matching Cut problem is to decide if a graph $G$ has a (perfect) matching cut, i.e., a (perfect) matching that is also an edge cut of $G$. Both Matching Cut and Perfect Matching Cut are known to be NP-complete. A perfect matching cut is also a matching cut with maximum number of edges. To increase our understanding of the relationship between the two problems, we introduce the Maximum Matching Cut problem. This problem is to determine a largest matching cut in a graph. We generalize and unify known polynomial-time algorithms for Matching Cut and Perfect Matching Cut restricted to graphs of diameter at most $2$ and to $(P_6+sP_2)$-free graphs. We also show that the complexity of Maximum Matching Cut differs from the complexities of Matching Cut and Perfect Matching Cut by proving NP-hardness of Maximum Matching Cut for $2P_3$-free quadrangulated graphs of diameter $3$ and radius $2$ and for subcubic line graphs of triangle-free graphs. In this way, we obtain full dichotomies of Maximum Matching Cut for graphs of bounded diameter, bounded radius and $H$-free graphs. Finally, we apply our techniques to get a dichotomy for the Maximum Disconnected Perfect Matching problem for $H$-free graphs. A disconnected perfect matching of a graph $G$ is a perfect matching that contains a matching cut of $G$. The Maximum Disconnected Perfect Matching problem asks to determine for a connected graph $G$, a disconnected perfect matching with a largest matching cut over all disconnected perfect matchings of $G$. Our dichotomy result implies that the original decision problem Disconnected Perfect Matching is polynomial-time solvable for $(P_6+sP_2)$-free graphs for every $s\geq 0$, which resolves an open problem of Bouquet and Picouleau (arXiv, 2020).
Medical visual question answering (VQA) is a challenging multimodal task, where Vision-Language Pre-training (VLP) models can effectively improve the generalization performance. However, most methods in the medical field treat VQA as an answer classification task which is difficult to transfer to practical application scenarios. Additionally, due to the privacy of medical images and the expensive annotation process, large-scale medical image-text pairs datasets for pretraining are severely lacking. In this paper, we propose a large-scale MultI-task Self-Supervised learning based framework (MISS) for medical VQA tasks. Unlike existing methods, we treat medical VQA as a generative task. We unify the text encoder and multimodal encoder and align image-text features through multi-task learning. Furthermore, we propose a Transfer-and-Caption method that extends the feature space of single-modal image datasets using large language models (LLMs), enabling those traditional medical vision field task data to be applied to VLP. Experiments show that our method achieves excellent results with fewer multimodal datasets and demonstrates the advantages of generative VQA models. The code and model weights will be released upon the paper's acceptance.
The Maximum k-Defective Clique Problem (MDCP) aims to find a maximum k-defective clique in a given graph, where a k-defective clique is a relaxation clique missing at most k edges. MDCP is NP-hard and finds many real-world applications in analyzing dense but not necessarily complete subgraphs. Exact algorithms for MDCP mainly follow the Branch-and-bound (BnB) framework, whose performance heavily depends on the quality of the upper bound on the cardinality of a maximum k-defective clique. The state-of-the-art BnB MDCP algorithms calculate the upper bound quickly but conservatively as they ignore many possible missing edges. In this paper, we propose a novel CoLoring-based Upper Bound (CLUB) that uses graph coloring techniques to detect independent sets so as to detect missing edges ignored by the previous methods. We then develop a new BnB algorithm for MDCP, called KD-Club, using CLUB in both the preprocessing stage for graph reduction and the BnB searching process for branch pruning. Extensive experiments show that KD-Club significantly outperforms state-of-the-art BnB MDCP algorithms on the number of solved instances within the cut-off time, having much smaller search tree and shorter solving time on various benchmarks.
The $b$-symbol metric is a generalization of the Hamming metric. Linear codes, in the $b$-symbol metric, have been used in the read channel whose outputs consist of $b$ consecutive symbols. The Griesmer bound outperforms the Singleton bound for $\mathbb{F}_q$-linear codes in the Hamming metric, when $q$ is fixed and the length is large enough. This scenario is also applicable in the $b$-symbol metric. Shi, Zhu, and Helleseth recently made a conjecture on cyclic codes in the $b$-symbol metric. In this paper, we present the $b$-symbol Griesmer bound for linear codes by concatenating linear codes and simplex codes. Based on cyclic codes and extended cyclic codes, we propose two families of distance-optimal linear codes with respect to the $b$-symbol Griesmer bound.
Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted. In this paper, we propose DI-AEC, pioneering a diffusion-based stochastic regeneration approach dedicated to AEC. Further, we propose FADI-AEC, fast score-based diffusion AEC framework to save computational demands, making it favorable for edge devices. It stands out by running the score model once per frame, achieving a significant surge in processing efficiency. Apart from that, we introduce a novel noise generation technique where far-end signals are utilized, incorporating both far-end and near-end signals to refine the score model's accuracy. We test our proposed method on the ICASSP2023 Microsoft deep echo cancellation challenge evaluation dataset, where our method outperforms some of the end-to-end methods and other diffusion based echo cancellation methods.
Our main result is a reduction from worst-case lattice problems such as GapSVP and SIVP to a certain learning problem. This learning problem is a natural extension of the `learning from parity with error' problem to higher moduli. It can also be viewed as the problem of decoding from a random linear code. This, we believe, gives a strong indication that these problems are hard. Our reduction, however, is quantum. Hence, an efficient solution to the learning problem implies a quantum algorithm for GapSVP and SIVP. A main open question is whether this reduction can be made classical (i.e., non-quantum). We also present a (classical) public-key cryptosystem whose security is based on the hardness of the learning problem. By the main result, its security is also based on the worst-case quantum hardness of GapSVP and SIVP. The new cryptosystem is much more efficient than previous lattice-based cryptosystems: the public key is of size $\tilde{O}(n^2)$ and encrypting a message increases its size by a factor of $\tilde{O}(n)$ (in previous cryptosystems these values are $\tilde{O}(n^4)$ and $\tilde{O}(n^2)$, respectively). In fact, under the assumption that all parties share a random bit string of length $\tilde{O}(n^2)$, the size of the public key can be reduced to $\tilde{O}(n)$.
In this paper, we introduce $\textbf{GS-SLAM}$ that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D re-rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussian in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. The source code will be released soon.
We propose Subject-Conditional Relation Detection SCoRD, where conditioned on an input subject, the goal is to predict all its relations to other objects in a scene along with their locations. Based on the Open Images dataset, we propose a challenging OIv6-SCoRD benchmark such that the training and testing splits have a distribution shift in terms of the occurrence statistics of $\langle$subject, relation, object$\rangle$ triplets. To solve this problem, we propose an auto-regressive model that given a subject, it predicts its relations, objects, and object locations by casting this output as a sequence of tokens. First, we show that previous scene-graph prediction methods fail to produce as exhaustive an enumeration of relation-object pairs when conditioned on a subject on this benchmark. Particularly, we obtain a recall@3 of 83.8% for our relation-object predictions compared to the 49.75% obtained by a recent scene graph detector. Then, we show improved generalization on both relation-object and object-box predictions by leveraging during training relation-object pairs obtained automatically from textual captions and for which no object-box annotations are available. Particularly, for $\langle$subject, relation, object$\rangle$ triplets for which no object locations are available during training, we are able to obtain a recall@3 of 33.80% for relation-object pairs and 26.75% for their box locations.
Graph Neural Networks (GNNs) are state-of-the-art models for performing prediction tasks on graphs. While existing GNNs have shown great performance on various tasks related to graphs, little attention has been paid to the scenario where out-of-distribution (OOD) nodes exist in the graph during training and inference. Borrowing the concept from CV and NLP, we define OOD nodes as nodes with labels unseen from the training set. Since a lot of networks are automatically constructed by programs, real-world graphs are often noisy and may contain nodes from unknown distributions. In this work, we define the problem of graph learning with out-of-distribution nodes. Specifically, we aim to accomplish two tasks: 1) detect nodes which do not belong to the known distribution and 2) classify the remaining nodes to be one of the known classes. We demonstrate that the connection patterns in graphs are informative for outlier detection, and propose Out-of-Distribution Graph Attention Network (OODGAT), a novel GNN model which explicitly models the interaction between different kinds of nodes and separate inliers from outliers during feature propagation. Extensive experiments show that OODGAT outperforms existing outlier detection methods by a large margin, while being better or comparable in terms of in-distribution classification.
Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibrium refinements. We then prove several equivalence results between MAIDs and EFGs. Finally, we describe an open source implementation for reasoning about MAIDs and computing their equilibria.
Most existing event extraction (EE) methods merely extract event arguments within the sentence scope. However, such sentence-level EE methods struggle to handle soaring amounts of documents from emerging applications, such as finance, legislation, health, etc., where event arguments always scatter across different sentences, and even multiple such event mentions frequently co-exist in the same document. To address these challenges, we propose a novel end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic graph to fulfill the document-level EE (DEE) effectively. Moreover, we reformalize a DEE task with the no-trigger-words design to ease the document-level event labeling. To demonstrate the effectiveness of Doc2EDAG, we build a large-scale real-world dataset consisting of Chinese financial announcements with the challenges mentioned above. Extensive experiments with comprehensive analyses illustrate the superiority of Doc2EDAG over state-of-the-art methods. Data and codes can be found at //github.com/dolphin-zs/Doc2EDAG.