We define the relative fractional independence number of two graphs, $G$ and $H$, as $$\alpha^*(G|H)=\max_{W}\frac{\alpha(G\boxtimes W)}{\alpha(H\boxtimes W)},$$ where the maximum is taken over all graphs $W$, $G\boxtimes W$ is the strong product of $G$ and $W$, and $\alpha$ denotes the independence number. We give a non-trivial linear program to compute $\alpha^*(G|H)$ and discuss some of its properties. We show that $$\alpha^*(G|H)\geq \frac{X(G)}{X(H)},$$ where $X(G)$ can be the independence number, the zero-error Shannon capacity, the fractional independence number, the Lov'{a}sz number, or the Schrijver's or Szegedy's variants of the Lov'{a}sz number of a graph $G$. This inequality is the first explicit non-trivial upper bound on the ratio of the invariants of two arbitrary graphs, as mentioned earlier, which can also be used to obtain upper or lower bounds for these invariants. As explicit applications, we present new upper bounds for the ratio of the zero-error Shannon capacity of two Cayley graphs and compute the Haemers number of certain Johnson graphs. Moreover, we show that the relative fractional independence number can be used to present a stronger version of the well-known No-Homomorphism Lemma. The No-Homomorphism Lemma is widely used to show the non-existence of a homomorphism between two graphs and is also used to give an upper bound on the independence number of a graph. Our extension of the No-Homomorphism Lemma is computationally more accessible than its original version.
We study the emptiness and $\lambda$-reachability problems for unary and binary Probabilistic Finite Automata (PFA) and characterise the complexity of these problems in terms of the degree of ambiguity of the automaton and the size of its alphabet. Our main result is that emptiness and $\lambda$-reachability are solvable in EXPTIME for polynomially ambiguous unary PFA and if, in addition, the transition matrix is binary, we show they are in NP. In contrast to the Skolem-hardness of the $\lambda$-reachability and emptiness problems for exponentially ambiguous unary PFA, we show that these problems are NP-hard even for finitely ambiguous unary PFA. For binary polynomially ambiguous PFA with fixed and commuting transition matrices, we prove NP-hardness of the $\lambda$-reachability (dimension 9), nonstrict emptiness (dimension 37) and strict emptiness (dimension 40) problems.
Given $k$ input graphs $G_1, \dots ,G_k$, where each pair $G_i$, $G_j$ with $i \neq j$ shares the same graph $G$, the problem Simultaneous Embedding With Fixed Edges (SEFE) asks whether there exists a planar drawing for each input graph such that all drawings coincide on $G$. While SEFE is still open for the case of two input graphs, the problem is NP-complete for $k \geq 3$ [Schaefer, JGAA 13]. In this work, we explore the parameterized complexity of SEFE. We show that SEFE is FPT with respect to $k$ plus the vertex cover number or the feedback edge set number of the the union graph $G^\cup = G_1 \cup \dots \cup G_k$. Regarding the shared graph $G$, we show that SEFE is NP-complete, even if $G$ is a tree with maximum degree 4. Together with a known NP-hardness reduction [Angelini et al., TCS 15], this allows us to conclude that several parameters of $G$, including the maximum degree, the maximum number of degree-1 neighbors, the vertex cover number, and the number of cutvertices are intractable. We also settle the tractability of all pairs of these parameters. We give FPT algorithms for the vertex cover number plus either of the first two parameters and for the number of cutvertices plus the maximum degree, whereas we prove all remaining combinations to be intractable.
In 2013, Pak and Panova proved the strict unimodality property of $q$-binomial coefficients $\binom{\ell+m}{m}_q$ (as polynomials in $q$) based on the combinatorics of Young tableaux and the semigroup property of Kronecker coefficients. They showed it to be true for all $\ell,m\geq 8$ and a few other cases. We propose a different approach to this problem based on computer algebra, where we establish a closed form for the coefficients of these polynomials and then use cylindrical algebraic decomposition to identify exactly the range of coefficients where strict unimodality holds. This strategy allows us to tackle generalizations of the problem, e.g., to show unimodality with larger gaps or unimodality of related sequences. In particular, we present proofs of two additional cases of a conjecture by Stanley and Zanello.
Given a rectangle $R$ with area $A$ and a set of areas $L=\{A_1,...,A_n\}$ with $\sum_{i=1}^n A_i = A$, we consider the problem of partitioning $R$ into $n$ sub-regions $R_1,...,R_n$ with areas $A_1,...,A_n$ in a way that the total perimeter of all sub-regions is minimized. The goal is to create square-like sub-regions, which are often more desired. We propose a divide and conquer algorithm for this problem that finds factor $1.2$--approximate solutions in $\mathcal{O}(n\log n)$ time.
We provide a framework to prove convergence rates for discretizations of kinetic Langevin dynamics for $M$-$\nabla$Lipschitz $m$-log-concave densities. Our approach provides convergence rates of $\mathcal{O}(m/M)$, with explicit stepsize restrictions, which are of the same order as the stability threshold for Gaussian targets and are valid for a large interval of the friction parameter. We apply this methodology to various integration methods which are popular in the molecular dynamics and machine learning communities. Finally we introduce the property ``$\gamma$-limit convergent" (GLC) to characterise underdamped Langevin schemes that converge to overdamped dynamics in the high friction limit and which have stepsize restrictions that are independent of the friction parameter; we show that this property is not generic by exhibiting methods from both the class and its complement.
Voxel-based segmentation volumes often store a large number of labels and voxels, and the resulting amount of data can make storage, transfer, and interactive visualization difficult. We present a lossless compression technique which addresses these challenges. It processes individual small bricks of a segmentation volume and compactly encodes the labelled regions and their boundaries by an iterative refinement scheme. The result for each brick is a list of labels, and a sequence of operations to reconstruct the brick which is further compressed using rANS-entropy coding. As the relative frequencies of operations are very similar across bricks, the entropy coding can use global frequency tables for an entire data set which enables efficient and effective parallel (de)compression. Our technique achieves high throughput (up to gigabytes per second both for compression and decompression) and strong compression ratios of about 1% to 3% of the original data set size while being applicable to GPU-based rendering. We evaluate our method for various data sets from different fields and demonstrate GPU-based volume visualization with on-the-fly decompression, level-of-detail rendering (with optional on-demand streaming of detail coefficients to the GPU), and a caching strategy for decompressed bricks for further performance improvement.
Raga is a fundamental melodic concept in Indian Art Music (IAM). It is characterized by complex patterns. All performances and compositions are based on the raga framework. Raga and tonic detection have been a long-standing research problem in the field of Music Information Retrieval. In this paper, we attempt to detect the raga using a novel feature to extract sequential or temporal information from an audio sample. We call these Sequential Pitch Distributions (SPD), which are distributions taken over pitch values between two given pitch values over time. We also achieve state-of-the-art results on both Hindustani and Carnatic music raga data sets with an accuracy of 99% and 88.13%, respectively. SPD gives a great boost in accuracy over a standard pitch distribution. The main goal of this paper, however, is to present an alternative approach to modeling the temporal aspects of the melody and thereby deducing the raga.
2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.