In this work, we initiate the complexity study of Biclique Contraction and Balanced Biclique Contraction. In these problems, given as input a graph G and an integer k, the objective is to determine whether one can contract at most k edges in G to obtain a biclique and a balanced biclique, respectively. We first prove that these problems are NP-complete even when the input graph is bipartite. Next, we study the parameterized complexity of these problems and show that they admit single exponential-time FPT algorithms when parameterized by the number k of edge contractions. Then, we show that Balanced Biclique Contraction admits a quadratic vertex kernel while Biclique Contraction does not admit any polynomial compression (or kernel) under standard complexity-theoretic assumptions.
In this paper, we investigate the computational complexity of solutions to the Laplace and the diffusion equation. We show that for a certain class of initial-boundary value problems of the Laplace and the diffusion equation, the solution operator is $\# P_1/ \#P$-complete in the sense that it maps polynomial-time computable functions to the set of $\#P_1/ \#P$-complete functions. Consequently, there exists polynomial-time (Turing) computable input data such that the solution is not polynomial-time computable, unless $FP=\#P$ or $FP_1=\#P_1$. In this case, we can, in general, not simulate the solution of the Laplace or the diffusion equation on a digital computer without having a complexity blowup, i.e., the computation time for obtaining an approximation of the solution with up to a finite number of significant digits grows non-polynomially in the number of digits. This indicates that the computational complexity of the solution operator that models a physical phenomena is intrinsically high, independent of the numerical algorithm that is used to approximate a solution.
Inspired by the works of Goldreich and Ron (J. ACM, 2017) and Nakar and Ron (ICALP, 2021), we initiate the study of property testing in dynamic environments with arbitrary topologies. Our focus is on the simplest non-trivial rule that can be tested, which corresponds to the 1-BP rule of bootstrap percolation and models a simple spreading behavior: Every "infected" node stays infected forever, and each "healthy" node becomes infected if and only if it has at least one infected neighbor. We show various results for both the case where we test a single time step of evolution and where the evolution spans several time steps. In the first, we show that the worst-case query complexity is $O(\Delta/\varepsilon)$ or $\tilde{O}(\sqrt{n}/\varepsilon)$ (whichever is smaller), where $\Delta$ and $n$ are the maximum degree of a node and number of vertices, respectively, in the underlying graph, and we also show lower bounds for both one- and two-sided error testers that match our upper bounds up to $\Delta = o(\sqrt{n})$ and $\Delta = O(n^{1/3})$, respectively. In the second setting of testing the environment over $T$ time steps, we show upper bounds of $O(\Delta^{T-1}/\varepsilon T)$ and $\tilde{O}(|E|/\varepsilon T)$, where $E$ is the set of edges of the underlying graph. All of our algorithms are one-sided error, and all of them are also time-conforming and non-adaptive, with the single exception of the more complex $\tilde{O}(\sqrt{n}/\varepsilon)$-query tester for the case $T = 2$.
In this note we highlight some connections of UMAP to the basic principles of Information Geometry. Originally, UMAP was derived from Category Theory observations. However, we posit that it also has a natural geometric interpretation.
Generalizing work of K\"unnemann, Paturi, and Schneider [ICALP 2017], we study a wide class of high-dimensional dynamic programming (DP) problems in which one must find the shortest path between two points in a high-dimensional grid given a tensor of transition costs between nodes in the grid. This captures many classical problems which are solved using DP such as the knapsack problem, the airplane refueling problem, and the minimal-weight polygon triangulation problem. We observe that for many of these problems, the tensor naturally has low tensor rank or low slice rank. We then give new algorithms and a web of fine-grained reductions to tightly determine the complexity of these problems. For instance, we show that a polynomial speedup over the DP algorithm is possible when the tensor rank is a constant or the slice rank is 1, but that such a speedup is impossible if the tensor rank is slightly super-constant (assuming SETH) or the slice rank is at least 3 (assuming the APSP conjecture). We find that this characterizes the known complexities for many of these problems, and in some cases leads to new faster algorithms.
We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but further investigation allowed us to utilize hp for multiple compilers by linking to the Fujitsu library libmpg and transparent hugepages (thp) by enabling it at the node level. By comparing the results of hardware counters and in-code timers, we found that hp and thp do not significantly impact the runtime performance of FLASH. Interestingly, there is a significant reduction in the TLB misses, differences in cache and memory access counters, and strange behavior is observed when using thp.
While Multimodal Large Language Models (MLLMs) are widely used for a variety of vision-language tasks, one observation is that they sometimes misinterpret visual inputs or fail to follow textual instructions even in straightforward cases, leading to irrelevant responses, mistakes, and ungrounded claims. This observation is analogous to a phenomenon in neuropsychology known as Agnosia, an inability to correctly process sensory modalities and recognize things (e.g., objects, colors, relations). In our study, we adapt this similar concept to define "agnosia in MLLMs", and our goal is to comprehensively evaluate and mitigate such agnosia in MLLMs. Inspired by the diagnosis and treatment process in neuropsychology, we propose a novel framework EMMA (Evaluation and Mitigation of Multimodal Agnosia). In EMMA, we develop an evaluation module that automatically creates fine-grained and diverse visual question answering examples to assess the extent of agnosia in MLLMs comprehensively. We also develop a mitigation module to reduce agnosia in MLLMs through multimodal instruction tuning on fine-grained conversations. To verify the effectiveness of our framework, we evaluate and analyze agnosia in seven state-of-the-art MLLMs using 9K test samples. The results reveal that most of them exhibit agnosia across various aspects and degrees. We further develop a fine-grained instruction set and tune MLLMs to mitigate agnosia, which led to notable improvement in accuracy.
Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, in the last few years, a large research effort has been devoted to image captioning, i.e. the task of describing images with syntactically and semantically meaningful sentences. Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoding step and a language model for text generation. During these years, both components have evolved considerably through the exploitation of object regions, attributes, and relationships and the introduction of multi-modal connections, fully-attentive approaches, and BERT-like early-fusion strategies. However, regardless of the impressive results obtained, research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview and categorization of image captioning approaches, from visual encoding and text generation to training strategies, used datasets, and evaluation metrics. In this respect, we quantitatively compare many relevant state-of-the-art approaches to identify the most impactful technical innovations in image captioning architectures and training strategies. Moreover, many variants of the problem and its open challenges are analyzed and discussed. The final goal of this work is to serve as a tool for understanding the existing state-of-the-art and highlighting the future directions for an area of research where Computer Vision and Natural Language Processing can find an optimal synergy.
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.