In Terminal Monitoring Set (TMS), the input is an undirected graph $G=(V,E)$, together with a collection $T$ of terminal pairs and the goal is to find a subset $S$ of minimum size that hits a shortest path between every pair of terminals. We show that this problem is W[2]-hard with respect to solution size. On the positive side, we show that TMS is fixed parameter tractable with respect to solution size plus distance to cluster, solution size plus neighborhood diversity, and feedback edge number. For the weighted version of the problem, we obtain a FPT algorithm with respect to vertex cover number, and for a relaxed version of the problem, we show that it is W[1]-hard with respect to solution size plus feedback vertex number.
A $(\beta,\delta,\Delta)$-padded decomposition of an edge-weighted graph $G = (V,E,w)$ is a stochastic decomposition into clusters of diameter at most $\Delta$ such that for every vertex $v\in V$, the probability that $\rm{ball}_G(v,\gamma\Delta)$ is entirely contained in the cluster containing $v$ is at least $e^{-\beta\gamma}$ for every $\gamma \in [0,\delta]$. Padded decompositions have been studied for decades and have found numerous applications, including metric embedding, multicommodity flow-cut gap, muticut, and zero extension problems, to name a few. In these applications, parameter $\beta$, called the padding parameter, is the most important parameter since it decides either the distortion or the approximation ratios. For general graphs with $n$ vertices, $\beta = \Theta(\log n)$. Klein, Plotkin, and Rao showed that $K_r$-minor-free graphs have padding parameter $\beta = O(r^3)$, which is a significant improvement over general graphs when $r$ is a constant. A long-standing conjecture is to construct a padded decomposition for $K_r$-minor-free graphs with padding parameter $\beta = O(\log r)$. Despite decades of research, the best-known result is $\beta = O(r)$, even for graphs with treewidth at most $r$. In this work, we make significant progress toward the aforementioned conjecture by showing that graphs with treewidth $\rm{tw}$ admit a padded decomposition with padding parameter $O(\log \rm{tw})$, which is tight. As corollaries, we obtain an exponential improvement in dependency on treewidth in a host of algorithmic applications: $O(\sqrt{ \log n \cdot \log(\rm{tw})})$ flow-cut gap, max flow-min multicut ratio of $O(\log(\rm{tw}))$, an $O(\log(\rm{tw}))$ approximation for the 0-extension problem, an $\ell^{O(\log n)}_\infty$ embedding with distortion $O(\log \rm{tw})$, and an $O(\log \rm{tw})$ bound for integrality gap for the uniform sparsest cut.
The definition of Data Science is a hotly debated topic. For many, the definition is a simple shortcut to Artificial Intelligence or Machine Learning. However, there is far more depth and nuance to the field of Data Science than a simple shortcut can provide. The School of Data Science at the University of Virginia has developed a novel model for the definition of Data Science. This model is based on identifying a unified understanding of the data work done across all areas of Data Science. It represents a generational leap forward in how we understand and teach Data Science. In this paper we will present the core features of the model and explain how it unifies various concepts going far beyond the analytics component of AI. From this foundation we will present our Undergraduate Major curriculum in Data Science and demonstrate how it prepares students to be well-rounded Data Science team members and leaders. The paper will conclude with an in-depth overview of the Foundations of Data Science course designed to introduce students to the field while also implementing proven STEM oriented pedagogical methods. These include, for example, specifications grading, active learning lectures, guest lectures from industry experts and weekly gamification labs.
The all-pairs shortest distances (APSD) with differential privacy (DP) problem takes as input an undirected, weighted graph $G = (V,E, \mathbf{w})$ and outputs a private estimate of the shortest distances in $G$ between all pairs of vertices. In this paper, we present a simple $\widetilde{O}(n^{1/3}/\varepsilon)$-accurate algorithm to solve APSD with $\varepsilon$-DP, which reduces to $\widetilde{O}(n^{1/4}/\varepsilon)$ in the $(\varepsilon, \delta)$-DP setting, where $n = |V|$. Our algorithm greatly improves upon the error of prior algorithms, namely $\widetilde{O}(n^{2/3}/\varepsilon)$ and $\widetilde{O}(\sqrt{n}/\varepsilon)$ in the two respective settings, and is the first to be optimal up to a polylogarithmic factor, based on a lower bound of $\widetilde{\Omega}(n^{1/4})$. In the case where a multiplicative approximation is allowed, we give two different constructions of algorithms with reduced additive error. Our first construction allows a multiplicative approximation of $O(k\log{\log{n}})$ and has additive error $\widetilde{O}(k\cdot n^{1/k}/\varepsilon)$ in the $\varepsilon$-DP case and $\widetilde{O}(\sqrt{k}\cdot n^{1/(2k)}/\varepsilon)$ in the $(\varepsilon, \delta)$-DP case. Our second construction allows multiplicative approximation $2k-1$ and has the same asymptotic additive error as the first construction. Both constructions significantly improve upon the currently best-known additive error of, $\widetilde{O}(k\cdot n^{1/2 + 1/(4k+2)}/\varepsilon)$ and $\widetilde{O}(k\cdot n^{1/3 + 2/(9k+3)}/\varepsilon)$, respectively. Our algorithms are straightforward and work by decomposing a graph into a set of spanning trees, and applying a key observation that we can privately release APSD in trees with $O(\text{polylog}(n))$ error.
To assess the quality of a probabilistic prediction for stochastic dynamical systems (SDSs), scoring rules assign a numerical score based on the predictive distribution and the measured state. In this paper, we propose an $\epsilon$-logarithm score that generalizes the celebrated logarithm score by considering a neighborhood with radius $\epsilon$. We characterize the probabilistic predictability of an SDS by optimizing the expected score over the space of probability measures. We show how the probabilistic predictability is quantitatively determined by the neighborhood radius, the differential entropies of process noises, and the system dimension. Given any predictor, we provide approximations for the expected score with an error of scale $\mathcal{O}(\epsilon)$. In addition to the expected score, we also analyze the asymptotic behaviors of the score on individual trajectories. Specifically, we prove that the score on a trajectory can converge to the expected score when the process noises are independent and identically distributed. Moreover, the convergence speed against the trajectory length $T$ is of scale $\mathcal{O}(T^{-\frac{1}{2}})$ in the sense of probability. Finally, numerical examples are given to elaborate the results.
In this work, we study the Biclique-Free Vertex Deletion problem: Given a graph $G$ and integers $k$ and $i \le j$, find a set of at most $k$ vertices that intersects every (not necessarily induced) biclique $K_{i, j}$ in $G$. This is a natural generalization of the Bounded-Degree Deletion problem, wherein one asks whether there is a set of at most $k$ vertices whose deletion results in a graph of a given maximum degree $r$. The two problems coincide when $i = 1$ and $j = r + 1$. We show that Biclique-Free Vertex Deletion is fixed-parameter tractable with respect to $k + d$ for the degeneracy $d$ by developing a $2^{O(d k^2)} \cdot n^{O(1)}$-time algorithm. We also show that it can be solved in $2^{O(f k)} \cdot n^{O(1)}$ time for the feedback vertex number $f$ when $i \ge 2$. In contrast, we find that it is W[1]-hard for the treedepth for any integer $i \ge 1$. Finally, we show that Biclique-Free Vertex Deletion has a polynomial kernel for every $i \ge 1$ when parameterized by the feedback edge number. Previously, for this parameter, its fixed-parameter tractability for $i = 1$ was known [Betzler et al., DAM '12] but the existence of polynomial kernel was open.
This work uniquely identifies and characterizes four prevalent multimodal model architectural patterns in the contemporary multimodal landscape. Systematically categorizing models by architecture type facilitates monitoring of developments in the multimodal domain. Distinct from recent survey papers that present general information on multimodal architectures, this research conducts a comprehensive exploration of architectural details and identifies four specific architectural types. The types are distinguished by their respective methodologies for integrating multimodal inputs into the deep neural network model. The first two types (Type A and B) deeply fuses multimodal inputs within the internal layers of the model, whereas the following two types (Type C and D) facilitate early fusion at the input stage. Type-A employs standard cross-attention, whereas Type-B utilizes custom-designed layers for modality fusion within the internal layers. On the other hand, Type-C utilizes modality-specific encoders, while Type-D leverages tokenizers to process the modalities at the model's input stage. The identified architecture types aid the monitoring of any-to-any multimodal model development. Notably, Type-C and Type-D are currently favored in the construction of any-to-any multimodal models. Type-C, distinguished by its non-tokenizing multimodal model architecture, is emerging as a viable alternative to Type-D, which utilizes input-tokenizing techniques. To assist in model selection, this work highlights the advantages and disadvantages of each architecture type based on data and compute requirements, architecture complexity, scalability, simplification of adding modalities, training objectives, and any-to-any multimodal generation capability.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.
How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNN-based method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicate-aware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in real-world KGs with different characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.
Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree of over-segmentation produced. It still remains a challenge to properly select such parameters for human-like perceptual grouping. In this work, we exploit the diversity of segments produced by different choices of parameters. We scan the segmentation parameter space and generate a collection of image segmentation hypotheses (from highly over-segmented to under-segmented). These are fed into a cost minimization framework that produces the final segmentation by selecting segments that: (1) better describe the natural contours of the image, and (2) are more stable and persistent among all the segmentation hypotheses. We compare our algorithm's performance with state-of-the-art algorithms, showing that we can achieve improved results. We also show that our framework is robust to the choice of segmentation kernel that produces the initial set of hypotheses.