Alon and Shapira proved that every monotone class (closed under taking subgraphs) of undirected graphs is strongly testable, that is, under the promise that a given graph is either in the class or $\varepsilon$-far from it, there is a test using a constant number of samples (depending on $\varepsilon$ only) that rejects every graph not in the class with probability at least one half, and always accepts a graph in the class. However, their bound on the number of samples is quite large, since they heavily rely on Szemer\'edi's regularity lemma. We study the case of posets and show that every monotone class of posets is easily testable, that is, a polynomial number of samples is sufficient. We achieve this via proving a polynomial removal lemma for posets. We give a simple classification: for every monotone class of posets there is an $h$ such that the class is indistinguishable (every large enough poset in one class is $\varepsilon$-close to a poset in the other class) from the class of posets free of the chain $C_h$. This allows to test every monotone class of posets using $O(\varepsilon^{-1})$ samples. The test has two-sided error, but it is almost complete: the probability to refute a poset in the class is polynomially small in the size of the poset. The analogous results hold for comparability graphs, too.
By abstracting over well-known properties of De Bruijn's representation with nameless dummies, we design a new theory of syntax with variable binding and capture-avoiding substitution. We propose it as a simpler alternative to Fiore, Plotkin, and Turi's approach, with which we establish a strong formal link. We also show that our theory easily incorporates simple types and equations between terms.
The low-rank quaternion matrix approximation has been successfully applied in many applications involving signal processing and color image processing. However, the cost of quaternion models for generating low-rank quaternion matrix approximation is sometimes considerable due to the computation of the quaternion singular value decomposition (QSVD), which limits their application to real large-scale data. To address this deficiency, an efficient quaternion matrix CUR (QMCUR) method for low-rank approximation is suggested, which provides significant acceleration in color image processing. We first explore the QMCUR approximation method, which uses actual columns and rows of the given quaternion matrix, instead of the costly QSVD. Additionally, two different sampling strategies are used to sample the above-selected columns and rows. Then, the perturbation analysis is performed on the QMCUR approximation of noisy versions of low-rank quaternion matrices. Extensive experiments on both synthetic and real data further reveal the superiority of the proposed algorithm compared with other algorithms for getting low-rank approximation, in terms of both efficiency and accuracy.
In large-scale, data-driven applications, parameters are often only known approximately due to noise and limited data samples. In this paper, we focus on high-dimensional optimization problems with linear constraints under uncertain conditions. To find high quality solutions for which the violation of the true constraints is limited, we develop a linear shrinkage method that blends random matrix theory and robust optimization principles. It aims to minimize the Frobenius distance between the estimated and the true parameter matrix, especially when dealing with a large and comparable number of constraints and variables. This data-driven method excels in simulations, showing superior noise resilience and more stable performance in both obtaining high quality solutions and adhering to the true constraints compared to traditional robust optimization. Our findings highlight the effectiveness of our method in improving the robustness and reliability of optimization in high-dimensional, data-driven scenarios.
Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to its optimal time scaling and strong bounds on memory scaling. The latter is rarely achieved due to the difficulty of implementing continuation stealing in traditional High Performance Computing (HPC) languages -- where it is often impossible without modifying the compiler or resorting to non-portable techniques. We demonstrate how stackless coroutines (a new feature in C++20) can enable fully-portable continuation stealing and present libfork a lock-free fine-grained parallelism library, combining coroutines with user-space, geometric segmented-stacks. We show our approach is able to achieve optimal time/memory scaling, both theoretically and empirically, across a variety of benchmarks. Compared to openMP (libomp), libfork is on average 7.2x faster and consumes 10x less memory. Similarly, compared to Intel's TBB, libfork is on average 2.7x faster and consumes 6.2x less memory. Additionally, we introduce non-uniform memory access (NUMA) optimizations for schedulers that demonstrate performance matching busy-waiting schedulers.
Fully decentralized learning is gaining momentum for training AI models at the Internet's edge, addressing infrastructure challenges and privacy concerns. In a decentralized machine learning system, data is distributed across multiple nodes, with each node training a local model based on its respective dataset. The local models are then shared and combined to form a global model capable of making accurate predictions on new data. Our exploration focuses on how different types of network structures influence the spreading of knowledge - the process by which nodes incorporate insights gained from learning patterns in data available on other nodes across the network. Specifically, this study investigates the intricate interplay between network structure and learning performance using three network topologies and six data distribution methods. These methods consider different vertex properties, including degree centrality, betweenness centrality, and clustering coefficient, along with whether nodes exhibit high or low values of these metrics. Our findings underscore the significance of global centrality metrics (degree, betweenness) in correlating with learning performance, while local clustering proves less predictive. We highlight the challenges in transferring knowledge from peripheral to central nodes, attributed to a dilution effect during model aggregation. Additionally, we observe that central nodes exert a pull effect, facilitating the spread of knowledge. In examining degree distribution, hubs in Barabasi-Albert networks positively impact learning for central nodes but exacerbate dilution when knowledge originates from peripheral nodes. Finally, we demonstrate the formidable challenge of knowledge circulation outside of segregated communities.
Bayesian statistical graphical models are typically either continuous and parametric (Gaussian, parameterized by the graph-dependent precision matrix with Wishart-type priors) or discrete and non-parametric (with graph-dependent structure of probabilities of cells and Dirichlet-type priors). We propose to break this dichotomy by introducing two discrete parametric graphical models on finite decomposable graphs: the graph negative multinomial and the graph multinomial distributions. These models interpolate between the product of univariate negative binomial laws and the negative multinomial distribution, and between the product of binomial laws and the multinomial distribution, respectively. We derive their Markov decomposition and present related probabilistic models representations. We also introduce graphical versions of the Dirichlet distribution and inverted Dirichlet distribution, which serve as conjugate priors for the two discrete graphical Markov models. We derive explicit normalizing constants for both graphical Dirichlet laws and demonstrate that their independence structure (a graphical version of neutrality) yields a strong hyper Markov property for both Bayesian models. We also provide characterization theorems for graphical Dirichlet laws via strong hyper Markov property. Finally, we develop a model selection procedure for the Bayesian graphical negative multinomial model with respective Dirichlet-type priors.
Pareto front profiling in multi-objective optimization (MOO), i.e. finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives like neural network training. Typically, in MOO neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints into the objective function, but profiling the Pareto front necessitates a search for each constraint. In this work, we propose a novel NAS algorithm that encodes user preferences for the trade-off between performance and hardware metrics, and yields representative and diverse architectures across multiple devices in just one search run. To this end, we parameterize the joint architectural distribution across devices and multiple objectives via a hypernetwork that can be conditioned on hardware features and preference vectors, enabling zero-shot transferability to new devices. Extensive experiments with up to 19 hardware devices and 3 objectives showcase the effectiveness and scalability of our method. Finally, we show that, without additional costs, our method outperforms existing MOO NAS methods across qualitatively different search spaces and datasets, including MobileNetV3 on ImageNet-1k and a Transformer space on machine translation.
We study combinatorial inequalities for various classes of set systems: matroids, polymatroids, poset antimatroids, and interval greedoids. We prove log-concavity inequalities for counting certain weighted feasible words, which generalize and extend several previous results establishing Mason conjectures for the numbers of independent sets of matroids. Notably, we prove matching equality conditions for both earlier inequalities and our extensions. In contrast with much of the previous work, our proofs are combinatorial and employ nothing but linear algebra. We use the language formulation of greedoids which allows a linear algebraic setup, which in turn can be analyzed recursively. The underlying non-commutative nature of matrices associated with greedoids allows us to proceed beyond polymatroids and prove the equality conditions. As further application of our tools, we rederive both Stanley's inequality on the number of certain linear extensions, and its equality conditions, which we then also extend to the weighted case.
Recently, MLP structures have regained popularity, with MLP-Mixer standing out as a prominent example. In the field of computer vision, MLP-Mixer is noted for its ability to extract data information from both channel and token perspectives, effectively acting as a fusion of channel and token information. Indeed, Mixer represents a paradigm for information extraction that amalgamates channel and token information. The essence of Mixer lies in its ability to blend information from diverse perspectives, epitomizing the true concept of "mixing" in the realm of neural network architectures. Beyond channel and token considerations, it is possible to create more tailored mixers from various perspectives to better suit specific task requirements. This study focuses on the domain of audio recognition, introducing a novel model named Audio Spectrogram Mixer with Roll-Time and Hermit FFT (ASM-RH) that incorporates insights from both time and frequency domains. Experimental results demonstrate that ASM-RH is particularly well-suited for audio data and yields promising outcomes across multiple classification tasks.
Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.