In the Maximum Independent Set problem we are asked to find a set of pairwise nonadjacent vertices in a given graph with the maximum possible cardinality. In general graphs, this classical problem is known to be NP-hard and hard to approximate within a factor of $n^{1-\varepsilon}$ for any $\varepsilon > 0$. Due to this, investigating the complexity of Maximum Independent Set in various graph classes in hope of finding better tractability results is an active research direction. In $H$-free graphs, that is, graphs not containing a fixed graph $H$ as an induced subgraph, the problem is known to remain NP-hard and APX-hard whenever $H$ contains a cycle, a vertex of degree at least four, or two vertices of degree at least three in one connected component. For the remaining cases, where every component of $H$ is a path or a subdivided claw, the complexity of Maximum Independent Set remains widely open, with only a handful of polynomial-time solvability results for small graphs $H$ such as $P_5$, $P_6$, the claw, or the fork. We show that for every graph $H$ for which Maximum Independent Set is not known to be APX-hard and SUBEXP-hard in $H$-free graphs, the problem admits a quasi-polynomial time approximation scheme and a subexponential-time exact algorithm in this graph class. Our algorithm works also in the more general weighted setting, where the input graph is supplied with a weight function on vertices and we are maximizing the total weight of an independent set.
We prove asymptotic results for a modification of the cross-entropy estimator originally introduced by Ziv and Merhav in the Markovian setting in 1993. Our results concern a more general class of decoupled measures. In particular, our results imply strong asymptotic consistency of the modified estimator for all pairs of functions of stationary, irreducible, finite-state Markov chains satisfying a mild decay condition. {Our approach is based on the study of a rescaled cumulant-generating function called the cross-entropic pressure, importing to information theory some techniques from the study of large deviations within the thermodynamic formalism.
In the paper we argue that performance of the classifiers based on Empirical Risk Minimization (ERM) for positive unlabeled data, which are designed for case-control sampling scheme may significantly deteriorate when applied to a single-sample scenario. We reveal why their behavior depends, in all but very specific cases, on the scenario. Also, we introduce a single-sample case analogue of the popular non-negative risk classifier designed for case-control data and compare its performance with the original proposal. We show that the significant differences occur between them, especiall when half or more positive of observations are labeled. The opposite case when ERM minimizer designed for the case-control case is applied for single-sample data is also considered and similar conclusions are drawn. Taking into account difference of scenarios requires a sole, but crucial, change in the definition of the Empirical Risk.
In this series of studies, we establish homogenized lattice Boltzmann methods (HLBM) for simulating fluid flow through porous media. Our contributions in part I are twofold. First, we assemble the targeted partial differential equation system by formally unifying the governing equations for nonstationary fluid flow in porous media. A matrix of regularly arranged, equally sized obstacles is placed into the domain to model fluid flow through porous structures governed by the incompressible nonstationary Navier--Stokes equations (NSE). Depending on the ratio of geometric parameters in the matrix arrangement, several homogenized equations are obtained. We review existing methods for homogenizing the nonstationary NSE for specific porosities and discuss the applicability of the resulting model equations. Consequently, the homogenized NSE are expressed as targeted partial differential equations that jointly incorporate the derived aspects. Second, we propose a kinetic model, the homogenized Bhatnagar--Gross--Krook Boltzmann equation, which approximates the homogenized nonstationary NSE. We formally prove that the zeroth and first order moments of the kinetic model provide solutions to the mass and momentum balance variables of the macrocopic model up to specific orders in the scaling parameter. Based on the present contributions, in the sequel (part II), the homogenized NSE are consistently approximated by deriving a limit consistent HLBM discretization of the homogenized Bhatnagar--Gross--Krook Boltzmann equation.
Interpretability methods aim to understand the algorithm implemented by a trained model (e.g., a Transofmer) by examining various aspects of the model, such as the weight matrices or the attention patterns. In this work, through a combination of theoretical results and carefully controlled experiments on synthetic data, we take a critical view of methods that exclusively focus on individual parts of the model, rather than consider the network as a whole. We consider a simple synthetic setup of learning a (bounded) Dyck language. Theoretically, we show that the set of models that (exactly or approximately) solve this task satisfy a structural characterization derived from ideas in formal languages (the pumping lemma). We use this characterization to show that the set of optima is qualitatively rich; in particular, the attention pattern of a single layer can be ``nearly randomized'', while preserving the functionality of the network. We also show via extensive experiments that these constructions are not merely a theoretical artifact: even after severely constraining the architecture of the model, vastly different solutions can be reached via standard training. Thus, interpretability claims based on inspecting individual heads or weight matrices in the Transformer can be misleading.
In this note we use the State of the Union Address dataset from Kaggle to make some surprising (and some not so surprising) observations pertaining to the general timeline of American history, and the character and nature of the addresses themselves. Our main approach is using vector embeddings, such as BERT (DistilBERT) and GPT-2. While it is widely believed that BERT (and its variations) is most suitable for NLP classification tasks, we find out that GPT-2 in conjunction with nonlinear dimension reduction methods such as UMAP provide better separation and stronger clustering. This makes GPT-2 + UMAP an interesting alternative. In our case, no model fine-tuning is required, and the pre-trained out-of-the-box GPT-2 model is enough. We also used a fine-tuned DistilBERT model for classification (detecting which president delivered which address), with very good results (accuracy 93% - 95% depending on the run). All computations can be replicated by using the accompanying code on GitHub.
In this paper, we introduce a new Bayesian approach for analyzing task fMRI data that simultaneously detects activation signatures and background connectivity. Our modeling involves a new hybrid tensor spatial-temporal basis strategy that enables scalable computing yet captures nearby and distant intervoxel correlation and long-memory temporal correlation. The spatial basis involves a composite hybrid transform with two levels: the first accounts for within-ROI correlation, and second between-ROI distant correlation. We demonstrate in simulations how our basis space regression modeling strategy increases sensitivity for identifying activation signatures, partly driven by the induced background connectivity that itself can be summarized to reveal biological insights. This strategy leads to computationally scalable fully Bayesian inference at the voxel or ROI level that adjusts for multiple testing. We apply this model to Human Connectome Project data to reveal insights into brain activation patterns and background connectivity related to working memory tasks.
Many-Objective Feature Selection (MOFS) approaches use four or more objectives to determine the relevance of a subset of features in a supervised learning task. As a consequence, MOFS typically returns a large set of non-dominated solutions, which have to be assessed by the data scientist in order to proceed with the final choice. Given the multi-variate nature of the assessment, which may include criteria (e.g. fairness) not related to predictive accuracy, this step is often not straightforward and suffers from the lack of existing tools. For instance, it is common to make use of a tabular presentation of the solutions, which provide little information about the trade-offs and the relations between criteria over the set of solutions. This paper proposes an original methodology to support data scientists in the interpretation and comparison of the MOFS outcome by combining post-processing and visualisation of the set of solutions. The methodology supports the data scientist in the selection of an optimal feature subset by providing her with high-level information at three different levels: objectives, solutions, and individual features. The methodology is experimentally assessed on two feature selection tasks adopting a GA-based MOFS with six objectives (number of selected features, balanced accuracy, F1-Score, variance inflation factor, statistical parity, and equalised odds). The results show the added value of the methodology in the selection of the final subset of features.
The trace plot is seldom used in meta-analysis, yet it is a very informative plot. In this article we define and illustrate what the trace plot is, and discuss why it is important. The Bayesian version of the plot combines the posterior density of tau, the between-study standard deviation, and the shrunken estimates of the study effects as a function of tau. With a small or moderate number of studies, tau is not estimated with much precision, and parameter estimates and shrunken study effect estimates can vary widely depending on the correct value of tau. The trace plot allows visualization of the sensitivity to tau along with a plot that shows which values of tau are plausible and which are implausible. A comparable frequentist or empirical Bayes version provides similar results. The concepts are illustrated using examples in meta-analysis and meta-regression; implementaton in R is facilitated in a Bayesian or frequentist framework using the bayesmeta and metafor packages, respectively.
The Gearhart-Koshy acceleration for the Kaczmarz method for linear systems is a line-search with the unusual property that it does not minimize the residual, but the error. Recently one of the authors generalized the this acceleration from a line-search to a search in affine subspaces. In this paper, we demonstrate that the affine search is a Krylov space method that is neither a CG-type nor a MINRES-type method, and we prove that it is mathematically equivalent with a more canonical Gram-Schmidt-based method. We also investigate what abstract property of the Kaczmarz method enables this type of algorithm, and we conclude with a simple numerical example.
This article presents the affordances that Generative Artificial Intelligence can have in disinformation context, one of the major threats to our digitalized society. We present a research framework to generate customized agent-based social networks for disinformation simulations that would enable understanding and evaluation of the phenomena whilst discussing open challenges.