In the literature on Kleene algebra, a number of variants have been proposed which impose additional structure specified by a theory, such as Kleene algebra with tests (KAT) and the recent Kleene algebra with observations (KAO), or make specific assumptions about certain constants, as for instance in NetKAT. Many of these variants fit within the unifying perspective offered by Kleene algebra with hypotheses, which comes with a canonical language model constructed from a given set of hypotheses. For the case of KAT, this model corresponds to the familiar interpretation of expressions as languages of guarded strings. A relevant question therefore is whether Kleene algebra together with a given set of hypotheses is complete with respect to its canonical language model. In this paper, we revisit, combine and extend existing results on this question to obtain tools for proving completeness in a modular way. We showcase these tools by giving new and modular proofs of completeness for KAT, KAO and NetKAT, and we prove completeness for new variants of KAT: KAT extended with a constant for the full relation, KAT extended with a converse operation, and a version of KAT where the collection of tests only forms a distributive lattice.
Disentangling model activations into meaningful features is a central problem in interpretability. However, the absence of ground-truth for these features in realistic scenarios makes validating recent approaches, such as sparse dictionary learning, elusive. To address this challenge, we propose a framework for evaluating feature dictionaries in the context of specific tasks, by comparing them against \emph{supervised} feature dictionaries. First, we demonstrate that supervised dictionaries achieve excellent approximation, control, and interpretability of model computations on the task. Second, we use the supervised dictionaries to develop and contextualize evaluations of unsupervised dictionaries along the same three axes. We apply this framework to the indirect object identification (IOI) task using GPT-2 Small, with sparse autoencoders (SAEs) trained on either the IOI or OpenWebText datasets. We find that these SAEs capture interpretable features for the IOI task, but they are less successful than supervised features in controlling the model. Finally, we observe two qualitative phenomena in SAE training: feature occlusion (where a causally relevant concept is robustly overshadowed by even slightly higher-magnitude ones in the learned features), and feature over-splitting (where binary features split into many smaller, less interpretable features). We hope that our framework will provide a useful step towards more objective and grounded evaluations of sparse dictionary learning methods.
Predictive algorithms inform consequential decisions in settings where the outcome is selectively observed given choices made by human decision makers. We propose a unified framework for the robust design and evaluation of predictive algorithms in selectively observed data. We impose general assumptions on how much the outcome may vary on average between unselected and selected units conditional on observed covariates and identified nuisance parameters, formalizing popular empirical strategies for imputing missing data such as proxy outcomes and instrumental variables. We develop debiased machine learning estimators for the bounds on a large class of predictive performance estimands, such as the conditional likelihood of the outcome, a predictive algorithm's mean square error, true/false positive rate, and many others, under these assumptions. In an administrative dataset from a large Australian financial institution, we illustrate how varying assumptions on unobserved confounding leads to meaningful changes in default risk predictions and evaluations of credit scores across sensitive groups.
We propose a new notion of uniqueness for the adversarial Bayes classifier in the setting of binary classification. Analyzing this concept produces a simple procedure for computing all adversarial Bayes classifiers for a well-motivated family of one dimensional data distributions. This characterization is then leveraged to show that as the perturbation radius increases, certain the regularity of adversarial Bayes classifiers improves. Various examples demonstrate that the boundary of the adversarial Bayes classifier frequently lies near the boundary of the Bayes classifier.
Modeling the shape of garments has received much attention, but most existing approaches assume the garments to be worn by someone, which constrains the range of shapes they can assume. In this work, we address shape recovery when garments are being manipulated instead of worn, which gives rise to an even larger range of possible shapes. To this end, we leverage the implicit sewing patterns (ISP) model for garment modeling and extend it by adding a diffusion-based deformation prior to represent these shapes. To recover 3D garment shapes from incomplete 3D point clouds acquired when the garment is folded, we map the points to UV space, in which our priors are learned, to produce partial UV maps, and then fit the priors to recover complete UV maps and 2D to 3D mappings. Experimental results demonstrate the superior reconstruction accuracy of our method compared to previous ones, especially when dealing with large non-rigid deformations arising from the manipulations.
We study functional dependencies together with two different probabilistic dependency notions: unary marginal identity and unary marginal distribution equivalence. A unary marginal identity states that two variables x and y are identically distributed. A unary marginal distribution equivalence states that the multiset consisting of the marginal probabilities of all the values for variable x is the same as the corresponding multiset for y. We present a sound and complete axiomatization for the class of these dependencies and show that it has Armstrong relations. The axiomatization is infinite, but we show that there can be no finite axiomatization. The implication problem for the subclass that contains only functional dependencies and unary marginal identities can be simulated with functional dependencies and unary inclusion atoms, and therefore the problem is in polynomial-time. This complexity bound also holds in the case of the full class, which we show by constructing a polynomial-time algorithm.
We introduce and characterize the operational diversity order (ODO) in fading channels, as a proxy to the classical notion of diversity order at any arbitrary operational signal-to-noise ratio (SNR). Thanks to this definition, relevant insights are brought up in a number of cases: (i) We quantify that in line-of-sight scenarios an increased diversity order is attainable compared to that achieved asymptotically; (ii) this effect is attenuated, but still visible, in the presence of an additional dominant specular component; (iii) we confirm that the decay slope in Rayleigh product channels increases very slowly and never fully achieves unitary slope for finite values of SNR.
Since the introduction of the Kolmogorov complexity of binary sequences in the 1960s, there have been significant advancements in the topic of complexity measures for randomness assessment, which are of fundamental importance in theoretical computer science and of practical interest in cryptography. This survey reviews notable research from the past four decades on the linear, quadratic and maximum-order complexities of pseudo-random sequences and their relations with Lempel-Ziv complexity, expansion complexity, 2-adic complexity, and correlation measures.
Datasets in which measurements of two (or more) types are obtained from a common set of samples arise in many scientific applications. A common problem in the exploratory analysis of such data is to identify groups of features of different data types that are strongly associated. A bimodule is a pair (A,B) of feature sets from two data types such that the aggregate cross-correlation between the features in A and those in B is large. A bimodule (A,B) is stable if A coincides with the set of features that have significant aggregate correlation with the features in B, and vice-versa. This paper proposes an iterative-testing based bimodule search procedure (BSP) to identify stable bimodules. Compared to existing methods for detecting cross-correlated features, BSP was the best at recovering true bimodules with sufficient signal, while limiting the false discoveries. In addition, we applied BSP to the problem of expression quantitative trait loci (eQTL) analysis using data from the GTEx consortium. BSP identified several thousand SNP-gene bimodules. While many of the individual SNP-gene pairs appearing in the discovered bimodules were identified by standard eQTL methods, the discovered bimodules revealed genomic subnetworks that appeared to be biologically meaningful and worthy of further scientific investigation.
Covering numbers are a powerful tool used in the development of approximation algorithms, randomized dimension reduction methods, smoothed complexity analysis, and others. In this paper we prove upper bounds on the covering number of numerous sets in Euclidean space, namely real algebraic varieties, images of polynomial maps and semialgebraic sets in terms of the number of variables and degrees of the polynomials involved. The bounds remarkably improve the best known general bound by Yomdin-Comte, and our proof is much more straightforward. In particular, our result gives new bounds on the volume of the tubular neighborhood of the image of a polynomial map and a semialgebraic set, where results for varieties by Lotz and Basu-Lerario are not directly applicable. We illustrate the power of the result on three computational applications. Firstly, we derive a near-optimal bound on the covering number of low rank CP tensors, quantifying their approximation properties and filling in an important missing piece of theory for tensor dimension reduction and reconstruction. Secondly, we prove a bound on the required dimension for the randomized sketching of polynomial optimization problems, which controls how much computation can be saved through randomization without sacrificing solution quality. Finally, we deduce generalization error bounds for deep neural networks with rational or ReLU activation functions, improving or matching the best known results in the machine learning literature while helping to quantify the impact of architecture choice on generalization error.
Runtime analysis, as a branch of the theory of AI, studies how the number of iterations algorithms take before finding a solution (its runtime) depends on the design of the algorithm and the problem structure. Drift analysis is a state-of-the-art tool for estimating the runtime of randomised algorithms, such as evolutionary and bandit algorithms. Drift refers roughly to the expected progress towards the optimum per iteration. This paper considers the problem of deriving concentration tail-bounds on the runtime/regret of algorithms. It provides a novel drift theorem that gives precise exponential tail-bounds given positive, weak, zero and even negative drift. Previously, such exponential tail bounds were missing in the case of weak, zero, or negative drift. Our drift theorem can be used to prove a strong concentration of the runtime/regret of algorithms in AI. For example, we prove that the regret of the \rwab bandit algorithm is highly concentrated, while previous analyses only considered the expected regret. This means that the algorithm obtains the optimum within a given time frame with high probability, i.e. a form of algorithm reliability. Moreover, our theorem implies that the time needed by the co-evolutionary algorithm RLS-PD to obtain a Nash equilibrium in a \bilinear max-min-benchmark problem is highly concentrated. However, we also prove that the algorithm forgets the Nash equilibrium, and the time until this occurs is highly concentrated. This highlights a weakness in the RLS-PD which should be addressed by future work.