An order-theoretic forest is a countable partial order such that the set of elements larger than any element is linearly ordered. It is an order-theoretic tree if any two elements have an upper-bound. The order type of a branch can be any countable linear order. Such generalized infinite trees yield convenient definitions of the rank-width and the modular decomposition of countable graphs. We define an algebra based on only four operations that generate up to isomorphism and via infinite terms these order-theoretic trees and forests. We prove that the associated regular objects, those defined by regular terms, are exactly the ones that are the unique models of monadic second-order sentences.
Let $\sigma$ be a first-order signature and let $\mathbf{W}_n$ be the set of all $\sigma$-structures with domain $[n] = \{1, \ldots, n\}$. We can think of each structure in $\mathbf{W}_n$ as representing a "possible (state of the) world". By an inference framework we mean a class $\mathbf{F}$ of pairs $(\mathbb{P}, L)$, where $\mathbb{P} = (\mathbb{P}_n : n = 1, 2, 3, \ldots)$ and each $\mathbb{P}_n$ is a probability distribution on $\mathbb{W}_n$, and $L$ is a logic with truth values in the unit interval $[0, 1]$. From the point of view of probabilistic and logical expressivity one may consider an inference framework as optimal if it allows any pair $(\mathbb{P}, L)$ where $\mathbb{P} = (\mathbb{P}_n : n = 1, 2, 3, \ldots)$ is a sequence of probability distributions on $\mathbb{W}_n$ and $L$ is a logic. But from the point of view of using a pair $(\mathbb{P}, L)$ from such an inference framework for making inferences on $\mathbb{W}_n$ when $n$ is large we face the problem of computational complexity. This motivates looking for an "optimal" trade-off (in a given context) between expressivity and computational efficiency. We define a notion that an inference framework is "asymptotically at least as expressive" as another inference framework. This relation is a preorder and we describe a (strict) partial order on the equivalence classes of some inference frameworks that in our opinion are natural in the context of machine learning and artificial intelligence. The results have bearing on issues concerning efficient learning and probabilistic inference, but are also new instances of results in finite model theory about "almost sure elimination" of extra syntactic features (e.g quantifiers) beyond the connectives. Often such a result has a logical convergence law as a corollary.
In this paper we present a proof system that operates on graphs instead of formulas. Starting from the well-known relationship between formulas and cographs, we drop the cograph-conditions and look at arbitrary undirected) graphs. This means that we lose the tree structure of the formulas corresponding to the cographs, and we can no longer use standard proof theoretical methods that depend on that tree structure. In order to overcome this difficulty, we use a modular decomposition of graphs and some techniques from deep inference where inference rules do not rely on the main connective of a formula. For our proof system we show the admissibility of cut and a generalization of the splitting property. Finally, we show that our system is a conservative extension of multiplicative linear logic with mix, and we argue that our graphs form a notion of generalized connective.
We employ kernel-based approaches that use samples from a probability distribution to approximate a Kolmogorov operator on a manifold. The self-tuning variable-bandwidth kernel method [Berry & Harlim, Appl. Comput. Harmon. Anal., 40(1):68--96, 2016] computes a large, sparse matrix that approximates the differential operator. Here, we use the eigendecomposition of the discretization to (i) invert the operator, solving a differential equation, and (ii) represent gradient vector fields on the manifold. These methods only require samples from the underlying distribution and, therefore, can be applied in high dimensions or on geometrically complex manifolds when spatial discretizations are not available. We also employ an efficient $k$-$d$ tree algorithm to compute the sparse kernel matrix, which is a computational bottleneck.
We present a sheaf-theoretic construction of shape space -- the space of all shapes. We do this by describing a homotopy sheaf on the poset category of constructible sets, where each set is mapped to its Persistent Homology Transform (PHT). Recent results that build on fundamental work of Schapira have shown that this transform is injective, thus making the PHT a good summary object for each shape. Our homotopy sheaf result allows us to "glue" PHTs of different shapes together to build up the PHT of a larger shape. In the case where our shape is a polyhedron we prove a generalized nerve lemma for the PHT. Finally, by re-examining the sampling result of Smale-Niyogi-Weinberger, we show that we can reliably approximate the PHT of a manifold by a polyhedron up to arbitrary precision.
Category theory can be used to state formulas in First-Order Logic without using set membership. Several notable results in logic such as proof of the continuum hypothesis can be elegantly rewritten in category theory. We propose in this paper a reformulation of the usual set-theoretical semantics of the description logic $\mathcal{ALC}$ by using categorical language. In this setting, ALC concepts are represented as objects, concept subsumptions as arrows, and memberships as logical quantifiers over objects and arrows of categories. Such a category-theoretical semantics provides a more modular representation of the semantics of $\mathcal{ALC}$ and a new way to design algorithms for reasoning.
We provide a decision theoretic analysis of bandit experiments. The setting corresponds to a dynamic programming problem, but solving this directly is typically infeasible. Working within the framework of diffusion asymptotics, we define suitable notions of asymptotic Bayes and minimax risk for bandit experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a nonlinear second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distribution of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and therefore suggests a practical strategy for dimension reduction. The upshot is that we can approximate the dynamic programming problem defining the bandit experiment with a PDE which can be efficiently solved using sparse matrix routines. We derive the optimal Bayes and minimax policies from the numerical solutions to these equations. The proposed policies substantially dominate existing methods such as Thompson sampling. The framework also allows for substantial generalizations to the bandit problem such as time discounting and pure exploration motives.
We study a class of enriched unfitted finite element or generalized finite element methods (GFEM) to solve a larger class of interface problems, that is, 1D elliptic interface problems with discontinuous solutions, including those having implicit or Robin-type interface jump conditions. The major challenge of GFEM development is to construct enrichment functions that capture the imposed discontinuity of the solution while keeping the condition number from fast growth. The linear stable generalized finite element method (SGFEM) was recently developed using one enrichment function. We generalized it to an arbitrary degree using two simple discontinuous one-sided enrichment functions. Optimal order convergence in the $L^2$ and broken $H^1$-norms are established. So is the optimal order convergence at all nodes. To prove the efficiency of the SGFEM, the enriched linear, quadratic, and cubic elements are applied to a multi-layer wall model for drug-eluting stents in which zero-flux jump conditions and implicit concentration interface conditions are both present.
We develop a simple and unified framework for nonlinear variable selection that incorporates model uncertainty and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods and neural network). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated gradient measure $\psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_2$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulation confirms that the proposed algorithm outperforms existing classic and recent variable selection methods.
There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the $L_1$ and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over $\{0,1\}^n$. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.
We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'