We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $\mathcal{X}$. Formally, we observe data $D_N = (\mu_1, Y_1), \ldots, (\mu_N, Y_N)$ where $\mu_i$ is a measure on $\mathcal{X}$ and $Y_i$ is a label in $\{0, 1\}$. Given a set $\mathcal{F}$ of base-classifiers on $\mathcal{X}$, we build corresponding classifiers in the space of measures. We provide upper and lower bounds on the Rademacher complexity of this new class of classifiers that can be expressed simply in terms of corresponding quantities for the class $\mathcal{F}$. If the measures $\mu_i$ are uniform over a finite set, this classification task boils down to a multi-instance learning problem. However, our approach allows more flexibility and diversity in the input data we can deal with. While such a framework has many possible applications, this work strongly emphasizes on classifying data via topological descriptors called persistence diagrams. These objects are discrete measures on $\mathbb{R}^2$, where the coordinates of each point correspond to the range of scales at which a topological feature exists. We will present several classifiers on measures and show how they can heuristically and theoretically enable a good classification performance in various settings in the case of persistence diagrams.
The Lamport diagram is a pervasive and intuitive tool for informal reasoning about causality in a concurrent system. However, traditional axiomatic formalizations of Lamport diagrams can be painful to work with in a mechanized setting like Agda, whereas inductively-defined data would enjoy structural induction and automatic normalization. We propose an alternative, inductive formalization -- the causal separation diagram (CSD) -- that takes inspiration from string diagrams and concurrent separation logic. CSDs enjoy a graphical syntax similar to Lamport diagrams, and can be given compositional semantics in a variety of domains. We demonstrate the utility of CSDs by applying them to logical clocks -- widely-used mechanisms for reifying causal relationships as data -- yielding a generic proof of Lamport's clock condition that is parametric in a choice of clock. We instantiate this proof on Lamport's scalar clock, on Mattern's vector clock, and on the matrix clocks of Raynal et al. and of Wuu and Bernstein, yielding verified implementations of each. Our results and general framework are mechanized in the Agda proof assistant.
The generalization performance of deep learning models for medical image analysis often decreases on images collected with different devices for data acquisition, device settings, or patient population. A better understanding of the generalization capacity on new images is crucial for clinicians' trustworthiness in deep learning. Although significant research efforts have been recently directed toward establishing generalization bounds and complexity measures, still, there is often a significant discrepancy between the predicted and actual generalization performance. As well, related large empirical studies have been primarily based on validation with general-purpose image datasets. This paper presents an empirical study that investigates the correlation between 25 complexity measures and the generalization abilities of supervised deep learning classifiers for breast ultrasound images. The results indicate that PAC-Bayes flatness-based and path norm-based measures produce the most consistent explanation for the combination of models and data. We also investigate the use of multi-task classification and segmentation approach for breast images, and report that such learning approach acts as an implicit regularizer and is conducive toward improved generalization.
We study Bayesian histograms for distribution estimation on $[0,1]^d$ under the Wasserstein $W_v, 1 \leq v < \infty$ distance in the i.i.d sampling regime. We newly show that when $d < 2v$, histograms possess a special \textit{memory efficiency} property, whereby in reference to the sample size $n$, order $n^{d/2v}$ bins are needed to obtain minimax rate optimality. This result holds for the posterior mean histogram and with respect to posterior contraction: under the class of Borel probability measures and some classes of smooth densities. The attained memory footprint overcomes existing minimax optimal procedures by a polynomial factor in $n$; for example an $n^{1 - d/2v}$ factor reduction in the footprint when compared to the empirical measure, a minimax estimator in the Borel probability measure class. Additionally constructing both the posterior mean histogram and the posterior itself can be done super--linearly in $n$. Due to the popularity of the $W_1,W_2$ metrics and the coverage provided by the $d < 2v$ case, our results are of most practical interest in the $(d=1,v =1,2), (d=2,v=2), (d=3,v=2)$ settings and we provide simulations demonstrating the theory in several of these instances.
Moderate calibration, the expected event probability among observations with predicted probability $\pi$ being equal to $\pi$, is a desired property of risk prediction models. Current graphical and numerical techniques for evaluating moderate calibration of clinical prediction models are mostly based on smoothing or grouping the data. As well, there is no widely accepted inferential method for the null hypothesis that a model is moderately calibrated. In this work, we discuss recently-developed, and propose novel, methods for the assessment of moderate calibration for binary responses. The methods are based on the limiting distributions of functions of standardized partial sums of prediction errors converging to the corresponding laws of Brownian motion. The novel method relies on well-known properties of the Brownian bridge which enables joint inference on mean and moderate calibration, leading to a unified 'bridge' test for detecting miscalibration. Simulation studies indicate that the bridge test is more powerful, often substantially, than the alternative test. As a case study we consider a prediction model for short-term mortality after a heart attack. Moderate calibration can be assessed without requiring arbitrary grouping of data or using methods that require tuning of parameters. We suggest graphical presentation of the partial sum curves and reporting the strength of evidence indicated by the proposed methods when examining model calibration.
This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph $p$-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the $p$-Laplacian is that the parameter $p$ induces a controllable bias on cluster structure. The drawback of $p$-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the $p$-resistance induced by the $p$-Laplacian for clustering. For $p$-resistance, small $p$ biases towards clusters with high internal connectivity while large $p$ biases towards clusters of small "extent," that is a preference for smaller shortest-path distances between vertices in the cluster. However, the $p$-resistance is expensive to compute. We overcome this by developing an approximation to the $p$-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of $p$-resistance for clustering. Finally, we provide experiments comparing our approximated $p$-resistance clustering to other $p$-Laplacian based methods.
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model, a key problem in statistical machine learning. Given $n$ samples from a multivariate Gaussian distribution with $p$ variables, the goal is to estimate the $p \times p$ inverse covariance matrix (aka precision matrix), assuming it is sparse (i.e., has a few nonzero entries). We propose GraphL0BnB, a new estimator based on an $\ell_0$-penalized version of the pseudolikelihood function, while most earlier approaches are based on the $\ell_1$-relaxation. Our estimator can be formulated as a convex mixed integer program (MIP) which can be difficult to compute at scale using off-the-shelf commercial solvers. To solve the MIP, we propose a custom nonlinear branch-and-bound (BnB) framework that solves node relaxations with tailored first-order methods. As a by-product of our BnB framework, we propose large-scale solvers for obtaining good primal solutions that are of independent interest. We derive novel statistical guarantees (estimation and variable selection) for our estimator and discuss how our approach improves upon existing estimators. Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 10^4$ -- corresponding to a symmetric matrix of size $p \times p$ with $p^2/2$ binary variables. We demonstrate the usefulness of GraphL0BnB versus various state-of-the-art approaches on a range of datasets.
Model-driven software engineering is a suitable method for dealing with the ever-increasing complexity of software development processes. Graphs and graph transformations have proven useful for representing such models and changes to them. These models must satisfy certain sets of constraints. An example are the multiplicities of a class structure. During the development process, a change to a model may result in an inconsistent model that must at some point be repaired. This problem is called model repair. In particular, we will consider rule-based graph repair which is defined as follows: Given a graph $G$, a constraint $c$ such that $G$ does not satisfy $c$, and a set of rules $R$, use the rules of $\mathcal{R}$ to transform $G$ into a graph that satisfies $c$. Known notions of consistency have either viewed consistency as a binary property, either a graph is consistent w.r.t. a constraint $c$ or not, or only viewed the number of violations of the first graph of a constraint. In this thesis, we introduce new notions of consistency, which we call consistency-maintaining and consistency-increasing transformations and rules, respectively. This is based on the possibility that a constraint can be satisfied up to a certain nesting level. We present constructions for direct consistency-maintaining or direct consistency-increasing application conditions, respectively. Finally, we present an rule-based graph repair approach that is able to repair so-called \emph{circular conflict-free constraints}, and so-called circular conflict-free sets of constraints. Intuitively, a set of constraint $C$ is circular conflict free, if there is an ordering $c_1, \ldots, c_n$ of all constraints of $C$ such that there is no $j <i$ such that a repair of $c_i$ at all graphs satisfying $c_j$ leads to a graph not satisfying $c_j$.
Selection of a group of representatives satisfying certain fairness constraints, is a commonly occurring scenario. Motivated by this, we initiate a systematic algorithmic study of a \emph{fair} version of \textsc{Hitting Set}. In the classical \textsc{Hitting Set} problem, the input is a universe $\mathcal{U}$, a family $\mathcal{F}$ of subsets of $\mathcal{U}$, and a non-negative integer $k$. The goal is to determine whether there exists a subset $S \subseteq \mathcal{U}$ of size $k$ that \emph{hits} (i.e., intersects) every set in $\mathcal{F}$. Inspired by several recent works, we formulate a fair version of this problem, as follows. The input additionally contains a family $\mathcal{B}$ of subsets of $\mathcal{U}$, where each subset in $\mathcal{B}$ can be thought of as the group of elements of the same \emph{type}. We want to find a set $S \subseteq \mathcal{U}$ of size $k$ that (i) hits all sets of $\mathcal{F}$, and (ii) does not contain \emph{too many} elements of each type. We call this problem \textsc{Fair Hitting Set}, and chart out its tractability boundary from both classical as well as multivariate perspective. Our results use a multitude of techniques from parameterized complexity including classical to advanced tools, such as, methods of representative sets for matroids, FO model checking, and a generalization of best known kernels for \textsc{Hitting Set}.
We present a compositional model checking algorithm for Markov decision processes, in which they are composed in the categorical graphical language of string diagrams. The algorithm computes optimal expected rewards. Our theoretical development of the algorithm is supported by category theory, while what we call decomposition equalities for expected rewards act as a key enabler. Experimental evaluation demonstrates its performance advantages.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.