This paper solves the continuous classification problem for finite clouds of unlabelled points under Euclidean isometry. The Lipschitz continuity of required invariants in a suitable metric under perturbations of points is motivated by the inevitable noise in measurements of real objects. The best solved case of this isometry classification is known as the SSS theorem in school geometry saying that any triangle up to congruence (isometry in the plane) has a continuous complete invariant of three side lengths. However, there is no easy extension of the SSS theorem even to four points in the plane partially due to a 4-parameter family of 4-point clouds that have the same six pairwise distances. The computational time of most past metrics that are invariant under isometry was exponential in the size of the input. The final obstacle was the discontinuity of previous invariants at singular configurations, for example, when a triangle degenerates to a straight line. All the challenges above are now resolved by the Simplexwise Centred Distributions that combine inter-point distances of a given cloud with the new strength of a simplex that finally guarantees the Lipschitz continuity. The computational times of new invariants and metrics are polynomial in the number of points for a fixed Euclidean dimension.
The model of {\em population protocols} provides a universal platform to study distributed processes driven by random pairwise interactions of anonymous agents. The {\em time complexity} of population protocols refers to the number of interactions required to reach a final configuration. More recently, the focus is on the {\em parallel time} defined as the time complexity divided by $n,$ where a given protocol is {\em efficient} if it stabilises in parallel time $O(\mbox{poly}\log n)$. Among computational deficiencies of such protocols are depleting fraction of {\em meaningful interactions} closing in on the final stabilisation (suppressing parallel efficiency), computation power of constant-space population protocols limited to semi-linear predicates in Presburger arithmetic (reflecting on time-space trade offs), and indefinite computation (impacting multi-stage protocols). With these deficiencies in mind, we propose a new {\em selective} variant of population protocols by imposing an elementary structure on the state space, together with a conditional probabilistic choice during random interacting pair selection. We show that such protocols are capable of computing functions more complex than semi-linear predicates, i.e., beyond Presburger arithmetic. We provide the first non-trivial study on median computation (in population protocols) in a comparison model where the operational state space of agents is fixed and the transition function decides on the order between (potentially large) hidden keys associated with the interacting agents. We show that computation of the median of $n$ numbers requires $\Omega(n)$ parallel time and the problem can be solved in $O(n\log n)$ parallel time in expectation and whp in standard population protocols. Finally, we show $O(\log^4 n)$ parallel time median computation in selective population protocols.
In this paper we study dually flat spaces arising from Delzant polytopes equipped with symplectic potential together with the corresponding toric Kahler manifold as its torification. We introduce a dually flat structure and the associated Bregman divergence on the boundary from the view point of toric Kahler geometry. We show a continuity and an extended Pythagorean theorem for the divergence on the boundary. We also provide a characterization for toric Kahler manifold to become a torification of a mixture family on a finite set.
Computing the distribution of trajectories from a Gaussian Process model of a dynamical system is an important challenge in utilizing such models. Motivated by the computational cost of sampling-based approaches, we consider approximations of the model's output and trajectory distribution. We show that previous work on uncertainty propagation, focussed on discrete state-space models, incorrectly included an independence assumption between subsequent states of the predicted trajectories. Expanding these ideas to continuous ordinary differential equation models, we illustrate the implications of this assumption and propose a novel piecewise linear approximation of Gaussian Processes to mitigate them.
The present paper focuses on the problem of sampling from a given target distribution $\pi$ defined on some general state space. To this end, we introduce a novel class of non-reversible Markov chains, each chain being defined on an extended state space and having an invariant probability measure admitting $\pi$ as a marginal distribution. The proposed methodology is inspired by a new formulation of Kac's theorem and allows global and local dynamics to be smoothly combined. Under mild conditions, the corresponding Markov transition kernel can be shown to be irreducible and Harris recurrent. In addition, we establish that geometric ergodicity holds under appropriate conditions on the global and local dynamics. Finally, we illustrate numerically the use of the proposed method and its potential benefits in comparison to existing Markov chain Monte Carlo (MCMC) algorithms.
Given the increasing interest in interpretable machine learning, classification trees have again attracted the attention of the scientific community because of their glass-box structure. These models are usually built using greedy procedures, solving subproblems to find cuts in the feature space that minimize some impurity measures. In contrast to this standard greedy approach and to the recent advances in the definition of the learning problem through MILP-based exact formulations, in this paper we propose a novel evolutionary algorithm for the induction of classification trees that exploits a memetic approach that is able to handle datasets with thousands of points. Our procedure combines the exploration of the feasible space of solutions with local searches to obtain structures with generalization capabilities that are competitive with the state-of-the-art methods.
Deep learning methods have shown outstanding classification accuracy in medical imaging problems, which is largely attributed to the availability of large-scale datasets manually annotated with clean labels. However, given the high cost of such manual annotation, new medical imaging classification problems may need to rely on machine-generated noisy labels extracted from radiology reports. Indeed, many Chest X-ray (CXR) classifiers have already been modelled from datasets with noisy labels, but their training procedure is in general not robust to noisy-label samples, leading to sub-optimal models. Furthermore, CXR datasets are mostly multi-label, so current noisy-label learning methods designed for multi-class problems cannot be easily adapted. In this paper, we propose a new method designed for the noisy multi-label CXR learning, which detects and smoothly re-labels samples from the dataset, which is then used to train common multi-label classifiers. The proposed method optimises a bag of multi-label descriptors (BoMD) to promote their similarity with the semantic descriptors produced by BERT models from the multi-label image annotation. Our experiments on diverse noisy multi-label training sets and clean testing sets show that our model has state-of-the-art accuracy and robustness in many CXR multi-label classification benchmarks.
The Straightness is a measure designed to characterize a pair of vertices in a spatial graph. It is defined as the ratio of the Euclidean distance to the graph distance between these vertices. It is often used as an average, for instance to describe the accessibility of a single vertex relatively to all the other vertices in the graph, or even to summarize the graph as a whole. In some cases, one needs to process the Straightness between not only vertices, but also any other points constituting the graph of interest. Suppose for instance that our graph represents a road network and we do not want to limit ourselves to crossroad-to-crossroad itineraries, but allow any street number to be a starting point or destination. In this situation, the standard approach consists in: 1) discretizing the graph edges, 2) processing the vertex-to-vertex Straightness considering the additional vertices resulting from this discretization, and 3) performing the appropriate average on the obtained values. However, this discrete approximation can be computationally expensive on large graphs, and its precision has not been clearly assessed. In this article, we adopt a continuous approach to average the Straightness over the edges of spatial graphs. This allows us to derive 5 distinct measures able to characterize precisely the accessibility of the whole graph, as well as individual vertices and edges. Our method is generic and could be applied to other measures designed for spatial graphs. We perform an experimental evaluation of our continuous average Straightness measures, and show how they behave differently from the traditional vertex-to-vertex ones. Moreover, we also study their discrete approximations, and show that our approach is globally less demanding in terms of both processing time and memory usage. Our R source code is publicly available under an open source license.
Neural diffusion on graphs is a novel class of graph neural networks that has attracted increasing attention recently. The capability of graph neural partial differential equations (PDEs) in addressing common hurdles of graph neural networks (GNNs), such as the problems of over-smoothing and bottlenecks, has been investigated but not their robustness to adversarial attacks. In this work, we explore the robustness properties of graph neural PDEs. We empirically demonstrate that graph neural PDEs are intrinsically more robust against topology perturbation as compared to other GNNs. We provide insights into this phenomenon by exploiting the stability of the heat semigroup under graph topology perturbations. We discuss various graph diffusion operators and relate them to existing graph neural PDEs. Furthermore, we propose a general graph neural PDE framework based on which a new class of robust GNNs can be defined. We verify that the new model achieves comparable state-of-the-art performance on several benchmark datasets.
The Importance Markov chain is a novel algorithm bridging the gap between rejection sampling and importance sampling, moving from one to the other through a tuning parameter. Based on a modified sample of an instrumental Markov chain targeting an instrumental distribution (typically via a MCMC kernel), the Importance Markov chain produces an extended Markov chain where the marginal distribution of the first component converges to the target distribution. For example, when targeting a multimodal distribution, the instrumental distribution can be chosen as a tempered version of the target which allows the algorithm to explore its modes more efficiently. We obtain a Law of Large Numbers and a Central Limit Theorem as well as geometric ergodicity for this extended kernel under mild assumptions on the instrumental kernel. Computationally, the algorithm is easy to implement and preexisting librairies can be used to sample from the instrumental distribution.
Generative models have advantageous characteristics for classification tasks such as the availability of unsupervised data and calibrated confidence, whereas discriminative models have advantages in terms of the simplicity of their model structures and learning algorithms and their ability to outperform their generative counterparts. In this paper, we propose a method to train a hybrid of discriminative and generative models in a single neural network (NN), which exhibits the characteristics of both models. The key idea is the Gaussian-coupled softmax layer, which is a fully connected layer with a softmax activation function coupled with Gaussian distributions. This layer can be embedded into an NN-based classifier and allows the classifier to estimate both the class posterior distribution and the class-conditional data distribution. We demonstrate that the proposed hybrid model can be applied to semi-supervised learning and confidence calibration.