Accumulated Local Effects (ALE) is a model-agnostic approach for global explanations of the results of black-box machine learning (ML) algorithms. There are at least three challenges with conducting statistical inference based on ALE: ensuring the reliability of ALE analyses, especially in the context of small datasets; intuitively characterizing a variable's overall effect in ML; and making robust inferences from ML data analysis. In response, we introduce innovative tools and techniques for statistical inference using ALE, establishing bootstrapped confidence intervals tailored to dataset size and introducing ALE effect size measures that intuitively indicate effects on both the outcome variable scale and a normalized scale. Furthermore, we demonstrate how to use these tools to draw reliable statistical inferences, reflecting the flexible patterns ALE adeptly highlights, with implementations available in the 'ale' package in R. This work propels the discourse on ALE and its applicability in ML and statistical analysis forward, offering practical solutions to prevailing challenges in the field.
In this work we present denoiSplit, a method to tackle a new analysis task, i.e. the challenge of joint semantic image splitting and unsupervised denoising. This dual approach has important applications in fluorescence microscopy, where semantic image splitting has important applications but noise does generally hinder the downstream analysis of image content. Image splitting involves dissecting an image into its distinguishable semantic structures. We show that the current state-of-the-art method for this task struggles in the presence of image noise, inadvertently also distributing the noise across the predicted outputs. The method we present here can deal with image noise by integrating an unsupervised denoising sub-task. This integration results in improved semantic image unmixing, even in the presence of notable and realistic levels of imaging noise. A key innovation in denoiSplit is the use of specifically formulated noise models and the suitable adjustment of KL-divergence loss for the high-dimensional hierarchical latent space we are training. We showcase the performance of denoiSplit across 4 tasks on real-world microscopy images. Additionally, we perform qualitative and quantitative evaluations and compare results to existing benchmarks, demonstrating the effectiveness of using denoiSplit: a single Variational Splitting Encoder-Decoder (VSE) Network using two suitable noise models to jointly perform semantic splitting and denoising.
This work presents a maturity model for assessing catalogues of semantic artefacts, one of the keystones that permit semantic interoperability of systems. We defined the dimensions and related features to include in the maturity model by analysing the current literature and existing catalogues of semantic artefacts provided by experts. In addition, we assessed 26 different catalogues to demonstrate the effectiveness of the maturity model, which includes 12 different dimensions (Metadata, Openness, Quality, Availability, Statistics, PID, Governance, Community, Sustainability, Technology, Transparency, and Assessment) and 43 related features (or sub-criteria) associated with these dimensions. Such a maturity model is one of the first attempts to provide recommendations for governance and processes for preserving and maintaining semantic artefacts and helps assess/address interoperability challenges.
In the search for highly efficient decoders for short LDPC codes approaching maximum likelihood performance, a relayed decoding strategy, specifically activating the ordered statistics decoding process upon failure of a neural min-sum decoder, is enhanced by instilling three innovations. Firstly, soft information gathered at each step of the neural min-sum decoder is leveraged to forge a new reliability measure using a convolutional neural network. This measure aids in constructing the most reliable basis of ordered statistics decoding, bolstering the decoding process by excluding error-prone bits or concentrating them in a smaller area. Secondly, an adaptive ordered statistics decoding process is introduced, guided by a derived decoding path comprising prioritized blocks, each containing distinct test error patterns. The priority of these blocks is determined from the statistical data during the query phase. Furthermore, effective complexity management methods are devised by adjusting the decoding path's length or refining constraints on the involved blocks. Thirdly, a simple auxiliary criterion is introduced to reduce computational complexity by minimizing the number of candidate codewords before selecting the optimal estimate. Extensive experimental results and complexity analysis strongly support the proposed framework, demonstrating its advantages in terms of high throughput, low complexity, independence from noise variance, in addition to superior decoding performance.
We prove the uniform convergence of the geometric multigrid V-cycle for hybrid high-order (HHO) and other discontinuous skeletal methods. Our results generalize previously established results for HDG methods, and our multigrid method uses standard smoothers and local solvers that are bounded, convergent, and consistent. We use a weak version of elliptic regularity in our proofs. Numerical experiments confirm our theoretical results.
We explore the Ziv-Lempel and Crochemore factorizations of some classical automatic sequences making an extensive use of the theorem prover Walnut.
Segmentation of fetal brain tissue from magnetic resonance imaging (MRI) plays a crucial role in the study of in utero neurodevelopment. However, automated tools face substantial domain shift challenges as they must be robust to highly heterogeneous clinical data, often limited in numbers and lacking annotations. Indeed, high variability of the fetal brain morphology, MRI acquisition parameters, and superresolution reconstruction (SR) algorithms adversely affect the model's performance when evaluated out-of-domain. In this work, we introduce FetalSynthSeg, a domain randomization method to segment fetal brain MRI, inspired by SynthSeg. Our results show that models trained solely on synthetic data outperform models trained on real data in out-ofdomain settings, validated on a 120-subject cross-domain dataset. Furthermore, we extend our evaluation to 40 subjects acquired using lowfield (0.55T) MRI and reconstructed with novel SR models, showcasing robustness across different magnetic field strengths and SR algorithms. Leveraging a generative synthetic approach, we tackle the domain shift problem in fetal brain MRI and offer compelling prospects for applications in fields with limited and highly heterogeneous data.
Numerous Deep Learning (DL) models have been developed for a large spectrum of medical image analysis applications, which promises to reshape various facets of medical practice. Despite early advances in DL model validation and implementation, which encourage healthcare institutions to adopt them, some fundamental questions remain: are the DL models capable of generalizing? What causes a drop in DL model performances? How to overcome the DL model performance drop? Medical data are dynamic and prone to domain shift, due to multiple factors such as updates to medical equipment, new imaging workflow, and shifts in patient demographics or populations can induce this drift over time. In this paper, we review recent developments in generalization methods for DL-based classification models. We also discuss future challenges, including the need for improved evaluation protocols and benchmarks, and envisioned future developments to achieve robust, generalized models for medical image classification.
On the Boolean domain, there is a class of symmetric signatures called ``Fibonacci gates" for which a beautiful P-time combinatorial algorithm has been designed for the corresponding $\operatorname{Holant}^*$ problems. In this work, we give a combinatorial view for $\operatorname{Holant}^*(\mathcal{F})$ problems on a domain of size 3 where $\mathcal{F}$ is a set of arity 3 functions with inputs taking values on the domain of size 3 and the functions share some common properties. The combinatorial view can also be extended to the domain of size 4. Specifically, we extend the definition of "Fibonacci gates" to the domain of size 3 and the domain of size 4. Moreover, we give the corresponding combinatorial algorithms.
This article presents MuMFiM, an open source application for multiscale modeling of fibrous materials on massively parallel computers. MuMFiM uses two scales to represent fibrous materials such as biological network materials (extracellular matrix, connective tissue, etc.). It is designed to make use of multiple levels of parallelism, including distributed parallelism of the macro and microscales as well as GPU accelerated data-parallelism of the microscale. Scaling results of the GPU accelerated microscale show that solving microscale problems concurrently on the GPU can lead to a 1000x speedup over the solution of a single RVE on the GPU. In addition, we show nearly optimal strong and weak scaling results of MuMFiM on up to 128 nodes of AiMOS (Rensselaer Polytechnic Institute) which is composed of IBM AC922 nodes with 6 Volta V100 GPU and 2 20 core Power 9 CPUs each. We also show how MuMFiM can be used to solve problems of interest to the broader engineering community, in particular providing an example of the facet capsule ligament (FCL) of the human spine undergoing uniaxial extension.
In Coevolving Latent Space Networks with Attractors (CLSNA) models, nodes in a latent space represent social actors, and edges indicate their dynamic interactions. Attractors are added at the latent level to capture the notion of attractive and repulsive forces between nodes, borrowing from dynamical systems theory. However, CLSNA reliance on MCMC estimation makes scaling difficult, and the requirement for nodes to be present throughout the study period limit practical applications. We address these issues by (i) introducing a Stochastic gradient descent (SGD) parameter estimation method, (ii) developing a novel approach for uncertainty quantification using SGD, and (iii) extending the model to allow nodes to join and leave over time. Simulation results show that our extensions result in little loss of accuracy compared to MCMC, but can scale to much larger networks. We apply our approach to the longitudinal social networks of members of US Congress on the social media platform X. Accounting for node dynamics overcomes selection bias in the network and uncovers uniquely and increasingly repulsive forces within the Republican Party.