Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling complex combinatorial structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies and evolutionary distances. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. PhyloGFN is competitive with prior works in marginal likelihood estimation and achieves a closer fit to the target distribution than state-of-the-art variational inference methods.
Evaluating the reliability of complex technical networks, such as those in energy distribution, logistics, and transportation systems, is of paramount importance. These networks are often represented as multistate flow networks (MFNs). While there has been considerable research on assessing MFN reliability, many studies still need to pay more attention to a critical factor: transmission distance constraints. These constraints are typical in real-world applications, such as Internet infrastructure, where controlling the distances between data centers, network nodes, and end-users is vital for ensuring low latency and efficient data transmission. This paper addresses the evaluation of MFN reliability under distance constraints. Specifically, it focuses on determining the probability that a minimum of $d$ flow units can be transmitted successfully from a source node to a sink node, using only paths with lengths not exceeding a predefined distance limit of $\lambda $. We introduce an effective algorithm to tackle this challenge, provide a benchmark example to illustrate its application and analyze its computational complexity.
Positron Emission Tomography (PET) enables functional imaging of deep brain structures, but the bulk and weight of current systems preclude their use during many natural human activities, such as locomotion. The proposed long-term solution is to construct a robotic system that can support an imaging system surrounding the subject's head, and then move the system to accommodate natural motion. This requires a system to measure the motion of the head with respect to the imaging ring, for use by both the robotic system and the image reconstruction software. We report here the design and experimental evaluation of a parallel string encoder mechanism for sensing this motion. Our preliminary results indicate that the measurement system may achieve accuracy within 0.5 mm, especially for small motions, with improved accuracy possible through kinematic calibration.
Physics-informed neural networks (PINNs) effectively embed physical principles into machine learning, but often struggle with complex or alternating geometries. We propose a novel method for integrating geometric transformations within PINNs to robustly accommodate geometric variations. Our method incorporates a diffeomorphism as a mapping of a reference domain and adapts the derivative computation of the physics-informed loss function. This generalizes the applicability of PINNs not only to smoothly deformed domains, but also to lower-dimensional manifolds and allows for direct shape optimization while training the network. We demonstrate the effectivity of our approach on several problems: (i) Eikonal equation on Archimedean spiral, (ii) Poisson problem on surface manifold, (iii) Incompressible Stokes flow in deformed tube, and (iv) Shape optimization with Laplace operator. Through these examples, we demonstrate the enhanced flexibility over traditional PINNs, especially under geometric variations. The proposed framework presents an outlook for training deep neural operators over parametrized geometries, paving the way for advanced modeling with PDEs on complex geometries in science and engineering.
Time Series Motif Discovery (TSMD) refers to the task of identifying patterns that occur multiple times (possibly with minor variations) in a time series. All existing methods for TSMD have one or more of the following limitations: they only look for the two most similar occurrences of a pattern; they only look for patterns of a pre-specified, fixed length; they cannot handle variability along the time axis; and they only handle univariate time series. In this paper, we present a new method, LoCoMotif, that has none of these limitations. The method is motivated by a concrete use case from physiotherapy. We demonstrate the value of the proposed method on this use case. We also introduce a new quantitative evaluation metric for motif discovery, and benchmark data for comparing TSMD methods. LoCoMotif substantially outperforms the existing methods, on top of being more broadly applicable.
The issue related to the quantification of the tail risk of cryptocurrencies is considered in this paper. The statistical methods used in the study are those concerning recent developments in Extreme Value Theory (EVT) for weakly dependent data. This research proposes an expectile-based approach for assessing the tail risk of dependent data. Expectile is a summary statistic that generalizes the concept of mean, as the quantile generalizes the concept of the median. We present the empirical findings for a dataset of cryptocurrencies. We propose a method for dynamically evaluating the level of the expectiles by estimating the level of the expectiles of the residuals of a heteroscedastic regression, such as a GARCH model. Finally, we introduce the Marginal Expected Shortfall (MES) as a tool for measuring the marginal impact of single assets on systemic shortfalls. In our case of interest, we are focused on the impact of a single cryptocurrency on the systemic risk of the whole cryptocurrency market. In particular, we present an expectile-based MES for dependent data.
We present a novel metric for generative modeling evaluation, focusing primarily on generative networks. The method uses dendrograms to represent real and fake data, allowing for the divergence between training and generated samples to be computed. This metric focus on mode collapse, targeting generators that are not able to capture all modes in the training set. To evaluate the proposed method it is introduced a validation scheme based on sampling from real datasets, therefore the metric is evaluated in a controlled environment and proves to be competitive with other state-of-the-art approaches.
We analyse the geometric instability of embeddings produced by graph neural networks (GNNs). Existing methods are only applicable for small graphs and lack context in the graph domain. We propose a simple, efficient and graph-native Graph Gram Index (GGI) to measure such instability which is invariant to permutation, orthogonal transformation, translation and order of evaluation. This allows us to study the varying instability behaviour of GNN embeddings on large graphs for both node classification and link prediction.
During the last decades, Anti-Financial Crime (AFC) entities and Financial Institutions have put a constantly increasing effort to reduce financial crime and detect fraudulent activities, that are changing and developing in extremely complex ways. We propose an anomaly detection approach based on network analysis to help AFC officers navigating through the high load of information that is typical of AFC data-driven scenarios. By experimenting on a large financial dataset of more than 80M cross-country wire transfers, we leverage on the properties of complex networks to develop a tool for explainable anomaly detection, that can help in identifying outliers that could be engaged in potentially malicious activities according to financial regulations. We identify a set of network centrality measures that provide useful insights on individual nodes; by keeping track of the evolution over time of the centrality-based node rankings, we are able to highlight sudden and unexpected changes in the roles of individual nodes that deserve further attention by AFC officers. Such changes can hardly be noticed by means of current AFC practices, that sometimes can lack a higher-level, global vision of the system. This approach represents a preliminary step in the automation of AFC and AML processes, serving the purpose of facilitating the work of AFC officers by providing them with a top-down view of the picture emerging from financial data.
The numerical simulation of cardiac electrophysiology is a highly challenging problem in scientific computing. The Bidomain system is the most complete mathematical model of cardiac bioelectrical activity. It consists of an elliptic and a parabolic partial differential equation (PDE), of reaction-diffusion type, describing the spread of electrical excitation in the cardiac tissue. The two PDEs are coupled with a stiff system of ordinary differential equations (ODEs), representing ionic currents through the cardiac membrane. Developing efficient and scalable preconditioners for the linear systems arising from the discretization of such computationally challenging model is crucial in order to reduce the computational costs required by the numerical simulations of cardiac electrophysiology. In this work, focusing on the Bidomain system as a model problem, we have benchmarked two popular implementations of the Algebraic Multigrid (AMG) preconditioner embedded in the PETSc library and we have studied the performance on the calibration of specific parameters. We have conducted our analysis on modern HPC architectures, performing scalability tests on multi-core and multi-GPUs setttings. The results have shown that, for our problem, although scalability is verified on CPUs, GPUs are the optimal choice, since they yield the best performance in terms of solution time.
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.