We propose an innovative and generic methodology to analyse individual and collective behaviour through individual trajectory data. The work is motivated by the analysis of GPS trajectories of fishing vessels collected from regulatory tracking data in the context of marine biodiversity conservation and ecosystem-based fisheries management. We build a low-dimensional latent representation of trajectories using convolutional neural networks as non-linear mapping. This is done by training a conditional variational auto-encoder taking into account covariates. The posterior distributions of the latent representations can be linked to the characteristics of the actual trajectories. The latent distributions of the trajectories are compared with the Bhattacharyya coefficient, which is well-suited for comparing distributions. Using this coefficient, we analyse the variation of the individual behaviour of each vessel during time. For collective behaviour analysis, we build proximity graphs and use an extension of the stochastic block model for multiple networks. This model results in a clustering of the individuals based on their set of trajectories. The application to French fishing vessels enables us to obtain groups of vessels whose individual and collective behaviours exhibit spatio-temporal patterns over the period 2014-2018.
We propose a framework where Fer and Wilcox expansions for the solution of differential equations are derived from two particular choices for the initial transformation that seeds the product expansion. In this scheme intermediate expansions can also be envisaged. Recurrence formulas are developed. A new lower bound for the convergence of the Wilcox expansion is provided as well as some applications of the results. In particular, two examples are worked out up to high order of approximation to illustrate the behavior of the Wilcox expansion.
This work proposes a novel variational approximation of partial differential equations on moving geometries determined by explicit boundary representations. The benefits of the proposed formulation are the ability to handle large displacements of explicitly represented domain boundaries without generating body-fitted meshes and remeshing techniques. For the space discretization, we use a background mesh and an unfitted method that relies on integration on cut cells only. We perform this intersection by using clipping algorithms. To deal with the mesh movement, we pullback the equations to a reference configuration (the spatial mesh at the initial time slab times the time interval) that is constant in time. This way, the geometrical intersection algorithm is only required in 3D, another key property of the proposed scheme. At the end of the time slab, we compute the deformed mesh, intersect the deformed boundary with the background mesh, and consider an exact transfer operator between meshes to compute jump terms in the time discontinuous Galerkin integration. The transfer is also computed using geometrical intersection algorithms. We demonstrate the applicability of the method to fluid problems around rotating (2D and 3D) geometries described by oriented boundary meshes. We also provide a set of numerical experiments that show the optimal convergence of the method.
Network meta-analysis combines aggregate data (AgD) from multiple randomised controlled trials, assuming that any effect modifiers are balanced across populations. Individual patient data (IPD) meta-regression is the ``gold standard'' method to relax this assumption, however IPD are frequently only available in a subset of studies. Multilevel network meta-regression (ML-NMR) extends IPD meta-regression to incorporate AgD studies whilst avoiding aggregation bias, but currently requires the aggregate-level likelihood to have a known closed form. Notably, this prevents application to time-to-event outcomes. We extend ML-NMR to individual-level likelihoods of any form, by integrating the individual-level likelihood function over the AgD covariate distributions to obtain the respective marginal likelihood contributions. We illustrate with two examples of time-to-event outcomes, showing the performance of ML-NMR in a simulated comparison with little loss of precision from a full IPD analysis, and demonstrating flexible modelling of baseline hazards using cubic M-splines with synthetic data on newly diagnosed multiple myeloma. ML-NMR is a general method for synthesising individual and aggregate level data in networks of all sizes. Extension to general likelihoods, including for survival outcomes, greatly increases the applicability of the method. R and Stan code is provided, and the methods are implemented in the multinma R package.
A crucial challenge for solving problems in conflict research is in leveraging the semi-supervised nature of the data that arise. Observed response data such as counts of battle deaths over time indicate latent processes of interest such as intensity and duration of conflicts, but defining and labeling instances of these unobserved processes requires nuance and imprecision. The availability of such labels, however, would make it possible to study the effect of intervention-related predictors - such as ceasefires - directly on conflict dynamics (e.g., latent intensity) rather than through an intermediate proxy like observed counts of battle deaths. Motivated by this problem and the new availability of the ETH-PRIO Civil Conflict Ceasefires data set, we propose a Bayesian autoregressive (AR) hidden Markov model (HMM) framework as a sufficiently flexible machine learning approach for semi-supervised regime labeling with uncertainty quantification. We motivate our approach by illustrating the way it can be used to study the role that ceasefires play in shaping conflict dynamics. This ceasefires data set is the first systematic and globally comprehensive data on ceasefires, and our work is the first to analyze this new data and to explore the effect of ceasefires on conflict dynamics in a comprehensive and cross-country manner.
We study the complexity (that is, the weight of the multiplication table) of the elliptic normal bases introduced by Couveignes and Lercier. We give an upper bound on the complexity of these elliptic normal bases, and we analyze the weight of some special vectors related to the multiplication table of those bases. This analysis leads us to some perspectives on the search for low complexity normal bases from elliptic periods.
We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight.
This paper introduces a new series of methods which combine modal decomposition algorithms, such as singular value decomposition and high-order singular value decomposition, and deep learning architectures to repair, enhance, and increase the quality and precision of numerical and experimental data. A combination of two- and three-dimensional, numerical and experimental dasasets are used to demonstrate the reconstruction capacity of the presented methods, showing that these methods can be used to reconstruct any type of dataset, showing outstanding results when applied to highly complex data, which is noisy. The combination of benefits of these techniques results in a series of data-driven methods which are capable of repairing and/or enhancing the resolution of a dataset by identifying the underlying physics that define the data, which is incomplete or under-resolved, filtering any existing noise. These methods and the Python codes are included in the first release of ModelFLOWs-app.
We study propositional proof systems with inference rules that formalize restricted versions of the ability to make assumptions that hold without loss of generality, commonly used informally to shorten proofs. Each system we study is built on resolution. They are called BC${}^-$, RAT${}^-$, SBC${}^-$, and GER${}^-$, denoting respectively blocked clauses, resolution asymmetric tautologies, set-blocked clauses, and generalized extended resolution - all "without new variables." They may be viewed as weak versions of extended resolution (ER) since they are defined by first generalizing the extension rule and then taking away the ability to introduce new variables. Except for SBC${}^-$, they are known to be strictly between resolution and extended resolution. Several separations between these systems were proved earlier by exploiting the fact that they effectively simulate ER. We answer the questions left open: We prove exponential lower bounds for SBC${}^-$ proofs of a binary encoding of the pigeonhole principle, which separates ER from SBC${}^-$. Using this new separation, we prove that both RAT${}^-$ and GER${}^-$ are exponentially separated from SBC${}^-$. This completes the picture of their relative strengths.
Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $D$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}^d$ within a Wasserstein error margin $\varepsilon>0$. Our findings reveal a balancing trade-off between $p$ and $L$, with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilon^d)^{-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$-approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p^{-1/d})$. This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $D$. In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.
We propose a finite-dimensional control-based method to approximate solution operators for evolutional partial differential equations (PDEs), particularly in high-dimensions. By employing a general reduced-order model, such as a deep neural network, we connect the evolution of the model parameters with trajectories in a corresponding function space. Using the computational technique of neural ordinary differential equation, we learn the control over the parameter space such that from any initial starting point, the controlled trajectories closely approximate the solutions to the PDE. Approximation accuracy is justified for a general class of second-order nonlinear PDEs. Numerical results are presented for several high-dimensional PDEs, including real-world applications to solving Hamilton-Jacobi-Bellman equations. These are demonstrated to show the accuracy and efficiency of the proposed method.