When using graphs and graph transformations to model systems, consistency is an important concern. While consistency has primarily been viewed as a binary property, i.e., a graph is consistent or inconsistent with respect to a set of constraints, recent work has presented an approach to consistency as a graduated property. This allows living with inconsistencies for a while and repairing them when necessary. When repairing inconsistencies in a graph, we use graph transformation rules with so-called impairment- and repair-indicating application conditions to understand how much repair gain certain rule applications would bring. Both types of conditions can be derived from given graph constraints. Our main theorem shows that the difference between the number of actual constraint violations before and after a graph transformation step can be characterized by the difference between the numbers of violated impairment-indicating and repair-indicating application conditions. This theory forms the basis for algorithms with look-ahead that rank graph transformations according to their potential for graph repair. An initial evaluation shows that graph repair can be well supported by rules with these new types of application conditions.
Current physics-informed (standard or deep operator) neural networks still rely on accurately learning the initial and/or boundary conditions of the system of differential equations they are solving. In contrast, standard numerical methods involve such conditions in computations without needing to learn them. In this study, we propose to improve current physics-informed deep learning strategies such that initial and/or boundary conditions do not need to be learned and are represented exactly in the predicted solution. Moreover, this method guarantees that when a deep operator network is applied multiple times to time-step a solution of an initial value problem, the resulting function is at least continuous.
Structural identifiability is an important property of parametric ODE models. When conducting an experiment and inferring the parameter value from the time-series data, we want to know if the value is globally, locally, or non-identifiable. Global identifiability of the parameter indicates that there exists only one possible solution to the inference problem, local identifiability suggests that there could be several (but finitely many) possibilities, while non-identifiability implies that there are infinitely many possibilities for the value. Having this information is useful since, one would, for example, only perform inferences for the parameters which are identifiable. Given the current significance and widespread research conducted in this area, we decided to create a database of linear compartment models and their identifiability results. This facilitates the process of checking theorems and conjectures and drawing conclusions on identifiability. By only storing models up to symmetries and isomorphisms, we optimize memory efficiency and reduce query time. We conclude by applying our database to real problems. We tested a conjecture about deleting one leak of the model states in the paper 'Linear compartmental models: Input-output equations and operations that preserve identifiability' by E. Gross et al., and managed to produce a counterexample. We also compute some interesting statistics related to the identifiability of linear compartment model parameters.
Bayesian inference for complex models with an intractable likelihood can be tackled using algorithms performing many calls to computer simulators. These approaches are collectively known as "simulation-based inference" (SBI). Recent SBI methods have made use of neural networks (NN) to provide approximate, yet expressive constructs for the unavailable likelihood function and the posterior distribution. However, the trade-off between accuracy and computational demand leaves much space for improvement. In this work, we propose an alternative that provides both approximations to the likelihood and the posterior distribution, using structured mixtures of probability distributions. Our approach produces accurate posterior inference when compared to state-of-the-art NN-based SBI methods, even for multimodal posteriors, while exhibiting a much smaller computational footprint. We illustrate our results on several benchmark models from the SBI literature and on a biological model of the translation kinetics after mRNA transfection.
Noninformative priors constructed for estimation purposes are usually not appropriate for model selection and testing. The methodology of integral priors was developed to get prior distributions for Bayesian model selection when comparing two models, modifying initial improper reference priors. We propose a generalization of this methodology to more than two models. Our approach adds an artificial copy of each model under comparison by compactifying the parametric space and creating an ergodic Markov chain across all models that returns the integral priors as marginals of the stationary distribution. Besides the garantee of their existance and the lack of paradoxes attached to estimation reference priors, an additional advantage of this methodology is that the simulation of this Markov chain is straightforward as it only requires simulations of imaginary training samples for all models and from the corresponding posterior distributions. This renders its implementation automatic and generic, both in the nested case and in the nonnested case.
Polyhazard models are a class of flexible parametric models for modelling survival over extended time horizons. Their additive hazard structure allows for flexible, non-proportional hazards whose characteristics can change over time while retaining a parametric form, which allows for survival to be extrapolated beyond the observation period of a study. Significant user input is required, however, in selecting the number of latent hazards to model, their distributions and the choice of which variables to associate with each hazard. The resulting set of models is too large to explore manually, limiting their practical usefulness. Motivated by applications to stroke survivor and kidney transplant patient survival times we extend the standard polyhazard model through a prior structure allowing for joint inference of parameters and structural quantities, and develop a sampling scheme that utilises state-of-the-art Piecewise Deterministic Markov Processes to sample from the resulting transdimensional posterior with minimal user tuning.
Generative diffusion models have achieved spectacular performance in many areas of machine learning and generative modeling. While the fundamental ideas behind these models come from non-equilibrium physics, variational inference and stochastic calculus, in this paper we show that many aspects of these models can be understood using the tools of equilibrium statistical mechanics. Using this reformulation, we show that generative diffusion models undergo second-order phase transitions corresponding to symmetry breaking phenomena. We show that these phase-transitions are always in a mean-field universality class, as they are the result of a self-consistency condition in the generative dynamics. We argue that the critical instability that arises from the phase transitions lies at the heart of their generative capabilities, which are characterized by a set of mean-field critical exponents. Finally, we show that the dynamic equation of the generative process can be interpreted as a stochastic adiabatic transformation that minimizes the free energy while keeping the system in thermal equilibrium.
Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference framework for nonparametric identification of the causal effect of a continuous exposure on an outcome in the presence of spatial confounding. We propose double machine learning (DML), a procedure in which flexible models are used to regress both the exposure and outcome variables on confounders to arrive at a causal estimator with favorable robustness properties and convergence rates, and we prove that this approach is consistent and asymptotically normal under spatial dependence. As far as we are aware, this is the first approach to spatial confounding that does not rely on restrictive parametric assumptions (such as linearity, effect homogeneity, or Gaussianity) for both identification and estimation. We demonstrate the advantages of the DML approach analytically and in simulations. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.
We study multi-marginal optimal transport (MOT) problems where the underlying cost has a graphical structure. These graphical multi-marginal optimal transport problems have found applications in several domains including traffic flow control and regression problems in the Wasserstein space. MOT problem can be approached through two aspects: a single big MOT problem, or coupled minor OT problems. In this paper, we focus on the latter approach and demonstrate it has efficiency gain from the parallelization. For tree-structured MOT problems, we introduce a novel parallelizable algorithm that significantly reduces computational complexity. Additionally, we adapt this algorithm for general graphs, employing the modified junction trees to enable parallel updates. Our contributions, validated through numerical experiments, offer new avenues for MOT applications and establish benchmarks in computational efficiency.
Transparency is an important aspect of human-robot interaction (HRI), as it can improve system trust and usability leading to improved communication and performance. However, most transparency models focus only on the amount of information given to users. In this paper, we propose a bidirectional transparency model, termed a transparency-based action (TBA) model, which in addition to providing transparency information (robot-to-human), allows the robot to take actions based on transparency information received from the human (robot-of-human and human-to-robot). We implemented a three-level (High, Medium and Low) TBA model on a robotic system trainer in two pilot studies (with students as participants) to examine its impact on acceptance and HRI. Based on the pilot studies results, the Medium TBA level was not included in the main experiment, which was conducted with older adults (aged 75-85). In that experiment, two TBA levels were compared: Low (basic information including only robot-to-human transparency) and High (including additional information relating to predicted outcomes with robot-of-human and human-to-robot transparency). The results revealed a significant difference between the two TBA levels of the model in terms of perceived usefulness, ease of use, and attitude. The High TBA level resulted in improved user acceptance and was preferred by the users.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.