Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference framework for nonparametric identification of the causal effect of a continuous exposure on an outcome in the presence of spatial confounding. We propose double machine learning (DML), a procedure in which flexible models are used to regress both the exposure and outcome variables on confounders to arrive at a causal estimator with favorable robustness properties and convergence rates, and we prove that this approach is consistent and asymptotically normal under spatial dependence. As far as we are aware, this is the first approach to spatial confounding that does not rely on restrictive parametric assumptions (such as linearity, effect homogeneity, or Gaussianity) for both identification and estimation. We demonstrate the advantages of the DML approach analytically and in simulations. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.
We address modelling and computational issues for multiple treatment effect inference under many potential confounders. Our main contribution is providing a trade-off between preventing the omission of relevant confounders, while not running into an over-selection of instruments that significantly inflates variance. We propose a novel empirical Bayes framework for Bayesian model averaging that learns from data the extent to which the inclusion of key covariates should be encouraged. Our framework sets a prior that asymptotically matches the true amount of confounding in the data, as measured by a novel confounding coefficient. A key challenge is computational. We develop fast algorithms, using an exact gradient of the marginal likelihood that has linear cost in the number of covariates, and a variational counterpart. Our framework uses widely-used ingredients and largely existing software, and it is implemented within the R package mombf. We illustrate our work with two applications. The first is the association between salary variation and discriminatory factors. The second, that has been debated in previous works, is the association between abortion policies and crime. Our approach provides insights that differ from previous analyses especially in situations with weaker treatment effects.
Choice constructs are an important part of the language of logic programming, yet the study of their semantics has been a challenging task. So far, only two-valued semantics have been studied, and the different proposals for such semantics have not been compared in a principled way. In this paper, an operator-based framework allow for the definition and comparison of different semantics in a principled way is proposed.
Data assimilation has become a crucial technique aiming to combine physical models with observational data to estimate state variables. Traditional assimilation algorithms often face challenges of high nonlinearity brought by both the physical and observational models. In this work, we propose a novel data-driven assimilation algorithm based on generative models to address such concerns. Our State-Observation Augmented Diffusion (SOAD) model is designed to handle nonlinear physical and observational models more effectively. The marginal posterior associated with SOAD has been derived and then proved to match the real posterior under mild assumptions, which shows theoretical superiority over previous score-based assimilation works. Experimental results also indicate that our SOAD model may offer improved accuracy over existing data-driven methods.
Which dynamic queries can be maintained efficiently? For constant-size changes, it is known that constant-depth circuits or, equivalently, first-order updates suffice for maintaining many important queries, among them reachability, tree isomorphism, and the word problem for context-free languages. In other words, these queries are in the dynamic complexity class DynFO. We show that most of the existing results for constant-size changes can be recovered for batch changes of polylogarithmic size if one allows circuits of depth O(log log n) or, equivalently, first-order updates that are iterated O(log log n) times.
This study proposes a unified theory and statistical learning approach for traffic conflict detection, addressing the long-existing call for a consistent and comprehensive methodology to evaluate the collision risk emerging in road user interactions. The proposed theory assumes context-dependent probabilistic collision risk and frames conflict detection as assessing this risk by statistical learning of extreme events in daily interactions. Experiments using real-world trajectory data are conducted in this study, where a unified metric of conflict is trained with lane-changing interactions on German highways and applied to near-crash events from the 100-Car Naturalistic Driving Study in the U.S. Results of the experiments demonstrate that the trained metric provides effective collision warnings, generalises across distinct datasets and traffic environments, covers a broad range of conflicts, and delivers a long-tailed distribution of conflict intensity. Reflecting on these results, the unified theory ensures consistent evaluation by a generic formulation that encompasses varying assumptions of traffic conflicts; the statistical learning approach then enables a comprehensive consideration of influencing factors such as motion states of road users, environment conditions, and participant characteristics. Therefore, the theory and learning approach jointly provide an explainable and adaptable methodology for conflict detection among different road users and across various interaction scenarios. This promises to reduce accidents and improve overall traffic safety, by enhanced safety assessment of traffic infrastructures, more effective collision warning systems for autonomous driving, and a deeper understanding of road user behaviour in different traffic conditions.
Machine learning techniques have recently been of great interest for solving differential equations. Training these models is classically a data-fitting task, but knowledge of the expression of the differential equation can be used to supplement the training objective, leading to the development of physics-informed scientific machine learning. In this article, we focus on one class of models called nonlinear vector autoregression (NVAR) to solve ordinary differential equations (ODEs). Motivated by connections to numerical integration and physics-informed neural networks, we explicitly derive the physics-informed NVAR (piNVAR) which enforces the right-hand side of the underlying differential equation regardless of NVAR construction. Because NVAR and piNVAR completely share their learned parameters, we propose an augmented procedure to jointly train the two models. Then, using both data-driven and ODE-driven metrics, we evaluate the ability of the piNVAR model to predict solutions to various ODE systems, such as the undamped spring, a Lotka-Volterra predator-prey nonlinear model, and the chaotic Lorenz system.
The insight that causal parameters are particularly suitable for out-of-sample prediction has sparked a lot development of causal-like predictors. However, the connection with strict causal targets, has limited the development with good risk minimization properties, but without a direct causal interpretation. In this manuscript we derive the optimal out-of-sample risk minimizing predictor of a certain target $Y$ in a non-linear system $(X,Y)$ that has been trained in several within-sample environments. We consider data from an observation environment, and several shifted environments. Each environment corresponds to a structural equation model (SEM), with random coefficients and with its own shift and noise vector, both in $L^2$. Unlike previous approaches, we also allow shifts in the target value. We define a sieve of out-of-sample environments, consisting of all shifts $\tilde{A}$ that are at most $\gamma$ times as strong as any weighted average of the observed shift vectors. For each $\beta\in\mathbb{R}^p$ we show that the supremum of the risk functions $R_{\tilde{A}}(\beta)$ has a worst-risk decomposition into a (positive) non-linear combination of risk functions, depending on $\gamma$. We then define the set $\mathcal{B}_\gamma$, as minimizers of this risk. The main result of the paper is that there is a unique minimizer ($|\mathcal{B}_\gamma|=1$) that can be consistently estimated by an explicit estimator, outside a set of zero Lebesgue measure in the parameter space. A practical obstacle for the initial method of estimation is that it involves the solution of a general degree polynomials. Therefore, we prove that an approximate estimator using the bisection method is also consistent.
Sharpness is an almost generic assumption in continuous optimization that bounds the distance from minima by objective function suboptimality. It facilitates the acceleration of first-order methods through restarts. However, sharpness involves problem-specific constants that are typically unknown, and restart schemes typically reduce convergence rates. Moreover, these schemes are challenging to apply in the presence of noise or with approximate model classes (e.g., in compressive imaging or learning problems), and they generally assume that the first-order method used produces feasible iterates. We consider the assumption of approximate sharpness, a generalization of sharpness that incorporates an unknown constant perturbation to the objective function error. This constant offers greater robustness (e.g., with respect to noise or relaxation of model classes) for finding approximate minimizers. By employing a new type of search over the unknown constants, we design a restart scheme that applies to general first-order methods and does not require the first-order method to produce feasible iterates. Our scheme maintains the same convergence rate as when the constants are known. The convergence rates we achieve for various first-order methods match the optimal rates or improve on previously established rates for a wide range of problems. We showcase our restart scheme in several examples and highlight potential future applications and developments of our framework and theory.
Deep learning still struggles with certain kinds of scientific data. Notably, pretraining data may not provide coverage of relevant distribution shifts (e.g., shifts induced via the use of different measurement instruments). We consider deep learning models trained to classify the synthesis conditions of uranium ore concentrates (UOCs) and show that model editing is particularly effective for improving generalization to distribution shifts common in this domain. In particular, model editing outperforms finetuning on two curated datasets comprising of micrographs taken of U$_{3}$O$_{8}$ aged in humidity chambers and micrographs acquired with different scanning electron microscopes, respectively.
We set up, at the abstract Hilbert space setting, the general question on when an inverse linear problem induced by an operator of Friedrichs type admits solutions belonging to (the closure of) the Krylov subspace associated to such operator. Such Krylov solvability of abstract Friedrichs systems allows to predict when, for concrete differential inverse problems, truncation algorithms can or cannot reproduce the exact solutions in terms of approximants from the Krylov subspace.