A common phenomenon in spatial regression models is spatial confounding. This phenomenon occurs when spatially indexed covariates modeling the mean of the response are correlated with a spatial effect included in the model. spatial+ Dupont et al. (2022) is a popular approach to reducing spatial confounding. spatial+ is a two-stage frequentist approach that explicitly models the spatial structure in the confounded covariate, removes it, and uses the corresponding residuals in the second stage. In a frequentist setting, there is no uncertainty propagation from the first stage estimation determining the residuals since only point estimates are used. Inference can also be cumbersome in a frequentist setting, and some of the gaps in the original approach can easily be remedied in a Bayesian framework. First, a Bayesian joint model can easily achieve uncertainty propagation from the first to the second stage of the model. In a Bayesian framework, we also have the tools to infer the model's parameters directly. Notably, another advantage of using a Bayesian framework we thoroughly explore is the ability to use prior information to impose restrictions on the spatial effects rather than applying them directly to their posterior. We build a joint prior for the smoothness of all spatial effects that simultaneously shrinks towards a high smoothness of the response and imposes that the spatial effect in the response is a smoother of the confounded covariates' spatial effect. This prevents the response from operating at a smaller scale than the covariate and can help to avoid situations where there is insufficient variation in the residuals resulting from the first stage model. We evaluate the performance of the Bayesian spatial+ via both simulated and real datasets.
We propose an augmented Lagrangian-based preconditioner to accelerate the convergence of Krylov subspace methods applied to linear systems of equations with a block three-by-three structure such as those arising from mixed finite element discretizations of the coupled Stokes-Darcy flow problem. We analyze the spectrum of the preconditioned matrix and we show how the new preconditioner can be efficiently applied. Numerical experiments are reported to illustrate the effectiveness of the preconditioner in conjunction with flexible GMRES for solving linear systems of equations arising from a 3D test problem.
Feature attribution is a fundamental task in both machine learning and data analysis, which involves determining the contribution of individual features or variables to a model's output. This process helps identify the most important features for predicting an outcome. The history of feature attribution methods can be traced back to General Additive Models (GAMs), which extend linear regression models by incorporating non-linear relationships between dependent and independent variables. In recent years, gradient-based methods and surrogate models have been applied to unravel complex Artificial Intelligence (AI) systems, but these methods have limitations. GAMs tend to achieve lower accuracy, gradient-based methods can be difficult to interpret, and surrogate models often suffer from stability and fidelity issues. Furthermore, most existing methods do not consider users' contexts, which can significantly influence their preferences. To address these limitations and advance the current state-of-the-art, we define a novel feature attribution framework called Context-Aware Feature Attribution Through Argumentation (CA-FATA). Our framework harnesses the power of argumentation by treating each feature as an argument that can either support, attack or neutralize a prediction. Additionally, CA-FATA formulates feature attribution as an argumentation procedure, and each computation has explicit semantics, which makes it inherently interpretable. CA-FATA also easily integrates side information, such as users' contexts, resulting in more accurate predictions.
Classical evolutionary approaches for multiobjective optimization are quite effective but incur a lot of queries to the objectives; this can be prohibitive when objectives are expensive oracles. A sample-efficient approach to solving multiobjective optimization is via Gaussian process (GP) surrogates and Bayesian optimization (BO). Multiobjective Bayesian optimization (MOBO) involves the construction of an acquisition function which is optimized to acquire new observation candidates. This ``inner'' optimization can be hard due to various reasons: acquisition functions being nonconvex, nondifferentiable and/or unavailable in analytical form; the success of MOBO heavily relies on this inner optimization. We do away with this hard acquisition function optimization step and propose a simple, but effective, Thompson sampling based approach ($q\texttt{POTS}$) where new candidate(s) are chosen from the Pareto frontier of random GP posterior sample paths obtained by solving a much cheaper multiobjective optimization problem. To further improve computational tractability in higher dimensions we propose an automated active set of candidates selection combined with a Nystr\"{o}m approximation. Our approach applies to arbitrary GP prior assumptions and demonstrates strong empirical performance over the state of the art, both in terms of accuracy and computational efficiency, on synthetic as well as real-world experiments.
Quantum computing promises transformational gains for solving some problems, but little to none for others. For anyone hoping to use quantum computers now or in the future, it is important to know which problems will benefit. In this paper, we introduce a framework for answering this question both intuitively and quantitatively. The underlying structure of the framework is a race between quantum and classical computers, where their relative strengths determine when each wins. While classical computers operate faster, quantum computers can sometimes run more efficient algorithms. Whether the speed advantage or the algorithmic advantage dominates determines whether a problem will benefit from quantum computing or not. Our analysis reveals that many problems, particularly those of small to moderate size that can be important for typical businesses, will not benefit from quantum computing. Conversely, larger problems or those with particularly big algorithmic gains will benefit from near-term quantum computing. Since very large algorithmic gains are rare in practice and theorized to be rare even in principle, our analysis suggests that the benefits from quantum computing will flow either to users of these rare cases, or practitioners processing very large data.
Optimizing multiple competing objectives is a common problem across science and industry. The inherent inextricable trade-off between those objectives leads one to the task of exploring their Pareto front. A meaningful quantity for the purpose of the latter is the hypervolume indicator, which is used in Bayesian Optimization (BO) and Evolutionary Algorithms (EAs). However, the computational complexity for the calculation of the hypervolume scales unfavorably with increasing number of objectives and data points, which restricts its use in those common multi-objective optimization frameworks. To overcome these restrictions we propose to approximate the hypervolume function with a deep neural network, which we call DeepHV. For better sample efficiency and generalization, we exploit the fact that the hypervolume is scale-equivariant in each of the objectives as well as permutation invariant w.r.t. both the objectives and the samples, by using a deep neural network that is equivariant w.r.t. the combined group of scalings and permutations. We evaluate our method against exact, and approximate hypervolume methods in terms of accuracy, computation time, and generalization. We also apply and compare our methods to state-of-the-art multi-objective BO methods and EAs on a range of synthetic benchmark test cases. The results show that our methods are promising for such multi-objective optimization tasks.
Chain-of-thought (CoT) is capable of eliciting models to explicitly generate reasoning paths, thus promoting reasoning accuracy and attracting increasing attention. Specifically, zero-shot CoT achieves remarkable improvements in a wide range of reasoning tasks by simply instructing the LLM with the prompt "Let's think step by step!". Despite the success of zero-shot CoT, the existing zero-shot prompting techniques remain limited to a single language, making it challenging to generalize to other languages and hindering global development. In this work, we introduce cross-lingual prompting (CLP), aiming to improve zero-shot CoT reasoning across languages. Specifically, CLP consists of two main components: (1) cross-lingual alignment prompting and (2) task-specific solver prompting. The cross-lingual alignment prompting is responsible for aligning representations across different languages, whereas the task-specific solver prompting is used to generate the final chain of thoughts and results for the reasoning task. In addition, we further introduce cross-lingual self-consistent prompting (CLSP) to ensemble different reasoning paths across languages. Our experimental evaluations on several benchmarks demonstrate that CLP and CLSP significantly outperform the existing prompting methods and achieve state-of-the-art performance. We hope this work will inspire further breakthroughs in cross-lingual CoT.
We introduce a Bayesian conditional autoregressive model for analyzing patient-specific and neighborhood risks of stillbirth and preterm birth within a city. Our fully Bayesian approach automatically learns the amount of spatial heterogeneity and spatial dependence between neighborhoods. Our model provides meaningful inferences and uncertainty quantification for both covariate effects and neighborhood risk probabilities through their posterior distributions. We apply our methodology to data from the city of Philadelphia. Using electronic health records (45,919 deliveries at hospitals within the University of Pennsylvania Health System) and United States Census Bureau data from 363 census tracts in Philadelphia, we find that both patient-level characteristics (e.g. self-identified race/ethnicity) and neighborhood-level characteristics (e.g. violent crime) are highly associated with patients' odds of stillbirth or preterm birth. Our neighborhood risk analysis further reveals that census tracts in West Philadelphia and North Philadelphia are at highest risk of these outcomes. Specifically, neighborhoods with higher rates of women in poverty or on public assistance have greater neighborhood risk for these outcomes, while neighborhoods with higher rates of college-educated women or women in the labor force have lower risk. Our findings could be useful for targeted individual and neighborhood interventions.
Nonparametric maximum likelihood estimators (MLEs) in inverse problems often have non-normal limit distributions, like Chernoff's distribution. However, if one considers smooth functionals of the model, with corresponding functionals of the MLE, one gets normal limit distributions and faster rates of convergence. We demonstrate this for a model for the incubation time of a disease. The usual approach in the latter models is to use parametric distributions, like Weibull and gamma distributions, which leads to inconsistent estimators. Smoothed bootstrap methods are discussed for constructing confidence intervals. The classical bootstrap, based on the nonparametric MLE itself, has been proved to be inconsistent in this situation.
We observe a large variety of robots in terms of their bodies, sensors, and actuators. Given the commonalities in the skill sets, teaching each skill to each different robot independently is inefficient and not scalable when the large variety in the robotic landscape is considered. If we can learn the correspondences between the sensorimotor spaces of different robots, we can expect a skill that is learned in one robot can be more directly and easily transferred to the other robots. In this paper, we propose a method to learn correspondences between robots that have significant differences in their morphologies: a fixed-based manipulator robot with joint control and a differential drive mobile robot. For this, both robots are first given demonstrations that achieve the same tasks. A common latent representation is formed while learning the corresponding policies. After this initial learning stage, the observation of a new task execution by one robot becomes sufficient to generate a latent space representation pertaining to the other robot to achieve the same task. We verified our system in a set of experiments where the correspondence between two simulated robots is learned (1) when the robots need to follow the same paths to achieve the same task, (2) when the robots need to follow different trajectories to achieve the same task, and (3) when complexities of the required sensorimotor trajectories are different for the robots considered. We also provide a proof-of-the-concept realization of correspondence learning between a real manipulator robot and a simulated mobile robot.
We propose an approach to 3D reconstruction via inverse procedural modeling and investigate two variants of this approach. The first option consists in the fitting set of input parameters using a genetic algorithm. We demonstrate the results of our work on tree models, complex objects, with the reconstruction of which most existing methods cannot handle. The second option allows us to significantly improve the precision by using gradients within memetic algorithm, differentiable rendering and also differentiable procedural generators. In our work we see 2 main contributions. First, we propose a method to join differentiable rendering and inverse procedural modeling. This gives us an opportunity to reconstruct 3D model more accurately than existing approaches when a small number of input images are available (even for single image). Second, we join both differentiable and non-differentiable procedural generators in a single framework which allow us to apply inverse procedural modeling to fairly complex generators: when gradient is available, reconstructions is precise, when gradient is not available, reconstruction is approximate, but always high quality without visual artifacts.