This work considers Bayesian inference under misspecification for complex statistical models comprised of simpler submodels, referred to as modules, that are coupled together. Such ``multi-modular" models often arise when combining information from different data sources, where there is a module for each data source. When some of the modules are misspecified, the challenges of Bayesian inference under misspecification can sometimes be addressed by using ``cutting feedback" methods, which modify conventional Bayesian inference by limiting the influence of unreliable modules. Here we investigate cutting feedback methods in the context of generalized posterior distributions, which are built from arbitrary loss functions, and present novel findings on their behaviour. We make three main contributions. First, we describe how cutting feedback methods can be defined in the generalized Bayes setting, and discuss the appropriate scaling of the loss functions for different modules to each other and the prior. Second, we derive a novel result about the large sample behaviour of the posterior for a given module's parameters conditional on the parameters of other modules. This formally justifies the use of conditional Laplace approximations, which provide better approximations of conditional posterior distributions compared to conditional distributions from a Laplace approximation of the joint posterior. Our final contribution leverages the large sample approximations of our second contribution to provide convenient diagnostics for understanding the sensitivity of inference to the coupling of the modules, and to implement a new semi-modular posterior approach for conducting robust Bayesian modular inference. The usefulness of the methodology is illustrated in several benchmark examples from the literature on cut model inference.
The emergence of large language models (LLMs) presents an unprecedented opportunity to automate construction contract management, reducing human errors and saving significant time and costs. However, LLMs may produce convincing yet inaccurate and misleading content due to a lack of domain expertise. To address this issue, expert-driven contract knowledge can be represented in a structured manner to constrain the automatic contract management process. This paper introduces the Nested Contract Knowledge Graph (NCKG), a knowledge representation approach that captures the complexity of contract knowledge using a nested structure. It includes a nested knowledge representation framework, a NCKG ontology built on the framework, and an implementation method. Furthermore, we present the LLM-assisted contract review pipeline enhanced with external knowledge in NCKG. Our pipeline achieves a promising performance in contract risk reviewing, shedding light on the combination of LLM and KG towards more reliable and interpretable contract management.
Recently established, directed dependence measures for pairs $(X,Y)$ of random variables build upon the natural idea of comparing the conditional distributions of $Y$ given $X=x$ with the marginal distribution of $Y$. They assign pairs $(X,Y)$ values in $[0,1]$, the value is $0$ if and only if $X,Y$ are independent, and it is $1$ exclusively for $Y$ being a function of $X$. Here we show that comparing randomly drawn conditional distributions with each other instead or, equivalently, analyzing how sensitive the conditional distribution of $Y$ given $X=x$ is on $x$, opens the door to constructing novel families of dependence measures $\Lambda_\varphi$ induced by general convex functions $\varphi: \mathbb{R} \rightarrow \mathbb{R}$, containing, e.g., Chatterjee's coefficient of correlation as special case. After establishing additional useful properties of $\Lambda_\varphi$ we focus on continuous $(X,Y)$, translate $\Lambda_\varphi$ to the copula setting, consider the $L^p$-version and establish an estimator which is strongly consistent in full generality. A real data example and a simulation study illustrate the chosen approach and the performance of the estimator. Complementing the afore-mentioned results, we show how a slight modification of the construction underlying $\Lambda_\varphi$ can be used to define new measures of explainability generalizing the fraction of explained variance.
Outcome-dependent sampling designs are extensively utilized in various scientific disciplines, including epidemiology, ecology, and economics, with retrospective case-control studies being specific examples of such designs. Additionally, if the outcome used for sample selection is also mismeasured, then it is even more challenging to estimate the average treatment effect (ATE) accurately. To our knowledge, no existing method can address these two issues simultaneously. In this paper, we establish the identifiability of ATE and propose a novel method for estimating ATE in the context of generalized linear model. The estimator is shown to be consistent under some regularity conditions. To relax the model assumption, we also consider generalized additive model. We propose to estimate ATE using penalized B-splines and establish asymptotic properties for the proposed estimator. Our methods are evaluated through extensive simulation studies and the application to a dataset from the UK Biobank, with alcohol intake as the treatment and gout as the outcome.
Offline model-based optimization aims to find a design that maximizes a property of interest using only an offline dataset, with applications in robot, protein, and molecule design, among others. A prevalent approach is gradient ascent, where a proxy model is trained on the offline dataset and then used to optimize the design. This method suffers from an out-of-distribution issue, where the proxy is not accurate for unseen designs. To mitigate this issue, we explore using a pseudo-labeler to generate valuable data for fine-tuning the proxy. Specifically, we propose \textit{\textbf{I}mportance-aware \textbf{C}o-\textbf{T}eaching for Offline Model-based Optimization}~(\textbf{ICT}). This method maintains three symmetric proxies with their mean ensemble as the final proxy, and comprises two steps. The first step is \textit{pseudo-label-driven co-teaching}. In this step, one proxy is iteratively selected as the pseudo-labeler for designs near the current optimization point, generating pseudo-labeled data. Subsequently, a co-teaching process identifies small-loss samples as valuable data and exchanges them between the other two proxies for fine-tuning, promoting knowledge transfer. This procedure is repeated three times, with a different proxy chosen as the pseudo-labeler each time, ultimately enhancing the ensemble performance. To further improve accuracy of pseudo-labels, we perform a secondary step of \textit{meta-learning-based sample reweighting}, which assigns importance weights to samples in the pseudo-labeled dataset and updates them via meta-learning. ICT achieves state-of-the-art results across multiple design-bench tasks, achieving the best mean rank of $3.1$ and median rank of $2$, among $15$ methods. Our source code can be found here.
Ordinary state-based peridynamic (OSB-PD) models have an unparalleled capability to simulate crack propagation phenomena in solids with arbitrary Poisson's ratio. However, their non-locality also leads to prohibitively high computational cost. In this paper, a fast solution scheme for OSB-PD models based on matrix operation is introduced, with which, the graphics processing units (GPUs) are used to accelerate the computation. For the purpose of comparison and verification, a commonly used solution scheme based on loop operation is also presented. An in-house software is developed in MATLAB. Firstly, the vibration of a cantilever beam is solved for validating the loop- and matrix-based schemes by comparing the numerical solutions to those produced by a FEM software. Subsequently, two typical dynamic crack propagation problems are simulated to illustrate the effectiveness of the proposed schemes in solving dynamic fracture problems. Finally, the simulation of the Brokenshire torsion experiment is carried out by using the matrix-based scheme, and the similarity in the shapes of the experimental and numerical broken specimens further demonstrates the ability of the proposed approach to deal with 3D non-planar fracture problems. In addition, the speed-up of the matrix-based scheme with respect to the loop-based scheme and the performance of the GPU acceleration are investigated. The results emphasize the high computational efficiency of the matrix-based implementation scheme.
The study further explores randomized QMC (RQMC), which maintains the QMC convergence rate and facilitates computational efficiency analysis. Emphasis is laid on integrating randomly shifted lattice rules, a distinct RQMC quadrature, with IS,a classic variance reduction technique. The study underscores the intricacies of establishing a theoretical convergence rate for IS in QMC compared to MC, given the influence of problem dimensions and smoothness on QMC. The research also touches on the significance of IS density selection and its potential implications. The study culminates in examining the error bound of IS with a randomly shifted lattice rule, drawing inspiration from the reproducing kernel Hilbert space (RKHS). In the realm of finance and statistics, many problems boil down to computing expectations, predominantly integrals concerning a Gaussian measure. This study considers optimal drift importance sampling (ODIS) and Laplace importance sampling (LapIS) as common importance densities. Conclusively, the paper establishes that under certain conditions, the IS-randomly shifted lattice rule can achieve a near $O(N^{-1})$ error bound.
In recent years, the development of technologies for causal inference with privacy preservation of distributed data has gained considerable attention. Many existing methods for distributed data focus on resolving the lack of subjects (samples) and can only reduce random errors in estimating treatment effects. In this study, we propose a data collaboration quasi-experiment (DC-QE) that resolves the lack of both subjects and covariates, reducing random errors and biases in the estimation. Our method involves constructing dimensionality-reduced intermediate representations from private data from local parties, sharing intermediate representations instead of private data for privacy preservation, estimating propensity scores from the shared intermediate representations, and finally, estimating the treatment effects from propensity scores. Through numerical experiments on both artificial and real-world data, we confirm that our method leads to better estimation results than individual analyses. While dimensionality reduction loses some information in the private data and causes performance degradation, we observe that sharing intermediate representations with many parties to resolve the lack of subjects and covariates sufficiently improves performance to overcome the degradation caused by dimensionality reduction. Although external validity is not necessarily guaranteed, our results suggest that DC-QE is a promising method. With the widespread use of our method, intermediate representations can be published as open data to help researchers find causalities and accumulate a knowledge base.
Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics and their use has garnered a great deal of attention in the signal processing and computational intelligence societies. This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics. To this end, a general formulation for the actions of an agent in multiagent networked systems is presented and conditions for achieving a cohesive learning behaviour is given. Importantly, application of the so derived framework in distributed and federated learning scenarios are presented.
Quantum neural networks (QNNs) and quantum kernels stand as prominent figures in the realm of quantum machine learning, poised to leverage the nascent capabilities of near-term quantum computers to surmount classical machine learning challenges. Nonetheless, the training efficiency challenge poses a limitation on both QNNs and quantum kernels, curbing their efficacy when applied to extensive datasets. To confront this concern, we present a unified approach: coreset selection, aimed at expediting the training of QNNs and quantum kernels by distilling a judicious subset from the original training dataset. Furthermore, we analyze the generalization error bounds of QNNs and quantum kernels when trained on such coresets, unveiling the comparable performance with those training on the complete original dataset. Through systematic numerical simulations, we illuminate the potential of coreset selection in expediting tasks encompassing synthetic data classification, identification of quantum correlations, and quantum compiling. Our work offers a useful way to improve diverse quantum machine learning models with a theoretical guarantee while reducing the training cost.
Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.