Causal inference methods for treatment effect estimation usually assume independent units. However, this assumption is often questionable because units may interact, resulting in spillover effects between units. We develop augmented inverse probability weighting (AIPW) for estimation and inference of the direct effect of the treatment with observational data from a single (social) network with spillover effects. We use plugin machine learning and sample splitting to obtain a semiparametric treatment effect estimator that converges at the parametric rate and asymptotically follows a Gaussian distribution. We apply our AIPW method to the Swiss StudentLife Study data to investigate the effect of hours spent studying on exam performance accounting for the students' social network.
Observational studies are often conducted to estimate causal effects of treatments or exposures on event-time outcomes. Since treatments are not randomized in observational studies, techniques from causal inference are required to adjust for confounding. Bayesian approaches to causal estimates are desirable because they provide 1) prior smoothing provides useful regularization of causal effect estimates, 2) flexible models that are robust to misspecification, 3) full inference (i.e. both point and uncertainty estimates) for causal estimands. However, Bayesian causal inference is difficult to implement manually and there is a lack of user-friendly software, presenting a significant barrier to wide-spread use. We address this gap by developing causalBETA (Bayesian Event Time Analysis) - an open-source R package for estimating causal effects on event-time outcomes using Bayesian semiparametric models. The package provides a familiar front-end to users, with syntax identical to existing survival analysis R packages such as survival. At the same time, it back-ends to Stan - a popular platform for Bayesian modeling and high performance statistical computing - for efficient posterior computation. To improve user experience, the package is built using customized S3 class objects and methods to facilitate visualizations and summaries of results using familiar generic functions like plot() and summary(). In this paper, we provide the methodological details of the package, a demonstration using publicly-available data, and computational guidance.
With the emergence of powerful representations of continuous data in the form of neural fields, there is a need for discretization invariant learning: an approach for learning maps between functions on continuous domains without being sensitive to how the function is sampled. We present a new framework for understanding and designing discretization invariant neural networks (DI-Nets), which generalizes many discrete networks such as convolutional neural networks as well as continuous networks such as neural operators. Our analysis establishes upper bounds on the deviation in model outputs under different finite discretizations, and highlights the central role of point set discrepancy in characterizing such bounds. This insight leads to the design of a family of neural networks driven by numerical integration via quasi-Monte Carlo sampling with discretizations of low discrepancy. We prove by construction that DI-Nets universally approximate a large class of maps between integrable function spaces, and show that discretization invariance also describes backpropagation through such models. Applied to neural fields, convolutional DI-Nets can learn to classify and segment visual data under various discretizations, and sometimes generalize to new types of discretizations at test time. Code: //github.com/clintonjwang/DI-net.
Although the Bayesian paradigm offers a formal framework for estimating the entire probability distribution over uncertain parameters, its online implementation can be challenging due to high computational costs. We suggest the Adaptive Recursive Markov Chain Monte Carlo (ARMCMC) method, which eliminates the shortcomings of conventional online techniques while computing the entire probability density function of model parameters. The limitations to Gaussian noise, the application to only linear in the parameters (LIP) systems, and the persistent excitation (PE) needs are some of these drawbacks. In ARMCMC, a temporal forgetting factor (TFF)-based variable jump distribution is proposed. The forgetting factor can be presented adaptively using the TFF in many dynamical systems as an alternative to a constant hyperparameter. By offering a trade-off between exploitation and exploration, the specific jump distribution has been optimised towards hybrid/multi-modal systems that permit inferences among modes. These trade-off are adjusted based on parameter evolution rate. We demonstrate that ARMCMC requires fewer samples than conventional MCMC methods to achieve the same precision and reliability. We demonstrate our approach using parameter estimation in a soft bending actuator and the Hunt-Crossley dynamic model, two challenging hybrid/multi-modal benchmarks. Additionally, we compare our method with recursive least squares and the particle filter, and show that our technique has significantly more accurate point estimates as well as a decrease in tracking error of the value of interest.
Heart disease, also known as cardiovascular disease, is a prevalent and critical medical condition characterized by the impairment of the heart and blood vessels, leading to various complications such as coronary artery disease, heart failure, and myocardial infarction. The timely and accurate detection of heart disease is of paramount importance in clinical practice. Early identification of individuals at risk enables proactive interventions, preventive measures, and personalized treatment strategies to mitigate the progression of the disease and reduce adverse outcomes. In recent years, the field of heart disease detection has witnessed notable advancements due to the integration of sophisticated technologies and computational approaches. These include machine learning algorithms, data mining techniques, and predictive modeling frameworks that leverage vast amounts of clinical and physiological data to improve diagnostic accuracy and risk stratification. In this work, we propose to detect heart disease from ECG images using cutting-edge technologies, namely vision transformer models. These models are Google-Vit, Microsoft-Beit, and Swin-Tiny. To the best of our knowledge, this is the initial endeavor concentrating on the detection of heart diseases through image-based ECG data by employing cuttingedge technologies namely, transformer models. To demonstrate the contribution of the proposed framework, the performance of vision transformer models are compared with state-of-the-art studies. Experiment results show that the proposed framework exhibits remarkable classification results.
We consider rather a general class of multi-level optimization problems, where a convex objective function is to be minimized subject to constraints of optimality of nested convex optimization problems. As a special case, we consider a trilevel optimization problem, where the objective of the two lower layers consists of a sum of a smooth and a non-smooth term.~Based on fixed-point theory and related arguments, we present a natural first-order algorithm and analyze its convergence and rates of convergence in several regimes of parameters.
Observational studies are often conducted to estimate causal effects of treatments or exposures on event-time outcomes. Since treatments are not randomized in observational studies, techniques from causal inference are required to adjust for confounding. Bayesian approaches to causal estimates are desirable because they provide 1) prior smoothing provides useful regularization of causal effect estimates, 2) flexible models that are robust to misspecification, 3) full inference (i.e. both point and uncertainty estimates) for causal estimands. However, Bayesian causal inference is difficult to implement manually and there is a lack of user-friendly software, presenting a significant barrier to wide-spread use. We address this gap by developing causalBETA (Bayesian Event Time Analysis) - an open-source R package for estimating causal effects on event-time outcomes using Bayesian semiparametric models. The package provides a familiar front-end to users, with syntax identical to existing survival analysis R packages such as survival. At the same time, it back-ends to Stan - a popular platform for Bayesian modeling and high performance statistical computing - for efficient posterior computation. To improve user experience, the package is built using customized S3 class objects and methods to facilitate visualizations and summaries of results using familiar generic functions like plot() and summary(). In this paper, we provide the methodological details of the package, a demonstration using publicly-available data, and computational guidance.
In drug discovery, it is vital to confirm the predictions of pharmaceutical properties from computational models using costly wet-lab experiments. Hence, obtaining reliable uncertainty estimates is crucial for prioritizing drug molecules for subsequent experimental validation. Conformal Prediction (CP) is a promising tool for creating such prediction sets for molecular properties with a coverage guarantee. However, the exchangeability assumption of CP is often challenged with covariate shift in drug discovery tasks: Most datasets contain limited labeled data, which may not be representative of the vast chemical space from which molecules are drawn. To address this limitation, we propose a method called CoDrug that employs an energy-based model leveraging both training data and unlabelled data, and Kernel Density Estimation (KDE) to assess the densities of a molecule set. The estimated densities are then used to weigh the molecule samples while building prediction sets and rectifying for distribution shift. In extensive experiments involving realistic distribution drifts in various small-molecule drug discovery tasks, we demonstrate the ability of CoDrug to provide valid prediction sets and its utility in addressing the distribution shift arising from de novo drug design models. On average, using CoDrug can reduce the coverage gap by over 35% when compared to conformal prediction sets not adjusted for covariate shift.
Natural gradient methods have been used to optimise the parameters of probability distributions in a variety of settings, often resulting in fast-converging procedures. Unfortunately, for many distributions of interest, computing the natural gradient has a number of challenges. In this work we propose a novel technique for tackling such issues, which involves reframing the optimisation as one with respect to the parameters of a surrogate distribution, for which computing the natural gradient is easy. We give several examples of existing methods that can be interpreted as applying this technique, and propose a new method for applying it to a wide variety of problems. Our method expands the set of distributions that can be efficiently targeted with natural gradients. Furthermore, it is fast, easy to understand, simple to implement using standard autodiff software, and does not require lengthy model-specific derivations. We demonstrate our method on maximum likelihood estimation and variational inference tasks.
Multimodal sentiment analysis is a very actively growing field of research. A promising area of opportunity in this field is to improve the multimodal fusion mechanism. We present a novel feature fusion strategy that proceeds in a hierarchical fashion, first fusing the modalities two in two and only then fusing all three modalities. On multimodal sentiment analysis of individual utterances, our strategy outperforms conventional concatenation of features by 1%, which amounts to 5% reduction in error rate. On utterance-level multimodal sentiment analysis of multi-utterance video clips, for which current state-of-the-art techniques incorporate contextual information from other utterances of the same clip, our hierarchical fusion gives up to 2.4% (almost 10% error rate reduction) over currently used concatenation. The implementation of our method is publicly available in the form of open-source code.
We examine the problem of question answering over knowledge graphs, focusing on simple questions that can be answered by the lookup of a single fact. Adopting a straightforward decomposition of the problem into entity detection, entity linking, relation prediction, and evidence combination, we explore simple yet strong baselines. On the popular SimpleQuestions dataset, we find that basic LSTMs and GRUs plus a few heuristics yield accuracies that approach the state of the art, and techniques that do not use neural networks also perform reasonably well. These results show that gains from sophisticated deep learning techniques proposed in the literature are quite modest and that some previous models exhibit unnecessary complexity.