Ordinal response model is a popular and commonly used regression for ordered categorical data in a wide range of fields such as medicine and social sciences. However, it is empirically known that the existence of ``outliers'', combinations of the ordered categorical response and covariates that are heterogeneous compared to other pairs, makes the inference with the ordinal response model unreliable. In this article, we prove that the posterior distribution in the ordinal response model does not satisfy the posterior robustness with any link functions, i.e., the posterior cannot ignore the influence of large outliers. Furthermore, to achieve robust Bayesian inference in the ordinal response model, this article defines general posteriors in the ordinal response model with two robust divergences (the density-power and $\gamma$-divergences) based on the framework of the general posterior inference. We also provide an algorithm for generating posterior samples from the proposed posteriors. The robustness of the proposed methods against outliers is clarified from the posterior robustness and the index of robustness based on the Fisher-Rao metric. Through numerical experiments on artificial data and two real datasets, we show that the proposed methods perform better than the ordinary bayesian methods with and without outliers in the data for various link functions.
Explainable artificial intelligence (XAI) provides explanations for not interpretable machine learning (ML) models. While many technical approaches exist, there is a lack of validation of these techniques on real-world datasets. In this work, we present a use-case of XAI: an ML model which is trained to estimate electrification rates based on mobile phone data in Senegal. The data originate from the Data for Development challenge by Orange in 2014/15. We apply two model-agnostic, local explanation techniques and find that while the model can be verified, it is biased with respect to the population density. We conclude our paper by pointing to the two main challenges we encountered during our work: data processing and model design that might be restricted by currently available XAI methods, and the importance of domain knowledge to interpret explanations.
Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, accurate for treatment effect estimation, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves these goals by (i) learning a distance metric via outcome modeling, (ii) creating matched groups using the distance metric, and (iii) using the matched groups to estimate treatment effects. Model-to-Match uses variable importance measurements to construct a distance metric, making it a flexible framework that can be adapted to various applications. Concentrating on the scalability of the problem in the number of potential confounders, we operationalize the Model-to-Match framework with LASSO. We derive performance guarantees for settings where LASSO outcome modeling consistently identifies all confounders (importantly without requiring the linear model to be correctly specified). We also provide experimental results demonstrating the method's auditability, accuracy, and scalability as well as extensions to more general nonparametric outcome modeling.
This paper introduces a novel framework for assessing risk and decision-making in the presence of uncertainty, the \emph{$\varphi$-Divergence Quadrangle}. This approach expands upon the traditional Risk Quadrangle, a model that quantifies uncertainty through four key components: \emph{risk, deviation, regret}, and \emph{error}. The $\varphi$-Divergence Quadrangle incorporates the $\varphi$-divergence as a measure of the difference between probability distributions, thereby providing a more nuanced understanding of risk. Importantly, the $\varphi$-Divergence Quadrangle is closely connected with the distributionally robust optimization based on the $\varphi$-divergence approach through the duality theory of convex functionals. To illustrate its practicality and versatility, several examples of the $\varphi$-Divergence Quadrangle are provided, including the Quantile Quadrangle. The final portion of the paper outlines a case study implementing regression with the Entropic Value-at-Risk Quadrangle. The proposed $\varphi$-Divergence Quadrangle presents a refined methodology for understanding and managing risk, contributing to the ongoing development of risk assessment and management strategies.
Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. Existing threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, if attackers can poison more samples than expected with affordable overhead, as in many practical scenarios, they may be able to render existing defenses ineffective in a short time. To address this issue, we leverage timestamps denoting the birth dates of data, which are often available but neglected in the past. Benefiting from these timestamps, we propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted. Using these metrics, we define the notions of temporal robustness against data poisoning, providing a meaningful sense of protection even with unbounded amounts of poisoned samples. We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal threat model for data poisoning.
Standardness is a popular assumption in the literature on set estimation. It also appears in statistical approaches to topological data analysis, where it is common to assume that the data were sampled from a probability measure that satisfies the standard assumption. Relevant results in this field, such as rates of convergence and confidence sets, depend on the standardness parameter, which in practice may be unknown. In this paper, we review the notion of standardness and its connection to other geometrical restrictions. We prove the almost sure consistency of a plug-in type estimator for the so-called standardness constant, already studied in the literature. We propose a method to correct the bias of the plug-in estimator and corroborate our theoretical findings through a small simulation study. We also show that it is not possible to determine, based on a finite sample, whether a probability measure satisfies the standard assumption.
Multidimensional scaling is widely used to reconstruct a map with the points' coordinates in a low-dimensional space from the original high-dimensional space while preserving the pairwise distances. In a Bayesian framework, the current approach using Markov chain Monte Carlo algorithms has limitations in terms of model generalization and performance comparison. To address these limitations, a general framework that incorporates non-Gaussian errors and robustness to fit different types of dissimilarities is developed. Then, an adaptive inference method using annealed Sequential Monte Carlo algorithm for Bayesian multidimensional scaling is proposed. This algorithm performs inference sequentially in time and provides an approximate posterior distribution over the points' coordinates in a low-dimensional space and an unbiased estimator for the marginal likelihood. In this study, we compare the performance of different models based on marginal likelihoods, which are produced as a byproduct of the adaptive annealed Sequential Monte Carlo algorithm. Using synthetic and real data, we demonstrate the effectiveness of the proposed algorithm. Our results show that the proposed algorithm outperforms other benchmark algorithms under the same computational budget based on common metrics used in the literature. The implementation of our proposed method and applications are available at //github.com/nunujiarui/GBMDS.
We propose and analyze an approximate message passing (AMP) algorithm for the matrix tensor product model, which is a generalization of the standard spiked matrix models that allows for multiple types of pairwise observations over a collection of latent variables. A key innovation for this algorithm is a method for optimally weighing and combining multiple estimates in each iteration. Building upon an AMP convergence theorem for non-separable functions, we prove a state evolution for non-separable functions that provides an asymptotically exact description of its performance in the high-dimensional limit. We leverage this state evolution result to provide necessary and sufficient conditions for recovery of the signal of interest. Such conditions depend on the singular values of a linear operator derived from an appropriate generalization of a signal-to-noise ratio for our model. Our results recover as special cases a number of recently proposed methods for contextual models (e.g., covariate assisted clustering) as well as inhomogeneous noise models.
We consider the problem of sequentially optimizing a time-varying objective function using time-varying Bayesian optimization (TVBO). Here, the key challenge is the exploration-exploitation trade-off under time variations. Current approaches to TVBO require prior knowledge of a constant rate of change. However, in practice, the rate of change is usually unknown. We propose an event-triggered algorithm, ET-GP-UCB, that treats the optimization problem as static until it detects changes in the objective function online and then resets the dataset. This allows the algorithm to adapt to realized temporal changes without the need for prior knowledge. The event-trigger is based on probabilistic uniform error bounds used in Gaussian process regression. We provide regret bounds for ET-GP-UCB and show in numerical experiments that it outperforms state-of-the-art algorithms on synthetic and real-world data. Furthermore, these results demonstrate that ET-GP-UCB is readily applicable to various settings without tuning hyperparameters.
In recent years, Graph Neural Networks have reported outstanding performance in tasks like community detection, molecule classification and link prediction. However, the black-box nature of these models prevents their application in domains like health and finance, where understanding the models' decisions is essential. Counterfactual Explanations (CE) provide these understandings through examples. Moreover, the literature on CE is flourishing with novel explanation methods which are tailored to graph learning. In this survey, we analyse the existing Graph Counterfactual Explanation methods, by providing the reader with an organisation of the literature according to a uniform formal notation for definitions, datasets, and metrics, thus, simplifying potential comparisons w.r.t to the method advantages and disadvantages. We discussed seven methods and sixteen synthetic and real datasets providing details on the possible generation strategies. We highlight the most common evaluation strategies and formalise nine of the metrics used in the literature. We first introduce the evaluation framework GRETEL and how it is possible to extend and use it while providing a further dimension of comparison encompassing reproducibility aspects. Finally, we provide a discussion on how counterfactual explanation interplays with privacy and fairness, before delving into open challenges and future works.
Unsupervised domain adaptation has recently emerged as an effective paradigm for generalizing deep neural networks to new target domains. However, there is still enormous potential to be tapped to reach the fully supervised performance. In this paper, we present a novel active learning strategy to assist knowledge transfer in the target domain, dubbed active domain adaptation. We start from an observation that energy-based models exhibit free energy biases when training (source) and test (target) data come from different distributions. Inspired by this inherent mechanism, we empirically reveal that a simple yet efficient energy-based sampling strategy sheds light on selecting the most valuable target samples than existing approaches requiring particular architectures or computation of the distances. Our algorithm, Energy-based Active Domain Adaptation (EADA), queries groups of targe data that incorporate both domain characteristic and instance uncertainty into every selection round. Meanwhile, by aligning the free energy of target data compact around the source domain via a regularization term, domain gap can be implicitly diminished. Through extensive experiments, we show that EADA surpasses state-of-the-art methods on well-known challenging benchmarks with substantial improvements, making it a useful option in the open world. Code is available at //github.com/BIT-DA/EADA.