国产乱理伦片A级在线看_亚洲天堂AV一区二区在线观看_欧美日韩在线观看精品一区_成年毛片在线全部免费播放_羞羞视频免费入口_无线电影站国产日本欧美第一页_中文字幕一区二区在线观看不卡

Attention mechanisms are dominating the explainability of deep models. They produce probability distributions over the input, which are widely deemed as feature-importance indicators. However, in this paper, we find one critical limitation in attention explanations: weakness in identifying the polarity of feature impact. This would be somehow misleading -- features with higher attention weights may not faithfully contribute to model predictions; instead, they can impose suppression effects. With this finding, we reflect on the explainability of current attention-based techniques, such as Attentio$\odot$Gradient and LRP-based attention explanations. We first propose an actionable diagnostic methodology (henceforth faithfulness violation test) to measure the consistency between explanation weights and the impact polarity. Through the extensive experiments, we then show that most tested explanation methods are unexpectedly hindered by the faithfulness violation issue, especially the raw attention. Empirical analyses on the factors affecting violation issues further provide useful observations for adopting explanation methods in attention models.

相關內容

注意力機(ji)制

關注 120

Attention機(ji)制最早是(shi)在(zai)視(shi)覺圖像領域(yu)提(ti)(ti)出來的(de)，但是(shi)真(zhen)正火起來應該算是(shi)google mind團隊的(de)這(zhe)篇論文《Recurrent Models of Visual Attention》[14]，他(ta)們在(zai)RNN模型上使用(yong)了(le)(le)attention機(ji)制來進行(xing)圖像分類。隨(sui)后，Bahdanau等人在(zai)論文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中(zhong)，使用(yong)類似attention的(de)機(ji)制在(zai)機(ji)器翻譯任務上將翻譯和對齊(qi)同時進行(xing)，他(ta)們的(de)工作算是(shi)是(shi)第一個提(ti)(ti)出attention機(ji)制應用(yong)到NLP領域(yu)中(zhong)。接著類似的(de)基于(yu)attention機(ji)制的(de)RNN模型擴展開始應用(yong)到各(ge)種NLP任務中(zhong)。最近，如何在(zai)CNN中(zhong)使用(yong)attention機(ji)制也成為了(le)(le)大家的(de)研究熱點(dian)。下圖表示(shi)了(le)(le)attention研究進展的(de)大概趨(qu)勢(shi)。

機器學習建模 · MoDELS · XAI · Machine Learning · Extensibility ·

2022 年 4 月 19 日

GAM(e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints

Patrick Zschech,Sven Weinzierl,Nico Hambauer,Sandra Zilker,Mathias Kraus

from arxiv, Preprint accepted for archival and presentation at the 30th European Conference on Information Systems (ECIS 2022)

The number of information systems (IS) studies dealing with explainable artificial intelligence (XAI) is currently exploding as the field demands more transparency about the internal decision logic of machine learning (ML) models. However, most techniques subsumed under XAI provide post-hoc-analytical explanations, which have to be considered with caution as they only use approximations of the underlying ML model. Therefore, our paper investigates a series of intrinsically interpretable ML models and discusses their suitability for the IS community. More specifically, our focus is on advanced extensions of generalized additive models (GAM) in which predictors are modeled independently in a non-linear way to generate shape functions that can capture arbitrary patterns but remain fully interpretable. In our study, we evaluate the prediction qualities of five GAMs as compared to six traditional ML models and assess their visual outputs for model interpretability. On this basis, we investigate their merits and limitations and derive design implications for further improvements.

Automator · 有偏 · Continuity · AI · XAI ·

2022 年 4 月 19 日

On the Influence of Explainable AI on Automation Bias

Max Schemmer,Niklas Kühl,Carina Benz,Gerhard Satzger

from arxiv, Thirtieth European Conference on Information Systems (ECIS 2022)

Artificial intelligence (AI) is gaining momentum, and its importance for the future of work in many areas, such as medicine and banking, is continuously rising. However, insights on the effective collaboration of humans and AI are still rare. Typically, AI supports humans in decision-making by addressing human limitations. However, it may also evoke human bias, especially in the form of automation bias as an over-reliance on AI advice. We aim to shed light on the potential to influence automation bias by explainable AI (XAI). In this pre-test, we derive a research model and describe our study design. Subsequentially, we conduct an online experiment with regard to hotel review classifications and discuss first results. We expect our research to contribute to the design and development of safe hybrid intelligence systems.

泛化理論 · Extensibility · 概念偏移 · 協變量偏移 · Better ·

2022 年 4 月 17 日

NICO++: Towards Better Benchmarking for Domain Generalization

Xingxuan Zhang,Linjun Zhou,Renzhe Xu,Peng Cui,Zheyan Shen,Haoxin Liu

Despite the remarkable performance that modern deep neural networks have achieved on independent and identically distributed (I.I.D.) data, they can crash under distribution shifts. Most current evaluation methods for domain generalization (DG) adopt the leave-one-out strategy as a compromise on the limited number of domains. We propose a large-scale benchmark with extensive labeled domains named NICO++{\ddag} along with more rational evaluation methods for comprehensively evaluating DG algorithms. To evaluate DG datasets, we propose two metrics to quantify covariate shift and concept shift, respectively. Two novel generalization bounds from the perspective of data construction are proposed to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization. Through extensive experiments, NICO++ shows its superior evaluation capability compared with current DG datasets and its contribution in alleviating unfairness caused by the leak of oracle knowledge in model selection.

泛化理論 · MoDELS · 可理解性 · Processing（編程語言） · 縮放 ·

2022 年 4 月 16 日

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

Yizhong Wang,Swaroop Mishra,Pegah Alipoormolabashi,Yeganeh Kordi,Amirreza Mirzaei,Anjana Arunkumar,Arjun Ashok,Arut Selvan Dhanasekaran,Atharva Naik,David Stap,Eshaan Pathak,Giannis Karamanolakis,Haizhi Gary Lai,Ishan Purohit,Ishani Mondal,Jacob Anderson,Kirby Kuznia,Krima Doshi,Maitreya Patel,Kuntal Kumar Pal,Mehrad Moradshahi,Mihir Parmar,Mirali Purohit,Neeraj Varshney,Phani Rohitha Kaza,Pulkit Verma,Ravsehaj Singh Puri,Rushang Karia,Shailaja Keyur Sampat,Savan Doshi,Siddhartha Mishra,Sujan Reddy,Sumanta Patro,Tanay Dixit,Xudong Shen,Chitta Baral,Yejin Choi,Hannaneh Hajishirzi,Noah A. Smith,Daniel Khashabi

from arxiv, 16 pages, 9 figures

How can we measure the generalization of models to a variety of unseen tasks when provided with their language instructions? To facilitate progress in this goal, we introduce Natural-Instructions v2, a collection of 1,600+ diverse language tasks and their expert written instructions. More importantly, the benchmark covers 70+ distinct task types, such as tagging, in-filling, and rewriting. This benchmark is collected with contributions of NLP practitioners in the community and through an iterative peer review process to ensure their quality. This benchmark enables large-scale evaluation of cross-task generalization of the models -- training on a subset of tasks and evaluating on the remaining unseen ones. For instance, we are able to rigorously quantify generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances, and model sizes. As a by-product of these experiments. we introduce Tk-Instruct, an encoder-decoder Transformer that is trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples) which outperforms existing larger models on our benchmark. We hope this benchmark facilitates future progress toward more general-purpose language understanding models.

Machine Learning · 學成 · ML · 機器學習模型 · domain shift ·

2022 年 4 月 15 日

Rethinking Machine Learning Model Evaluation in Pathology

Syed Ashar Javed,Dinkar Juyal,Zahil Shanis,Shreya Chakraborty,Harsha Pokkalla,Aaditya Prakash

from arxiv, ICLR 2022 ML Evaluation Workshop

Machine Learning has been applied to pathology images in research and clinical practice with promising outcomes. However, standard ML models often lack the rigorous evaluation required for clinical decisions. Machine learning techniques for natural images are ill-equipped to deal with pathology images that are significantly large and noisy, require expensive labeling, are hard to interpret, and are susceptible to spurious correlations. We propose a set of practical guidelines for ML evaluation in pathology that address the above concerns. The paper includes measures for setting up the evaluation framework, effectively dealing with variability in labels, and a recommended suite of tests to address issues related to domain shift, robustness, and confounding variables. We hope that the proposed framework will bridge the gap between ML researchers and domain experts, leading to wider adoption of ML techniques in pathology and improving patient outcomes.

估計/估計量 · 估計誤差 · MoDELS · 學成 · 無偏 ·

2020 年 12 月 17 日

The Causal Learning of Retail Delinquency

Yiyan Huang,Cheuk Hang Leung,Xing Yan,Qi Wu,Nanbo Peng,Dongdong Wang,Zhixiang Huang

from arxiv, This paper was accepted and will be published in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

磁流變材料 · XAI · Notability · Continuity · AIM ·

2020 年 9 月 1 日

Machine Reasoning Explainability

Kristijonas Cyras,Ramamurthy Badrinath,Swarup Kumar Mohalik,Anusha Mujumdar,Alexandros Nikou,Alessandro Previti,Vaishnavi Sundararajan,Aneta Vulgarakis Feljan

from arxiv, 37 pages of content plus 20 pages of references

As a field of AI, Machine Reasoning (MR) uses largely symbolic means to formalize and emulate abstract reasoning. Studies in early MR have notably started inquiries into Explainable AI (XAI) -- arguably one of the biggest concerns today for the AI community. Work on explainable MR as well as on MR approaches to explainability in other areas of AI has continued ever since. It is especially potent in modern MR branches, such as argumentation, constraint and logic programming, planning. We hereby aim to provide a selective overview of MR explainability techniques and studies in hopes that insights from this long track of research will complement well the current XAI landscape. This document reports our work in-progress on MR explainability.

模型評估 · MoDELS · 學成 · AIM · 特化 ·

2019 年 1 月 14 日

Interpretable machine learning: definitions, methods, and applications

W. James Murdoch,Chandan Singh,Karl Kumbier,Reza Abbasi-Asl,Bin Yu

from arxiv, 11 pages

Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.

學成 · RNN · 門控 · INFORMS · Performer ·

2018 年 10 月 25 日

Learning with Interpretable Structure from RNN

Bo-Jian Hou,Zhi-Hua Zhou

In structure learning, the output is generally a structure that is used as supervision information to achieve good performance. Considering the interpretation of deep learning models has raised extended attention these years, it will be beneficial if we can learn an interpretable structure from deep learning models. In this paper, we focus on Recurrent Neural Networks (RNNs) whose inner mechanism is still not clearly understood. We find that Finite State Automaton (FSA) that processes sequential data has more interpretable inner mechanism and can be learned from RNNs as the interpretable structure. We propose two methods to learn FSA from RNN based on two different clustering methods. We first give the graphical illustration of FSA for human beings to follow, which shows the interpretability. From the FSA's point of view, we then analyze how the performance of RNNs are affected by the number of gates, as well as the semantic meaning behind the transition of numerical hidden states. Our results suggest that RNNs with simple gated structure such as Minimal Gated Unit (MGU) is more desirable and the transitions in FSA leading to specific classification result are associated with corresponding words which are understandable by human beings.

視覺問答 · 數據集 · Performer · state-of-the-art · MoDELS ·

2018 年 3 月 20 日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Qing Li,Qingyi Tao,Shafiq Joty,Jianfei Cai,Jiebo Luo

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.