In this work, we formulate a novel framework of adversarial robustness using the manifold hypothesis. Our framework provides sufficient conditions for defending against adversarial examples. We develop a test-time defense method with our formulation and variational inference. The developed approach combines manifold learning with the Bayesian framework to provide adversarial robustness without the need for adversarial training. We show that our proposed approach can provide adversarial robustness even if attackers are aware of existence of test-time defense. In additions, our approach can also serve as a test-time defense mechanism for variational autoencoders.
We study the problem of sharing as many branching conditions of a given forest classifier or regressor as possible while keeping classification performance. As a constraint for preventing from accuracy degradation, we first consider the one that the decision paths of all the given feature vectors must not change. For a branching condition that a value of a certain feature is at most a given threshold, the set of values satisfying such constraint can be represented as an interval. Thus, the problem is reduced to the problem of finding the minimum set intersecting all the constraint-satisfying intervals for each set of branching conditions on the same feature. We propose an algorithm for the original problem using an algorithm solving this problem efficiently. The constraint is relaxed later to promote further sharing of branching conditions by allowing decision path change of a certain ratio of the given feature vectors or allowing a certain number of non-intersected constraint-satisfying intervals. We also extended our algorithm for both the relaxations. The effectiveness of our method is demonstrated through comprehensive experiments using 21 datasets (13 classification and 8 regression datasets in UCI machine learning repository) and 4 classifiers/regressors (random forest, extremely randomized trees, AdaBoost and gradient boosting).
Deep Generative Models (DGMs) are a popular class of deep learning models which find widespread use because of their ability to synthesize data from complex, high-dimensional manifolds. However, even with their increasing industrial adoption, they haven't been subject to rigorous security and privacy analysis. In this work we examine one such aspect, namely backdoor attacks on DGMs which can significantly limit the applicability of pre-trained models within a model supply chain and at the very least cause massive reputation damage for companies outsourcing DGMs form third parties. While similar attacks scenarios have been studied in the context of classical prediction models, their manifestation in DGMs hasn't received the same attention. To this end we propose novel training-time attacks which result in corrupted DGMs that synthesize regular data under normal operations and designated target outputs for inputs sampled from a trigger distribution. These attacks are based on an adversarial loss function that combines the dual objectives of attack stealth and fidelity. We systematically analyze these attacks, and show their effectiveness for a variety of approaches like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as well as different data domains including images and audio. Our experiments show that - even for large-scale industry-grade DGMs (like StyleGAN) - our attacks can be mounted with only modest computational effort. We also motivate suitable defenses based on static/dynamic model and output inspections, demonstrate their usefulness, and prescribe a practical and comprehensive defense strategy that paves the way for safe usage of DGMs.
Deep learning (DL) methods where interpretability is intrinsically considered as part of the model are required to better understand the relationship of clinical and imaging-based attributes with DL outcomes, thus facilitating their use in the reasoning behind medical decisions. Latent space representations built with variational autoencoders (VAE) do not ensure individual control of data attributes. Attribute-based methods enforcing attribute disentanglement have been proposed in the literature for classical computer vision tasks in benchmark data. In this paper, we propose a VAE approach, the Attri-VAE, that includes an attribute regularization term to associate clinical and medical imaging attributes with different regularized dimensions in the generated latent space, enabling a better-disentangled interpretation of the attributes. Furthermore, the generated attention maps explained the attribute encoding in the regularized latent space dimensions. Using the Attri-VAE approach we analyzed healthy and myocardial infarction patients with clinical, cardiac morphology, and radiomics attributes. The proposed model provided an excellent trade-off between reconstruction fidelity, disentanglement, and interpretability, outperforming state-of-the-art VAE approaches according to several quantitative metrics. The resulting latent space allowed the generation of realistic synthetic data in the trajectory between two distinct input samples or along a specific attribute dimension to better interpret changes between different cardiac conditions.
Unsupervised domain adaptation (UDA) aims to transfer knowledge from a well-labeled source domain to a different but related unlabeled target domain with identical label space. Currently, the main workhorse for solving UDA is domain alignment, which has proven successful. However, it is often difficult to find an appropriate source domain with identical label space. A more practical scenario is so-called partial domain adaptation (PDA) in which the source label set or space subsumes the target one. Unfortunately, in PDA, due to the existence of the irrelevant categories in the source domain, it is quite hard to obtain a perfect alignment, thus resulting in mode collapse and negative transfer. Although several efforts have been made by down-weighting the irrelevant source categories, the strategies used tend to be burdensome and risky since exactly which irrelevant categories are unknown. These challenges motivate us to find a relatively simpler alternative to solve PDA. To achieve this, we first provide a thorough theoretical analysis, which illustrates that the target risk is bounded by both model smoothness and between-domain discrepancy. Considering the difficulty of perfect alignment in solving PDA, we turn to focus on the model smoothness while discard the riskier domain alignment to enhance the adaptability of the model. Specifically, we instantiate the model smoothness as a quite simple intra-domain structure preserving (IDSP). To our best knowledge, this is the first naive attempt to address the PDA without domain alignment. Finally, our empirical results on multiple benchmark datasets demonstrate that IDSP is not only superior to the PDA SOTAs by a significant margin on some benchmarks (e.g., +10% on Cl->Rw and +8% on Ar->Rw ), but also complementary to domain alignment in the standard UDA
The problem of adversarial defenses for image classification, where the goal is to robustify a classifier against adversarial examples, is considered. Inspired by the hypothesis that these examples lie beyond the natural image manifold, a novel aDversarIal defenSe with local impliCit functiOns (DISCO) is proposed to remove adversarial perturbations by localized manifold projections. DISCO consumes an adversarial image and a query pixel location and outputs a clean RGB value at the location. It is implemented with an encoder and a local implicit module, where the former produces per-pixel deep features and the latter uses the features in the neighborhood of query pixel for predicting the clean RGB value. Extensive experiments demonstrate that both DISCO and its cascade version outperform prior defenses, regardless of whether the defense is known to the attacker. DISCO is also shown to be data and parameter efficient and to mount defenses that transfers across datasets, classifiers and attacks.
Offline reinforcement learning (RL) promises the ability to learn effective policies solely using existing, static datasets, without any costly online interaction. To do so, offline RL methods must handle distributional shift between the dataset and the learned policy. The most common approach is to learn conservative, or lower-bound, value functions, which underestimate the return of out-of-distribution (OOD) actions. However, such methods exhibit one notable drawback: policies optimized on such value functions can only behave according to a fixed, possibly suboptimal, degree of conservatism. However, this can be alleviated if we instead are able to learn policies for varying degrees of conservatism at training time and devise a method to dynamically choose one of them during evaluation. To do so, in this work, we propose learning value functions that additionally condition on the degree of conservatism, which we dub confidence-conditioned value functions. We derive a new form of a Bellman backup that simultaneously learns Q-values for any degree of confidence with high probability. By conditioning on confidence, our value functions enable adaptive strategies during online evaluation by controlling for confidence level using the history of observations thus far. This approach can be implemented in practice by conditioning the Q-function from existing conservative algorithms on the confidence. We theoretically show that our learned value functions produce conservative estimates of the true value at any desired confidence. Finally, we empirically show that our algorithm outperforms existing conservative offline RL algorithms on multiple discrete control domains.
Invariant approaches have been remarkably successful in tackling the problem of domain generalization, where the objective is to perform inference on data distributions different from those used in training. In our work, we investigate whether it is possible to leverage domain information from the unseen test samples themselves. We propose a domain-adaptive approach consisting of two steps: a) we first learn a discriminative domain embedding from unsupervised training examples, and b) use this domain embedding as supplementary information to build a domain-adaptive model, that takes both the input as well as its domain into account while making predictions. For unseen domains, our method simply uses few unlabelled test examples to construct the domain embedding. This enables adaptive classification on any unseen domain. Our approach achieves state-of-the-art performance on various domain generalization benchmarks. In addition, we introduce the first real-world, large-scale domain generalization benchmark, Geo-YFCC, containing 1.1M samples over 40 training, 7 validation, and 15 test domains, orders of magnitude larger than prior work. We show that the existing approaches either do not scale to this dataset or underperform compared to the simple baseline of training a model on the union of data from all training domains. In contrast, our approach achieves a significant improvement.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Deep Learning algorithms have achieved the state-of-the-art performance for Image Classification and have been used even in security-critical applications, such as biometric recognition systems and self-driving cars. However, recent works have shown those algorithms, which can even surpass the human capabilities, are vulnerable to adversarial examples. In Computer Vision, adversarial examples are images containing subtle perturbations generated by malicious optimization algorithms in order to fool classifiers. As an attempt to mitigate these vulnerabilities, numerous countermeasures have been constantly proposed in literature. Nevertheless, devising an efficient defense mechanism has proven to be a difficult task, since many approaches have already shown to be ineffective to adaptive attackers. Thus, this self-containing paper aims to provide all readerships with a review of the latest research progress on Adversarial Machine Learning in Image Classification, however with a defender's perspective. Here, novel taxonomies for categorizing adversarial attacks and defenses are introduced and discussions about the existence of adversarial examples are provided. Further, in contrast to exisiting surveys, it is also given relevant guidance that should be taken into consideration by researchers when devising and evaluating defenses. Finally, based on the reviewed literature, it is discussed some promising paths for future research.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.