In this study, we introduce a measure for machine perception, inspired by the concept of Just Noticeable Difference (JND) of human perception. Based on this measure, we suggest an adversarial image generation algorithm, which iteratively distorts an image by an additive noise until the model detects the change in the image by outputting a false label. The noise added to the original image is defined as the gradient of the cost function of the model. A novel cost function is defined to explicitly minimize the amount of perturbation applied to the input image while enforcing the perceptual similarity between the adversarial and input images. For this purpose, the cost function is regularized by the well-known total variation and bounded range terms to meet the natural appearance of the adversarial image. We evaluate the adversarial images generated by our algorithm both qualitatively and quantitatively on CIFAR10, ImageNet, and MS COCO datasets. Our experiments on image classification and object detection tasks show that adversarial images generated by our JND method are both more successful in deceiving the recognition/detection models and less perturbed compared to the images generated by the state-of-the-art methods, namely, FGV, FSGM, and DeepFool methods.
Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to \emph{control} their content becomes an increasingly important problem. We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear minimax game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation, R-LACE, that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method -- despite being linear -- is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
This paper studies the problem of generalized zero-shot learning which requires the model to train on image-label pairs from some seen classes and test on the task of classifying new images from both seen and unseen classes. Most previous models try to learn a fixed one-directional mapping between visual and semantic space, while some recently proposed generative methods try to generate image features for unseen classes so that the zero-shot learning problem becomes a traditional fully-supervised classification problem. In this paper, we propose a novel model that provides a unified framework for three different approaches: visual-> semantic mapping, semantic->visual mapping, and metric learning. Specifically, our proposed model consists of a feature generator that can generate various visual features given class embeddings as input, a regressor that maps each visual feature back to its corresponding class embedding, and a discriminator that learns to evaluate the closeness of an image feature and a class embedding. All three components are trained under the combination of cyclic consistency loss and dual adversarial loss. Experimental results show that our model not only preserves higher accuracy in classifying images from seen classes, but also performs better than existing state-of-the-art models in in classifying images from unseen classes.
In this article, we introduce a new mode for training Generative Adversarial Networks (GANs). Rather than minimizing the distance of evidence distribution $\tilde{p}(x)$ and the generative distribution $q(x)$, we minimize the distance of $\tilde{p}(x_r)q(x_f)$ and $\tilde{p}(x_f)q(x_r)$. This adversarial pattern can be interpreted as a Turing test in GANs. It allows us to use information of real samples during training generator and accelerates the whole training procedure. We even find that just proportionally increasing the size of discriminator and generator, it succeeds on 256x256 resolution without adjusting hyperparameters carefully.
In this paper, we propose a novel conditional generative adversarial nets based image captioning framework as an extension of traditional reinforcement learning (RL) based encoder-decoder architecture. To deal with the inconsistent evaluation problem between objective language metrics and subjective human judgements, we are inspired to design some "discriminator" networks to automatically and progressively determine whether generated caption is human described or machine generated. Two kinds of discriminator architecture (CNN and RNN based structures) are introduced since each has its own advantages. The proposed algorithm is generic so that it can enhance any existing encoder-decoder based image captioning model and we show that conventional RL training method is just a special case of our framework. Empirically, we show consistent improvements over all language evaluation metrics for different stage-of-the-art image captioning models.
We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machine-generated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often lack important characteristics in human languages - distinctiveness for each caption and diversity for different images. To address this problem, we propose a novel conditional generative adversarial network for generating diverse captions across images. Instead of estimating the quality of a caption solely on one image, the proposed comparative adversarial learning framework better assesses the quality of captions by comparing a set of captions within the image-caption joint space. By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions. We show that our proposed network is capable of producing accurate and diverse captions across images.
In this paper, we propose an improved quantitative evaluation framework for Generative Adversarial Networks (GANs) on generating domain-specific images, where we improve conventional evaluation methods on two levels: the feature representation and the evaluation metric. Unlike most existing evaluation frameworks which transfer the representation of ImageNet inception model to map images onto the feature space, our framework uses a specialized encoder to acquire fine-grained domain-specific representation. Moreover, for datasets with multiple classes, we propose Class-Aware Frechet Distance (CAFD), which employs a Gaussian mixture model on the feature space to better fit the multi-manifold feature distribution. Experiments and analysis on both the feature level and the image level were conducted to demonstrate improvements of our proposed framework over the recently proposed state-of-the-art FID method. To our best knowledge, we are the first to provide counter examples where FID gives inconsistent results with human judgments. It is shown in the experiments that our framework is able to overcome the shortness of FID and improves robustness. Code will be made available.
We introduce an effective model to overcome the problem of mode collapse when training Generative Adversarial Networks (GAN). Firstly, we propose a new generator objective that finds it better to tackle mode collapse. And, we apply an independent Autoencoders (AE) to constrain the generator and consider its reconstructed samples as "real" samples to slow down the convergence of discriminator that enables to reduce the gradient vanishing problem and stabilize the model. Secondly, from mappings between latent and data spaces provided by AE, we further regularize AE by the relative distance between the latent and data samples to explicitly prevent the generator falling into mode collapse setting. This idea comes when we find a new way to visualize the mode collapse on MNIST dataset. To the best of our knowledge, our method is the first to propose and apply successfully the relative distance of latent and data samples for stabilizing GAN. Thirdly, our proposed model, namely Generative Adversarial Autoencoder Networks (GAAN), is stable and has suffered from neither gradient vanishing nor mode collapse issues, as empirically demonstrated on synthetic, MNIST, MNIST-1K, CelebA and CIFAR-10 datasets. Experimental results show that our method can approximate well multi-modal distribution and achieve better results than state-of-the-art methods on these benchmark datasets. Our model implementation is published here: //github.com/tntrung/gaan
Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.
Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.
Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.