Arbitrary neural style transfer is a vital topic with great research value and wide industrial application, which strives to render the structure of one image using the style of another. Recent researches have devoted great efforts on the task of arbitrary style transfer (AST) for improving the stylization quality. However, there are very few explorations about the quality evaluation of AST images, even it can potentially guide the design of different algorithms. In this paper, we first construct a new AST images quality assessment database (AST-IQAD), which consists 150 content-style image pairs and the corresponding 1200 stylized images produced by eight typical AST algorithms. Then, a subjective study is conducted on our AST-IQAD database, which obtains the subjective rating scores of all stylized images on the three subjective evaluations, i.e., content preservation (CP), style resemblance (SR), and overall vision (OV). To quantitatively measure the quality of AST image, we propose a new sparse representation-based method, which computes the quality according to the sparse feature similarity. Experimental results on our AST-IQAD have demonstrated the superiority of the proposed method. The dataset and source code will be released at //github.com/Hangwei-Chen/AST-IQAD-SRQE
Considering a conversation thread, stance classification aims to identify the opinion (e.g. agree or disagree) of replies towards a given target. The target of the stance is expected to be an essential component in this task, being one of the main factors that make it different from sentiment analysis. However, a recent study shows that a target-oblivious model outperforms target-aware models, suggesting that targets are not useful when predicting stance. This paper re-examines this phenomenon for rumour stance classification (RSC) on social media, where a target is a rumour story implied by the source tweet in the conversation. We propose adversarial attacks in the test data, aiming to assess the models robustness and evaluate the role of the data in the models performance. Results show that state-of-the-art models, including approaches that use the entire conversation thread, overly relying on superficial signals. Our hypothesis is that the naturally high occurrence of target-independent direct replies in RSC (e.g. "this is fake" or just "fake") results in the impressive performance of target-oblivious models, highlighting the risk of target instances being treated as noise during training.
Deep hashing has been extensively applied to massive image retrieval due to its efficiency and effectiveness. Recently, several adversarial attacks have been presented to reveal the vulnerability of deep hashing models against adversarial examples. However, existing attack methods suffer from degraded performance or inefficiency because they underutilize the semantic relations between original samples or spend a lot of time learning these relations with a deep neural network. In this paper, we propose a novel Pharos-guided Attack, dubbed PgA, to evaluate the adversarial robustness of deep hashing networks reliably and efficiently. Specifically, we design pharos code to represent the semantics of the benign image, which preserves the similarity to semantically relevant samples and dissimilarity to irrelevant ones. It is proven that we can quickly calculate the pharos code via a simple math formula. Accordingly, PgA can directly conduct a reliable and efficient attack on deep hashing-based retrieval by maximizing the similarity between the hash code of the adversarial example and the pharos code. Extensive experiments on the benchmark datasets verify that the proposed algorithm outperforms the prior state-of-the-arts in both attack strength and speed.
At the core of bodily self-consciousness is the perception of the ownership of one's body. Recent efforts to gain a deeper understanding of the mechanisms behind the brain's encoding of the self-body have led to various attempts to develop a unified theoretical framework to explain related behavioral and neurophysiological phenomena. A central question to be explained is how body illusions such as the rubber hand illusion actually occur. Despite the conceptual descriptions of the mechanisms of bodily self-consciousness and the possible relevant brain areas, the existing theoretical models still lack an explanation of the computational mechanisms by which the brain encodes the perception of one's body and how our subjectively perceived body illusions can be generated by neural networks. Here we integrate the biological findings of bodily self-consciousness to propose a Brain-inspired bodily self-perception model, by which perceptions of bodily self can be autonomously constructed without any supervision signals. We successfully validated our computational model with six rubber hand illusion experiments on platforms including a iCub humanoid robot and simulated environments. The experimental results show that our model can not only well replicate the behavioral and neural data of monkeys in biological experiments, but also reasonably explain the causes and results of the rubber hand illusion from the neuronal level due to advantages in biological interpretability, thus contributing to the revealing of the computational and neural mechanisms underlying the occurrence of the rubber hand illusion.
Optional type annotations allow for enriching dynamic programming languages with static typing features like better Integrated Development Environment (IDE) support, more precise program analysis, and early detection and prevention of type-related runtime errors. Machine learning-based type inference promises interesting results for automating this task. However, the practical usage of such systems depends on their ability to generalize across different domains, as they are often applied outside their training domain. In this work, we investigate Type4Py as a representative of state-of-the-art deep learning-based type inference systems, by conducting extensive cross-domain experiments. Thereby, we address the following problems: class imbalances, out-of-vocabulary words, dataset shifts, and unknown classes. To perform such experiments, we use the datasets ManyTypes4Py and CrossDomainTypes4Py. The latter we introduce in this paper. Our dataset enables the evaluation of type inference systems in different domains of software projects and has over 1,000,000 type annotations mined on the platforms GitHub and Libraries. It consists of data from the two domains web development and scientific calculation. Through our experiments, we detect that the shifts in the dataset and the long-tailed distribution with many rare and unknown data types decrease the performance of the deep learning-based type inference system drastically. In this context, we test unsupervised domain adaptation methods and fine-tuning to overcome these issues. Moreover, we investigate the impact of out-of-vocabulary words.
More and more consumer devices implement the IEEE Ultra-Wide Band (UWB) standard to perform distance measurements for sensitive tasks such as keyless entry and startup of modern cars, to find lost items using coin-sized trackers, and for smart payments. While UWB promises the ability to perform time-of-flight centimeter-accurate distance measurements between two devices, the accuracy and reliability of the implementation in up-to-date consumer devices have not been evaluated so far. In this paper, we present the first evaluation of UWB smartphones from Apple, Google, and Samsung, focusing on accuracy and reliability in passive keyless entry and smart home automation scenarios. To perform the measurements for our analysis, we build a custom-designed testbed based on a Gimbal-based platform for Wireless Evaluation (GWEn), which allows us to create reproducible measurements. All our results, including all measurement data and a manual to reconstruct a GWEn are published online. We find that the evaluated devices can measure the distance with an error of less than 20cm, but fail in producing reliable measurements in all scenarios. Finally, we give recommendations on how to handle measurement results when implementing a passive keyless entry system.
The artistic style within a painting is the means of expression, which includes not only the painting material, colors, and brushstrokes, but also the high-level attributes including semantic elements, object shapes, etc. Previous arbitrary example-guided artistic image generation methods often fail to control shape changes or convey elements. The pre-trained text-to-image synthesis diffusion probabilistic models have achieved remarkable quality, but it often requires extensive textual descriptions to accurately portray attributes of a particular painting. We believe that the uniqueness of an artwork lies precisely in the fact that it cannot be adequately explained with normal language. Our key idea is to learn artistic style directly from a single painting and then guide the synthesis without providing complex textual descriptions. Specifically, we assume style as a learnable textual description of a painting. We propose an inversion-based style transfer method (InST), which can efficiently and accurately learn the key information of an image, thus capturing and transferring the artistic style of a painting. We demonstrate the quality and efficiency of our method on numerous paintings of various artists and styles. Code and models are available at //github.com/zyxElsa/InST.
Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a reward function from expert demonstrations. Many IRL algorithms require a known transition model and sometimes even a known expert policy, or they at least require access to a generative model. However, these assumptions are too strong for many real-world applications, where the environment can be accessed only through sequential interaction. We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL), which actively explores an unknown environment and expert policy to quickly learn the expert's reward function and identify a good policy. AceIRL uses previous observations to construct confidence intervals that capture plausible reward functions and find exploration policies that focus on the most informative regions of the environment. AceIRL is the first approach to active IRL with sample-complexity bounds that does not require a generative model of the environment. AceIRL matches the sample complexity of active IRL with a generative model in the worst case. Additionally, we establish a problem-dependent bound that relates the sample complexity of AceIRL to the suboptimality gap of a given IRL problem. We empirically evaluate AceIRL in simulations and find that it significantly outperforms more naive exploration strategies.
A robot operating in unstructured environments must be able to discriminate between different grasping styles depending on the prospective manipulation task. Having a system that allows learning from remote non-expert demonstrations can very feasibly extend the cognitive skills of a robot for task-oriented grasping. We propose a novel two-step framework towards this aim. The first step involves grasp area estimation by segmentation. We receive grasp area demonstrations for a new task via interactive segmentation, and learn from these few demonstrations to estimate the required grasp area on an unseen scene for the given task. The second step is autonomous grasp estimation in the segmented region. To train the segmentation network for few-shot learning, we built a grasp area segmentation (GAS) dataset with 10089 images grouped into 1121 segmentation tasks. We benefit from an efficient meta learning algorithm for training for few-shot adaptation. Experimental evaluation showed that our method successfully detects the correct grasp area on the respective objects in unseen test scenes and effectively allows remote teaching of new grasp strategies by non-experts.
In a future with autonomous robots, visual and spatial perception is of utmost importance for robotic systems. Particularly for aerial robotics, there are many applications where utilizing visual perception is necessary for any real-world scenarios. Robotic aerial grasping using drones promises fast pick-and-place solutions with a large increase in mobility over other robotic solutions. Utilizing Mask R-CNN scene segmentation (detectron2), we propose a vision-based system for autonomous rapid aerial grasping which does not rely on markers for object localization and does not require the appearence of the object to be previously known. Combining segmented images with spatial information from a depth camera, we generate a dense point cloud of the detected objects and perform geometry-based grasp planning to determine grasping points on the objects. In real-world experiments on a dynamically grasping aerial platform, we show that our system can replicate the performance of a motion capture system for object localization up to 94.5% of the baseline grasping success rate. With our results, we show the first use of geometry-based grasping techniques with a flying platform and aim to increase the autonomy of existing aerial manipulation platforms, bringing them further towards real-world applications in warehouses and similar environments.
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.