We can create Virtual Reality (VR) interactions that have no equivalent in the real world by remapping spacetime or altering users' body representation, such as stretching the user's virtual arm for manipulation of distant objects or scaling up the user's avatar to enable rapid locomotion. Prior research has leveraged such approaches, what we call beyond-real techniques, to make interactions in VR more practical, efficient, ergonomic, and accessible. We present a survey categorizing prior movement-based VR interaction literature as reality-based, illusory, or beyond-real interactions. We survey relevant conferences (CHI, IEEE VR, VRST, UIST, and DIS) while focusing on selection, manipulation, locomotion, and navigation in VR. For beyond-real interactions, we describe the transformations that have been used by prior works to create novel remappings. We discuss open research questions through the lens of the human sensorimotor control system and highlight challenges that need to be addressed for effective utilization of beyond-real interactions in future VR applications, including plausibility, control, long-term adaptation, and individual differences.
Visual grounding is a task to locate the target indicated by a natural language expression. Existing methods extend the generic object detection framework to this problem. They base the visual grounding on the features from pre-generated proposals or anchors, and fuse these features with the text embeddings to locate the target mentioned by the text. However, modeling the visual features from these predefined locations may fail to fully exploit the visual context and attribute information in the text query, which limits their performance. In this paper, we propose a transformer-based framework for accurate visual grounding by establishing text-conditioned discriminative features and performing multi-stage cross-modal reasoning. Specifically, we develop a visual-linguistic verification module to focus the visual features on regions relevant to the textual descriptions while suppressing the unrelated areas. A language-guided feature encoder is also devised to aggregate the visual contexts of the target object to improve the object's distinctiveness. To retrieve the target from the encoded visual features, we further propose a multi-stage cross-modal decoder to iteratively speculate on the correlations between the image and text for accurate target localization. Extensive experiments on five widely used datasets validate the efficacy of our proposed components and demonstrate state-of-the-art performance. Our code is public at //github.com/yangli18/VLTVG.
We present an historical overview about the connections between the analysis of risk and the control of autonomous systems. We offer two main contributions. Our first contribution is to propose three overlapping paradigms to classify the vast body of literature: the worst-case, risk-neutral, and risk-averse paradigms. We consider an appropriate assessment for the risk of an autonomous system to depend on the application at hand. In contrast, it is typical to assess risk using an expectation, variance, or probability alone. Our second contribution is to unify the concepts of risk and autonomous systems. We achieve this by connecting approaches for quantifying and optimizing the risk that arises from a system's behaviour across academic fields. The survey is highly multidisciplinary. We include research from the communities of reinforcement learning, stochastic and robust control theory, operations research, and formal verification. We describe both model-based and model-free methods, with emphasis on the former. Lastly, we highlight fruitful areas for further research. A key direction is to blend risk-averse model-based and model-free methods to enhance the real-time adaptive capabilities of systems to improve human and environmental welfare.
Humans are naturally endowed with the ability to write in a particular style. They can, for instance, re-phrase a formal letter in an informal way, convey a literal message with the use of figures of speech or edit a novel mimicking the style of some well-known authors. Automating this form of creativity constitutes the goal of style transfer. As a natural language generation task, style transfer aims at rewriting existing texts, and specifically, it creates paraphrases that exhibit some desired stylistic attributes. From a practical perspective, it envisions beneficial applications, like chat-bots that modulate their communicative style to appear empathetic, or systems that automatically simplify technical articles for a non-expert audience. Several style-aware paraphrasing methods have attempted to tackle style transfer. A handful of surveys give a methodological overview of the field, but they do not support researchers to focus on specific styles. With this paper, we aim at providing a comprehensive discussion of the styles that have received attention in the transfer task. We organize them in a hierarchy, highlighting the challenges for the definition of each of them, and pointing out gaps in the current research landscape. The hierarchy comprises two main groups. One encompasses styles that people modulate arbitrarily, along the lines of registers and genres. The other group corresponds to unintentionally expressed styles, due to an author's personal characteristics. Hence, our review shows how these groups relate to one another, and where specific styles, including some that have not yet been explored, belong in the hierarchy. Moreover, we summarize the methods employed for different stylistic families, hinting researchers towards those that would be the most fitting for future research.
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. We consider the setting in which all workers sample from the same dataset, and communicate over a sparse graph (decentralized). In this setting, current theory fails to capture important aspects of real-world behavior. First, the 'spectral gap' of the communication graph is not predictive of its empirical performance in (deep) learning. Second, current theory does not explain that collaboration enables larger learning rates than training alone. In fact, it prescribes smaller learning rates, which further decrease as graphs become larger, failing to explain convergence in infinite graphs. This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution. We quantify how the graph topology influences convergence in a quadratic toy problem and provide theoretical results for general smooth and (strongly) convex objectives. Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
With the advent of 5G commercialization, the need for more reliable, faster, and intelligent telecommunication systems are envisaged for the next generation beyond 5G (B5G) radio access technologies. Artificial Intelligence (AI) and Machine Learning (ML) are not just immensely popular in the service layer applications but also have been proposed as essential enablers in many aspects of B5G networks, from IoT devices and edge computing to cloud-based infrastructures. However, most of the existing surveys in B5G security focus on the performance of AI/ML models and their accuracy, but they often overlook the accountability and trustworthiness of the models' decisions. Explainable AI (XAI) methods are promising techniques that would allow system developers to identify the internal workings of AI/ML black-box models. The goal of using XAI in the security domain of B5G is to allow the decision-making processes of the security of systems to be transparent and comprehensible to stakeholders making the systems accountable for automated actions. In every facet of the forthcoming B5G era, including B5G technologies such as RAN, zero-touch network management, E2E slicing, this survey emphasizes the role of XAI in them and the use cases that the general users would ultimately enjoy. Furthermore, we presented the lessons learned from recent efforts and future research directions on top of the currently conducted projects involving XAI.
The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user's information need. Recently, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Since there have been a large number of works dedicating to the application of PTMs in IR, we believe it is the right time to summarize the current status, learn from existing methods, and gain some insights for future development. In this survey, we present an overview of PTMs applied in different components of IR system, including the retrieval component, the re-ranking component, and other components. In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Moreover, we discuss some open challenges and envision some promising directions, with the hope of inspiring more works on these topics for future research.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
Deployment of Internet of Things (IoT) devices and Data Fusion techniques have gained popularity in public and government domains. This usually requires capturing and consolidating data from multiple sources. As datasets do not necessarily originate from identical sensors, fused data typically results in a complex data problem. Because military is investigating how heterogeneous IoT devices can aid processes and tasks, we investigate a multi-sensor approach. Moreover, we propose a signal to image encoding approach to transform information (signal) to integrate (fuse) data from IoT wearable devices to an image which is invertible and easier to visualize supporting decision making. Furthermore, we investigate the challenge of enabling an intelligent identification and detection operation and demonstrate the feasibility of the proposed Deep Learning and Anomaly Detection models that can support future application that utilizes hand gesture data from wearable devices.
Deep Learning algorithms have achieved the state-of-the-art performance for Image Classification and have been used even in security-critical applications, such as biometric recognition systems and self-driving cars. However, recent works have shown those algorithms, which can even surpass the human capabilities, are vulnerable to adversarial examples. In Computer Vision, adversarial examples are images containing subtle perturbations generated by malicious optimization algorithms in order to fool classifiers. As an attempt to mitigate these vulnerabilities, numerous countermeasures have been constantly proposed in literature. Nevertheless, devising an efficient defense mechanism has proven to be a difficult task, since many approaches have already shown to be ineffective to adaptive attackers. Thus, this self-containing paper aims to provide all readerships with a review of the latest research progress on Adversarial Machine Learning in Image Classification, however with a defender's perspective. Here, novel taxonomies for categorizing adversarial attacks and defenses are introduced and discussions about the existence of adversarial examples are provided. Further, in contrast to exisiting surveys, it is also given relevant guidance that should be taken into consideration by researchers when devising and evaluating defenses. Finally, based on the reviewed literature, it is discussed some promising paths for future research.