Mixed Reality (MR) is an evolving technology lying in the continuum spanned by related technologies such as Virtual Reality (VR) and Augmented Reality (AR), and creates an exciting way of interacting with people and the environment. This technology is fast becoming a tool used by many people, potentially improving living environments and work efficiency. Microsoft HoloLens has played an important role in the progress of MR, from the first generation to the second generation. In this paper, we systematically evaluate the functions of applicable functions in HoloLens 2. These evaluations can serve as a performance benchmark that can help people who need to use this instrument for research or applications in the future. The detailed tests and the performance evaluation of the different functionalities show the usability and possible limitations of each function. We mainly divide the experiment into the existing functions of the HoloLens 1, the new functions of the HoloLens 2, and the use of research mode. This research results will be useful for MR researchers who want to use HoloLens 2 as a research tool to design their own MR applications.
X-ray vision, a technique that allows users to see through walls and other obstacles, is a popular technique for Augmented Reality (AR) and Mixed Reality (MR). In this paper, we demonstrate a dynamic X-ray vision window that is rendered in real-time based on the user's current position and changes with movement in the physical environment. Moreover, the location and transparency of the window are also dynamically rendered based on the user's eye gaze. We build this X-ray vision window for a current state-of-the-art MR Head-Mounted Device (HMD) -- HoloLens 2 by integrating several different features: scene understanding, eye tracking, and clipping primitive.
A general framework with a series of different methods is proposed to improve the estimate of convex function (or functional) values when only noisy observations of the true input are available. Technically, our methods catch the bias introduced by the convexity and remove this bias from a baseline estimate. Theoretical analysis are conducted to show that the proposed methods can strictly reduce the expected estimate error under mild conditions. When applied, the methods require no specific knowledge about the problem except the convexity and the evaluation of the function. Therefore, they can serve as off-the-shelf tools to obtain good estimate for a wide range of problems, including optimization problems with random objective functions or constraints, and functionals of probability distributions such as the entropy and the Wasserstein distance. Numerical experiments on a wide variety of problems show that our methods can significantly improve the quality of the estimate compared with the baseline method.
For the last two years, from 2020 to 2021, COVID-19 has broken disease prevention measures in many countries, including Vietnam, and negatively impacted various aspects of human life and the social community. Besides, the misleading information in the community and fake news about the pandemic are also serious situations. Therefore, we present the first Vietnamese community-based question answering dataset for developing question answering systems for COVID-19 called UIT-ViCoV19QA. The dataset comprises 4,500 question-answer pairs collected from trusted medical sources, with at least one answer and at most four unique paraphrased answers per question. Along with the dataset, we set up various deep learning models as baseline to assess the quality of our dataset and initiate the benchmark results for further research through commonly used metrics such as BLEU, METEOR, and ROUGE-L. We also illustrate the positive effects of having multiple paraphrased answers experimented on these models, especially on Transformer - a dominant architecture in the field of study.
Text selection is an essential activity in interactive systems, including virtual reality (VR) head-mounted displays (HMDs). It is useful for: sharing information across apps or platforms, highlighting and making notes while reading articles, and text editing tasks. Despite its usefulness, the space of text selection interaction is underexplored in VR HMDs. In this research, we performed a user study with 24 participants to investigate the performance and user preference of six text selection techniques (Controller+Dwell, Controller+Click, Head+Dwell, Head+Click, Hand+Dwell, Hand+Pinch). Results reveal that Head+Click is ranked first since it has excellent speed-accuracy performance (2nd fastest task completion speed with 3rd lowest total error rate), provides the best user experience, and produces a very low workload -- followed by Controller+Click, which has the fastest speed and comparable experience with Head+Click, but much higher total error rate. Other methods can also be useful depending on the goals of the system or the users. As a first systematic evaluation of pointing*selection techniques for text selection in VR, the results of this work provide a strong foundation for further research in this area of growing importance to the future of VR to help it become a more ubiquitous and pervasive platform.
Automatic processing of language is becoming pervasive in our lives, often taking central roles in our decision making, like choosing the wording for our messages and mails, translating our readings, or even having full conversations with us. Word embeddings are a key component of modern natural language processing systems. They provide a representation of words that has boosted the performance of many applications, working as a semblance of meaning. Word embeddings seem to capture a semblance of the meaning of words from raw text, but, at the same time, they also distill stereotypes and societal biases which are subsequently relayed to the final applications. Such biases can be discriminatory. It is very important to detect and mitigate those biases, to prevent discriminatory behaviors of automated processes, which can be much more harmful than in the case of humans because their of their scale. There are currently many tools and techniques to detect and mitigate biases in word embeddings, but they present many barriers for the engagement of people without technical skills. As it happens, most of the experts in bias, either social scientists or people with deep knowledge of the context where bias is harmful, do not have such skills, and they cannot engage in the processes of bias detection because of the technical barriers. We have studied the barriers in existing tools and have explored their possibilities and limitations with different kinds of users. With this exploration, we propose to develop a tool that is specially aimed to lower the technical barriers and provide the exploration power to address the requirements of experts, scientists and people in general who are willing to audit these technologies.
Along with the massive growth of the Internet from the 1990s until now, various innovative technologies have been created to bring users breathtaking experiences with more virtual interactions in cyberspace. Many virtual environments with thousands of services and applications, from social networks to virtual gaming worlds, have been developed with immersive experience and digital transformation, but most are incoherent instead of being integrated into a platform. In this context, metaverse, a term formed by combining meta and universe, has been introduced as a shared virtual world that is fueled by many emerging technologies, such as fifth-generation networks and beyond, virtual reality, and artificial intelligence (AI). Among such technologies, AI has shown the great importance of processing big data to enhance immersive experience and enable human-like intelligence of virtual agents. In this survey, we make a beneficial effort to explore the role of AI in the foundation and development of the metaverse. We first deliver a preliminary of AI, including machine learning algorithms and deep learning architectures, and its role in the metaverse. We then convey a comprehensive investigation of AI-based methods concerning six technical aspects that have potentials for the metaverse: natural language processing, machine vision, blockchain, networking, digital twin, and neural interface, and being potential for the metaverse. Subsequently, several AI-aided applications, such as healthcare, manufacturing, smart cities, and gaming, are studied to be deployed in the virtual worlds. Finally, we conclude the key contribution of this survey and open some future research directions in AI for the metaverse.
Games and simulators can be a valuable platform to execute complex multi-agent, multiplayer, imperfect information scenarios with significant parallels to military applications: multiple participants manage resources and make decisions that command assets to secure specific areas of a map or neutralize opposing forces. These characteristics have attracted the artificial intelligence (AI) community by supporting development of algorithms with complex benchmarks and the capability to rapidly iterate over new ideas. The success of artificial intelligence algorithms in real-time strategy games such as StarCraft II have also attracted the attention of the military research community aiming to explore similar techniques in military counterpart scenarios. Aiming to bridge the connection between games and military applications, this work discusses past and current efforts on how games and simulators, together with the artificial intelligence algorithms, have been adapted to simulate certain aspects of military missions and how they might impact the future battlefield. This paper also investigates how advances in virtual reality and visual augmentation systems open new possibilities in human interfaces with gaming platforms and their military parallels.
Deep neural network architectures have traditionally been designed and explored with human expertise in a long-lasting trial-and-error process. This process requires huge amount of time, expertise, and resources. To address this tedious problem, we propose a novel algorithm to optimally find hyperparameters of a deep network architecture automatically. We specifically focus on designing neural architectures for medical image segmentation task. Our proposed method is based on a policy gradient reinforcement learning for which the reward function is assigned a segmentation evaluation utility (i.e., dice index). We show the efficacy of the proposed method with its low computational cost in comparison with the state-of-the-art medical image segmentation networks. We also present a new architecture design, a densely connected encoder-decoder CNN, as a strong baseline architecture to apply the proposed hyperparameter search algorithm. We apply the proposed algorithm to each layer of the baseline architectures. As an application, we train the proposed system on cine cardiac MR images from Automated Cardiac Diagnosis Challenge (ACDC) MICCAI 2017. Starting from a baseline segmentation architecture, the resulting network architecture obtains the state-of-the-art results in accuracy without performing any trial-and-error based architecture design approaches or close supervision of the hyperparameters changes.