Process models depict crucial artifacts for organizations regarding documentation, communication, and collaboration. The proper comprehension of such models is essential for an effective application. An important aspect in process model literacy constitutes the question how the information presented in process models is extracted and processed by the human visual system? For such visuospatial tasks, the visual system deploys a set of elemental operations, from whose compositions different visual routines are produced. This paper provides insights from an exploratory eye tracking study, in which visual routines during process model comprehension were contemplated. More specifically, n = 29 participants were asked to comprehend n = 18 process models expressed in the Business Process Model and Notation 2.0 reflecting diverse mappings (i.e., straight, upward, downward) and complexity levels. The performance measures indicated that even less complex process models pose a challenge regarding their comprehension. The upward mapping confronted participants' attention with more challenges, whereas the downward mapping was comprehended more effectively. Based on recorded eye movements, three gaze patterns applied during model comprehension were derived. Thereupon, we defined a general model which identifies visual routines and corresponding elemental operations during process model comprehension. Finally, implications for practice as well as research and directions for future work are discussed in this paper.
The rise of deep learning algorithms has led many researchers to withdraw from using classic signal processing methods for sound generation. Deep learning models have achieved expressive voice synthesis, realistic sound textures, and musical notes from virtual instruments. However, the most suitable deep learning architecture is still under investigation. The choice of architecture is tightly coupled to the audio representations. A sound's original waveform can be too dense and rich for deep learning models to deal with efficiently - and complexity increases training time and computational cost. Also, it does not represent sound in the manner in which it is perceived. Therefore, in many cases, the raw audio has been transformed into a compressed and more meaningful form using upsampling, feature-extraction, or even by adopting a higher level illustration of the waveform. Furthermore, conditional on the form chosen, additional conditioning representations, different model architectures, and numerous metrics for evaluating the reconstructed sound have been investigated. This paper provides an overview of audio representations applied to sound synthesis using deep learning. Additionally, it presents the most significant methods for developing and evaluating a sound synthesis architecture using deep learning models, always depending on the audio representation.
Context. Computer workers in general, and software developers specifically, are under a high amount of stress due to continuous deadlines and, often, over-commitment. Objective. This study investigates the effects of a neuroplasticity practice, a specific breathing practice, on the attention awareness, well-being, perceived productivity, and self-efficacy of computer workers. Method. We created a questionnaire mainly from existing, validated scales as entry and exit survey for data points for comparison before and after the intervention. The intervention was a 12-week program with a weekly live session that included a talk on a well-being topic and a facilitated group breathing session. During the intervention period, we solicited one daily journal note and one weekly well-being rating. We replicated the intervention in a similarly structured 8-week program. The data was analyzed using a Bayesian multi-level model for the quantitative part and thematic analysis for the qualitative part. Results. The intervention showed improvements in participants' experienced inner states despite an ongoing pandemic and intense outer circumstances for most. Over the course of the study, we found an improvement in the participants' ratings of how often they found themselves in good spirits as well as in a calm and relaxed state. We also aggregate a large number of deep inner reflections and growth processes that may not have surfaced for the participants without deliberate engagement in such a program. Conclusion. The data indicates usefulness and effectiveness of an intervention for computer workers in terms of increasing well-being and resilience. Everyone needs a way to deliberately relax, unplug, and recover. Breathing practice is a simple way to do so, and the results call for establishing a larger body of work to make this common practice.
Recent AI ethics has focused on applying abstract principles downward to practice. This paper moves in the other direction. Ethical insights are generated from the lived experiences of AI-designers working on tangible human problems, and then cycled upward to influence theoretical debates surrounding these questions: 1) Should AI as trustworthy be sought through explainability, or accurate performance? 2) Should AI be considered trustworthy at all, or is reliability a preferable aim? 3) Should AI ethics be oriented toward establishing protections for users, or toward catalyzing innovation? Specific answers are less significant than the larger demonstration that AI ethics is currently unbalanced toward theoretical principles, and will benefit from increased exposure to grounded practices and dilemmas.
Due to the success of deep learning (DL) and its growing job market, students and researchers from many areas are interested in learning about DL technologies. Visualization has proven to be of great help during this learning process. While most current educational visualizations are targeted towards one specific architecture or use case, recurrent neural networks (RNNs), which are capable of processing sequential data, are not covered yet. This is despite the fact that tasks on sequential data, such as text and function analysis, are at the forefront of DL research. Therefore, we propose exploRNN, the first interactively explorable educational visualization for RNNs. On the basis of making learning easier and more fun, we define educational objectives targeted towards understanding RNNs. We use these objectives to form guidelines for the visual design process. By means of exploRNN, which is accessible online, we provide an overview of the training process of RNNs at a coarse level, while also allowing a detailed inspection of the data flow within LSTM cells. In an empirical study, we assessed 37 subjects in a between-subjects design to investigate the learning outcomes and cognitive load of exploRNN compared to a classic text-based learning environment. While learners in the text group are ahead in superficial knowledge acquisition, exploRNN is particularly helpful for deeper understanding of the learning content. In addition, the complex content in exploRNN is perceived as significantly easier and causes less extraneous load than in the text group. The study shows that for difficult learning material such as recurrent networks, where deep understanding is important, interactive visualizations such as exploRNN can be helpful.
In this paper we examine the concept of complexity as it applies to generative and evolutionary art and design. Complexity has many different, discipline specific definitions, such as complexity in physical systems (entropy), algorithmic measures of information complexity and the field of "complex systems". We apply a series of different complexity measures to three different evolutionary art datasets and look at the correlations between complexity and individual aesthetic judgement by the artist (in the case of two datasets) or the physically measured complexity of generative 3D forms. Our results show that the degree of correlation is different for each set and measure, indicating that there is no overall "better" measure. However, specific measures do perform well on individual datasets, indicating that careful choice can increase the value of using such measures. We then assess the value of complexity measures for the audience by undertaking a large-scale survey on the perception of complexity and aesthetics. We conclude by discussing the value of direct measures in generative and evolutionary art, reinforcing recent findings from neuroimaging and psychology which suggest human aesthetic judgement is informed by many extrinsic factors beyond the measurable properties of the object being judged.
Training machines to understand natural language and interact with humans is an elusive and essential task of artificial intelligence. A diversity of dialogue systems has been designed with the rapid development of deep learning techniques, especially the recent pre-trained language models (PrLMs). Among these studies, the fundamental yet challenging type of task is dialogue comprehension whose role is to teach the machines to read and comprehend the dialogue context before responding. In this paper, we review the previous methods from the technical perspective of dialogue modeling for the dialogue comprehension task. We summarize the characteristics and challenges of dialogue comprehension in contrast to plain-text reading comprehension. Then, we discuss three typical patterns of dialogue modeling. In addition, we categorize dialogue-related pre-training techniques which are employed to enhance PrLMs in dialogue scenarios. Finally, we highlight the technical advances in recent years and point out the lessons from the empirical analysis and the prospects towards a new frontier of researches.
In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing, and accumulating new knowledge coming at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularization, knowledge distillation, memory, generative replay, parameter isolation, and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.
Multi-stage ranking pipelines have been a practical solution in modern search systems, where the first-stage retrieval is to return a subset of candidate documents, and latter stages attempt to re-rank those candidates. Unlike re-ranking stages going through quick technique shifts during past decades, the first-stage retrieval has long been dominated by classical term-based models. Unfortunately, these models suffer from the vocabulary mismatch problem, which may block re-ranking stages from relevant documents at the very beginning. Therefore, it has been a long-term desire to build semantic models for the first-stage retrieval that can achieve high recall efficiently. Recently, we have witnessed an explosive growth of research interests on the first-stage semantic retrieval models. We believe it is the right time to survey current status, learn from existing methods, and gain some insights for future development. In this paper, we describe the current landscape of the first-stage retrieval models under a unified framework to clarify the connection between classical term-based retrieval methods, early semantic retrieval methods and neural semantic retrieval methods. Moreover, we identify some open challenges and envision some future directions, with the hope of inspiring more researches on these important yet less investigated topics.
In humans, Attention is a core property of all perceptual and cognitive operations. Given our limited ability to process competing sources, attention mechanisms select, modulate, and focus on the information most relevant to behavior. For decades, concepts and functions of attention have been studied in philosophy, psychology, neuroscience, and computing. For the last six years, this property has been widely explored in deep neural networks. Currently, the state-of-the-art in Deep Learning is represented by neural attention models in several application domains. This survey provides a comprehensive overview and analysis of developments in neural attention models. We systematically reviewed hundreds of architectures in the area, identifying and discussing those in which attention has shown a significant impact. We also developed and made public an automated methodology to facilitate the development of reviews in the area. By critically analyzing 650 works, we describe the primary uses of attention in convolutional, recurrent networks and generative models, identifying common subgroups of uses and applications. Furthermore, we describe the impact of attention in different application domains and their impact on neural networks' interpretability. Finally, we list possible trends and opportunities for further research, hoping that this review will provide a succinct overview of the main attentional models in the area and guide researchers in developing future approaches that will drive further improvements.
Recently, Visual Question Answering (VQA) has emerged as one of the most significant tasks in multimodal learning as it requires understanding both visual and textual modalities. Existing methods mainly rely on extracting image and question features to learn their joint feature embedding via multimodal fusion or attention mechanism. Some recent studies utilize external VQA-independent models to detect candidate entities or attributes in images, which serve as semantic knowledge complementary to the VQA task. However, these candidate entities or attributes might be unrelated to the VQA task and have limited semantic capacities. To better utilize semantic knowledge in images, we propose a novel framework to learn visual relation facts for VQA. Specifically, we build up a Relation-VQA (R-VQA) dataset based on the Visual Genome dataset via a semantic similarity module, in which each data consists of an image, a corresponding question, a correct answer and a supporting relation fact. A well-defined relation detector is then adopted to predict visual question-related relation facts. We further propose a multi-step attention model composed of visual attention and semantic attention sequentially to extract related visual knowledge and semantic knowledge. We conduct comprehensive experiments on the two benchmark datasets, demonstrating that our model achieves state-of-the-art performance and verifying the benefit of considering visual relation facts.