Problem-driven visualization work is rooted in deeply understanding the data, actors, processes, and workflows of a target domain. However, an individual's personality traits and cognitive abilities may also influence visualization use. Diverse user needs and abilities raise natural questions for specificity in visualization design: Could individuals from different domains exhibit performance differences when using visualizations? Are any systematic variations related to their cognitive abilities? This study bridges domain-specific perspectives on visualization design with those provided by cognition and perception. We measure variations in visualization task performance across chemistry, computer science, and education, and relate these differences to variations in spatial ability. We conducted an online study with over 60 domain experts consisting of tasks related to pie charts, isocontour plots, and 3D scatterplots, and grounded by a well-documented spatial ability test. Task performance (correctness) varied with profession across more complex visualizations, but not pie charts, a comparatively common visualization. We found that correctness correlates with spatial ability, and the professions differ in terms of spatial ability. These results indicate that domains differ not only in the specifics of their data and tasks, but also in terms of how effectively their constituent members engage with visualizations and their cognitive traits. Analyzing participants' confidence and strategy comments suggests that focusing on performance neglects important nuances, such as differing approaches to engage with even common visualizations and potential skill transference. Our findings offer a fresh perspective on discipline-specific visualization with recommendations to help guide visualization design that celebrates the uniqueness of the disciplines and individuals we seek to serve.
Discovery of new knowledge is increasingly data-driven, predicated on a team's ability to collaboratively create, find, analyze, retrieve, and share pertinent datasets over the duration of an investigation. This is especially true in the domain of scientific discovery where generation, analysis, and interpretation of data are the fundamental mechanisms by which research teams collaborate to achieve their shared scientific goal. Data-driven discovery in general, and scientific discovery in particular, is distinguished by complex and diverse data models and formats that evolve over the lifetime of an investigation. While databases and related information systems have the potential to be valuable tools in the discovery process, developing effective interfaces for data-driven discovery remains a roadblock to the application of database technology as an essential tool in scientific investigations. In this paper, we present a model-adaptive approach to creating interaction environments for data-driven discovery of scientific data that automatically generates interactive user interfaces for editing, searching, and viewing scientific data based entirely on introspection of an extended relational data model. We have applied model-adaptive interface generation to many active scientific investigations spanning domains of proteomics, bioinformatics, neuroscience, occupational therapy, stem cells, genitourinary, craniofacial development, and others. We present the approach, its implementation, and its evaluation through analysis of its usage in diverse scientific settings.
Geographically dispersed teams often face challenges in coordination and collaboration, lowering their productivity. Understanding the relationship between team dispersion and productivity is critical for supporting such teams. Extensive prior research has studied these relations in lab settings or using qualitative measures. This paper extends prior work by contributing an empirical case study in a real-world organization, using quantitative measures. We studied 117 new research project teams within a company for 6 months. During this time, all teams shared one goal: submitting a research paper to a conference. We analyzed these teams' dispersion-related characteristics as well as team productivity. Interestingly, we found little statistical evidence that geographic and time differences relate to team productivity. However, organizational and functional distances are highly predictive of the productivity of the dispersed teams we studied. We discuss the open research questions these findings revealed and their implications for future research.
When we consider our CV, it is full of entities that we are or were associated with and that define us in some way(s). Such entities include where we studied, where we worked, who we collaborated with on a project or on a paper etc. Entities we are linked to are part of who we are and may reveal about what we are interested in. Hence, we can view any CV as a graph of interlinked entities, where nodes are entities and edges are relations between them. This study proposes a novel entity search framework that in response to a real-time query about an entity, searches, crawls, analyzes and consolidates relevant information that is freely available on the Web about the entity of interest, culminating in the generation a profile of the searched entity. Unlike typical entity search settings, in which a ranked list of entities related to the target entity over a pre-specified relation is processed, we present and visualize rich information about the entity of interest as a typed entity-relation graph without an apriori definition of the types of related entities and relations. This view is structured and compact, making it easy to understand as well as interpret. It enables the user to learn not only about the entity in question, but also about related entities, thereby obtaining a better understanding of the entity in question. We evaluated each of the frameworks components separately and then performed an overall evaluation of the framework, its visualization and the interest of users in the results. The results show that the proposed framework performs entity searches, related entity identification and relation identification very well and that it satisfies users needs.
The collective attention on online items such as web pages, search terms, and videos reflects trends that are of social, cultural, and economic interest. Moreover, attention trends of different items exhibit mutual influence via mechanisms such as hyperlinks or recommendations. Many visualisation tools exist for time series, network evolution, or network influence; however, few systems connect all three. In this work, we present AttentionFlow, a new system to visualise networks of time series and the dynamic influence they have on one another. Centred around an ego node, our system simultaneously presents the time series on each node using two visual encodings: a tree ring for an overview and a line chart for details. AttentionFlow supports interactions such as overlaying time series of influence and filtering neighbours by time or flux. We demonstrate AttentionFlow using two real-world datasets, VevoMusic and WikiTraffic. We show that attention spikes in songs can be explained by external events such as major awards, or changes in the network such as the release of a new song. Separate case studies also demonstrate how an artist's influence changes over their career, and that correlated Wikipedia traffic is driven by cultural interests. More broadly, AttentionFlow can be generalised to visualise networks of time series on physical infrastructures such as road networks, or natural phenomena such as weather and geological measurements.
Collecting supporting evidence from large corpora of text (e.g., Wikipedia) is of great challenge for open-domain Question Answering (QA). Especially, for multi-hop open-domain QA, scattered evidence pieces are required to be gathered together to support the answer extraction. In this paper, we propose a new retrieval target, hop, to collect the hidden reasoning evidence from Wikipedia for complex question answering. Specifically, the hop in this paper is defined as the combination of a hyperlink and the corresponding outbound link document. The hyperlink is encoded as the mention embedding which models the structured knowledge of how the outbound link entity is mentioned in the textual context, and the corresponding outbound link document is encoded as the document embedding representing the unstructured knowledge within it. Accordingly, we build HopRetriever which retrieves hops over Wikipedia to answer complex questions. Experiments on the HotpotQA dataset demonstrate that HopRetriever outperforms previously published evidence retrieval methods by large margins. Moreover, our approach also yields quantifiable interpretations of the evidence collection process.
Named entity recognition (NER) and entity linking (EL) are two fundamentally related tasks, since in order to perform EL, first the mentions to entities have to be detected. However, most entity linking approaches disregard the mention detection part, assuming that the correct mentions have been previously detected. In this paper, we perform joint learning of NER and EL to leverage their relatedness and obtain a more robust and generalisable system. For that, we introduce a model inspired by the Stack-LSTM approach (Dyer et al., 2015). We observe that, in fact, doing multi-task learning of NER and EL improves the performance in both tasks when comparing with models trained with individual objectives. Furthermore, we achieve results competitive with the state-of-the-art in both NER and EL.
This paper explores the problem of matching entities across different knowledge graphs. Given a query entity in one knowledge graph, we wish to find the corresponding real-world entity in another knowledge graph. We formalize this problem and present two large-scale datasets for this task based on exiting cross-ontology links between DBpedia and Wikidata, focused on several hundred thousand ambiguous entities. Using a classification-based approach, we find that a simple multi-layered perceptron based on representations derived from RDF2Vec graph embeddings of entities in each knowledge graph is sufficient to achieve high accuracy, with only small amounts of training data. The contributions of our work are datasets for examining this problem and strong baselines on which future work can be based.
We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machine-generated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often lack important characteristics in human languages - distinctiveness for each caption and diversity for different images. To address this problem, we propose a novel conditional generative adversarial network for generating diverse captions across images. Instead of estimating the quality of a caption solely on one image, the proposed comparative adversarial learning framework better assesses the quality of captions by comparing a set of captions within the image-caption joint space. By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions. We show that our proposed network is capable of producing accurate and diverse captions across images.
Generative Adversarial Networks (GAN) have shown great promise in tasks like synthetic image generation, image inpainting, style transfer, and anomaly detection. However, generating discrete data is a challenge. This work presents an adversarial training based correlated discrete data (CDD) generation model. It also details an approach for conditional CDD generation. The results of our approach are presented over two datasets; job-seeking candidates skill set (private dataset) and MNIST (public dataset). From quantitative and qualitative analysis of these results, we show that our model performs better as it leverages inherent correlation in the data, than an existing model that overlooks correlation.
Can an algorithm create original and compelling fashion designs to serve as an inspirational assistant? To help answer this question, we design and investigate different image generation models associated with different loss functions to boost creativity in fashion generation. The dimensions of our explorations include: (i) different Generative Adversarial Networks architectures that start from noise vectors to generate fashion items, (ii) a new loss function that encourages creativity, and (iii) a generation process following the key elements of fashion design (disentangling shape and texture makers). A key challenge of this study is the evaluation of generated designs and the retrieval of best ones, hence we put together an evaluation protocol associating automatic metrics and human experimental studies that we hope will help ease future research. We show that our proposed creativity loss yields better overall appreciation than the one employed in Creative Adversarial Networks. In the end, about 61% of our images are thought to be created by human designers rather than by a computer while also being considered original per our human subject experiments, and our proposed loss scores the highest compared to existing losses in both novelty and likability.