Flawed TLS certificates are not uncommon on the Internet. While they signal a potential issue, in most cases they have benign causes (e.g., misconfiguration or even deliberate deployment). This adds fuzziness to the decision on whether to trust a connection or not. Little is known about perceptions of flawed certificates by IT professionals, even though their decisions impact high numbers of end users. Moreover, it is unclear how much the content of error messages and documentation influences these perceptions. To shed light on these issues, we observed 75 attendees of an industrial IT conference investigating different certificate validation errors. We also analyzed the influence of reworded error messages and redesigned documentation. We find that people working in IT have very nuanced opinions, with trust decisions being far from binary. The self-signed and the name-constrained certificates seem to be over-trusted (the latter also being poorly understood). We show that even small changes in existing error messages can positively influence resource use, comprehension, and trust assessment. At the end of the article, we summarize lessons learned from conducting usable security studies with IT professionals.
The staple of human intelligence is the capability of acquiring knowledge in a continuous fashion. In stark contrast, Deep Networks forget catastrophically and, for this reason, the sub-field of Class-Incremental Continual Learning fosters methods that learn a sequence of tasks incrementally, blending sequentially-gained knowledge into a comprehensive prediction. This work aims at assessing and overcoming the pitfalls of our previous proposal Dark Experience Replay (DER), a simple and effective approach that combines rehearsal and Knowledge Distillation. Inspired by the way our minds constantly rewrite past recollections and set expectations for the future, we endow our model with the abilities to i) revise its replay memory to welcome novel information regarding past data ii) pave the way for learning yet unseen classes. We show that the application of these strategies leads to remarkable improvements; indeed, the resulting method - termed eXtended-DER (X-DER) - outperforms the state of the art on both standard benchmarks (such as CIFAR-100 and miniImagenet) and a novel one here introduced. To gain a better understanding, we further provide extensive ablation studies that corroborate and extend the findings of our previous research (e.g. the value of Knowledge Distillation and flatter minima in continual learning setups).
A majority of researchers who develop design guidelines have WEIRD, adult perspectives. This means we may not have technology developed appropriately for people from non-WEIRD countries and children. We present five design recommendations to empower designers to consider diverse users' desires and perceptions of agents. For one, designers should consider the degree of task-orientation of agents appropriate to end-users' cultural perspectives. For another, designers should consider how competence, predictability, and integrity in agent-persona affects end-users' trust of agents. We developed recommendations following our study, which analyzed children and parents from WEIRD and non-WEIRD countries' perspectives on agents as they create them. We found different subsets of participants' perceptions differed. For instance, non-WEIRD and child perspectives emphasized agent artificiality, whereas WEIRD and parent perspectives emphasized human-likeness. Children also consistently felt agents were warmer and more human-like than parents did. Finally, participants generally trusted technology, including agents, more than people.
Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the local and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performance, e.g. the rate of decay of the generalisation error with the number of training samples. In this paper, we study deep CNNs in the kernel regime. First, we show that the spectrum of the corresponding kernel inherits the hierarchical structure of the network, and we characterise its asymptotics. Then, we use this result together with generalisation bounds to prove that deep CNNs adapt to the spatial scale of the target function. In particular, we find that if the target function depends on low-dimensional subsets of adjacent input variables, then the rate of decay of the error is controlled by the effective dimensionality of these subsets. Conversely, if the teacher function depends on the full set of input variables, then the error rate is inversely proportional to the input dimension. We conclude by computing the rate when a deep CNN is trained on the output of another deep CNN with randomly-initialised parameters. Interestingly, we find that despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.
With more and better clinical data being captured outside of clinical studies and greater data sharing of clinical studies, external controls may become a more attractive alternative to randomized clinical trials. Both industry and regulators recognize that in situations where a randomized study cannot be performed, external controls can provide the needed contextualization to allow a better interpretation of studies without a randomized control. It is also agreed that external controls will not fully replace randomized clinical trials as the gold standard for formal proof of efficacy in drug development and the yardstick of clinical research. However, it remains unclear in which situations conclusions about efficacy and a positive benefit/risk can reliably be based on the use of an external control. This paper will provide an overview on types of external control, their applications and the different sources of bias their use may incur, and discuss potential mitigation steps. It will also give recommendations on how the use of external controls can be justified.
Factorizable joint shift (FJS) was recently proposed as a type of dataset shift for which the complete characteristics can be estimated from feature data observations on the test dataset by a method called Joint Importance Aligning. For the multinomial (multiclass) classification setting, we derive a representation of factorizable joint shift in terms of the source (training) distribution, the target (test) prior class probabilities and the target marginal distribution of the features. On the basis of this result, we propose alternatives to joint importance aligning and, at the same time, point out that factorizable joint shift is not fully identifiable if no class label information on the test dataset is available and no additional assumptions are made. Other results of the paper include correction formulae for the posterior class probabilities both under general dataset shift and factorizable joint shift. In addition, we investigate the consequences of assuming factorizable joint shift for the bias caused by sample selection.
We argue that intelligence, construed as the disposition to perform tasks successfully, is a property of systems composed of agents and their contexts. This is the thesis of extended intelligence. We argue that the performance of an agent will generally not be preserved if its context is allowed to vary. Hence, this disposition is not possessed by an agent alone, but is rather possessed by the system consisting of an agent and its context, which we dub an agent-in-context. An agent's context may include an environment, other agents, cultural artifacts (like language, technology), or all of these, as is typically the case for humans and artificial intelligence systems, as well as many non-human animals. In virtue of the thesis of extended intelligence, we contend that intelligence is context-bound, task-particular and incommensurable among agents. Our thesis carries strong implications for how intelligence is analyzed in the context of both psychology and artificial intelligence.
We present and discuss the results of a two-year qualitative analysis of images published in IEEE Visualization (VIS) papers. Specifically, we derive a typology of 13 visualization image types, coded to distinguish visualizations and several image characteristics. The categorization process required much more time and was more difficult than we initially thought. The resulting typology and image analysis may serve a number of purposes: to study the evolution of the community and its research output over time, to facilitate the categorization of visualization images for the purpose of teaching, to identify visual designs for evaluation purposes, or to enable progress towards standardization in visualization. In addition to the typology and image characterization, we provide a dataset of 6,833 tagged images and an online tool that can be used to explore and analyze the large set of tagged images. We thus facilitate a discussion of the diverse visualizations used and how they are published and communicated in our community.
The prevalence of employing attention mechanisms has brought along concerns on the interpretability of attention distributions. Although it provides insights about how a model is operating, utilizing attention as the explanation of model predictions is still highly dubious. The community is still seeking more interpretable strategies for better identifying local active regions that contribute the most to the final decision. To improve the interpretability of existing attention models, we propose a novel Bilinear Representative Non-Parametric Attention (BR-NPA) strategy that captures the task-relevant human-interpretable information. The target model is first distilled to have higher-resolution intermediate feature maps. From which, representative features are then grouped based on local pairwise feature similarity, to produce finer-grained, more precise attention maps highlighting task-relevant parts of the input. The obtained attention maps are ranked according to the activity level of the compound feature, which provides information regarding the important level of the highlighted regions. The proposed model can be easily adapted in a wide variety of modern deep models, where classification is involved. Extensive quantitative and qualitative experiments showcase more comprehensive and accurate visual explanations compared to state-of-the-art attention models and visualizations methods across multiple tasks including fine-grained image classification, few-shot classification, and person re-identification, without compromising the classification accuracy. The proposed visualization model sheds imperative light on how neural networks `pay their attention' differently in different tasks.
Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. We first identify a group of interpretable units that are closely related to object concepts using a segmentation-based network dissection method. Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. We examine the contextual relationship between these units and their surroundings by inserting the discovered object concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a scene. We provide open source interpretation tools to help researchers and practitioners better understand their GAN models.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.