Organisations increasingly use automated decision-making systems (ADMS) to inform decisions that affect humans and their environment. While the use of ADMS can improve the accuracy and efficiency of decision-making processes, it is also coupled with ethical challenges. Unfortunately, the governance mechanisms currently used to oversee human decision-making often fail when applied to ADMS. In previous work, we proposed that ethics-based auditing (EBA), i.e. a structured process by which ADMS are assessed for consistency with relevant principles or norms, can (a) help organisations verify claims about their ADMS and (b) provide decision-subjects with justifications for the outputs produced by ADMS. In this article, we outline the conditions under which EBA procedures can be feasible and effective in practice. First, we argue that EBA is best understood as a 'soft' yet 'formal' governance mechanism. This implies that the main responsibility of auditors should be to spark ethical deliberation at key intervention points throughout the software development process and ensure that there is sufficient documentation to respond to potential inquiries. Second, we frame ADMS as parts of larger socio-technical systems to demonstrate that to be feasible and effective, EBA procedures must link to intervention points that span all levels of organisational governance and all phases of the software lifecycle. The main function of EBA should therefore be to inform, formalise, assess, and interlink existing governance structures. Finally, we discuss the policy implications of our findings. To support the emergence of feasible and effective EBA procedures, policymakers and regulators could provide standardised reporting formats, facilitate knowledge exchange, provide guidance on how to resolve normative tensions, and create an independent body to oversee EBA of ADMS.
A data science task can be deemed as making sense of the data or testing a hypothesis about it. The conclusions inferred from data can greatly guide us to make informative decisions. Big data has enabled us to carry out countless prediction tasks in conjunction with machine learning, such as identifying high risk patients suffering from a certain disease and taking preventable measures. However, healthcare practitioners are not content with mere predictions - they are also interested in the cause-effect relation between input features and clinical outcomes. Understanding such relations will help doctors treat patients and reduce the risk effectively. Causality is typically identified by randomized controlled trials. Often such trials are not feasible when scientists and researchers turn to observational studies and attempt to draw inferences. However, observational studies may also be affected by selection and/or confounding biases that can result in wrong causal conclusions. In this chapter, we will try to highlight some of the drawbacks that may arise in traditional machine learning and statistical approaches to analyze the observational data, particularly in the healthcare data analytics domain. We will discuss causal inference and ways to discover the cause-effect from observational studies in healthcare domain. Moreover, we will demonstrate the applications of causal inference in tackling some common machine learning issues such as missing data and model transportability. Finally, we will discuss the possibility of integrating reinforcement learning with causality as a way to counter confounding bias.
Industry is moving towards large-scale systems where processor cores, memories, accelerators, etc.\ are bundled via 2.5D integration. These various components are fabricated separately as chiplets and then integrated using an interconnect carrier, a so-called interposer. This new design style provides benefits in terms of yield as well as economies of scale, as chiplets may come from various third-party vendors, and be integrated into one sophisticated system. The benefits of this approach, however, come at the cost of new challenges for the system's security and integrity when many third-party component chiplets, some from not fully trusted vendors, are integrated. Here, we explore these challenges, but also promises, for modern interposer-based systems of cache-coherent, multi-core chiplets. First, we introduce a new, coherence-based attack, GETXspy, wherein a single compromised chiplet can expose a high-bandwidth side/covert-channel in an ostensibly secure system. We further show that prior art is insufficient to stop this new attack. Second, we propose using an active interposer as generic, secure-by-construction platform that forms a physical root of trust for modern 2.5D systems. Our scheme has limited overhead, restricted to the active interposer, allowing the chiplets and the coherence system to remain untouched. We show that our scheme prevents a wide range of attacks, including but not limited to our GETXspy attack, with little overhead on system performance, $\sim$4\%. This overhead reduces as workloads increase, ensuring scalability of the scheme.
There is a growing need for authentication methodology in virtual reality applications. Current systems assume that the immersive experience technology is a collection of peripheral devices connected to a personal computer or mobile device. Hence there is a complete reliance on the computing device with traditional authentication mechanisms to handle the authentication and authorization decisions. Using the virtual reality controllers and headset poses a different set of challenges as it is subject to unauthorized observation, unannounced to the user given the fact that the headset completely covers the field of vision in order to provide an immersive experience. As the need for virtual reality experiences in the commercial world increases, there is a need to provide other alternative mechanisms for secure authentication. In this paper, we analyze a few proposed authentication systems and reached a conclusion that a multidimensional approach to authentication is needed to address the granular nature of authentication and authorization needs of a commercial virtual reality applications in the commercial world.
Finely tuning MPI applications and understanding the influence of keyparameters (number of processes, granularity, collective operationalgorithms, virtual topology, and process placement) is critical toobtain good performance on supercomputers. With the high consumptionof running applications at scale, doing so solely to optimize theirperformance is particularly costly. Havinginexpensive but faithful predictions of expected performance could bea great help for researchers and system administrators. Themethodology we propose decouples the complexity of the platform, whichis captured through statistical models of the performance of its maincomponents (MPI communications, BLAS operations), from the complexityof adaptive applications by emulating the application and skippingregular non-MPI parts of the code. We demonstrate the capability of our method with High-PerformanceLinpack (HPL), the benchmark used to rank supercomputers in theTOP500, which requires careful tuning. We briefly present (1) how theopen-source version of HPL can be slightly modified to allow a fastemulation on a single commodity server at the scale of asupercomputer. Then we present (2) an extensive (in)validation studythat compares simulation with real experiments and demonstrates our ability to predict theperformance of HPL within a few percent consistently. This study allows us toidentify the main modeling pitfalls (e.g., spatial and temporal nodevariability or network heterogeneity and irregular behavior) that needto be considered. Last, we show (3) how our "surrogate" allowsstudying several subtle HPL parameter optimization problems whileaccounting for uncertainty on the platform.
Comparing with lecturer marked assessments, peer assessment is a more comprehensive learning process and many of the associated problems have occurred. In this research work, we study the peer-assessment impact on group learning activities in order to provide a complete and systematic review, increase the practice and quality of the peer assessment process. Pilot studies were conducted and took the form of surveys, focus group interviews, and questionnaires. Prelimi-nary surveys were conducted with 582 students and 276 responses were received, giving a response rate of 47.4%. The results show 37% student will choose individual work over group work if given the choice. In the case study, 82.1% of the total of 28 students have en-joyed working in a group using Facebook as communication tools. 89.3% of the students can demonstrate their skills through group-working and most importantly, 82.1% of them agree that peer assess-ment is an impartial method of assessment with the help of Facebook as proof of self-contribution. Our suggestions to make group work a pleasant experience are by identifying and taking action against the freeloader, giving credit to the deserving students, educating students on how to give constructive feedback and making the assessment pro-cess transparent to all.
The paper presents an approach for building consistent and applicable clinical decision support systems (CDSSs) using a data-driven predictive model aimed at resolving the problem of low applicability and scalability of CDSSs in real-world applications. The approach is based on a threestage application of domain-specific and data-driven supportive procedures that are to be integrated into clinical business processes with higher trust and explainability of the prediction results and recommendations. Within the considered three stages, the regulatory policy, data-driven modes, and interpretation procedures are integrated to enable natural domain-specific interaction with decisionmakers with sequential narrowing of the intelligent decision support focus. The proposed methodology enables a higher level of automation, scalability, and semantic interpretability of CDSSs. The approach was implemented in software solutions and tested within a case study in T2DM prediction, enabling us to improve known clinical scales (such as FINDRISK) while keeping the problem-specific reasoning interface similar to existing applications. Such inheritance, together with the three-staged approach, provide higher compatibility of the solution and leads to trust, valid, and explainable application of data-driven solutions in real-world cases.
Transit monitoring is a preventative approach used to identify possible cases of human trafficking while an individual is in transit or before one crosses a border. Transit monitoring is often conducted by non-governmental organizations (NGOs) who train staff to identify and intercept suspicious activity. Love Justice International (LJI) is a well-established NGO that has been conducting transit monitoring for years along the Nepal-India border at a number of monitoring stations. In partnership with LJI, we developed a system that uses data envelopment analysis (DEA) to help LJI decision-makers evaluate the performance of these stations and make specific operational improvement recommendations. Our model consists of 91 decision-making units (DMUs) from 7 stations over 13 quarters and considers three inputs, four outputs, and 3 homogeneity criteria. Using this model we identified efficient stations, compared rankings of station performance, and recommended strategies to improve efficiency. To the best of our knowledge, this is the first application of DEA in the anti-human trafficking domain.
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.
There is a resurgent interest in developing intelligent open-domain dialog systems due to the availability of large amounts of conversational data and the recent progress on neural approaches to conversational AI. Unlike traditional task-oriented bots, an open-domain dialog system aims to establish long-term connections with users by satisfying the human need for communication, affection, and social belonging. This paper reviews the recent works on neural approaches that are devoted to addressing three challenges in developing such systems: semantics, consistency, and interactiveness. Semantics requires a dialog system to not only understand the content of the dialog but also identify user's social needs during the conversation. Consistency requires the system to demonstrate a consistent personality to win users trust and gain their long-term confidence. Interactiveness refers to the system's ability to generate interpersonal responses to achieve particular social goals such as entertainment, conforming, and task completion. The works we select to present here is based on our unique views and are by no means complete. Nevertheless, we hope that the discussion will inspire new research in developing more intelligent dialog systems.