Current ethical debates on the use of artificial intelligence (AI) in health care treat AI as a product of technology in three ways: First, by assessing risks and potential benefits of currently developed AI-enabled products with ethical checklists; second, by proposing ex ante lists of ethical values seen as relevant for the design and development of assisting technology, and third, by promoting AI technology to use moral reasoning as part of the automation process. Subsequently, we propose a fourth approach to AI, namely as a methodological tool to assist ethical reflection. We provide a concept of an AI-simulation informed by three separate elements: 1) stochastic human behavior models based on behavioral data for simulating realistic settings, 2) qualitative empirical data on value statements regarding internal policy, and 3) visualization components that aid in understanding the impact of changes in these variables. The potential of this approach is to inform an interdisciplinary field about anticipated ethical challenges or ethical trade-offs in concrete settings and, hence, to spark a re-evaluation of design and implementation plans. This may be particularly useful for applications that deal with extremely complex values and behavior or with limitations on the communication resources of affected persons (e.g., persons with dementia care or for care of persons with cognitive impairment). Simulation does not replace ethical reflection but does allow for detailed, context-sensitive analysis during the design process and prior to implementation. Finally, we discuss the inherently quantitative methods of analysis afforded by stochastic simulations as well as the potential for ethical discussions and how simulations with AI can improve traditional forms of thought experiments and future-oriented technology assessment.
Probabilistic programming combines general computer programming, statistical inference, and formal semantics to help systems make decisions when facing uncertainty. Probabilistic programs are ubiquitous, including having a significant impact on machine intelligence. While many probabilistic algorithms have been used in practice in different domains, their automated verification based on formal semantics is still a relatively new research area. In the last two decades, it has attracted much interest. Many challenges, however, remain. The work presented in this paper, probabilistic relations, takes a step towards our vision to tackle these challenges. Our work is based on Hehner's predicative probabilistic programming, but there are several obstacles to the broader adoption of his work. Our contributions here include (1) the formalisation of its syntax and semantics by introducing an Iverson bracket notation to separate relations from arithmetic; (2) the formalisation of relations using Unifying Theories of Programming (UTP) and probabilities outside the brackets using summation over the topological space of the real numbers; (3) the constructive semantics for probabilistic loops using Kleene's fixed-point theorem; (4) the enrichment of its semantics from distributions to subdistributions and superdistributions to deal with the constructive semantics; (5) the unique fixed-point theorem to simplify the reasoning about probabilistic loops; and (6) the mechanisation of our theory in Isabelle/UTP, an implementation of UTP in Isabelle/HOL, for automated reasoning using theorem proving. We demonstrate our work with six examples, including problems in robot localisation, classification in machine learning, and the termination of probabilistic loops.
Intentional and accidental harms arising from the use of AI have impacted the health, safety and rights of individuals. While regulatory frameworks are being developed, there remains a lack of consensus on methods necessary to deliver safe AI. The potential for explainable AI (XAI) to contribute to the effectiveness of the regulation of AI is being increasingly examined. Regulation must include methods to ensure compliance on an ongoing basis, though there is an absence of practical proposals on how to achieve this. For XAI to be successfully incorporated into a regulatory system, the individuals who are engaged in interpreting/explaining the model to stakeholders should be sufficiently qualified for the role. Statutory professionals are prevalent in domains in which harm can be done to the health, safety and rights of individuals. The most obvious examples are doctors, engineers and lawyers. Those professionals are required to exercise skill and judgement and to defend their decision making process in the event of harm occurring. We propose that a statutory profession framework be introduced as a necessary part of the AI regulatory framework for compliance and monitoring purposes. We will refer to this new statutory professional as an AI Architect (AIA). This AIA would be responsible to ensure the risk of harm is minimised and accountable in the event that harms occur. The AIA would also be relied on to provide appropriate interpretations/explanations of XAI models to stakeholders. Further, in order to satisfy themselves that the models have been developed in a satisfactory manner, the AIA would require models to have appropriate transparency. Therefore it is likely that the introduction of an AIA system would lead to an increase in the use of XAI to enable AIA to discharge their professional obligations.
This paper presents a novel framework enabling end-users to perform the management of complex robotic workplaces using a tablet and augmented reality. The framework allows users to commission the workplace comprising different types of robots, machines, or services irrespective of the vendor, set task-important points in space, specify program steps, generate a code, and control its execution. More users can collaborate simultaneously, for instance, within a large-scale workplace. Spatially registered visualization and programming enable a fast and easy understanding of workplace processes, while high precision is achieved by combining kinesthetic teaching with specific graphical tools for relative manipulation of poses. A visually defined program is for execution translated into Python representation, allowing efficient involvement of experts. The system was designed and developed in cooperation with a system integrator based on an offline printed circuit board testing use case, and its user interface was evaluated multiple times during the development. The latest evaluation was performed by three experts and indicates the high potential of the solution.
Generative artificial intelligence (AI) is a widely popular technology that will have a profound impact on society and individuals. Less than a decade ago, it was thought that creative work would be among the last to be automated - yet today, we see AI encroaching on many creative domains. In this paper, we present the findings of a survey study on people's perceptions of text-to-image generation. We touch on participants' technical understanding of the emerging technology, their fears and concerns, and thoughts about risks and dangers of text-to-image generation to the individual and society. We find that while participants were aware of the risks and dangers associated with the technology, only few participants considered the technology to be a personal risk. The risks for others were more easy to recognize for participants. Artists were particularly seen at risk. Participants who had tried the technology rated its future importance lower than those who had not tried it. This result shows that many people are still oblivious of the potential personal risks of generative artificial intelligence and the impending societal changes associated with this technology.
The surge in Reinforcement Learning (RL) applications in Intelligent Transportation Systems (ITS) has contributed to its growth as well as highlighted key challenges. However, defining objectives of RL agents in traffic control and management tasks, as well as aligning policies with these goals through an effective formulation of Markov Decision Process (MDP), can be challenging and often require domain experts in both RL and ITS. Recent advancements in Large Language Models (LLMs) such as GPT-4 highlight their broad general knowledge, reasoning capabilities, and commonsense priors across various domains. In this work, we conduct a large-scale user study involving 70 participants to investigate whether novices can leverage ChatGPT to solve complex mixed traffic control problems. Three environments are tested, including ring road, bottleneck, and intersection. We find ChatGPT has mixed results. For intersection and bottleneck, ChatGPT increases number of successful policies by 150% and 136% compared to solely beginner capabilities, with some of them even outperforming experts. However, ChatGPT does not provide consistent improvements across all scenarios.
Time is a crucial factor in modelling dynamic behaviours of intelligent agents: activities have a determined temporal duration in a real-world environment, and previous actions influence agents' behaviour. In this paper, we propose a language for modelling concurrent interaction between agents that also allows the specification of temporal intervals in which particular actions occur. Such a language exploits a timed version of Abstract Argumentation Frameworks to realise a shared memory used by the agents to communicate and reason on the acceptability of their beliefs with respect to a given time interval. An interleaving model on a single processor is used for basic computation steps, with maximum parallelism for time elapsing. Following this approach, only one of the enabled agents is executed at each moment. To demonstrate the capabilities of language, we also show how it can be used to model interactions such as debates and dialogue games taking place between intelligent agents. Lastly, we present an implementation of the language that can be accessed via a web interface. Under consideration in Theory and Practice of Logic Programming (TPLP).
Cross-lingual Machine Translation (MT) quality estimation plays a crucial role in evaluating translation performance. GEMBA, the first MT quality assessment metric based on Large Language Models (LLMs), employs one-step prompting to achieve state-of-the-art (SOTA) in system-level MT quality estimation; however, it lacks segment-level analysis. In contrast, Chain-of-Thought (CoT) prompting outperforms one-step prompting by offering improved reasoning and explainability. In this paper, we introduce Knowledge-Prompted Estimator (KPE), a CoT prompting method that combines three one-step prompting techniques, including perplexity, token-level similarity, and sentence-level similarity. This method attains enhanced performance for segment-level estimation compared with previous deep learning models and one-step prompting approaches. Furthermore, supplementary experiments on word-level visualized alignment demonstrate that our KPE method significantly improves token alignment compared with earlier models and provides better interpretability for MT quality estimation. Code will be released upon publication.
In settings where users are both time-pressured and need high accuracy, such as doctors working in Emergency Rooms, we want to provide AI assistance that both increases accuracy and reduces time. However, different types of AI assistance have different benefits: some reduce time taken while increasing overreliance on AI, while others do the opposite. We therefore want to adapt what AI assistance we show depending on various properties (of the question and of the user) in order to best tradeoff our two objectives. We introduce a study where users have to prescribe medicines to aliens, and use it to explore the potential for adapting AI assistance. We find evidence that it is beneficial to adapt our AI assistance depending on the question, leading to good tradeoffs between time taken and accuracy. Future work would consider machine-learning algorithms (such as reinforcement learning) to automatically adapt quickly.
Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.
We describe ACE0, a lightweight platform for evaluating the suitability and viability of AI methods for behaviour discovery in multiagent simulations. Specifically, ACE0 was designed to explore AI methods for multi-agent simulations used in operations research studies related to new technologies such as autonomous aircraft. Simulation environments used in production are often high-fidelity, complex, require significant domain knowledge and as a result have high R&D costs. Minimal and lightweight simulation environments can help researchers and engineers evaluate the viability of new AI technologies for behaviour discovery in a more agile and potentially cost effective manner. In this paper we describe the motivation for the development of ACE0.We provide a technical overview of the system architecture, describe a case study of behaviour discovery in the aerospace domain, and provide a qualitative evaluation of the system. The evaluation includes a brief description of collaborative research projects with academic partners, exploring different AI behaviour discovery methods.