People are known to judge artificial intelligence using a utilitarian moral philosophy and humans using a moral philosophy emphasizing perceived intentions. But why do people judge humans and machines differently? Psychology suggests that people may have different mind perception models of humans and machines, and thus, will treat human-like robots more similarly to the way they treat humans. Here we present a randomized experiment where we manipulated people's perception of machine agency (e.g., ability to plan, act) and experience (e.g., ability to feel) to explore whether people judge machines that are perceived to be more similar to humans along these two dimensions more similarly to the way they judge humans. We find that people's judgments of machines become more similar to that of humans when they perceive machines as having more agency but not more experience. Our findings indicate that people's use of different moral philosophies to judge humans and machines can be explained by a progression of mind perception models where the perception of agency plays a prominent role. These findings add to the body of evidence suggesting that people's judgment of machines becomes more similar to that of humans motivating further work on dimensions modulating people's judgment of human and machine actions.
Purpose: To provide a simulation framework for routine neuroimaging test data, which allows for "stress testing" of deep segmentation networks against acquisition shifts that commonly occur in clinical practice for T2 weighted (T2w) fluid attenuated inversion recovery (FLAIR) Magnetic Resonance Imaging (MRI) protocols. Approach: The approach simulates "acquisition shift derivatives" of MR images based on MR signal equations. Experiments comprise the validation of the simulated images by real MR scans and example stress tests on state-of-the-art MS lesion segmentation networks to explore a generic model function to describe the F1 score in dependence of the contrast-affecting sequence parameters echo time (TE) and inversion time (TI). Results: The differences between real and simulated images range up to 19 % in gray and white matter for extreme parameter settings. For the segmentation networks under test the F1 score dependency on TE and TI can be well described by quadratic model functions (R^2 > 0.9). The coefficients of the model functions indicate that changes of TE have more influence on the model performance than TI. Conclusions: We show that these deviations are in the range of values as may be caused by erroneous or individual differences of relaxation times as described by literature. The coefficients of the F1 model function allow for quantitative comparison of the influences of TE and TI. Limitations arise mainly from tissues with the low baseline signal (like CSF) and when the protocol contains contrast-affecting measures that cannot be modelled due to missing information in the DICOM header.
Artificial intelligence experts often question whether AI is fair. They view fairness as a property of AI systems rather than of sociopolitical and economic systems. This paper emphasizes the need to be fair in the social, political, and economic contexts within which an educational system operates and uses AI. Taking Swedish decentralized compulsory education as the context, this paper examines whether and how the use of AI envisaged by national authorities and edtech companies exacerbates unfairness. A qualitative content analysis of selected Swedish policy documents and edtech reports was conducted using the concept of relevant social groups to understand how different groups view the risks and benefits of AI for fairness. Three groups that view efficiency as a key value of AI are identified, and interpreted as economical, pedagogical and accessibility-related. By separating fairness from social justice, this paper challenges the notion of fairness as the formal equality of opportunities.
Motorcycle accidents are a prevalent problem in Texas, resulting in hundreds of injuries and deaths each year. Motorcycles provide the driver with little physical protection during accidents compared to cars and other vehicles, so when there is a collision involving a motorcycle, the motorcyclist is likely to be injured. While there are numerous reasons for motorcycle accidents, most are caused by negligence and could have been avoided. Because of the increasing popularity of motorcycles and scooter in Texas, coupled with an increase in the number of motorcycle accidents, the Texas Department of Transportation (TxDOT) has amped its efforts to improve motorcycle safety. From the data, it has been visible that teenage drivers are the most vulnerable to motorcycle accidents. In this report, we have tried to find out the probability of young driver and passenger motorcyclist's injury based on different conditions and to predict the rate of changing injury to this group in the upcoming years.
Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. Their ever-increasing size, however, raised concerns about their effective deployment and the need for LLM compressions. This study introduces the Divergent Token metrics (DTMs), a novel approach for assessing compressed LLMs, addressing the limitations of traditional measures like perplexity that fail to accurately reflect text generation quality. DTMs focus on token divergence, providing deeper insights into the subtleties of model compression. Our results indicate that significant levels of precision and sparsity can be achieved without compromising text generation quality. Moreover, DTMs offers a more precise evaluation of each component's impact individually. Utilizing the First Divergent Token metric (FDTM) in model sparsification reveals that nearly 20% of all components can be pruned over 90%. In terms of quantization, the FDTM suggests that over 80% of parameters can be straightforwardly transformed to int8 without special outlier management.
Large-sample Bayesian analogs exist for many frequentist methods, but are less well-known for the widely-used 'sandwich' or 'robust' variance estimates. We review existing approaches to Bayesian analogs of sandwich variance estimates and propose a new analog, as the Bayes rule under a form of balanced loss function, that combines elements of standard parametric inference with fidelity of the data to the model. Our development is general, for essentially any regression setting with independent outcomes. Being the large-sample equivalent of its frequentist counterpart, we show by simulation that Bayesian robust standard error estimates can faithfully quantify the variability of parameter estimates even under model misspecification -- thus retaining the major attraction of the original frequentist version. We demonstrate our Bayesian analog of standard error estimates when studying the association between age and systolic blood pressure in NHANES.
Empirical studies have widely demonstrated that neural networks are highly sensitive to small, adversarial perturbations of the input. The worst-case robustness against these so-called adversarial examples can be quantified by the Lipschitz constant of the neural network. However, only few theoretical results regarding this quantity exist in the literature. In this paper, we initiate the study of the Lipschitz constant of random ReLU neural networks, i.e., neural networks whose weights are chosen at random and which employ the ReLU activation function. For shallow neural networks, we characterize the Lipschitz constant up to an absolute numerical constant. Moreover, we extend our analysis to deep neural networks of sufficiently large width where we prove upper and lower bounds for the Lipschitz constant. These bounds match up to a logarithmic factor that depends on the depth.
In recent years, research involving human participants has been critical to advances in artificial intelligence (AI) and machine learning (ML), particularly in the areas of conversational, human-compatible, and cooperative AI. For example, around 12% and 6% of publications at recent AAAI and NeurIPS conferences indicate the collection of original human data, respectively. Yet AI and ML researchers lack guidelines for ethical, transparent research practices with human participants. Fewer than one out of every four of these AAAI and NeurIPS papers provide details of ethical review, the collection of informed consent, or participant compensation. This paper aims to bridge this gap by exploring normative similarities and differences between AI research and related fields that involve human participants. Though psychology, human-computer interaction, and other adjacent fields offer historic lessons and helpful insights, AI research raises several specific concerns$\unicode{x2014}$namely, participatory design, crowdsourced dataset development, and an expansive role of corporations$\unicode{x2014}$that necessitate a contextual ethics framework. To address these concerns, this paper outlines a set of guidelines for ethical and transparent practice with human participants in AI and ML research. These guidelines can be found in Section 4 on pp. 4$\unicode{x2013}$7.
The emergence of AI tools in cybersecurity creates many opportunities and uncertainties. A focus group with advanced graduate students in cybersecurity revealed the potential depth and breadth of the challenges and opportunities. The salient issues are access to open source or free tools, documentation, curricular diversity, and clear articulation of ethical principles for AI cybersecurity education. Confronting the "black box" mentality in AI cybersecurity work is also of the greatest importance, doubled by deeper and prior education in foundational AI work. Systems thinking and effective communication were considered relevant areas of educational improvement. Future AI educators and practitioners need to address these issues by implementing rigorous technical training curricula, clear documentation, and frameworks for ethically monitoring AI combined with critical and system's thinking and communication skills.
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.