A growing amount of literature critiques the current operationalizations of empathy based on loose definitions of the construct. Such definitions negatively affect dataset quality, model robustness, and evaluation reliability. We propose an empathy evaluation framework that operationalizes empathy close to its psychological origins. The framework measures the variance in responses of LLMs to prompts using existing metrics for empathy and emotional valence. The variance is introduced through the controlled generation of the prompts by varying social biases affecting context understanding, thus impacting empathetic understanding. The control over generation ensures high theoretical validity of the constructs in the prompt dataset. Also, it makes high-quality translation, especially into languages that currently have little-to-no way of evaluating empathy or bias, such as the Slavonic family, more manageable. Using chosen LLMs and various prompt types, we demonstrate the empathy evaluation with the framework, including multiple-choice answers and free generation. The variance in our initial evaluation sample is small and we were unable to measure convincing differences between the empathetic understanding in contexts given by different social groups. However, the results are promising because the models showed significant alterations their reasoning chains needed to capture the relatively subtle changes in the prompts. This provides the basis for future research into the construction of the evaluation sample and statistical methods for measuring the results.
The biological brain has inspired multiple advances in machine learning. However, most state-of-the-art models in computer vision do not operate like the human brain, simply because they are not capable of changing or improving their decisions/outputs based on a deeper analysis. The brain is recurrent, while these models are not. It is therefore relevant to explore what would be the impact of adding recurrent mechanisms to existing state-of-the-art architectures and to answer the question of whether recurrency can improve existing architectures. To this end, we build on a feed-forward segmentation model and explore multiple types of recurrency for image segmentation. We explore self-organizing, relational, and memory retrieval types of recurrency that minimize a specific energy function. In our experiments, we tested these models on artificial and medical imaging data, while analyzing the impact of high levels of noise and few-shot learning settings. Our results do not validate our initial hypothesis that recurrent models should perform better in these settings, suggesting that these recurrent architectures, by themselves, are not sufficient to surpass state-of-the-art feed-forward versions and that additional work needs to be done on the topic.
The criticality problem in nuclear engineering asks for the principal eigenpair of a Boltzmann operator describing neutron transport in a reactor core. Being able to reliably design, and control such reactors requires assessing these quantities within quantifiable accuracy tolerances. In this paper we propose a paradigm that deviates from the common practice of approximately solving the corresponding spectral problem with a fixed, presumably sufficiently fine discretization. Instead, the present approach is based on first contriving iterative schemes, formulated in function space, that are shown to converge at a quantitative rate without assuming any a priori excess regularity properties, and that exploit only properties of the optical parameters in the underlying radiative transfer model. We develop the analytical and numerical tools for approximately realizing each iteration step within judiciously chosen accuracy tolerances, verified by a posteriori estimates, so as to still warrant quantifiable convergence to the exact eigenpair. This is carried out in full first for a Newton scheme. Since this is only locally convergent we analyze in addition the convergence of a power iteration in function space to produce sufficiently accurate initial guesses. Here we have to deal with intrinsic difficulties posed by compact but unsymmetric operators preventing standard arguments used in the finite dimensional case. Our main point is that we can avoid any condition on an initial guess to be already in a small neighborhood of the exact solution. We close with a discussion of remaining intrinsic obstructions to a certifiable numerical implementation, mainly related to not knowing the gap between the principal eigenvalue and the next smaller one in modulus.
This study investigates the internal representations of verb-particle combinations within transformer-based large language models (LLMs), specifically examining how these models capture lexical and syntactic nuances at different neural network layers. Employing the BERT architecture, we analyse the representational efficacy of its layers for various verb-particle constructions such as 'agree on', 'come back', and 'give up'. Our methodology includes a detailed dataset preparation from the British National Corpus, followed by extensive model training and output analysis through techniques like multi-dimensional scaling (MDS) and generalized discrimination value (GDV) calculations. Results show that BERT's middle layers most effectively capture syntactic structures, with significant variability in representational accuracy across different verb categories. These findings challenge the conventional uniformity assumed in neural network processing of linguistic elements and suggest a complex interplay between network architecture and linguistic representation. Our research contributes to a better understanding of how deep learning models comprehend and process language, offering insights into the potential and limitations of current neural approaches to linguistic analysis. This study not only advances our knowledge in computational linguistics but also prompts further research into optimizing neural architectures for enhanced linguistic precision.
Properly understanding the performances of classifiers is essential in various scenarios. However, the literature often relies only on one or two standard scores to compare classifiers, which fails to capture the nuances of application-specific requirements, potentially leading to suboptimal classifier selection. Recently, a paper on the foundations of the theory of performance-based ranking introduced a tool, called the Tile, that organizes an infinity of ranking scores into a 2D map. Thanks to the Tile, it is now possible to evaluate and compare classifiers efficiently, displaying all possible application-specific preferences instead of having to rely on a pair of scores. In this paper, we provide a first hitchhiker's guide for understanding the performances of two-class classifiers by presenting four scenarios, each showcasing a different user profile: a theoretical analyst, a method designer, a benchmarker, and an application developer. Particularly, we show that we can provide different interpretative flavors that are adapted to the user's needs by mapping different values on the Tile. As an illustration, we leverage the newly introduced Tile tool and the different flavors to rank and analyze the performances of 74 state-of-the-art semantic segmentation models in two-class classification through the eyes of the four user profiles. Through these user profiles, we demonstrate that the Tile effectively captures the behavior of classifiers in a single visualization, while accommodating an infinite number of ranking scores.
Reductions combine collections of input values with an associative and often commutative operator to produce collections of results. When the same input value contributes to multiple outputs, there is an opportunity to reuse partial results, enabling reduction simplification. Simplification often produces a program with lower asymptotic complexity. Typical compiler optimizations yield, at best, a constant fold speedup, but a complexity improvement from, say, cubic to quadratic complexity yields unbounded speedup for sufficiently large problems. It is well known that reductions in polyhedral programs may be simplified automatically, but previous methods cannot exploit all available reuse. This paper resolves this long-standing open problem, thereby attaining minimal asymptotic complexity in the simplified program. We propose extensions to prior work on simplification to support any independent commutative reduction. At the heart of our approach is piece-wise simplification, the notion that we can split an arbitrary reduction into pieces and then independently simplify each piece. However, the difficulty of using such piece-wise transformations is that they typically involve an infinite number of choices. We give constructive proofs to deal with this and select a finite number of pieces for simplification.
This work uniquely identifies and characterizes four prevalent multimodal model architectural patterns in the contemporary multimodal landscape. Systematically categorizing models by architecture type facilitates monitoring of developments in the multimodal domain. Distinct from recent survey papers that present general information on multimodal architectures, this research conducts a comprehensive exploration of architectural details and identifies four specific architectural types. The types are distinguished by their respective methodologies for integrating multimodal inputs into the deep neural network model. The first two types (Type A and B) deeply fuses multimodal inputs within the internal layers of the model, whereas the following two types (Type C and D) facilitate early fusion at the input stage. Type-A employs standard cross-attention, whereas Type-B utilizes custom-designed layers for modality fusion within the internal layers. On the other hand, Type-C utilizes modality-specific encoders, while Type-D leverages tokenizers to process the modalities at the model's input stage. The identified architecture types aid the monitoring of any-to-any multimodal model development. Notably, Type-C and Type-D are currently favored in the construction of any-to-any multimodal models. Type-C, distinguished by its non-tokenizing multimodal model architecture, is emerging as a viable alternative to Type-D, which utilizes input-tokenizing techniques. To assist in model selection, this work highlights the advantages and disadvantages of each architecture type based on data and compute requirements, architecture complexity, scalability, simplification of adding modalities, training objectives, and any-to-any multimodal generation capability.
Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.