Direct reciprocity is a mechanism for the evolution of cooperation in repeated social interactions. According to this literature, individuals naturally learn to adopt conditionally cooperative strategies if they have multiple encounters with their partner. Corresponding models have greatly facilitated our understanding of cooperation, yet they often make strong assumptions on how individuals remember and process payoff information. For example, when strategies are updated through social learning, it is commonly assumed that individuals compare their average payoffs. This would require them to compute (or remember) their payoffs against everyone else in the population. To understand how more realistic constraints influence direct reciprocity, we consider the evolution of conditional behaviors when individuals learn based on more recent experiences. Even in the most extreme case that they only take into account their very last interaction, we find that cooperation can still evolve. However, such individuals adopt less generous strategies, and they tend to cooperate less often than in the classical setup with average payoffs. Interestingly, once individuals remember the payoffs of two or three recent interactions, cooperation rates quickly approach the classical limit. These findings contribute to a literature that explores which kind of cognitive capabilities are required for reciprocal cooperation. While our results suggest that some rudimentary form of payoff memory is necessary, it already suffices to remember a few interactions.
Advances in large language models (LLMs) have driven an explosion of interest about their societal impacts. Much of the discourse around how they will impact social equity has been cautionary or negative, focusing on questions like "how might LLMs be biased and how would we mitigate those biases?" This is a vital discussion: the ways in which AI generally, and LLMs specifically, can entrench biases have been well-documented. But equally vital, and much less discussed, is the more opportunity-focused counterpoint: "what promising applications do LLMs enable that could promote equity?" If LLMs are to enable a more equitable world, it is not enough just to play defense against their biases and failure modes. We must also go on offense, applying them positively to equity-enhancing use cases to increase opportunities for underserved groups and reduce societal discrimination. There are many choices which determine the impact of AI, and a fundamental choice very early in the pipeline is the problems we choose to apply it to. If we focus only later in the pipeline -- making LLMs marginally more fair as they facilitate use cases which intrinsically entrench power -- we will miss an important opportunity to guide them to equitable impacts. Here, we highlight the emerging potential of LLMs to promote equity by presenting four newly possible, promising research directions, while keeping risks and cautionary points in clear view.
Dependence is undoubtedly a central concept in statistics. Though, it proves difficult to locate in the literature a formal definition which goes beyond the self-evident 'dependence = non-independence'. This absence has allowed the term 'dependence' and its declination to be used vaguely and indiscriminately for qualifying a variety of disparate notions, leading to numerous incongruities. For example, the classical Pearson's, Spearman's or Kendall's correlations are widely regarded as 'dependence measures' of major interest, in spite of returning 0 in some cases of deterministic relationships between the variables at play, evidently not measuring dependence at all. Arguing that research on such a fundamental topic would benefit from a slightly more rigid framework, this paper suggests a general definition of the dependence between two random variables defined on the same probability space. Natural enough for aligning with intuition, that definition is still sufficiently precise for allowing unequivocal identification of a 'universal' representation of the dependence structure of any bivariate distribution. Links between this representation and familiar concepts are highlighted, and ultimately, the idea of a dependence measure based on that universal representation is explored and shown to satisfy Renyi's postulates.
The problem of estimating a piecewise monotone sequence of normal means is called the nearly isotonic regression. For this problem, an efficient algorithm has been devised by modifying the pool adjacent violators algorithm (PAVA). In this study, we investigate estimation of a piecewise monotone parameter sequence for general one-parameter exponential families such as binomial, Poisson and chi-square. We develop an efficient algorithm based on the modified PAVA, which utilizes the duality between the natural and expectation parameters. We also provide a method for selecting the regularization parameter by using an information criterion. Simulation results demonstrate that the proposed method detects change-points in piecewise monotone parameter sequences in a data-driven manner. Applications to spectrum estimation, causal inference and discretization error quantification of ODE solvers are also presented.
Equivariance is an important feature in machine learning, including language models. It ensures that any sequences of phrases with the same meanings are interpreted consistently. For example, the sentence 'There is a cat on the table' should be interpreted by language models as it is, regardless of variations in its token-level expression. Building on this insight, I propose a new theory suggesting that insufficient equivariance in language models can lead to hallucinations. According to this theory, which is both intuitive and novel, language models trained on relatively small datasets tend to misinterpret input texts and/or generate incorrect texts (i.e., hallucinations). To test this theory, I developed a toy model known as 'dancing men', which is a character-level substitution cipher. Additionally, I propose a novel technique based on the T5 (Text To Text Transfer Transformer) model to efficiently decipher these codes without relying on frequency analysis. I have found that this T5 model can almost completely solve the cipher, demonstrating its ability to acquire equivariance in this frame. This method could be scaled up to word-level and sentence-level substitution ciphers, analogous to large language models without tokenizers or dictionaries. This scalability makes it suitable for investigating the proposed link between inadequate equivariance acquisition and the emergence of hallucinations.
Deep neural networks (DNNs) often fail silently with over-confident predictions on out-of-distribution (OOD) samples, posing risks in real-world deployments. Existing techniques predominantly emphasize either the feature representation space or the gradient norms computed with respect to DNN parameters, yet they overlook the intricate gradient distribution and the topology of classification regions. To address this gap, we introduce GRadient-aware Out-Of-Distribution detection in interpolated manifolds (GROOD), a novel framework that relies on the discriminative power of gradient space to distinguish between in-distribution (ID) and OOD samples. To build this space, GROOD relies on class prototypes together with a prototype that specifically captures OOD characteristics. Uniquely, our approach incorporates a targeted mix-up operation at an early intermediate layer of the DNN to refine the separation of gradient spaces between ID and OOD samples. We quantify OOD detection efficacy using the distance to the nearest neighbor gradients derived from the training set, yielding a robust OOD score. Experimental evaluations substantiate that the introduction of targeted input mix-upamplifies the separation between ID and OOD in the gradient space, yielding impressive results across diverse datasets. Notably, when benchmarked against ImageNet-1k, GROOD surpasses the established robustness of state-of-the-art baselines. Through this work, we establish the utility of leveraging gradient spaces and class prototypes for enhanced OOD detection for DNN in image classification.
Finding the exact spanning ratio of a Delaunay graph has been one of the longstanding open problems in Computational Geometry. Currently there are only four convex shapes for which the exact spanning ratio of their Delaunay graph is known: the equilateral triangle, the square, the regular hexagon and the rectangle. In this paper, we show the exact spanning ratio of the parallelogram Delaunay graph, making the parallelogram the fifth convex shape for which an exact bound is known. The worst-case spanning ratio is exactly $$\frac{\sqrt{2}\sqrt{1+A^2+2A\cos(\theta_0)+(A+\cos(\theta_0))\sqrt{1+A^2+2A\cos(\theta_0)}}}{\sin(\theta_0)} .$$ where $A$ is the aspect ratio and $\theta_0$ is the non-obtuse angle of the parallelogram. Moreover, we show how to construct a parallelogram Delaunay graph whose spanning ratio matches the above mentioned spanning ratio.
Conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence, because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called "computational random-access memory (CRAM)" has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there lacks an experimental demonstration and study of CRAM to evaluate its computation accuracy, which is a realistic and application-critical metrics for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations as well as 2-, 3-, and 5-input logic operations are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of modeling has been developed to characterize the accuracy of CRAM computation. Further analysis of scalar addition, multiplication, and matrix multiplication shows promising results. These results are then applied to a complete application: a neural network based handwritten digit classifier, as an example to show the connection between the application performance and further MTJ development. The classifier achieved almost-perfect classification accuracy, with reasonable projections of future MTJ development. With the confirmation of MTJ-based CRAM's accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.
In discussions about the development and governance of AI, a false binary is often drawn between two groups: those most concerned about the existing, social impacts of AI, and those most concerned about possible future risks of powerful AI systems taking actions that don't align with human interests. In this piece, we (i) describe the emergence of this false binary, (ii) explain why the seemingly clean distinctions drawn between these two groups don't hold up under scrutiny and (iii) highlight efforts to bridge this divide.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.