The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very fruitful, they continue to be models with little or no interpretability and explainability. One of the tasks for which they are best suited is the encoding of the contextual sense of words using contextualized embeddings. In this paper we propose a transparent, interpretable, and linguistically motivated strategy for encoding the contextual sense of words by modeling semantic compositionality. Particular attention is given to dependency relations and semantic notions such as selection preferences and paradigmatic classes. A partial implementation of the proposed model is carried out and compared with Transformer-based architectures for a given semantic task, namely the similarity calculation of word senses in context. The results obtained show that it is possible to be competitive with linguistically motivated models instead of using the black boxes underlying complex neural architectures.
Recently, simplicial complexes are used in constructions of several infinite families of minimal and optimal linear codes by Hyun {\em et al.} Building upon their research, in this paper more linear codes over the ring $\mathbb{Z}_4$ are constructed by simplicial complexes. Specifically, the Lee weight distributions of the resulting quaternary codes are determined and two infinite families of four-Lee-weight quaternary codes are obtained. Compared to the databases of $\mathbb Z_4$ codes by Aydin {\em et al.}, at least nine new quaternary codes are found. Thanks to the special structure of the defining sets, we have the ability to determine whether the Gray images of certain obtained quaternary codes are linear or not. This allows us to obtain two infinite families of binary nonlinear codes and one infinite family of binary minimal linear codes. Furthermore, utilizing these minimal binary codes, some secret sharing schemes as a byproduct also are established.
Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference framework for nonparametric identification of the causal effect of a continuous exposure on an outcome in the presence of spatial confounding. We propose double machine learning (DML), a procedure in which flexible models are used to regress both the exposure and outcome variables on confounders to arrive at a causal estimator with favorable robustness properties and convergence rates, and we prove that this approach is consistent and asymptotically normal under spatial dependence. As far as we are aware, this is the first approach to spatial confounding that does not rely on restrictive parametric assumptions (such as linearity, effect homogeneity, or Gaussianity) for both identification and estimation. We demonstrate the advantages of the DML approach analytically and in simulations. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.
We present the new Orthogonal Polynomials Approximation Algorithm (OPAA), a parallelizable algorithm that estimates probability distributions using functional analytic approach: first, it finds a smooth functional estimate of the probability distribution, whether it is normalized or not; second, the algorithm provides an estimate of the normalizing weight; and third, the algorithm proposes a new computation scheme to compute such estimates. A core component of OPAA is a special transform of the square root of the joint distribution into a special functional space of our construct. Through this transform, the evidence is equated with the $L^2$ norm of the transformed function, squared. Hence, the evidence can be estimated by the sum of squares of the transform coefficients. Computations can be parallelized and completed in one pass. OPAA can be applied broadly to the estimation of probability density functions. In Bayesian problems, it can be applied to estimating the normalizing weight of the posterior, which is also known as the evidence, serving as an alternative to existing optimization-based methods.
Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $D$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}^d$ within a Wasserstein error margin $\varepsilon>0$. Our findings reveal a balancing trade-off between $p$ and $L$, with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilon^d)^{-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$-approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p^{-1/d})$. This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $D$. In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.
Deterministic communication is required for applications of several industry verticals including manufacturing, automotive, financial, and health care, etc. These applications rely on reliable and time-synchronized delivery of information among the communicating devices. Therefore, large delay variations in packet delivery or inaccuracies in time synchronization cannot be tolerated. In particular, the industrial revolution on digitization, connectivity of digital and physical systems, and flexible production design require deterministic and time-synchronized communication. A network supporting deterministic communication guarantees data delivery in a specified time with high reliability. The IEEE 802.1 TSN task group is developing standards to provide deterministic communication through IEEE 802 networks. The IEEE 802.1AS standard defines time synchronization mechanism for accurate distribution of time among the communicating devices. The time synchronization accuracy depends on the accurate calculation of the residence time which is the time between the ingress and the egress ports of the bridge and includes the processing, queuing, transmission, and link latency of the timing information. This paper discusses time synchronization mechanisms supported in current wired and wireless integrated systems.
Pre-trained large language models (LLMs) have powerful capabilities for generating creative natural text. Evolutionary algorithms (EAs) can discover diverse solutions to complex real-world problems. Motivated by the common collective and directionality of text sequence generation and evolution, this paper illustrates the strong consistency of LLMs and EAs, which includes multiple one-to-one key characteristics: token embedding and genotype-phenotype mapping, position encoding and fitness shaping, position embedding and selection, attention and crossover, feed-forward neural network and mutation, model training and parameter update, and multi-task learning and multi-objective optimization. Based on this consistency perspective, existing coupling studies are analyzed, including evolutionary fine-tuning and LLM-enhanced EAs. Leveraging these insights, we outline a fundamental roadmap for future research in coupling LLMs and EAs, while highlighting key challenges along the way. The consistency not only reveals the evolution mechanism behind LLMs but also facilitates the development of evolved artificial agents that approach or surpass biological organisms.
One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem. In this work, we study methods to assess and bolster utterance consistency of chat systems. A dataset is first developed for studying the inconsistencies, where inconsistent dialogue responses, explanations of the inconsistencies, and recovery utterances are authored by annotators. This covers the life span of inconsistencies, namely introduction, understanding, and resolution. Building on this, we introduce a set of tasks centered on dialogue consistency, specifically focused on its detection and resolution. Our experimental findings indicate that our dataset significantly helps the progress in identifying and resolving conversational inconsistencies, and current popular large language models like ChatGPT which are good at resolving inconsistencies however still struggle with detection.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.