Partial Rejection Sampling is an algorithmic approach to obtaining a perfect sample from a specified distribution. The objects to be sampled are assumed to be represented by a number of random variables. In contrast to classical rejection sampling, in which all variables are resampled until a feasible solution is found, partial rejection sampling aims at greater efficiency by resampling only a subset of variables that `go wrong'. Partial rejection sampling is closely related to Moser and Tardos' algorithmic version of the Lov\'asz Local Lemma, but with the additional requirement that a specified output distribution should be met. This article provides a largely self-contained account of the basic form of the algorithm and its analysis.
BSML is a pure functional library for the multi-paradigm language OCaml. BSML embodies the principles of the Bulk Synchronous Parallel (BSP) model, a model of scalable parallel computing. We propose a formalization of BSML primitives with WhyML, the specification language of Why3 and specify and prove the correctness of most of the BSML standard library. Finally, we develop and verify the correctness of a small BSML application.
We define a notion of grading of a monoid T in a monoidal category C, relative to a class of morphisms M (which provide a notion of M-subobject). We show that, under reasonable conditions (including that M forms a factorization system), there is a canonical grading of T. Our application is to graded monads and models of computational effects. We demonstrate our results by characterizing the canonical gradings of a number of monads, for which C is endofunctors with composition. We also show that we can obtain canonical grades for algebraic operations.
Liftings of endofunctors on sets to endofunctors on relations are commonly used to capture bisimulation of coalgebras. Lax versions have been used in those cases where strict lifting fails to capture bisimilarity, as well as in modeling other notions of simulation. This paper provides tools for defining and manipulating lax liftings. As a central result, we define a notion of a lax distributive law of a functor over the powerset monad, and show that there is an isomorphism between the lattice of lax liftings and the lattice of lax distributive laws. We also study two functors in detail: (i) we show that the lifting for monotone bisimilarity is the minimal lifting for the monotone neighbourhood functor, and (ii) we show that the lattice of liftings for the (ordinary) neighbourhood functor is isomorphic to P(4), the powerset of a 4-element set.
Epistemic modals have peculiar logical features that are challenging to account for in a broadly classical framework. For instance, while a sentence of the form $p\wedge\Diamond\neg p$ ('$p$, but it might be that not $p$') appears to be a contradiction, $\Diamond\neg p$ does not entail $\neg p$, which would follow in classical logic. Likewise, the classical laws of distributivity and disjunctive syllogism fail for epistemic modals. Existing attempts to account for these facts generally either under- or over-correct. Some predict that $p\wedge\Diamond\neg p$, a so-called epistemic contradiction, is a contradiction only in an etiolated sense, under a notion of entailment that does not always allow us to replace $p\wedge\Diamond\neg p$ with a contradiction; these theories underpredict the infelicity of embedded epistemic contradictions. Other theories savage classical logic, eliminating not just rules that intuitively fail but also rules like non-contradiction, excluded middle, De Morgan's laws, and disjunction introduction, which intuitively remain valid for epistemic modals. In this paper, we aim for a middle ground, developing a semantics and logic for epistemic modals that makes epistemic contradictions genuine contradictions and that invalidates distributivity and disjunctive syllogism but that otherwise preserves classical laws that intuitively remain valid. We start with an algebraic semantics, based on ortholattices instead of Boolean algebras, and then propose a more concrete possibility semantics, based on partial possibilities related by compatibility. Both semantics yield the same consequence relation, which we axiomatize. We then show how to lift an arbitrary possible worlds model for a non-modal language to a possibility model for a language with epistemic modals.
A robust and sparse Direction of Arrival (DOA) estimator is derived for array data that follows a Complex Elliptically Symmetric (CES) distribution with zero-mean and finite second-order moments. The derivation allows to choose the loss function and four loss functions are discussed in detail: the Gauss loss which is the Maximum-Likelihood (ML) loss for the circularly symmetric complex Gaussian distribution, the ML-loss for the complex multivariate $t$-distribution (MVT) with $\nu$ degrees of freedom, as well as Huber and Tyler loss functions. For Gauss loss, the method reduces to Sparse Bayesian Learning (SBL). The root mean square DOA error of the derived estimators is discussed for Gaussian, MVT, and $\epsilon$-contaminated data. The robust SBL estimators perform well for all cases and nearly identical with classical SBL for Gaussian noise.
Answering temporal CQs over temporalized Description Logic knowledge bases (TKB) is a main technique to realize ontology-based situation recognition. In case the collected data in such a knowledge base is inaccurate, important query answers can be missed. In this paper we introduce the TKB Alignment problem, which computes a variant of the TKB that minimally changes the TKB, but entails the given temporal CQ and is in that sense (cost-)optimal. We investigate this problem for ALC TKBs and conjunctive queries with LTL operators and devise a solution technique to compute (cost-optimal) alignments of TKBs that extends techniques for the alignment problem for propositional LTL over finite traces.
Large Language Models (LLMs) have demonstrated remarkable abilities across numerous disciplines, primarily assessed through tasks in language generation, knowledge utilization, and complex reasoning. However, their alignment with human emotions and values, which is critical for real-world applications, has not been systematically evaluated. Here, we assessed LLMs' Emotional Intelligence (EI), encompassing emotion recognition, interpretation, and understanding, which is necessary for effective communication and social interactions. Specifically, we first developed a novel psychometric assessment focusing on Emotion Understanding (EU), a core component of EI, suitable for both humans and LLMs. This test requires evaluating complex emotions (e.g., surprised, joyful, puzzled, proud) in realistic scenarios (e.g., despite feeling underperformed, John surprisingly achieved a top score). With a reference frame constructed from over 500 adults, we tested a variety of mainstream LLMs. Most achieved above-average EQ scores, with GPT-4 exceeding 89% of human participants with an EQ of 117. Interestingly, a multivariate pattern analysis revealed that some LLMs apparently did not reply on the human-like mechanism to achieve human-level performance, as their representational patterns were qualitatively distinct from humans. In addition, we discussed the impact of factors such as model size, training method, and architecture on LLMs' EQ. In summary, our study presents one of the first psychometric evaluations of the human-like characteristics of LLMs, which may shed light on the future development of LLMs aiming for both high intellectual and emotional intelligence. Project website: //emotional-intelligence.github.io/
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
Transformer, an attention-based encoder-decoder architecture, has revolutionized the field of natural language processing. Inspired by this significant achievement, some pioneering works have recently been done on adapting Transformerliked architectures to Computer Vision (CV) fields, which have demonstrated their effectiveness on various CV tasks. Relying on competitive modeling capability, visual Transformers have achieved impressive performance on multiple benchmarks such as ImageNet, COCO, and ADE20k as compared with modern Convolution Neural Networks (CNN). In this paper, we have provided a comprehensive review of over one hundred different visual Transformers for three fundamental CV tasks (classification, detection, and segmentation), where a taxonomy is proposed to organize these methods according to their motivations, structures, and usage scenarios. Because of the differences in training settings and oriented tasks, we have also evaluated these methods on different configurations for easy and intuitive comparison instead of only various benchmarks. Furthermore, we have revealed a series of essential but unexploited aspects that may empower Transformer to stand out from numerous architectures, e.g., slack high-level semantic embeddings to bridge the gap between visual and sequential Transformers. Finally, three promising future research directions are suggested for further investment.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.