Named Entity Recognition seeks to extract substrings within a text that name real-world objects and to determine their type (for example, whether they refer to persons or organizations). In this survey, we first present an overview of recent popular approaches, but we also look at graph- and transformer- based methods including Large Language Models (LLMs) that have not had much coverage in other surveys. Second, we focus on methods designed for datasets with scarce annotations. Third, we evaluate the performance of the main NER implementations on a variety of datasets with differing characteristics (as regards their domain, their size, and their number of classes). We thus provide a deep comparison of algorithms that are never considered together. Our experiments shed some light on how the characteristics of datasets affect the behavior of the methods that we compare.
This paper investigates extremal quantiles under two-way cluster dependence. We demonstrate that the limiting distribution of the unconditional intermediate order quantiles in the tails converges to a Gaussian distribution. This is remarkable as two-way cluster dependence entails potential non-Gaussianity in general, but extremal quantiles do not suffer from this issue. Building upon this result, we extend our analysis to extremal quantile regressions of intermediate order.
Long quantum codes using projective Reed-Muller codes are constructed. Projective Reed-Muller are evaluation codes obtained by evaluating homogeneous polynomials at the projective space. We obtain asymmetric and symmetric quantum codes by using the CSS construction and the Hermitian construction, respectively. We provide entanglement-assisted quantum error-correcting codes from projective Reed-Muller codes with flexible amounts of entanglement by considering equivalent codes. Moreover, we also construct quantum codes from subfield subcodes of projective Reed-Muller codes.
Hundreds of millions of people now interact with language models, with uses ranging from serving as a writing aid to informing hiring decisions. Yet these language models are known to perpetuate systematic racial prejudices, making their judgments biased in problematic ways about groups like African Americans. While prior research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice: we extend research showing that Americans hold raciolinguistic stereotypes about speakers of African American English and find that language models have the same prejudice, exhibiting covert stereotypes that are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement. By contrast, the language models' overt stereotypes about African Americans are much more positive. We demonstrate that dialect prejudice has the potential for harmful consequences by asking language models to make hypothetical decisions about people, based only on how they speak. Language models are more likely to suggest that speakers of African American English be assigned less prestigious jobs, be convicted of crimes, and be sentenced to death. Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level. Our findings have far-reaching implications for the fair and safe employment of language technology.
In 1999, Xing, Niederreiter and Lam introduced a generalization of AG codes using the evaluation at non-rational places of a function field. In this paper, we show that one can obtain a locality parameter $r$ in such codes by using only non-rational places of degrees at most $r$. This is, up to the author's knowledge, a new way to construct locally recoverable codes (LRCs). We give an example of such a code reaching the Singleton-like bound for LRCs, and show the parameters obtained for some longer codes over $\mathbb F_3$. We then investigate similarities with certain concatenated codes. Contrary to previous methods, our construction allows one to obtain directly codes whose dimension is not a multiple of the locality. Finally, we give an asymptotic study using the Garcia-Stichtenoth tower of function fields, for both our construction and a construction of concatenated codes. We give explicit infinite families of LRCs with locality 2 over any finite field of cardinality greater than 3 following our new approach.
This study develops a model-based index creation approach called the Generalized Shared Component Model (GSCM) by drawing on the large field of factor models. The proposed fully Bayesian approach accommodates heteroscedastic model error, multiple shared factors and flexible spatial priors. Moreover, our model, unlike previous index approaches, provides indices with uncertainty. Focusing on Australian risk factor data, the proposed GSCM is used to develop the Area Indices of Behaviors Impacting Cancer product - representing the first area level cancer risk factor index in Australia. This advancement aids in identifying communities with elevated cancer risk, facilitating targeted health interventions.
The deconfounder was proposed as a method for estimating causal parameters in a context with multiple causes and unobserved confounding. It is based on recovery of a latent variable from the observed causes. We disentangle the causal interpretation from the statistical estimation problem and show that the deconfounder in general estimates adjusted regression target parameters. It does so by outcome regression adjusted for the recovered latent variable termed the substitute. We refer to the general algorithm, stripped of causal assumptions, as substitute adjustment. We give theoretical results to support that substitute adjustment estimates adjusted regression parameters when the regressors are conditionally independent given the latent variable. We also introduce a variant of our substitute adjustment algorithm that estimates an assumption-lean target parameter with minimal model assumptions. We then give finite sample bounds and asymptotic results supporting substitute adjustment estimation in the case where the latent variable takes values in a finite set. A simulation study illustrates finite sample properties of substitute adjustment. Our results support that when the latent variable model of the regressors hold, substitute adjustment is a viable method for adjusted regression.
Bayesian Improved Surname Geocoding (BISG) is a ubiquitous tool for predicting race and ethnicity using an individual's geolocation and surname. Here we demonstrate that statistical dependence of surname and geolocation within racial/ethnic categories in the United States results in biases for minority subpopulations, and we introduce a raking-based improvement. Our method augments the data used by BISG--distributions of race by geolocation and race by surname--with the distribution of surname by geolocation obtained from state voter files. We validate our algorithm on state voter registration lists that contain self-identified race/ethnicity.
By abstracting over well-known properties of De Bruijn's representation with nameless dummies, we design a new theory of syntax with variable binding and capture-avoiding substitution. We propose it as a simpler alternative to Fiore, Plotkin, and Turi's approach, with which we establish a strong formal link. We also show that our theory easily incorporates simple types and equations between terms.
We show that any Lotka--Volterra tree-system associated with an $n$-vertex tree, as introduced in Quispel et al., J. Phys. A 56 (2023) 315201, preserves a rational measure. We also prove that the Kahan discretisation of these tree-systems factorises and preserves the same measure. As a consequence, for the Kahan maps of Lotka--Volterra systems related to the subclass of tree-systems corresponding to graphs with more than one $n$-vertex subtree, we are able to construct rational integrals.
Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.