Online hate speech proliferation has created a difficult problem for social media platforms. A particular challenge relates to the use of coded language by groups interested in both creating a sense of belonging for its users and evading detection. Coded language evolves quickly and its use varies over time. This paper proposes a methodology for detecting emerging coded hate-laden terminology. The methodology is tested in the context of online antisemitic discourse. The approach considers posts scraped from social media platforms, often used by extremist users. The posts are scraped using seed expressions related to previously known discourse of hatred towards Jews. The method begins by identifying the expressions most representative of each post and calculating their frequency in the whole corpus. It filters out grammatically incoherent expressions as well as previously encountered ones so as to focus on emergent well-formed terminology. This is followed by an assessment of semantic similarity to known antisemitic terminology using a fine-tuned large language model, and subsequent filtering out of the expressions that are too distant from known expressions of hatred. Emergent antisemitic expressions containing terms clearly relating to Jewish topics are then removed to return only coded expressions of hatred.
Smartphones are integral to modern life, yet research highlights the cognitive drawbacks associated even with their mere presence. Physically removing them from sight is a solution, but it is sometimes impractical and may increase anxiety due to fear of missing out. In response, we introduce a simple but effective use of augmented reality (AR) head-mounted displays, focusing not on augmenting reality with virtual objects, but on diminishing reality by selectively removing or occluding distracting objects, from the user's field of view. We compared cognitive task performance across four conditions: the smartphone being physically nearby, physically remote, visually removed and visually occluded via AR. Our findings reveal that using AR to visually cancel out smartphones significantly mitigates cognitive distractions caused by their presence. Specifically, the AR interventions had effects similar to physically removing the phone. These results suggest potential for novel AR applications designed to diminish reality, thereby enhancing cognitive performance.
Necessary and sufficient conditions of uniform consistency are explored. A hypothesis is simple. Nonparametric sets of alternatives are bounded convex sets in $\mathbb{L}_p$, $p >1$ with "small" balls deleted. The "small" balls have the center at the point of hypothesis and radii of balls tend to zero as sample size increases. For problem of hypothesis testing on a density, we show that, for the sets of alternatives, there are uniformly consistent tests for some sequence of radii of the balls, if and only if, convex set is relatively compact. The results are established for problem of hypothesis testing on a density, for signal detection in Gaussian white noise, for linear ill-posed problems with random Gaussian noise and so on.
This work explores the dimension reduction problem for Bayesian nonparametric regression and density estimation. More precisely, we are interested in estimating a functional parameter $f$ over the unit ball in $\mathbb{R}^d$, which depends only on a $d_0$-dimensional subspace of $\mathbb{R}^d$, with $d_0 < d$.It is well-known that rescaled Gaussian process priors over the function space achieve smoothness adaptation and posterior contraction with near minimax-optimal rates. Moreover, hierarchical extensions of this approach, equipped with subspace projection, can also adapt to the intrinsic dimension $d_0$ (\cite{Tokdar2011DimensionAdapt}).When the ambient dimension $d$ does not vary with $n$, the minimax rate remains of the order $n^{-\beta/(2\beta +d_0)}$.%When $d$ does not vary with $n$, the order of the minimax rate remains the same regardless of the ambient dimension $d$. However, this is up to multiplicative constants that can become prohibitively large when $d$ grows. The dependences between the contraction rate and the ambient dimension have not been fully explored yet and this work provides a first insight: we let the dimension $d$ grow with $n$ and, by combining the arguments of \cite{Tokdar2011DimensionAdapt} and \cite{Jiang2021VariableSelection}, we derive a growth rate for $d$ that still leads to posterior consistency with minimax rate.The optimality of this growth rate is then discussed.Additionally, we provide a set of assumptions under which consistent estimation of $f$ leads to a correct estimation of the subspace projection, assuming that $d_0$ is known.
Direct reciprocity based on the repeated prisoner's dilemma has been intensively studied. Most theoretical investigations have concentrated on memory-$1$ strategies, a class of elementary strategies just reacting to the previous-round outcomes. Though the properties of "All-or-None" strategies ($AoN_K$) have been discovered, simulations just confirmed the good performance of $AoN_K$ of very short memory lengths. It remains unclear how $AoN_K$ strategies would fare when players have access to longer rounds of history information. We construct a theoretical model to investigate the performance of the class of $AoN_K$ strategies of varying memory length $K$. We rigorously derive the payoffs and show that $AoN_K$ strategies of intermediate memory length $K$ are most prevalent, while strategies of larger memory lengths are less competent. Larger memory lengths make it hard for $AoN_K$ strategies to coordinate, and thus inhibiting their mutual reciprocity. We then propose the adaptive coordination strategy combining tolerance and $AoN_K$' coordination rule. This strategy behaves like $AoN_K$ strategy when coordination is not sufficient, and tolerates opponents' occasional deviations by still cooperating when coordination is sufficient. We found that the adaptive coordination strategy wins over other classic memory-$1$ strategies in various typical competition environments, and stabilizes the population at high levels of cooperation, suggesting the effectiveness of high level adaptability in resolving social dilemmas. Our work may offer a theoretical framework for exploring complex strategies using history information, which are different from traditional memory-$n$ strategies.
The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame rates; 2) reorganized block structure with more modules, within which we re-use attention weights for efficiency; 3) a modified form of LayerNorm called BiasNorm allows us to retain some length information; 4) new activation functions SwooshR and SwooshL work better than Swish. We also propose a new optimizer, called ScaledAdam, which scales the update by each tensor's current scale to keep the relative change about the same, and also explictly learns the parameter scale. It achieves faster convergence and better performance than Adam. Extensive experiments on LibriSpeech, Aishell-1, and WenetSpeech datasets demonstrate the effectiveness of our proposed Zipformer over other state-of-the-art ASR models. Our code is publicly available at //github.com/k2-fsa/icefall.
One way to make decisions under uncertainty is to select an optimal option from a possible range of options, by maximizing the expected utilities derived from a probability model. However, under severe uncertainty, identifying precise probabilities is hard. For this reason, imprecise probability models uncertainty through convex sets of probabilities, and considers decision rules that can return multiple options to reflect insufficient information. Many well-founded decision rules have been studied in the past, but none of those standard rules are able to control the number of returned alternatives. This can be a problem for large decision problems, due to the cognitive burden decision makers have to face when presented with a large number of alternatives. Our contribution proposes regret-based ideas to construct new decision rules which return a bounded number of options, where the limit on the number of options is set in advance by the decision maker as an expression of their cognitive limitation. We also study their consistency and numerical behaviour.
Despite recent availability of large transcribed Kinyarwanda speech data, achieving robust speech recognition for Kinyarwanda is still challenging. In this work, we show that using self-supervised pre-training, following a simple curriculum schedule during fine-tuning and using semi-supervised learning to leverage large unlabelled speech data significantly improve speech recognition performance for Kinyarwanda. Our approach focuses on using public domain data only. A new studio-quality speech dataset is collected from a public website, then used to train a clean baseline model. The clean baseline model is then used to rank examples from a more diverse and noisy public dataset, defining a simple curriculum training schedule. Finally, we apply semi-supervised learning to label and learn from large unlabelled data in five successive generations. Our final model achieves 3.2% word error rate (WER) on the new dataset and 15.6% WER on Mozilla Common Voice benchmark, which is state-of-the-art to the best of our knowledge. Our experiments also indicate that using syllabic rather than character-based tokenization results in better speech recognition performance for Kinyarwanda.
Simpson's paradox is an obstacle to establishing a probabilistic association between two events $a_1$ and $a_2$, given the third (lurking) random variable $B$. We focus on scenarios when the random variables $A$ (which combines $a_1$, $a_2$, and their complements) and $B$ have a common cause $C$ that need not be observed. Alternatively, we can assume that $C$ screens out $A$ from $B$. For such cases, the correct association between $a_1$ and $a_2$ is to be defined via conditioning over $C$. This set-up generalizes the original Simpson's paradox. Now its two contradicting options simply refer to two particular and different causes $C$. We show that if $B$ and $C$ are binary and $A$ is quaternary (the minimal and the most widespread situation for valid Simpson's paradox), the conditioning over any binary common cause $C$ establishes the same direction of the association between $a_1$ and $a_2$ as the conditioning over $B$ in the original formulation of the paradox. Thus, for the minimal common cause, one should choose the option of Simpson's paradox that assumes conditioning over $B$ and not its marginalization. For tertiary (unobserved) common causes $C$ all three options of Simpson's paradox become possible (i.e. marginalized, conditional, and none of them), and one needs prior information on $C$ to choose the right option.
Hundreds of millions of people now interact with language models, with uses ranging from serving as a writing aid to informing hiring decisions. Yet these language models are known to perpetuate systematic racial prejudices, making their judgments biased in problematic ways about groups like African Americans. While prior research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice: we extend research showing that Americans hold raciolinguistic stereotypes about speakers of African American English and find that language models have the same prejudice, exhibiting covert stereotypes that are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement. By contrast, the language models' overt stereotypes about African Americans are much more positive. We demonstrate that dialect prejudice has the potential for harmful consequences by asking language models to make hypothetical decisions about people, based only on how they speak. Language models are more likely to suggest that speakers of African American English be assigned less prestigious jobs, be convicted of crimes, and be sentenced to death. Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level. Our findings have far-reaching implications for the fair and safe employment of language technology.
Variable selection has played a critical role in modern statistical learning and scientific discoveries. Numerous regularization and Bayesian variable selection methods have been developed in the past two decades for variable selection, but most of these methods consider selecting variables for only one response. As more data is being collected nowadays, it is common to analyze multiple related responses from the same study. Existing multivariate variable selection methods select variables for all responses without considering the possible heterogeneity across different responses, i.e. some features may only predict a subset of responses but not the rest. Motivated by the multi-trait fine mapping problem in genetics to identify the causal variants for multiple related traits, we developed a novel multivariate Bayesian variable selection method to select critical predictors from a large number of grouped predictors that target at multiple correlated and possibly heterogeneous responses. Our new method is featured by its selection at multiple levels, its incorporation of prior biological knowledge to guide selection and identification of best subset of responses predictors target at. We showed the advantage of our method via extensive simulations and a real fine mapping example to identify causal variants associated with different subsets of addictive behaviors.