Multiple measures, such as WEAT or MAC, attempt to quantify the magnitude of bias present in word embeddings in terms of a single-number metric. However, such metrics and the related statistical significance calculations rely on treating pre-averaged data as individual data points and employing bootstrapping techniques with low sample sizes. We show that similar results can be easily obtained using such methods even if the data are generated by a null model lacking the intended bias. Consequently, we argue that this approach generates false confidence. To address this issue, we propose a Bayesian alternative: hierarchical Bayesian modeling, which enables a more uncertainty-sensitive inspection of bias in word embeddings at different levels of granularity. To showcase our method, we apply it to Religion, Gender, and Race word lists from the original research, together with our control neutral word lists. We deploy the method using Google, Glove, and Reddit embeddings. Further, we utilize our approach to evaluate a debiasing technique applied to Reddit word embedding. Our findings reveal a more complex landscape than suggested by the proponents of single-number metrics. The datasets and source code for the paper are publicly available.
In this paper, to the best of our knowledge, we make the first attempt at studying the parametric semilinear elliptic eigenvalue problems with the parametric coefficient and some power-type nonlinearities. The parametric coefficient is assumed to have an affine dependence on the countably many parameters with an appropriate class of sequences of functions. In this paper, we obtain the upper bound estimation for the mixed derivatives of the ground eigenpairs that has the same form obtained recently for the linear eigenvalue problem. The three most essential ingredients for this estimation are the parametric analyticity of the ground eigenpairs, the uniform boundedness of the ground eigenpairs, and the uniform positive differences between ground eigenvalues of linear operators. All these three ingredients need new techniques and a careful investigation of the nonlinear eigenvalue problem that will be presented in this paper. As an application, considering each parameter as a uniformly distributed random variable, we estimate the expectation of the eigenpairs using a randomly shifted quasi-Monte Carlo lattice rule and show the dimension-independent error bound.
The idea of the restricted mean has been used to establish a significantly improved version of Markov's inequality that does not require any new assumptions. The result immediately extends on Chebyshev's inequalities and Chernoff's bound. The improved Markov inequality yields a bound that is hundreds or thousands of times more accurate than the original Markov bound for high quantiles in the most prevalent and diverse situations. The Markov inequality benefits from being model-independent, and the long-standing issue of its imprecision is solved. Practically speaking, avoidance of model risk is decisive when multiple competing models are present in a real-world situation.
Consider minimizing the entropy of a mixture of states by choosing each state subject to constraints. If the spectrum of each state is fixed, we expect that in order to reduce the entropy of the mixture, we should make the states less distinguishable in some sense. Here, we study a class of optimization problems that are inspired by this situation and shed light on the relevant notions of distinguishability. The motivation for our study is the recently introduced spin alignment conjecture. In the original version of the underlying problem, each state in the mixture is constrained to be a freely chosen state on a subset of $n$ qubits tensored with a fixed state $Q$ on each of the qubits in the complement. According to the conjecture, the entropy of the mixture is minimized by choosing the freely chosen state in each term to be a tensor product of projectors onto a fixed maximal eigenvector of $Q$, which maximally "aligns" the terms in the mixture. We generalize this problem in several ways. First, instead of minimizing entropy, we consider maximizing arbitrary unitarily invariant convex functions such as Fan norms and Schatten norms. To formalize and generalize the conjectured required alignment, we define alignment as a preorder on tuples of self-adjoint operators that is induced by majorization. We prove the generalized conjecture for Schatten norms of integer order, for the case where the freely chosen states are constrained to be classical, and for the case where only two states contribute to the mixture and $Q$ is proportional to a projector. The last case fits into a more general situation where we give explicit conditions for maximal alignment. The spin alignment problem has a natural "dual" formulation, versions of which have further generalizations that we introduce.
We present the Continuous Empirical Cubature Method (CECM), a novel algorithm for empirically devising efficient integration rules. The CECM aims to improve existing cubature methods by producing rules that are close to the optimal, featuring far less points than the number of functions to integrate. The CECM consists on a two-stage strategy. First, a point selection strategy is applied for obtaining an initial approximation to the cubature rule, featuring as many points as functions to integrate. The second stage consists in a sparsification strategy in which, alongside the indexes and corresponding weights, the spatial coordinates of the points are also considered as design variables. The positions of the initially selected points are changed to render their associated weights to zero, and in this way, the minimum number of points is achieved. Although originally conceived within the framework of hyper-reduced order models (HROMs), we present the method's formulation in terms of generic vector-valued functions, thereby accentuating its versatility across various problem domains. To demonstrate the extensive applicability of the method, we conduct numerical validations using univariate and multivariate Lagrange polynomials. In these cases, we show the method's capacity to retrieve the optimal Gaussian rule. We also asses the method for an arbitrary exponential-sinusoidal function in a 3D domain, and finally consider an example of the application of the method to the hyperreduction of a multiscale finite element model, showcasing notable computational performance gains. A secondary contribution of the current paper is the Sequential Randomized SVD (SRSVD) approach for computing the Singular Value Decomposition (SVD) in a column-partitioned format. The SRSVD is particularly advantageous when matrix sizes approach memory limitations.
We study Lindstrom quantifiers that satisfy certain closure properties which are motivated by the study of polymorphisms in the context of constraint satisfaction problems (CSP). When the algebra of polymorphisms of a finite structure B satisfies certain equations, this gives rise to a natural closure condition on the class of structures that map homomorphically to B. The collection of quantifiers that satisfy closure conditions arising from a fixed set of equations are rather more general than those arising as CSP. For any such conditions P, we define a pebble game that delimits the distinguishing power of the infinitary logic with all quantifiers that are P-closed. We use the pebble game to show that the problem of deciding whether a system of linear equations is solvable in Z2 is not expressible in the infinitary logic with all quantifiers closed under a near-unanimity condition.
Deep Learning predictions with measurable confidence are increasingly desirable for real-world problems, especially in high-risk settings. The Conformal Prediction (CP) framework is a versatile solution that guarantees a maximum error rate given minimal constraints. In this paper, we propose a novel conformal loss function that approximates the traditionally two-step CP approach in a single step. By evaluating and penalising deviations from the stringent expected CP output distribution, a Deep Learning model may learn the direct relationship between the input data and the conformal p-values. We carry out a comprehensive empirical evaluation to show our novel loss function's competitiveness for seven binary and multi-class prediction tasks on five benchmark datasets. On the same datasets, our approach achieves significant training time reductions up to 86% compared to Aggregated Conformal Prediction (ACP), while maintaining comparable approximate validity and predictive efficiency.
V. Levenshtein first proposed the sequence reconstruction problem in 2001. This problem studies the model where the same sequence from some set is transmitted over multiple channels, and the decoder receives the different outputs. Assume that the transmitted sequence is at distance $d$ from some code and there are at most $r$ errors in every channel. Then the sequence reconstruction problem is to find the minimum number of channels required to recover exactly the transmitted sequence that has to be greater than the maximum intersection between two metric balls of radius $r$, where the distance between their centers is at least $d$. In this paper, we study the sequence reconstruction problem of permutations under the Hamming distance. In this model we define a Cayley graph over the symmetric group, study its properties and find the exact value of the largest intersection of its two metric balls for $d=2r$. Moreover, we give a lower bound on the largest intersection of two metric balls for $d=2r-1$.
In this paper, we present a novel algorithm called the Hybrid Search algorithm that integrates the Zermelo's Navigation Initial Value Problem with the Ferraro-Mart\'in de Diego-Almagro algorithm to find the optimal route for a vessel to reach its destination. Our algorithm is designed to work in both Euclidean and spherical spaces and utilizes a heuristic that allows the vessel to move forward while remaining within a predetermined search cone centred around the destination. This approach not only improves efficiency but also includes obstacle avoidance, making it well-suited for real-world applications. We evaluate the performance of the Hybrid Search algorithm on synthetic vector fields and real ocean currents data, demonstrating its effectiveness and performance.
Richardson extrapolation is applied to a simple first-order upwind difference scheme for the approximation of solutions of singularly perturbed convection-diffusion problems in one dimension. Robust a posteriori error bounds are derived for the proposed method on arbitrary meshes. It is shown that the resulting error estimator can be used to stear an adaptive mesh algorithm that generates meshes resolving layers and singularities. Numerical results are presented that illustrate the theoretical findings.
We present our proposed solution to the BabyLM challenge [arXiv:2301.11796], whose goal was to improve the sample efficiency of language models. We trained an ensemble consisting of a GPT-2 and small LLaMA models on the developmentally-plausible, 10M-word BabyLM dataset, then distilled it into a small, 58M-parameter LLaMA model, which exceeds in performance both of its teachers as well as a similar model trained without distillation. This suggests that distillation can not only retain the full performance of the teacher model when the latter is trained on a sufficiently small dataset; it can exceed it, and lead to significantly better performance than direct training.