Study Objectives: Polysomnography (PSG) currently serves as the benchmark for evaluating sleep disorders. Its discomfort, impracticality for home-use, and introduction of bias in sleep quality assessment necessitate the exploration of less invasive, cost-effective, and portable alternatives. One promising contender is the in-ear-EEG sensor, which offers advantages in terms of comfort, fixed electrode positions, resistance to electromagnetic interference, and user-friendliness. This study aims to establish a methodology to assess the similarity between the in-ear-EEG signal and standard PSG. Methods: We assess the agreement between the PSG and in-ear-EEG derived hypnograms. We extract features in the time- and frequency- domain from PSG and in-ear-EEG 30-second epochs. We only consider the epochs where the PSG-scorers and the in-ear-EEG-scorers were in agreement. We introduce a methodology to quantify the similarity between PSG derivations and the single-channel in-ear-EEG. The approach relies on a comparison of distributions of selected features -- extracted for each sleep stage and subject on both PSG and the in-ear-EEG signals -- via a Jensen-Shannon Divergence Feature-based Similarity Index (JSD-FSI). Results: We found a high intra-scorer variability, mainly due to the uncertainty the scorers had in evaluating the in-ear-EEG signals. We show that the similarity between PSG and in-ear-EEG signals is high (JSD-FSI: 0.61 +/- 0.06 in awake, 0.60 +/- 0.07 in NREM and 0.51 +/- 0.08 in REM), and in line with the similarity values computed independently on standard PSG-channel-combinations. Conclusions: In-ear-EEG is a valuable solution for home-based sleep monitoring, however further studies with a larger and more heterogeneous dataset are needed.
Bayesian Improved Surname Geocoding (BISG) is a ubiquitous tool for predicting race and ethnicity using an individual's geolocation and surname. Here we demonstrate that statistical dependence of surname and geolocation within racial/ethnic categories in the United States results in biases for minority subpopulations, and we introduce a raking-based improvement. Our method augments the data used by BISG--distributions of race by geolocation and race by surname--with the distribution of surname by geolocation obtained from state voter files. We validate our algorithm on state voter registration lists that contain self-identified race/ethnicity.
In 2001, D. Erwin \cite{Erw01} introduced in his Ph.D. dissertation the notion of broadcast independence in unoriented graphs. Since then, some results but not many, are published on this notion, including research work on the broadcast independence number of unoriented circulant graphs \cite{LBS23}. In this paper, we are focused in the same parameter but of the class of oriented circulant graphs. An independent broadcast on an oriented graph $\overrightarrow{G}$ is a function $f: V\longrightarrow \{0,\ldots,\diam(\overrightarrow{G})\}$ such that $(i)$ $f(v)\leq e(v)$ for every vertex $v\in V(\overrightarrow{G})$, where $\diam(\overrightarrow{G})$ denotes the diameter of $\overrightarrow{G}$ and $e(v)$ the eccentricity of vertex $v$, and $(ii)$ $d_{\overrightarrow{G}}(u,v) > f(u)$ for every distinct vertices $u$, $v$ with $f(u)$, $f(v)>0$, where $d_{\overrightarrow{G}}(u,v)$ denotes the length of a shortest oriented path from $u$ to $v$. The broadcast independence number $\beta_b(\overrightarrow{G})$ of $\overrightarrow{G}$ is then the maximum value of $\sum_{v \in V} f(v)$, taken over all independent broadcasts on $\overrightarrow{G}$. The goal of this paper is to study the properties of independent broadcasts of oriented circulant graphs $\overrightarrow{C}(n;1,a)$, for any integers $n$ and $a$ with $n>|a|\geq 1$ and $a \notin \{1,n-1\}$. Then, we give some bounds and some exact values for the number $\beta_b(\overrightarrow{C}(n;1,a))$.
The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text retrieval. However, applying them directly to UCDR may not sufficiently to handle both domain shift (i.e., adapting to unfamiliar domains) and semantic shift (i.e., transferring to unknown categories). To this end, we propose \textbf{Pro}mpting-to-\textbf{S}imulate (ProS), the first method to apply prompt tuning for UCDR. ProS employs a two-step process to simulate Content-aware Dynamic Prompts (CaDP) which can impact models to produce generalized features for UCDR. Concretely, in Prompt Units Learning stage, we introduce two Prompt Units to individually capture domain and semantic knowledge in a mask-and-align way. Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP. Extensive experiments conducted on three benchmark datasets show that our method achieves new state-of-the-art performance without bringing excessive parameters. Our method is publicly available at //github.com/fangkaipeng/ProS.
The approach to analysing compositional data has been dominated by the use of logratio transformations, to ensure exact subcompositional coherence and, in some situations, exact isometry as well. A problem with this approach is that data zeros, found in most applications, have to be replaced to allow the logarithmic transformation. An alternative new approach, called the `chiPower' transformation, which allows data zeros, is to combine the standardization inherent in the chi-square distance in correspondence analysis, with the essential elements of the Box-Cox power transformation. The chiPower transformation is justified because it} defines between-sample distances that tend to logratio distances for strictly positive data as the power parameter tends to zero, and are then equivalent to transforming to logratios. For data with zeros, a value of the power can be identified that brings the chiPower transformation as close as possible to a logratio transformation, without having to substitute the zeros. Especially in the area of high-dimensional data, this alternative approach can present such a high level of coherence and isometry as to be a valid approach to the analysis of compositional data. Furthermore, in a supervised learning context, if the compositional variables serve as predictors of a response in a modelling framework, for example generalized linear models, then the power can be used as a tuning parameter in optimizing the accuracy of prediction through cross-validation. The chiPower-transformed variables have a straightforward interpretation, since they are each identified with single compositional parts, not ratios.
With the increase in the availability of Building Information Models (BIM) and (semi-) automatic tools to generate BIM from point clouds, we propose a world model architecture and algorithms to allow the use of the semantic and geometric knowledge encoded within these models to generate maps for robot localization and navigation. When heterogeneous robots are deployed within an environment, maps obtained from classical SLAM approaches might not be shared between all agents within a team of robots, e.g. due to a mismatch in sensor type, or a difference in physical robot dimensions. Our approach extracts the 3D geometry and semantic description of building elements (e.g. material, element type, color) from BIM, and represents this knowledge in a graph. Based on queries on the graph and knowledge of the skills of the robot, we can generate skill-specific maps that can be used during the execution of localization or navigation tasks. The approach is validated with data from complex build environments and integrated into existing navigation frameworks.
The categorical Gini covariance is a dependence measure between a numerical variable and a categorical variable. The Gini covariance measures dependence by quantifying the difference between the conditional and unconditional distributional functions. A value of zero for the categorical Gini covariance implies independence of the numerical variable and the categorical variable. We propose a non-parametric test for testing the independence between a numerical and categorical variable using the categorical Gini covariance. We used the theory of U-statistics to find the test statistics and study the properties. The test has an asymptotic normal distribution. As the implementation of a normal-based test is difficult, we develop a jackknife empirical likelihood (JEL) ratio test for testing independence. Extensive Monte Carlo simulation studies are carried out to validate the performance of the proposed JEL-based test. We illustrate the test procedure using real a data set.
Evaluating the Expected Information Gain (EIG) is a critical task in many areas of computational science and statistics, necessitating the approximation of nested integrals. Available techniques for this problem based on Quasi-Monte Carlo (QMC) methods have primarily focused on enhancing the efficiency of the inner integral approximation. In this work, we introduce a novel approach that extends the scope of these efforts to address inner and outer expectations simultaneously. Leveraging the principles of Owen's scrambling, we develop a randomized quasi-Monte Carlo (RQMC) method that improves the approximation of nested integrals. We also indicate how to combine this methodology with Importance Sampling to address a measure concentration arising in the inner integral. Our RQMC method capitalizes on the unique structure of nested expectations to offer a more efficient approximation mechanism. By incorporating Owen's scrambling techniques, we handle integrands exhibiting infinite variation in the Hardy-Krause (HK) sense, paving the way for theoretically sound error estimates. We derive asymptotic error bounds for the bias and variance of our estimator. In addition, we provide nearly optimal sample sizes for the inner and outer RQMC approximations, which are helpful for the actual numerical implementations. We verify the quality of our estimator through numerical experiments in the context of Bayesian optimal experimental design. Specifically, we compare the computational efficiency of our RQMC method against standard nested Monte Carlo integration across two case studies: one in thermo-mechanics and the other in pharmacokinetics. These examples highlight our approach's computational savings and enhanced applicability, showcasing the advantages of estimating the Expected Information Gain with greater efficiency and reduced computational cost.
Trapped human detection in search and rescue (SAR) scenarios poses a significant challenge in pervasive computing. This study addresses this issue by leveraging machine learning techniques, given their high accuracy. However, accurate identification of trapped individuals is hindered by the curse of dimensionality and noisy data. Particularly in non-line-of-sight (NLOS) situations during catastrophic events, the curse of dimensionality may lead to blind spots due to noise and uncorrelated values in detections. This research focuses on harmonizing information through wireless communication and identifying individuals in NLOS scenarios using ultra-wideband (UWB) radar signals. Employing independent component analysis (ICA) for feature extraction, the study evaluates classification performance using ensemble algorithms on both static and dynamic datasets. The experimental results demonstrate categorization accuracies of 88.37% for static data and 87.20% for dynamic data, highlighting the effectiveness of the proposed approach. Finally, this work can help scientists and engineers make instant decisions during SAR operations.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.