We present a method for obtaining unbiased signal estimates in the presence of a significant background, eliminating the need for a parametric model for the background itself. Our approach is based on a minimal set of conditions for observation and background estimators, which are typically satisfied in practical scenarios. To showcase the effectiveness of our method, we apply it to simulated data from the planned dielectric axion haloscope MADMAX.
In online conferencing applications, estimating the perceived quality of an audio signal is crucial to ensure high quality of experience for the end user. The most reliable way to assess the quality of a speech signal is through human judgments in the form of the mean opinion score (MOS) metric. However, such an approach is labor intensive and not feasible for large-scale applications. The focus has therefore shifted towards automated speech quality assessment through end-to-end training of deep neural networks. Recently, it was shown that leveraging pre-trained wav2vec-based XLS-R embeddings leads to state-of-the-art performance for the task of speech quality prediction. In this paper, we perform an in-depth analysis of the pre-trained model. First, we analyze the performance of embeddings extracted from each layer of XLS-R and also for each size of the model (300M, 1B, 2B parameters). Surprisingly, we find two optimal regions for feature extraction: one in the lower-level features and one in the high-level features. Next, we investigate the reason for the two distinct optima. We hypothesize that the lower-level features capture characteristics of noise and room acoustics, whereas the high-level features focus on speech content and intelligibility. To investigate this, we analyze the sensitivity of the MOS predictions with respect to different levels of corruption in each category. Afterwards, we try fusing the two optimal feature depths to determine if they contain complementary information for MOS prediction. Finally, we compare the performance of the proposed models and assess the generalizability of the models on unseen datasets.
This paper presents a comprehensive survey of ChatGPT-related (GPT-3.5 and GPT-4) research, state-of-the-art large language models (LLM) from the GPT series, and their prospective applications across diverse domains. Indeed, key innovations such as large-scale pre-training that captures knowledge across the entire world wide web, instruction fine-tuning and Reinforcement Learning from Human Feedback (RLHF) have played significant roles in enhancing LLMs' adaptability and performance. We performed an in-depth analysis of 194 relevant papers on arXiv, encompassing trend analysis, word cloud representation, and distribution analysis across various application domains. The findings reveal a significant and increasing interest in ChatGPT-related research, predominantly centered on direct natural language processing applications, while also demonstrating considerable potential in areas ranging from education and history to mathematics, medicine, and physics. This study endeavors to furnish insights into ChatGPT's capabilities, potential implications, ethical concerns, and offer direction for future advancements in this field.
Analysis of a network in terms of vulnerability is one of the most significant problems. Graph theory serves as a valuable tool for solving complex network problems, and there exist numerous graph-theoretic parameters to analyze the system's stability. Among these parameters, the closeness parameter stands out as one of the most commonly used vulnerability metric. Its definition has evolved over time to enhance ease of formulation and applicability to disconnected structures. Furthermore, based on the closeness parameter, residual closeness, which is a newer and more sensitive parameter compared to other existing parameters, has been introduced as a new graph vulnerability index by Dangalchev. In this study, the outcomes of the closeness and residual closeness parameters in Harary Graphs have been examined. Harary Graphs are well-known constructs that are distinguished by having $n$ vertices that are $k$-connected with the least possible number of edges.
This paper presents a rigorous Bayesian analysis of the information in the signal (consisting of both the line-of-sight (LOS) path and reflections from multiple reconfigurable intelligent surfaces (RISs)) that originate from a single base station (BS) and is received by a user equipment (UE). For a comprehensive Bayesian analysis, both near and far field regimes are considered. The Bayesian analysis views both the location of the RISs and previous information about the UE as {\em a priori} information for UE localization. With outdated {\em a priori} information, the position and orientation offsets of the RISs become parameters that need to be estimated and fed back to the BS for correction. We first show that when the RIS elements have a half wavelength spacing, this RIS orientation offset is a factor in the pathloss of the RIS paths. Subsequently, we show through the Bayesian equivalent Fisher information matrix (EFIM) for the channel parameters that the RIS orientation offset cannot be corrected when there is an unknown phase offset in the received signal in the far-field regime. However, the corresponding EFIM for the channel parameters in the received signal observed in the near-field shows that this unknown phase offset does not hinder the estimation of the RIS orientation offset when the UE has more than one receive antenna. Furthermore, we use the EFIM for the UE location parameters to present bounds for UE localization in the presence of RIS uncertainty.
A generative AI model can generate extremely realistic-looking content, posing growing challenges to the authenticity of information. To address the challenges, watermark has been leveraged to detect AI-generated content. Specifically, a watermark is embedded into an AI-generated content before it is released. A content is detected as AI-generated if a similar watermark can be decoded from it. In this work, we perform a systematic study on the robustness of such watermark-based AI-generated content detection. We focus on AI-generated images. Our work shows that an attacker can post-process a watermarked image via adding a small, human-imperceptible perturbation to it, such that the post-processed image evades detection while maintaining its visual quality. We show the effectiveness of our attack both theoretically and empirically. Moreover, to evade detection, our adversarial post-processing method adds much smaller perturbations to AI-generated images and thus better maintain their visual quality than existing popular post-processing methods such as JPEG compression, Gaussian blur, and Brightness/Contrast. Our work shows the insufficiency of existing watermark-based detection of AI-generated content, highlighting the urgent needs of new methods. Our code is publicly available: \url{//github.com/zhengyuan-jiang/WEvade}.
Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations: relations between words and their synonyms, entities and their attributes, etc. We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation. Linear relation representations may be obtained by constructing a first-order approximation to the LM from a single prompt, and they exist for a variety of factual, commonsense, and linguistic relations. However, we also identify many cases in which LM predictions capture relational knowledge accurately, but this knowledge is not linearly encoded in their representations. Our results thus reveal a simple, interpretable, but heterogeneously deployed knowledge representation strategy in transformer LMs.
Long-term outcomes of experimental evaluations are necessarily observed after long delays. We develop semiparametric methods for combining the short-term outcomes of experiments with observational measurements of short-term and long-term outcomes, in order to estimate long-term treatment effects. We characterize semiparametric efficiency bounds for various instances of this problem. These calculations facilitate the construction of several estimators. We analyze the finite-sample performance of these estimators with a simulation calibrated to data from an evaluation of the long-term effects of a poverty alleviation program.
This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene modeling tasks based on these representations. Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With respect to analysis, we focus on four basic scene understanding tasks -- 3D object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene similarity. And for synthesis, we mainly discuss neural scene synthesis works, though also highlighting model-driven methods that allow for human-centric, progressive scene synthesis. We identify the challenges involved in modeling scenes for these tasks and the kind of machinery that needs to be developed to adapt to the data representation, and the task setting in general. For each of these tasks, we provide a comprehensive summary of the state-of-the-art works across different axes such as the choice of data representation, backbone, evaluation metric, input, output, etc., providing an organized review of the literature. Towards the end, we discuss some interesting research directions that have the potential to make a direct impact on the way users interact and engage with these virtual scene models, making them an integral part of the metaverse.
Deep neural networks have revolutionized many machine learning tasks in power systems, ranging from pattern recognition to signal processing. The data in these tasks is typically represented in Euclidean domains. Nevertheless, there is an increasing number of applications in power systems, where data are collected from non-Euclidean domains and represented as the graph-structured data with high dimensional features and interdependency among nodes. The complexity of graph-structured data has brought significant challenges to the existing deep neural networks defined in Euclidean domains. Recently, many studies on extending deep neural networks for graph-structured data in power systems have emerged. In this paper, a comprehensive overview of graph neural networks (GNNs) in power systems is proposed. Specifically, several classical paradigms of GNNs structures (e.g., graph convolutional networks, graph recurrent neural networks, graph attention networks, graph generative networks, spatial-temporal graph convolutional networks, and hybrid forms of GNNs) are summarized, and key applications in power systems such as fault diagnosis, power prediction, power flow calculation, and data generation are reviewed in detail. Furthermore, main issues and some research trends about the applications of GNNs in power systems are discussed.
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.