Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only whether audio is similar, but similar in what way (e.g., wrt. tempo, mood or genre). Previous works have proposed disentangled embedding spaces where subspaces representing specific, yet possibly correlated, attributes can be weighted to emphasize those attributes in downstream tasks. However, no research has been conducted into the independence of these subspaces, nor their manipulation, in order to retrieve tracks that are similar but different in a specific way. Here, we explore the manipulation of tempo in embedding spaces as a case-study towards this goal. We propose tempo translation functions that allow for efficient manipulation of tempo within a pre-existing embedding space whilst maintaining other properties such as genre. As this translation is specific to tempo it enables retrieval of tracks that are similar but have specifically different tempi. We show that such a function can be used as an efficient data augmentation strategy for both training of downstream tempo predictors, and improved nearest neighbor retrieval of properties largely independent of tempo.
In the realm of distributed systems tasked with managing and processing large-scale graph-structured data, optimizing graph partitioning stands as a pivotal challenge. The primary goal is to minimize communication overhead and runtime cost. However, alongside the computational complexity associated with optimal graph partitioning, a critical factor to consider is memory overhead. Real-world graphs often reach colossal sizes, making it impractical and economically unviable to load the entire graph into memory for partitioning. This is also a fundamental premise in distributed graph processing, where accommodating a graph with non-distributed systems is unattainable. Currently, existing streaming partitioning algorithms exhibit a skew-oblivious nature, yielding satisfactory partitioning results exclusively for specific graph types. In this paper, we propose a novel streaming partitioning algorithm, the Skewness-aware Vertex-cut Partitioner S5P, designed to leverage the skewness characteristics of real graphs for achieving high-quality partitioning. S5P offers high partitioning quality by segregating the graph's edge set into two subsets, head and tail sets. Following processing by a skewness-aware clustering algorithm, these two subsets subsequently undergo a Stackelberg graph game. Our extensive evaluations conducted on substantial real-world and synthetic graphs demonstrate that, in all instances, the partitioning quality of S5P surpasses that of existing streaming partitioning algorithms, operating within the same load balance constraints. For example, S5P can bring up to a 51% improvement in partitioning quality compared to the top partitioner among the baselines. Lastly, we showcase that the implementation of S5P results in up to an 81% reduction in communication cost and a 130% increase in runtime efficiency for distributed graph processing tasks on PowerGraph.
Recently, retrieval augmentation and tool augmentation have demonstrated a remarkable capability to expand the internal memory boundaries of language models (LMs) by providing external context. However, internal memory and external context inevitably clash, leading to knowledge conflicts within LMs. In this paper, we aim to interpret the mechanism of knowledge conflicts through the lens of information flow, and then mitigate conflicts by precise interventions at the pivotal point. We find there are some attention heads with opposite effects in the later layers, where memory heads can recall knowledge from internal memory, and context heads can retrieve knowledge from external context. Moreover, we reveal that the pivotal point at which knowledge conflicts emerge in LMs is the integration of inconsistent information flows by memory heads and context heads. Inspired by the insights, we propose a novel method called Pruning Head via PatH PatcHing (PH3), which can efficiently mitigate knowledge conflicts by pruning conflicting attention heads without updating model parameters. PH3 can flexibly control eight LMs to use internal memory ($\uparrow$ 44.0%) or external context ($\uparrow$ 38.5%). Moreover, PH3 can also improve the performance of LMs on open-domain QA tasks. We also conduct extensive experiments to demonstrate the cross-model, cross-relation, and cross-format generalization of our method.
Generative AI models are increasingly powering software applications, offering the capability to produce expressive content across varied contexts. However, unlike previous iterations of human-AI design, the emerging design process for generative capabilities primarily hinges on prompt engineering strategies. Given this fundamental shift in approach, our work aims to understand how collaborative software teams set up and apply design guidelines and values, iteratively prototype prompts, and evaluate prompts to achieve desired outcomes. We conducted design studies with 39 industry professionals, including designers, software engineers, and product managers. Our findings reveal a content-centric prototyping approach in which teams begin with the content they want to generate, then identify specific attributes, constraints, and values, and explore methods to give users the ability to influence and interact with those attributes. Based on associated challenges, such as the lack of model interpretability and overfitting the design to examples, we outline considerations for generative AI prototyping.
Image enhancement algorithms are very useful for real world computer vision tasks where image resolution is often physically limited by the sensor size. While state-of-the-art deep neural networks show impressive results for image enhancement, they often struggle to enhance real-world images. In this work, we tackle a real-world setting: inpainting of images from Dunhuang caves. The Dunhuang dataset consists of murals, half of which suffer from corrosion and aging. These murals feature a range of rich content, such as Buddha statues, bodhisattvas, sponsors, architecture, dance, music, and decorative patterns designed by different artists spanning ten centuries, which makes manual restoration challenging. We modify two different existing methods (CAR, HINet) that are based upon state-of-the-art (SOTA) super resolution and deblurring networks. We show that those can successfully inpaint and enhance these deteriorated cave paintings. We further show that a novel combination of CAR and HINet, resulting in our proposed inpainting network (ARIN), is very robust to external noise, especially Gaussian noise. To this end, we present a quantitative and qualitative comparison of our proposed approach with existing SOTA networks and winners of the Dunhuang challenge. One of the proposed methods HINet) represents the new state of the art and outperforms the 1st place of the Dunhuang Challenge, while our combination ARIN, which is robust to noise, is comparable to the 1st place. We also present and discuss qualitative results showing the impact of our method for inpainting on Dunhuang cave images.
This research presents a proof-of-concept prototype of an all-in-one mixed reality application platform, developed to investigate the needs and expectations of users from mixed reality systems. The study involved an extensive user study with 1,052 participants, including the collection of diaries from 6 users and conducting interviews with 51 participants to gain deeper insights into their experiences. The findings from the interviews revealed that directly porting current user flows into 3D environments was not well-received by the target users. Instead, users expressed a clear preference for alternative 3D interactions along with the continued use of 2D interfaces. This study provides insights for understanding user preferences and interactions in mixed reality systems, and design recommendations to facilitate the mass adoption of MR systems.
Recent statements about the impressive capabilities of large language models (LLMs) are usually supported by evaluating on open-access benchmarks. Considering the vast size and wide-ranging sources of LLMs' training data, it could explicitly or implicitly include test data, leading to LLMs being more susceptible to data contamination. However, due to the opacity of training data, the black-box access of models, and the rapid growth of synthetic training data, detecting and mitigating data contamination for LLMs faces significant challenges. In this paper, we propose CDD, which stands for Contamination Detection via output Distribution for LLMs. CDD necessitates only the sampled texts to detect data contamination, by identifying the peakedness of LLM's output distribution. To mitigate the impact of data contamination in evaluation, we also present TED: Trustworthy Evaluation via output Distribution, based on the correction of LLM's output distribution. To facilitate this study, we introduce two benchmarks, i.e., DetCon and ComiEval, for data contamination detection and contamination mitigation evaluation tasks. Extensive experimental results show that CDD achieves the average relative improvements of 21.8\%-30.2\% over other contamination detection approaches in terms of Accuracy, F1 Score, and AUC metrics, and can effectively detect contamination caused by the variants of test data. TED significantly mitigates performance improvements up to 66.9\% attributed to data contamination across 24 settings and 21 contamination degrees. In real-world applications, we reveal that ChatGPT exhibits a high potential to suffer from data contamination on HumanEval benchmark.
Photo-realistic avatar is a modern term referring to the digital asset that represents a human in computer graphic advance systems such as video games and simulation tools. These avatars utilize the advances in graphic technologies on both software and hardware aspects. While photorealistic avatars are increasingly used in industrial simulations, representing human factors such as human workers internal states, remains a challenge. This article addresses this issue by introducing the concept of MetaStates which are the digitization and representation of the psychophysiological states of a human worker in the digital world. The MetaStates influence the physical representation and performance of a digital human worker while performing a task. To demonstrate this concept the study presents a development of a photorealistic avatar which is integrated into a simulated environment and enhanced with a multi-level graphical representation of different psychophysiological states. This approach represents a major step forward in the use of digital humans for industrial simulations, allowing companies to better leverage the benefits of the Industrial Metaverse in their daily operations and simulations while keeping human workers at the center of the system.
We present a novel dataset for the controlled composition of counterarguments designed for further applications in argument refining, mining, and evaluation. Our dataset constitutes enriched counter-arguments to posts in the Reddit ChangeMyView dataset that are integrated with evidence retrieved from high-quality sources and generated based on user preferences, adjusting the critical attributes of evidence and argument style. The resultant Counterfire corpus comprises arguments generated from GPT-3.5 turbo, Koala, and PaLM 2 models and two of their finetuned variants (N = 32,000). Model evaluation indicates strong paraphrasing abilities with evidence, albeit limited word overlap, while demonstrating high style integration (0.9682 for 'reciprocity'), showing the ability of LLM to assimilate diverse styles. Of all models, GPT-3.5 turbo showed the highest scores in argument quality evaluation, showing consistent accuracy (score >0.8). In further analyses, reciprocity-style counterarguments display higher counts in most categories, possibly indicating a more creatively persuasive use of evidence. In contrast, human-written counterarguments exhibited greater argumentative richness and diversity across categories. Despite human-written arguments being favored as the most persuasive in human evaluation, the 'No Style' generated text surprisingly exhibited the highest score, prompting further exploration and investigation on the trade-offs in generation for facts and style.
Knowledge Graph Completion (KGC) is crucial for addressing knowledge graph incompleteness and supporting downstream applications. Many models have been proposed for KGC. They can be categorized into two main classes: triple-based and text-based approaches. Triple-based methods struggle with long-tail entities due to limited structural information and imbalanced entity distributions. Text-based methods alleviate this issue but require costly training for language models and specific finetuning for knowledge graphs, which limits their efficiency. To alleviate these limitations, in this paper, we propose KICGPT, a framework that integrates a large language model (LLM) and a triple-based KGC retriever. It alleviates the long-tail problem without incurring additional training overhead. KICGPT uses an in-context learning strategy called Knowledge Prompt, which encodes structural knowledge into demonstrations to guide the LLM. Empirical results on benchmark datasets demonstrate the effectiveness of KICGPT with smaller training overhead and no finetuning.
While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.