A crucial challenge for generative large language models (LLMs) is diversity: when a user's prompt is under-specified, models may follow implicit assumptions while generating a response, which may result in homogenization of the responses, as well as certain demographic groups being under-represented or even erased from the generated responses. In this paper, we formalize diversity of representation in generative LLMs. We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes. We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal. This finding motivated a new prompting technique called collective-critique and self-voting (CCSV) to self-improve people diversity of LLMs by tapping into its diversity reasoning capabilities, without relying on handcrafted examples or prompt tuning. Extensive empirical experiments with both human and automated evaluations show that our proposed approach is effective at improving people and culture diversity, and outperforms all baseline methods by a large margin.
Quantum Relative Entropy (QRE) programming is a recently popular and challenging class of convex optimization problems with significant applications in quantum computing and quantum information theory. We are interested in modern interior point (IP) methods based on optimal self-concordant barriers for the QRE cone. A range of theoretical and numerical challenges associated with such barrier functions and the QRE cones have hindered the scalability of IP methods. To address these challenges, we propose a series of numerical and linear algebraic techniques and heuristics aimed at enhancing the efficiency of gradient and Hessian computations for the self-concordant barrier function, solving linear systems, and performing matrix-vector products. We also introduce and deliberate about some interesting concepts related to QRE such as symmetric quantum relative entropy (SQRE). We also introduce a two-phase method for performing facial reduction that can significantly improve the performance of QRE programming. Our new techniques have been implemented in the latest version (DDS 2.2) of the software package DDS. In addition to handling QRE constraints, DDS accepts any combination of several other conic and non-conic convex constraints. Our comprehensive numerical experiments encompass several parts including 1) a comparison of DDS 2.2 with Hypatia for the nearest correlation matrix problem, 2) using DDS for combining QRE constraints with various other constraint types, and 3) calculating the key rate for quantum key distribution (QKD) channels and presenting results for several QKD protocols.
Post hoc explanations have emerged as a way to improve user trust in machine learning models by providing insight into model decision-making. However, explanations tend to be evaluated based on their alignment with prior knowledge while the faithfulness of an explanation with respect to the model, a fundamental criterion, is often overlooked. Furthermore, the effect of explanation faithfulness and alignment on user trust and whether this effect differs among laypeople and domain experts is unclear. To investigate these questions, we conduct a user study with computer science students and doctors in three domain areas, controlling the laypeople and domain expert groups in each setting. The results indicate that laypeople base their trust in explanations on explanation faithfulness while domain experts base theirs on explanation alignment. To our knowledge, this work is the first to show that (1) different factors affect laypeople and domain experts' trust in post hoc explanations and (2) domain experts are subject to specific biases due to their expertise when interpreting post hoc explanations. By uncovering this phenomenon and exposing this cognitive bias, this work motivates the need to educate end users about how to properly interpret explanations and overcome their own cognitive biases, and motivates the development of simple and interpretable faithfulness metrics for end users. This research is particularly important and timely as post hoc explanations are increasingly being used in high-stakes, real-world settings such as medicine.
Federated learning (FL) emphasizes decentralized training by storing data locally and sending only model updates, underlining user privacy. Recently, a line of works on privacy attacks impairs user privacy by extracting sensitive training text from language models in the context of FL. Yet, these attack techniques face distinct hurdles: some work chiefly with limited batch sizes (e.g., batch size of 1), and others are easily detectable. This paper introduces an innovative approach that is challenging to detect, significantly enhancing the recovery rate of text in various batch-size settings. Building on fundamental gradient matching and domain prior knowledge, we enhance the attack by recovering the input of the Pooler layer of language models, which enables us to provide additional supervised signals at the feature level. Unlike gradient data, these signals do not average across sentences and tokens, thereby offering more nuanced and effective insights. We benchmark our method using text classification tasks on datasets such as CoLA, SST-2, and Rotten Tomatoes. Across different batch sizes and models, our approach consistently outperforms previous state-of-the-art results.
Post hoc explanations have emerged as a way to improve user trust in machine learning models by providing insight into model decision-making. However, explanations tend to be evaluated based on their alignment with prior knowledge while the faithfulness of an explanation with respect to the model, a fundamental criterion, is often overlooked. Furthermore, the effect of explanation faithfulness and alignment on user trust and whether this effect differs among laypeople and domain experts is unclear. To investigate these questions, we conduct a user study with computer science students and doctors in three domain areas, controlling the laypeople and domain expert groups in each setting. The results indicate that laypeople base their trust in explanations on explanation faithfulness while domain experts base theirs on explanation alignment. To our knowledge, this work is the first to show that (1) different factors affect laypeople and domain experts' trust in post hoc explanations and (2) domain experts are subject to specific biases due to their expertise when interpreting post hoc explanations. By uncovering this phenomenon and exposing this cognitive bias, this work motivates the need to educate end users about how to properly interpret explanations and overcome their own cognitive biases, and motivates the development of simple and interpretable faithfulness metrics for end users. This research is particularly important and timely as post hoc explanations are increasingly being used in high-stakes, real-world settings such as medicine.
Diffusion models and large language models have emerged as leading-edge generative models, revolutionizing various aspects of human life. However, the practical implementations of these models have also exposed inherent risks, bringing to the forefront their evil sides and sparking concerns regarding their trustworthiness. Despite the wealth of literature on this subject, a comprehensive survey specifically delving into the intersection of large-scale generative models and their trustworthiness remains largely absent. To bridge this gap, this paper investigates both the long-standing and emerging threats associated with these models across four fundamental dimensions: 1) privacy, 2) security, 3) fairness, and 4) responsibility. Based on the investigation results, we develop an extensive map outlining the trustworthiness of large generative models. After that, we provide practical recommendations and potential research directions for future secure applications equipped with large generative models, ultimately promoting the trustworthiness of the models and benefiting the society as a whole.
While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.
Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for small models within a multi-task training framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task.
Pre-trained Language Models (PLMs) which are trained on large text corpus via self-supervised learning method, have yielded promising performance on various tasks in Natural Language Processing (NLP). However, though PLMs with huge parameters can effectively possess rich knowledge learned from massive training text and benefit downstream tasks at the fine-tuning stage, they still have some limitations such as poor reasoning ability due to the lack of external knowledge. Research has been dedicated to incorporating knowledge into PLMs to tackle these issues. In this paper, we present a comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) to provide a clear insight into this thriving field. We introduce appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight these two main tasks of NLP. For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG), and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods. Finally, we point out some promising future directions of KE-PLMs.
Existing recommender systems extract the user preference based on learning the correlation in data, such as behavioral correlation in collaborative filtering, feature-feature, or feature-behavior correlation in click-through rate prediction. However, regretfully, the real world is driven by causality rather than correlation, and correlation does not imply causation. For example, the recommender systems can recommend a battery charger to a user after buying a phone, in which the latter can serve as the cause of the former, and such a causal relation cannot be reversed. Recently, to address it, researchers in recommender systems have begun to utilize causal inference to extract causality, enhancing the recommender system. In this survey, we comprehensively review the literature on causal inference-based recommendation. At first, we present the fundamental concepts of both recommendation and causal inference as the basis of later content. We raise the typical issues that the non-causality recommendation is faced. Afterward, we comprehensively review the existing work of causal inference-based recommendation, based on a taxonomy of what kind of problem causal inference addresses. Last, we discuss the open problems in this important research area, along with interesting future works.
Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.