Knowledge-grounded dialogue (KGD) learns to generate an informative response based on a given dialogue context and external knowledge (\emph{e.g.}, knowledge graphs; KGs). Recently, the emergence of large language models (LLMs) and pre-training techniques has brought great success to knowledge-grounded dialogue. However, when building KGD systems in real applications, there are various real-world noises that are inevitable to face. For example, the dialogue context might involve perturbations such as misspellings and abbreviations. In addition, KGs typically suffer from incompletion and also might contain erroneous and outdated facts. Such real-world noises pose a challenge to the robustness of KGD systems and hinder their applications in the real world. In this paper, we propose an entity-based contrastive learning framework for improving the robustness of KGD. Specifically, we make use of the entity information in a KGD sample to create both its positive and negative samples which involve semantic-irrelevant and semantic-relevant perturbations, respectively. The contrastive learning framework ensures the KGD model is aware of these two types of perturbations, thus generating informative responses with the potentially noisy inputs in real applications. Experimental results on three benchmark datasets show that our method achieves new state-of-the-art performance in terms of automatic evaluation scores, verifying its effectiveness and potentiality. Furthermore, we show that our method can generate better responses than comparison models in both the noisy and the few-shot settings.
Recently, influence functions present an apparatus for achieving explainability for deep neural models by quantifying the perturbation of individual train instances that might impact a test prediction. Our objectives in this paper are twofold. First we incorporate influence functions as a feedback into the model to improve its performance. Second, in a dataset extension exercise, using influence functions to automatically identify data points that have been initially `silver' annotated by some existing method and need to be cross-checked (and corrected) by annotators to improve the model performance. To meet these objectives, in this paper, we introduce InfFeed, which uses influence functions to compute the influential instances for a target instance. Toward the first objective, we adjust the label of the target instance based on its influencer(s) label. In doing this, InfFeed outperforms the state-of-the-art baselines (including LLMs) by a maximum macro F1-score margin of almost 4% for hate speech classification, 3.5% for stance classification, and 3% for irony and 2% for sarcasm detection. Toward the second objective we show that manually re-annotating only those silver annotated data points in the extension set that have a negative influence can immensely improve the model performance bringing it very close to the scenario where all the data points in the extension set have gold labels. This allows for huge reduction of the number of data points that need to be manually annotated since out of the silver annotated extension dataset, the influence function scheme picks up ~1/1000 points that need manual correction.
Despite the possibility to quickly compute reachable sets of large-scale linear systems, current methods are not yet widely applied by practitioners. The main reason for this is probably that current approaches are not push-button-capable and still require to manually set crucial parameters, such as time step sizes and the accuracy of the used set representation -- these settings require expert knowledge. We present a generic framework to automatically find near-optimal parameters for reachability analysis of linear systems given a user-defined accuracy. To limit the computational overhead as much as possible, our methods tune all relevant parameters during runtime. We evaluate our approach on benchmarks from the ARCH competition as well as on random examples. Our results show that our new framework verifies the selected benchmarks faster than manually-tuned parameters and is an order of magnitude faster compared to genetic algorithms.
Machine Unlearning (MU) algorithms have become increasingly critical due to the imperative adherence to data privacy regulations. The primary objective of MU is to erase the influence of specific data samples on a given model without the need to retrain it from scratch. Accordingly, existing methods focus on maximizing user privacy protection. However, there are different degrees of privacy regulations for each real-world web-based application. Exploring the full spectrum of trade-offs between privacy, model utility, and runtime efficiency is critical for practical unlearning scenarios. Furthermore, designing the MU algorithm with simple control of the aforementioned trade-off is desirable but challenging due to the inherent complex interaction. To address the challenges, we present Controllable Machine Unlearning (ConMU), a novel framework designed to facilitate the calibration of MU. The ConMU framework contains three integral modules: an important data selection module that reconciles the runtime efficiency and model generalization, a progressive Gaussian mechanism module that balances privacy and model generalization, and an unlearning proxy that controls the trade-offs between privacy and runtime efficiency. Comprehensive experiments on various benchmark datasets have demonstrated the robust adaptability of our control mechanism and its superiority over established unlearning methods. ConMU explores the full spectrum of the Privacy-Utility-Efficiency trade-off and allows practitioners to account for different real-world regulations. Source code available at: //github.com/guangyaodou/ConMU.
Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel loss that naturally extends score matching to discrete spaces, integrates seamlessly to build discrete diffusion models, and significantly boosts performance. Experimentally, we test our Score Entropy Discrete Diffusion models (SEDD) on standard language modeling tasks. For comparable model sizes, SEDD beats existing language diffusion paradigms (reducing perplexity by $25$-$75$\%) and is competitive with autoregressive models, in particular outperforming GPT-2. Furthermore, compared to autoregressive mdoels, SEDD generates faithful text without requiring distribution annealing techniques like temperature scaling (around $6$-$8\times$ better generative perplexity than un-annealed GPT-2), can trade compute and quality (similar quality with $32\times$ fewer network evaluations), and enables controllable infilling (matching nucleus sampling quality while enabling other strategies besides left to right prompting).
Binary spatter code (BSC)-based hyperdimensional computing (HDC) is a highly error-resilient approximate computational paradigm suited for error-prone, emerging hardware platforms. In BSC HDC, the basic datatype is a hypervector, a typically large binary vector, where the size of the hypervector has a significant impact on the fidelity and resource usage of the computation. Typically, the hypervector size is dynamically tuned to deliver the desired accuracy; this process is time-consuming and often produces hypervector sizes that lack accuracy guarantees and produce poor results when reused for very similar workloads. We present Heim, a hardware-aware static analysis and optimization framework for BSC HD computations. Heim analytically derives the minimum hypervector size that minimizes resource usage and meets the target accuracy requirement. Heim guarantees the optimized computation converges to the user-provided accuracy target on expectation, even in the presence of hardware error. Heim deploys a novel static analysis procedure that unifies theoretical results from the neuroscience community to systematically optimize HD computations. We evaluate Heim against dynamic tuning-based optimization on 25 benchmark data structures. Given a 99% accuracy requirement, Heim-optimized computations achieve a 99.2%-100.0% median accuracy, up to 49.5% higher than dynamic tuning-based optimization, while achieving 1.15x-7.14x reductions in hypervector size compared to HD computations that achieve comparable query accuracy and finding parametrizations 30.0x-100167.4x faster than dynamic tuning-based approaches. We also use Heim to systematically evaluate the performance benefits of using analog CAMs and multiple-bit-per-cell ReRAM over conventional hardware, while maintaining iso-accuracy -- for both emerging technologies, we find usages where the emerging hardware imparts significant benefits.
We give a simple characterization of which functions can be computed deterministically by anonymous processes in dynamic networks, depending on the number of leaders in the network. In addition, we provide efficient distributed algorithms for computing all such functions assuming minimal or no knowledge about the network. Each of our algorithms comes in two versions: one that terminates with the correct output and a faster one that stabilizes on the correct output without explicit termination. Notably, these are the first deterministic algorithms whose running times scale linearly with both the number of processes and a parameter of the network which we call "dynamic disconnectivity" (meaning that our dynamic networks do not necessarily have to be connected at all times). We also provide matching lower bounds, showing that all our algorithms are asymptotically optimal for any fixed number of leaders. While most of the existing literature on anonymous dynamic networks relies on classical mass-distribution techniques, our work makes use of a recently introduced combinatorial structure called "history tree", also developing its theory in new directions. Among other contributions, our results make definitive progress on two popular fundamental problems for anonymous dynamic networks: leaderless Average Consensus (i.e., computing the mean value of input numbers distributed among the processes) and multi-leader Counting (i.e., determining the exact number of processes in the network). In fact, our approach unifies and improves upon several independent lines of research on anonymous networks, including Nedic et al., IEEE Trans. Automat. Contr. 2009; Olshevsky, SIAM J. Control Optim. 2017; Kowalski-Mosteiro, ICALP 2019, SPAA 2021; Di Luna-Viglietta, FOCS 2022.
The problem of minimizing the maximum of $N$ convex, Lipschitz functions plays significant roles in optimization and machine learning. It has a series of results, with the most recent one requiring $O(N\epsilon^{-2/3} + \epsilon^{-8/3})$ queries to a first-order oracle to compute an $\epsilon$-suboptimal point. On the other hand, quantum algorithms for optimization are rapidly advancing with speedups shown on many important optimization problems. In this paper, we conduct a systematic study for quantum algorithms and lower bounds for minimizing the maximum of $N$ convex, Lipschitz functions. On one hand, we develop quantum algorithms with an improved complexity bound of $\tilde{O}(\sqrt{N}\epsilon^{-5/3} + \epsilon^{-8/3})$. On the other hand, we prove that quantum algorithms must take $\tilde{\Omega}(\sqrt{N}\epsilon^{-2/3})$ queries to a first order quantum oracle, showing that our dependence on $N$ is optimal up to poly-logarithmic factors.
Coding theory revolves around the incorporation of redundancy into transmitted symbols, computation tasks, and stored data to guard against adversarial manipulation. However, error correction in coding theory is contingent upon a strict trust assumption. In the context of computation and storage, it is required that honest nodes outnumber adversarial ones by a certain margin. However, in several emerging real-world cases, particularly, in decentralized blockchain-oriented applications, such assumptions are often unrealistic. Consequently, despite the important role of coding in addressing significant challenges within decentralized systems, its applications become constrained. Still, in decentralized platforms, a distinctive characteristic emerges, offering new avenues for secure coding beyond the constraints of conventional methods. In these scenarios, the adversary benefits when the legitimate decoder recovers the data, and preferably with a high estimation error. This incentive motivates them to act rationally, trying to maximize their gains. In this paper, we propose a game theoretic formulation for coding, called the game of coding, that captures this unique dynamic where each of the adversary and the data collector (decoder) have a utility function to optimize. The utility functions reflect the fact that both the data collector and the adversary are interested in increasing the chance of data being recoverable by the data collector. Moreover, the utility functions express the interest of the data collector to estimate the input with lower estimation error, but the opposite interest of the adversary. As a first, still highly non-trivial step, we characterize the equilibrium of the game for the repetition code with a repetition factor of 2, for a wide class of utility functions with minimal assumptions.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.
We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.