We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployment data from BlenderBot (Xu et al., 2023). Human evaluation indicates improvements in our new models over baseline responses; however, we find that some proxy signals can lead to more generations with undesirable properties as well. For example, optimizing for conversation length can lead to more controversial or unfriendly generations compared to the baseline, whereas optimizing for positive sentiment or reaction can decrease these behaviors.
This study aimed to develop a deep learning model for the classification of bearing faults in wind turbine generators from acoustic signals. A convolutional LSTM model was successfully constructed and trained by using audio data from five predefined fault types for both training and validation. To create the dataset, raw audio signal data was collected and processed in frames to capture time and frequency domain information. The model exhibited outstanding accuracy on training samples and demonstrated excellent generalization ability during validation, indicating its proficiency of generalization capability. On the test samples, the model achieved remarkable classification performance, with an overall accuracy exceeding 99.5%, and a false positive rate of less than 1% for normal status. The findings of this study provide essential support for the diagnosis and maintenance of bearing faults in wind turbine generators, with the potential to enhance the reliability and efficiency of wind power generation.
Large language models (LLMs) and generative artificial intelligence (GenAI) constitute paradigm shifts in cybersecurity that present hitherto unseen challenges as well as opportunities. In examining the state-of-the-art application of GenAI in cybersecurity, this work highlights how models like Google's Gemini and ChatGPT-4 potentially enhance security protocols, vulnerability assessment, and threat identification. Our research highlights the significance of a novel approach that employs LLMs to identify and eliminate sophisticated cyber threats. This paper presents a thorough assessment of LLMs' ability to produce important security insights, hence broadening the potential applications of AI-driven cybersecurity solutions. Our findings demonstrate the significance of GenAI in improving digital security. It offers recommendations for further investigations into the intricate relationship between cybersecurity requirements and artificial intelligence's potential.
A growing body of work in economics and computation focuses on the trade-off between implementability and simplicity in mechanism design. The goal is to develop a theory that not only allows to design an incentive structure easy to grasp for imperfectly rational agents, but also understand the ensuing limitations on the class of mechanisms that enforce it. In this context, the concept of OSP mechanisms has assumed a prominent role since they provably account for the absence of contingent reasoning skills, a specific cognitive limitation. For single-dimensional agents, it is known that OSP mechanisms need to use certain greedy algorithms. In this work, we introduce a notion that interpolates between OSP and SOSP, a more stringent notion where agents only plan a subset of their own future moves. We provide an algorithmic characterization of this novel class of mechanisms for single-dimensional domains and binary allocation problems, that precisely measures the interplay between simplicity and implementability. We build on this to show how mechanisms based on reverse greedy algorithms (a.k.a., deferred acceptance auctions) are algorithmically more robust to imperfectly rationality than those adopting greedy algorithms.
In the era of deep learning, federated learning (FL) presents a promising approach that allows multi-institutional data owners, or clients, to collaboratively train machine learning models without compromising data privacy. However, most existing FL approaches rely on a centralized server for global model aggregation, leading to a single point of failure. This makes the system vulnerable to malicious attacks when dealing with dishonest clients. In this work, we address this problem by proposing a secure and reliable FL system based on blockchain and distributed ledger technology. Our system incorporates a peer-to-peer voting mechanism and a reward-and-slash mechanism, which are powered by on-chain smart contracts, to detect and deter malicious behaviors. Both theoretical and empirical analyses are presented to demonstrate the effectiveness of the proposed approach, showing that our framework is robust against malicious client-side behaviors.
The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
The generalization mystery in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size? Furthermore, from among all solutions that fit the training data, how does GD find one that generalizes well (when such a well-generalizing solution exists)? We argue that the answer to both questions lies in the interaction of the gradients of different examples during training. Intuitively, if the per-example gradients are well-aligned, that is, if they are coherent, then one may expect GD to be (algorithmically) stable, and hence generalize well. We formalize this argument with an easy to compute and interpretable metric for coherence, and show that the metric takes on very different values on real and random datasets for several common vision networks. The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others, why early stopping works, and why it is possible to learn from noisy labels. Moreover, since the theory provides a causal explanation of how GD finds a well-generalizing solution when one exists, it motivates a class of simple modifications to GD that attenuate memorization and improve generalization. Generalization in deep learning is an extremely broad phenomenon, and therefore, it requires an equally general explanation. We conclude with a survey of alternative lines of attack on this problem, and argue that the proposed approach is the most viable one on this basis.
In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing, and accumulating new knowledge coming at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularization, knowledge distillation, memory, generative replay, parameter isolation, and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).