Event Temporal Relation Extraction (ETRE) aims to identify the temporal relationship between two events, which plays an important role in natural language understanding. Most previous works follow a single-label classification style, classifying an event pair into either a specific temporal relation (e.g., \textit{Before}, \textit{After}), or a special label \textit{Vague} when there may be multiple possible temporal relations between the pair. In our work, instead of directly making predictions on \textit{Vague}, we propose a multi-label classification solution for ETRE (METRE) to infer the possibility of each temporal relation independently, where we treat \textit{Vague} as the cases when there is more than one possible relation between two events. We design a speculation mechanism to explore the possible relations hidden behind \textit{Vague}, which enables the latent information to be used efficiently. Experiments on TB-Dense, MATRES and UDS-T show that our method can effectively utilize the \textit{Vague} instances to improve the recognition for specific temporal relations and outperforms most state-of-the-art methods.
Current speech-based LLMs are predominantly trained on extensive ASR and TTS datasets, excelling in tasks related to these domains. However, their ability to handle direct speech-to-speech conversations remains notably constrained. These models often rely on an ASR-to-TTS chain-of-thought pipeline, converting speech into text for processing before generating audio responses, which introduces latency and loses audio features. We propose a method that implicitly internalizes ASR chain of thought into a speech LLM, enhancing its native speech understanding capabilities. Our approach reduces latency and improves the model's native understanding of speech, paving the way for more efficient and natural real-time audio interactions. We also release a large-scale synthetic conversational dataset to facilitate further research.
Power words are terms that evoke strong emotional responses and significantly influence readers' behavior, playing a crucial role in fields like marketing, politics, and motivational writing. This study proposes a methodology for the automated detection and analysis of power words in persuasive text using a custom lexicon and the TextBlob library in Python. By identifying the presence and frequency of power words within a given text, we aim to classify and analyze their impact on sentiment and reader engagement. This research examines diverse datasets across various domains to provide insights into the effectiveness of power words, offering practical applications for content creators, advertisers, and policymakers.
Despite being widely applied due to their exceptional capabilities, Large Language Models (LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce targeted vulnerabilities into LLMs by poisoning training samples and full-parameter fine-tuning. However, this kind of backdoor attack is limited since they require significant computational resources, especially as the size of LLMs increases. Besides, parameter-efficient fine-tuning (PEFT) offers an alternative but the restricted parameter updating may impede the alignment of triggers with target labels. In this study, we first verify that backdoor attacks with PEFT may encounter challenges in achieving feasible performance. To address these issues and improve the effectiveness of backdoor attacks with PEFT, we propose a novel backdoor attack algorithm from weak to strong based on contrastive knowledge distillation (W2SAttack). Specifically, we poison small-scale language models through full-parameter fine-tuning to serve as the teacher model. The teacher model then covertly transfers the backdoor to the large-scale student model through contrastive knowledge distillation, which employs PEFT. Theoretical analysis reveals that W2SAttack has the potential to augment the effectiveness of backdoor attacks. We demonstrate the superior performance of W2SAttack on classification tasks across four language models, four backdoor attack algorithms, and two different architectures of teacher models. Experimental results indicate success rates close to 100% for backdoor attacks targeting PEFT.
Earth Observation (EO) systems play a crucial role in achieving Sustainable Development Goals by collecting and analyzing vital global data through satellite networks. These systems are essential for tasks like mapping, disaster monitoring, and resource management, but they face challenges in processing and transmitting large volumes of EO data, especially in specialized fields such as agriculture and real-time disaster response. Domain-adapted Large Language Models (LLMs) provide a promising solution by facilitating data fusion between extensive EO data and semantic EO data. By improving integration and interpretation of diverse datasets, LLMs address the challenges of processing specialized information in agriculture and disaster response applications. This fusion enhances the accuracy and relevance of transmitted data. This paper presents a framework for semantic communication in EO satellite networks, aimed at improving data transmission efficiency and overall system performance through cognitive processing techniques. The proposed system employs Discrete-Task-Oriented Source-Channel Coding (DT-JSCC) and Semantic Data Augmentation (SA) to focus on relevant information while minimizing communication overhead. By integrating cognitive semantic processing and inter-satellite links, the framework enhances the analysis and transmission of multispectral satellite imagery, improving object detection, pattern recognition, and real-time decision-making. The introduction of Cognitive Semantic Augmentation (CSA) allows satellites to process and transmit semantic information, boosting adaptability to changing environments and application needs. This end-to-end architecture is tailored for next-generation satellite networks, such as those supporting 6G, and demonstrates significant improvements in efficiency and accuracy.
Tactile sensing plays a vital role in enabling robots to perform fine-grained, contact-rich tasks. However, the high dimensionality of tactile data, due to the large coverage on dexterous hands, poses significant challenges for effective tactile feature learning, especially for 3D tactile data, as there are no large standardized datasets and no strong pretrained backbones. To address these challenges, we propose a novel canonical representation that reduces the difficulty of 3D tactile feature learning and further introduces a force-based self-supervised pretraining task to capture both local and net force features, which are crucial for dexterous manipulation. Our method achieves an average success rate of 78% across four fine-grained, contact-rich dexterous manipulation tasks in real-world experiments, demonstrating effectiveness and robustness compared to other methods. Further analysis shows that our method fully utilizes both spatial and force information from 3D tactile data to accomplish the tasks. The videos can be viewed at //3dtacdex.github.io.
While Explainable AI (XAI) aims to make AI understandable and useful to humans, it has been criticised for relying too much on formalism and solutionism, focusing more on mathematical soundness than user needs. We propose an alternative to this bottom-up approach inspired by design thinking: the XAI research community should adopt a top-down, user-focused perspective to ensure user relevance. We illustrate this with a relatively young subfield of XAI, Training Data Attribution (TDA). With the surge in TDA research and growing competition, the field risks repeating the same patterns of solutionism. We conducted a needfinding study with a diverse group of AI practitioners to identify potential user needs related to TDA. Through interviews (N=10) and a systematic survey (N=31), we uncovered new TDA tasks that are currently largely overlooked. We invite the TDA and XAI communities to consider these novel tasks and improve the user relevance of their research outcomes.
Timeline Analysis (TA) plays a crucial role in Timeline Forensics (TF) within the field of Digital Forensics (DF). It focuses on examining and analyzing time-based digital artefacts, such as timestamps derived from event logs, file metadata, and other relevant data, to correlate events linked to cyber incidents and reconstruct their chronological sequence. Traditional tools often struggle to efficiently handle the large volume and variety of data generated during DF investigations and Incident Response (IR) processes. This paper introduces a novel framework, GenDFIR, which combines Rule-Based Artificial Intelligence (R-BAI) algorithms with Large Language Models (LLMs) to enhance and automate the TA process. The proposed approach consists of two key stages: (1) R-BAI is used to identify and select anomalous digital artefacts based on predefined rules. (2) The selected artefacts are then transformed into embeddings for processing by an LLM with the assistance of a Retrieval-Augmented Generation (RAG) agent. The LLM uses its capabilities to perform automated TA on the artefacts and predict potential incident outcomes. To validate the framework, we evaluated its performance, efficiency, and reliability. Several metrics were applied to simulated cyber incident scenarios, which were presented as forensic case documents. Our findings demonstrate the significant potential of integrating R-BAI and LLMs for TA. This innovative approach underscores the power of Generative AI (GenAI), particularly LLMs, and opens up new possibilities for advanced threat detection and incident reconstruction, marking a significant advancement in the field.
Knowledge Graph Embedding (KGE) aims to learn representations for entities and relations. Most KGE models have gained great success, especially on extrapolation scenarios. Specifically, given an unseen triple (h, r, t), a trained model can still correctly predict t from (h, r, ?), or h from (?, r, t), such extrapolation ability is impressive. However, most existing KGE works focus on the design of delicate triple modeling function, which mainly tells us how to measure the plausibility of observed triples, but offers limited explanation of why the methods can extrapolate to unseen data, and what are the important factors to help KGE extrapolate. Therefore in this work, we attempt to study the KGE extrapolation of two problems: 1. How does KGE extrapolate to unseen data? 2. How to design the KGE model with better extrapolation ability? For the problem 1, we first discuss the impact factors for extrapolation and from relation, entity and triple level respectively, propose three Semantic Evidences (SEs), which can be observed from train set and provide important semantic information for extrapolation. Then we verify the effectiveness of SEs through extensive experiments on several typical KGE methods. For the problem 2, to make better use of the three levels of SE, we propose a novel GNN-based KGE model, called Semantic Evidence aware Graph Neural Network (SE-GNN). In SE-GNN, each level of SE is modeled explicitly by the corresponding neighbor pattern, and merged sufficiently by the multi-layer aggregation, which contributes to obtaining more extrapolative knowledge representation. Finally, through extensive experiments on FB15k-237 and WN18RR datasets, we show that SE-GNN achieves state-of-the-art performance on Knowledge Graph Completion task and performs a better extrapolation ability.
Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERC-oriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on "emotion shift" frequency within a conversation, then the conversations are scheduled in an "easy to hard" schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model's ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.
Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.