Automatic pavement crack detection is an important task to ensure the functional performances of pavements during their service life. Inspired by deep learning (DL), the encoder-decoder framework is a powerful tool for crack detection. However, these models are usually open-loop (OL) systems that tend to treat thin cracks as the background. Meanwhile, these models can not automatically correct errors in the prediction, nor can it adapt to the changes of the environment to automatically extract and detect thin cracks. To tackle this problem, we embed closed-loop feedback (CLF) into the neural network so that the model could learn to correct errors on its own, based on generative adversarial networks (GAN). The resulting model is called CrackCLF and includes the front and back ends, i.e. segmentation and adversarial network. The front end with U-shape framework is employed to generate crack maps, and the back end with a multi-scale loss function is used to correct higher-order inconsistencies between labels and crack maps (generated by the front end) to address open-loop system issues. Empirical results show that the proposed CrackCLF outperforms others methods on three public datasets. Moreover, the proposed CLF can be defined as a plug and play module, which can be embedded into different neural network models to improve their performances.
Auditory spatial attention detection (ASAD) is used to determine the direction of a listener's attention to a speaker by analyzing her/his electroencephalographic (EEG) signals. This study aimed to further improve the performance of ASAD with a short decision window (i.e., <1 s) rather than with long decision windows in previous studies. An end-to-end temporal attention network (i.e., TAnet) was introduced in this work. TAnet employs a multi-head attention (MHA) mechanism, which can more effectively capture the interactions among time steps in collected EEG signals and efficiently assign corresponding weights to those EEG time steps. Experiments demonstrated that, compared with the CNN-based method and recent ASAD methods, TAnet provided improved decoding performance in the KUL dataset, with decoding accuracies of 92.4% (decision window 0.1 s), 94.9% (0.25 s), 95.1% (0.3 s), 95.4% (0.4 s), and 95.5% (0.5 s) with short decision windows (i.e., <1 s). As a new ASAD model with a short decision window, TAnet can potentially facilitate the design of EEG-controlled intelligent hearing aids and sound recognition systems.
We propose a new training algorithm, named DualFL (Dualized Federated Learning), for solving distributed optimization problems in federated learning. DualFL achieves communication acceleration for very general convex cost functions, thereby providing a solution to an open theoretical problem in federated learning concerning cost functions that may not be smooth nor strongly convex. We provide a detailed analysis for the local iteration complexity of DualFL to ensure the overall computational efficiency of DualFL. Furthermore, we introduce a completely new approach for the convergence analysis of federated learning based on a dual formulation. This new technique enables concise and elegant analysis, which contrasts the complex calculations used in existing literature on convergence of federated learning algorithms.
With the increasing demand for computing capability given limited resource and power budgets, it is crucial to deploy applications to customized accelerators like FPGAs. However, FPGA programming is non-trivial. Although existing high-level synthesis (HLS) tools improve productivity to a certain extent, they are limited in scope and capability to support sufficient FPGA-oriented optimizations. This paper focuses on FPGA-based accelerators and proposes POM, an optimizing framework built on multi-level intermediate representation (MLIR). POM has several features which demonstrate its scope and capability of performance optimization. First, most HLS tools depend exclusively on a single-level IR to perform all the optimizations, introducing excessive information into the IR and making debugging an arduous task. In contrast, POM introduces three layers of IR to perform operations at suitable abstraction levels, streamlining the implementation and debugging process and exhibiting better flexibility, extensibility, and systematicness. Second, POM integrates the polyhedral model into MLIR, enabling advanced dependence analysis and various FPGA-oriented loop transformations. By representing nested loops with integer sets and maps, loop transformations can be conducted conveniently through manipulations on polyhedral semantics. Finally, to further relieve design effort, POM has a user-friendly programming interface (DSL) that allows a concise description of computation and includes a rich collection of scheduling primitives. An automatic design space exploration (DSE) engine is provided to search for high-performance optimization schemes efficiently and generate optimized accelerators automatically. Experimental results show that POM achieves a $6.46\times$ average speedup on typical benchmark suites and a $6.06\times$ average speedup on real-world applications compared to the state-of-the-art.
Learning precise representations of users and items to fit observed interaction data is the fundamental task of collaborative filtering. Existing studies usually infer entangled representations to fit such interaction data, neglecting to model the diverse matching relationships between users and items behind their interactions, leading to limited performance and weak interpretability. To address this problem, we propose a Dual Disentangled Variational AutoEncoder (DualVAE) for collaborative recommendation, which combines disentangled representation learning with variational inference to facilitate the generation of implicit interaction data. Specifically, we first implement the disentangling concept by unifying an attention-aware dual disentanglement and disentangled variational autoencoder to infer the disentangled latent representations of users and items. Further, to encourage the correspondence and independence of disentangled representations of users and items, we design a neighborhood-enhanced representation constraint with a customized contrastive mechanism to improve the representation quality. Extensive experiments on three real-world benchmarks show that our proposed model significantly outperforms several recent state-of-the-art baselines. Further empirical experimental results also illustrate the interpretability of the disentangled representations learned by DualVAE.
Diffusion models have found valuable applications in anomaly detection by capturing the nominal data distribution and identifying anomalies via reconstruction. Despite their merits, they struggle to localize anomalies of varying scales, especially larger anomalies like entire missing components. Addressing this, we present a novel framework that enhances the capability of diffusion models, by extending the previous introduced implicit conditioning approach Meng et al. (2022) in three significant ways. First, we incorporate a dynamic step size computation that allows for variable noising steps in the forward process guided by an initial anomaly prediction. Second, we demonstrate that denoising an only scaled input, without any added noise, outperforms conventional denoising process. Third, we project images in a latent space to abstract away from fine details that interfere with reconstruction of large missing components. Additionally, we propose a fine-tuning mechanism that facilitates the model to effectively grasp the nuances of the target domain. Our method undergoes rigorous evaluation on two prominent anomaly detection datasets VISA and BTAD, yielding state-of-the-art performance. Importantly, our framework effectively localizes anomalies regardless of their scale, marking a pivotal advancement in diffusion-based anomaly detection.
Interactive user interfaces have increasingly explored AI's role in enhancing communication efficiency and productivity in collaborative tasks. AI tools such as chatbots and smart replies aim to enhance conversation quality and improve team performance. Early AI assistants, were limited by predefined knowledge bases and decision trees. However, the advent of Large Language Models (LLMs) such as ChatGPT has revolutionized AI assistants, employing advanced deep learning architecture to generate context-aware, coherent, and personalized responses. Consequently, ChatGPT-based AI assistants provide a more natural and efficient user experience across various tasks and domains. In this paper, we study how LLM models such as ChatGPT can be used to improve work efficiency in collaborative workplaces. Specifically, we present an LLM-based Smart Reply (LSR) system utilizing the ChatGPT to generate personalized responses in daily collaborative scenarios, while adapting to context and communication style based on prior responses. Our two-step process involves generating a preliminary response type (e.g., Agree, Disagree) to provide a generalized direction for message generation, thus reducing response drafting time. We conducted an experiment in which participants completed simulated work tasks, involving a Dual N-back test and subtask scheduling through Google Calendar while interacting with researchers posing as co-workers. Our findings indicate that the proposed LSR reduces overall workload, as measured by the NASA TLX, and improves work performance and productivity in the N-back task. We also provide qualitative feedback on participants' experiences as well as design recommendations so as to provide future directions for the design of these technologies.
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.
Generative commonsense reasoning which aims to empower machines to generate sentences with the capacity of reasoning over a set of concepts is a critical bottleneck for text generation. Even the state-of-the-art pre-trained language generation models struggle at this task and often produce implausible and anomalous sentences. One reason is that they rarely consider incorporating the knowledge graph which can provide rich relational information among the commonsense concepts. To promote the ability of commonsense reasoning for text generation, we propose a novel knowledge graph augmented pre-trained language generation model KG-BART, which encompasses the complex relations of concepts through the knowledge graph and produces more logical and natural sentences as output. Moreover, KG-BART can leverage the graph attention to aggregate the rich concept semantics that enhances the model generalization on unseen concept sets. Experiments on benchmark CommonGen dataset verify the effectiveness of our proposed approach by comparing with several strong pre-trained language generation models, particularly KG-BART outperforms BART by 5.80, 4.60, in terms of BLEU-3, 4. Moreover, we also show that the generated context by our model can work as background scenarios to benefit downstream commonsense QA tasks.
To provide more accurate, diverse, and explainable recommendation, it is compulsory to go beyond modeling user-item interactions and take side information into account. Traditional methods like factorization machine (FM) cast it as a supervised learning problem, which assumes each interaction as an independent instance with side information encoded. Due to the overlook of the relations among instances or items (e.g., the director of a movie is also an actor of another movie), these methods are insufficient to distill the collaborative signal from the collective behaviors of users. In this work, we investigate the utility of knowledge graph (KG), which breaks down the independent interaction assumption by linking items with their attributes. We argue that in such a hybrid structure of KG and user-item graph, high-order relations --- which connect two items with one or multiple linked attributes --- are an essential factor for successful recommendation. We propose a new method named Knowledge Graph Attention Network (KGAT) which explicitly models the high-order connectivities in KG in an end-to-end fashion. It recursively propagates the embeddings from a node's neighbors (which can be users, items, or attributes) to refine the node's embedding, and employs an attention mechanism to discriminate the importance of the neighbors. Our KGAT is conceptually advantageous to existing KG-based recommendation methods, which either exploit high-order relations by extracting paths or implicitly modeling them with regularization. Empirical results on three public benchmarks show that KGAT significantly outperforms state-of-the-art methods like Neural FM and RippleNet. Further studies verify the efficacy of embedding propagation for high-order relation modeling and the interpretability benefits brought by the attention mechanism.
The cross-domain recommendation technique is an effective way of alleviating the data sparsity in recommender systems by leveraging the knowledge from relevant domains. Transfer learning is a class of algorithms underlying these techniques. In this paper, we propose a novel transfer learning approach for cross-domain recommendation by using neural networks as the base model. We assume that hidden layers in two base networks are connected by cross mappings, leading to the collaborative cross networks (CoNet). CoNet enables dual knowledge transfer across domains by introducing cross connections from one base network to another and vice versa. CoNet is achieved in multi-layer feedforward networks by adding dual connections and joint loss functions, which can be trained efficiently by back-propagation. The proposed model is evaluated on two real-world datasets and it outperforms baseline models by relative improvements of 3.56\% in MRR and 8.94\% in NDCG, respectively.