Modern applications, such as autonomous vehicles, require deploying deep learning algorithms on resource-constrained edge devices for real-time image and video processing. However, there is limited understanding of the efficiency and performance of various object detection models on these devices. In this paper, we evaluate state-of-the-art object detection models, including YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet). We deployed these models on popular edge devices like the Raspberry Pi 3, 4, and 5 with/without TPU accelerators, and Jetson Orin Nano, collecting key performance metrics such as energy consumption, inference time, and Mean Average Precision (mAP). Our findings highlight that lower mAP models such as SSD MobileNet V1 are more energy-efficient and faster in inference, whereas higher mAP models like YOLOv8 Medium generally consume more energy and have slower inference, though with exceptions when accelerators like TPUs are used. Among the edge devices, Jetson Orin Nano stands out as the fastest and most energy-efficient option for request handling, despite having the highest idle energy consumption. These results emphasize the need to balance accuracy, speed, and energy efficiency when deploying deep learning models on edge devices, offering valuable guidance for practitioners and researchers selecting models and devices for their applications.
Achieving a provable exponential quantum speedup for an important machine learning task has been a central research goal since the seminal HHL quantum algorithm for solving linear systems and the subsequent quantum recommender systems algorithm by Kerenidis and Prakash. These algorithms were initially believed to be strong candidates for exponential speedups, but a lower bound ruling out similar classical improvements remained absent. In breakthrough work by Tang, it was demonstrated that this lack of progress in classical lower bounds was for good reasons. Concretely, she gave a classical counterpart of the quantum recommender systems algorithm, reducing the quantum advantage to a mere polynomial. Her approach is quite general and was named quantum-inspired classical algorithms. Since then, almost all the initially exponential quantum machine learning speedups have been reduced to polynomial via new quantum-inspired classical algorithms. From the current state-of-affairs, it is unclear whether we can hope for exponential quantum speedups for any natural machine learning task. In this work, we present the first such provable exponential separation between quantum and quantum-inspired classical algorithms. We prove the separation for the basic problem of solving a linear system when the input matrix is well-conditioned and has sparse rows and columns.
Backdoor attacks allow an attacker to embed a specific vulnerability in a machine learning algorithm, activated when an attacker-chosen pattern is presented, causing a specific misprediction. The need to identify backdoors in biometric scenarios has led us to propose a novel technique with different trade-offs. In this paper we propose to use model pairs on open-set classification tasks for detecting backdoors. Using a simple linear operation to project embeddings from a probe model's embedding space to a reference model's embedding space, we can compare both embeddings and compute a similarity score. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures, having been trained independently and on different datasets. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature. Additionally, we show that backdoors can be detected even when both models are backdoored. The source code is made available for reproducibility purposes.
Rational design of next-generation functional materials relied on quantitative predictions of their electronic structures beyond single building blocks. First-principles quantum mechanical (QM) modeling became infeasible as the size of a material grew beyond hundreds of atoms. In this study, we developed a new computational tool integrating fragment-based graph neural networks (FBGNN) into the fragment-based many-body expansion (MBE) theory, referred to as FBGNN-MBE, and demonstrated its capacity to reproduce full-dimensional potential energy surfaces (FD-PES) for hierarchic chemical systems with manageable accuracy, complexity, and interpretability. In particular, we divided the entire system into basic building blocks (fragments), evaluated their single-fragment energies using a first-principles QM model and attacked many-fragment interactions using the structure-property relationships trained by FBGNNs. Our development of FBGNN-MBE demonstrated the potential of a new framework integrating deep learning models into fragment-based QM methods, and marked a significant step towards computationally aided design of large functional materials.
The emergence of distinct local mark behaviours is becoming increasingly common in the applications of spatial marked point processes. This dynamic highlights the limitations of existing global mark correlation functions in accurately identifying the true patterns of mark associations/variations among points as distinct mark behaviours might dominate one another, giving rise to an incomplete understanding of mark associations. In this paper, we introduce a family of local indicators of mark association (LIMA) functions for spatial marked point processes. These functions are defined on general state spaces and can include marks that are either real-valued or function-valued. Unlike global mark correlation functions, which are often distorted by the existence of distinct mark behaviours, LIMA functions reliably identify all types of mark associations and variations among points. Additionally, they accurately determine the interpoint distances where individual points show significant mark associations. Through simulation studies, featuring various scenarios, and four real applications in forestry, criminology, and urban mobility, we study spatial marked point processes in $\R^2$ and on linear networks with either real-valued or function-valued marks, demonstrating that LIMA functions significantly outperform the existing global mark correlation functions.
Achieving the effective design and improvement of reward functions in reinforcement learning (RL) tasks with complex custom environments and multiple requirements presents considerable challenges. In this paper, we propose ERFSL, an efficient reward function searcher using LLMs, which enables LLMs to be effective white-box searchers and highlights their advanced semantic understanding capabilities. Specifically, we generate reward components for each numerically explicit user requirement and employ a reward critic to identify the correct code form. Then, LLMs assign weights to the reward components to balance their values and iteratively adjust the weights without ambiguity and redundant adjustments by flexibly adopting directional mutation and crossover strategies, similar to genetic algorithms, based on the context provided by the training log analyzer. We applied the framework to an underwater data collection RL task without direct human feedback or reward examples (zero-shot learning). The reward critic successfully corrects the reward code with only one feedback instance for each requirement, effectively preventing unrectifiable errors. The initialization of weights enables the acquisition of different reward functions within the Pareto solution set without the need for weight search. Even in cases where a weight is 500 times off, on average, only 5.2 iterations are needed to meet user requirements. The ERFSL also works well with most prompts utilizing GPT-4o mini, as we decompose the weight searching process to reduce the requirement for numerical and long-context understanding capabilities
The challenge of data scarcity hinders the application of deep learning in industrial surface defect classification (SDC), as it's difficult to collect and centralize sufficient training data from various entities in Industrial Internet of Things (IIoT) due to privacy concerns. Federated learning (FL) provides a solution by enabling collaborative global model training across clients while maintaining privacy. However, performance may suffer due to data heterogeneity-discrepancies in data distributions among clients. In this paper, we propose a novel personalized FL (PFL) approach, named Adversarial Federated Consensus Learning (AFedCL), for the challenge of data heterogeneity across different clients in SDC. First, we develop a dynamic consensus construction strategy to mitigate the performance degradation caused by data heterogeneity. Through adversarial training, local models from different clients utilize the global model as a bridge to achieve distribution alignment, alleviating the problem of global knowledge forgetting. Complementing this strategy, we propose a consensus-aware aggregation mechanism. It assigns aggregation weights to different clients based on their efficacy in global knowledge learning, thereby enhancing the global model's generalization capabilities. Finally, we design an adaptive feature fusion module to further enhance global knowledge utilization efficiency. Personalized fusion weights are gradually adjusted for each client to optimally balance global and local features. Compared with state-of-the-art FL methods like FedALA, the proposed AFedCL method achieves an accuracy increase of up to 5.67% on three SDC datasets.
Despite extensive research on adversarial training strategies to improve robustness, the decisions of even the most robust deep learning models can still be quite sensitive to imperceptible perturbations, creating serious risks when deploying them for high-stakes real-world applications. While detecting such cases may be critical, evaluating a model's vulnerability at a per-instance level using adversarial attacks is computationally too intensive and unsuitable for real-time deployment scenarios. The input space margin is the exact score to detect non-robust samples and is intractable for deep neural networks. This paper introduces the concept of margin consistency -- a property that links the input space margins and the logit margins in robust models -- for efficient detection of vulnerable samples. First, we establish that margin consistency is a necessary and sufficient condition to use a model's logit margin as a score for identifying non-robust samples. Next, through comprehensive empirical analysis of various robustly trained models on CIFAR10 and CIFAR100 datasets, we show that they indicate high margin consistency with a strong correlation between their input space margins and the logit margins. Then, we show that we can effectively and confidently use the logit margin to detect brittle decisions with such models. Finally, we address cases where the model is not sufficiently margin-consistent by learning a pseudo-margin from the feature representation. Our findings highlight the potential of leveraging deep representations to assess adversarial vulnerability in deployment scenarios efficiently.
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enhancing the accuracy and reliability of their outputs through the use of real-world data. As RAG grows in complexity and incorporates multiple concepts that can influence its performance, this paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation, offering a detailed perspective from the retrieval viewpoint. It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies. Additionally, the paper introduces evaluation methods for RAG, addressing the challenges faced and proposing future research directions. By offering an organized framework and categorization, the study aims to consolidate existing research on RAG, clarify its technological underpinnings, and highlight its potential to broaden the adaptability and applications of LLMs.
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).
Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).