Monocular Re-Localization (MRL) is a critical component in numerous autonomous applications, which estimates 6 degree-of-freedom poses with regards to the scene map based on a single monocular image. In recent decades, significant progress has been made in the development of MRL techniques. Numerous landmark algorithms have accomplished extraordinary success in terms of localization accuracy and robustness against visual interference. In MRL research, scene maps are represented in various forms, and they determine how MRL methods work and even how MRL methods perform. However, to the best of our knowledge, existing surveys do not provide systematic reviews of MRL from the respective of map. This survey fills the gap by comprehensively reviewing MRL methods employing monocular cameras as main sensors, promoting further research. 1) We commence by delving into the problem definition of MRL and exploring current challenges, while also comparing ours with with previous published surveys. 2) MRL methods are then categorized into five classes according to the representation forms of utilized map, i.e., geo-tagged frames, visual landmarks, point clouds, and vectorized semantic map, and we review the milestone MRL works of each category. 3) To quantitatively and fairly compare MRL methods with various map, we also review some public datasets and provide the performances of some typical MRL methods. The strengths and weakness of different types of MRL methods are analyzed. 4) We finally introduce some topics of interest in this field and give personal opinions. This survey can serve as a valuable referenced materials for newcomers and researchers interested in MRL, and a continuously updated summary of this survey, including reviewed papers and datasets, is publicly available to the community at: //github.com/jinyummiao/map-in-mono-reloc.
Emulating chip functionality before silicon production is crucial, especially with the increasing prevalence of RISC-V-based designs. FPGAs are promising candidates for such purposes due to their high-speed and reconfigurable architecture. In this paper, we introduce our Makinote, an FPGA-based Cluster platform, hosted at Barcelona Supercomputing Center (BSC-CNS), which is composed of a large number of FPGAs (in total 96 AMD/Xilinx Alveo U55c) to emulate massive size RTL designs (up to 750M ASIC cells). In addition, we introduce our FPGA shell as a powerful tool to facilitate the utilization of such a large FPGA cluster with minimal effort needed by the designers. The proposed FPGA shell provides an easy-to-use interface for the RTL developers to rapidly port such design into several FPGAs by automatically connecting to the necessary ports, e.g., PCIe Gen4, DRAM (DDR4 and HBM), ETH10g/100g. Moreover, specific drivers for exploiting RISC-V based architectures are provided within the set of tools associated with the FPGA shell. We release the tool online for further extensions. We validate the efficiency of our hardware platform (i.e., FPGA cluster) and the software tool (i.e., FPGA Shell) by emulating a RISC-V processor and experimenting HPC Challenge application running on 32 FPGAs. Our results demonstrate that the performance improves by 8 times over the single-FPGA case.
The Traveling Salesman Problem (TSP) is one of the most extensively researched and widely applied combinatorial optimization problems. It is NP-hard even in the symmetric and metric case. Building upon elaborate research, state-of-the-art exact solvers such as CONCORDE can solve TSP instances with several ten thousand vertices. A key ingredient for these integer programming approaches are fast heuristics to find a good initial solution, in particular the Lin-Kernighan-Helsgaun (LKH) heuristic. For instances with few hundred vertices heuristics like LKH often find an optimal solution. In this work we develop variations of LKH that perform significantly better on large instances. LKH repeatedly improves an initially random tour by exchanging edges along alternating circles. Thereby, it respects several criteria designed to quickly find alternating circles that give a feasible improvement of the tour. Among those criteria, the positive gain criterion stayed mostly untouched in previous research. It requires that, while constructing an alternating circle, the total gain has to be positive after each pair of edges. We relax this criterion carefully leading to improvement steps hitherto undiscovered by LKH. We confirm this improvement experimentally via extensive simulations on various benchmark libraries for TSP. Our computational study shows that for large instances our method is on average 13% faster than the latest version of LKH.
Physics-Informed Neural Networks (PINNs) have emerged as a powerful tool for solving partial differential equations (PDEs) in various scientific and engineering domains. However, traditional PINN architectures typically rely on fully connected multilayer perceptrons (MLPs), lacking the sparsity and modularity inherent in many traditional numerical solvers. This study investigates a novel approach by merging established PINN methodologies with brain-inspired neural network techniques to address this architectural limitation. We leverage Brain-Inspired Modular Training (BIMT), leveraging concepts such as locality, sparsity, and modularity inspired by the organization of the brain. Through BIMT, we demonstrate the evolution of PINN architectures from fully connected structures to highly sparse and modular forms, resulting in reduced computational complexity and memory requirements. We showcase the efficacy of this approach by solving differential equations with varying spectral components, revealing insights into the spectral bias phenomenon and its impact on neural network architecture. Moreover, we derive basic PINN building blocks through BIMT training on simple problems akin to convolutional and attention modules in deep neural networks, enabling the construction of modular PINN architectures. Our experiments show that these modular architectures offer improved accuracy compared to traditional fully connected MLP PINNs, showcasing their potential for enhancing PINN performance while reducing computational overhead. Overall, this study contributes to advancing the understanding and development of efficient and effective neural network architectures for solving PDEs, bridging the gap between PINNs and traditional numerical methods.
Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to guide the model in accomplishing the task. GPT-based models are one of the popular ones studied for tasks such as code comment generation or test generation. These tasks are `generative' tasks. However, there is limited research on the usage of LLMs for `non-generative' tasks such as classification using the prompt-based paradigm. In this preliminary exploratory study, we investigated the applicability of LLMs for Code Clone Detection (CCD), a non-generative task. By building a mono-lingual and cross-lingual CCD dataset derived from CodeNet, we first investigated two different prompts using ChatGPT to detect Type-4 code clones in Java-Java and Java-Ruby pairs in a zero-shot setting. We then conducted an analysis to understand the strengths and weaknesses of ChatGPT in CCD. ChatGPT surpasses the baselines in cross-language CCD attaining an F1-score of 0.877 and achieves comparable performance to fully fine-tuned models for mono-lingual CCD, with an F1-score of 0.878. Also, the prompt and the difficulty level of the problems has an impact on the performance of ChatGPT. Finally we provide insights and future directions based on our initial analysis
We present a numerical discretisation of the coupled moment systems, previously introduced in Dahm and Helzel, which approximate the kinetic multi-scale model by Helzel and Tzavaras for sedimentation in suspensions of rod-like particles for a two-dimensional flow problem and a shear flow problem. We use a splitting ansatz which, during each time step, separately computes the update of the macroscopic flow equation and of the moment system. The proof of the hyperbolicity of the moment systems in \cite{Dahm} suggests solving the moment systems with standard numerical methods for hyperbolic problems, like LeVeque's Wave Propagation Algorithm \cite{LeV}. The number of moment equations used in the hyperbolic moment system can be adapted to locally varying flow features. An error analysis is proposed, which compares the approximation with $2N+1$ moment equations to an approximation with $2N+3$ moment equations. This analysis suggests an error indicator which can be computed from the numerical approximation of the moment system with $2N+1$ moment equations. In order to use moment approximations with a different number of moment equations in different parts of the computational domain, we consider an interface coupling of moment systems with different resolution. Finally, we derive a conservative high-resolution Wave Propagation Algorithm for solving moment systems with different numbers of moment equations.
Hyperspectral Imaging (HSI) is used in a wide range of applications such as remote sensing, yet the transmission of the HS images by communication data links becomes challenging due to the large number of spectral bands that the HS images contain together with the limited data bandwidth available in real applications. Compressive Sensing reduces the images by randomly subsampling the spectral bands of each spatial pixel and then it performs the image reconstruction of all the bands using recovery algorithms which impose sparsity in a certain transform domain. Since the image pixels are not strictly sparse, this work studies a data sparsification pre-processing stage prior to compression to ensure the sparsity of the pixels. The sparsified images are compressed $2.5\times$ and then recovered using the Generalized Orthogonal Matching Pursuit algorithm (gOMP) characterized by high accuracy, low computational requirements and fast convergence. The experiments are performed in five conventional hyperspectral images where the effect of different sparsification levels in the quality of the uncompressed as well as the recovered images is studied. It is concluded that the gOMP algorithm reconstructs the hyperspectral images with higher accuracy as well as faster convergence when the pixels are highly sparsified and hence at the expense of reducing the quality of the recovered images with respect to the original images.
Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction. Compared to conventional single-modal 3D understanding, introducing an additional modality not only elevates the richness and precision of scene interpretation but also ensures a more robust and resilient understanding. This becomes especially crucial in varied and challenging environments where solely relying on 3D data might be inadequate. While there has been a surge in the development of multi-modal 3D methods over past three years, especially those integrating multi-camera images (3D+2D) and textual descriptions (3D+language), a comprehensive and in-depth review is notably absent. In this article, we present a systematic survey of recent progress to bridge this gap. We begin by briefly introducing a background that formally defines various 3D multi-modal tasks and summarizes their inherent challenges. After that, we present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations. Furthermore, comparative results of recent approaches on several benchmark datasets, together with insightful analysis, are offered. Finally, we discuss the unresolved issues and provide several potential avenues for future research.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
Recently, Mutual Information (MI) has attracted attention in bounding the generalization error of Deep Neural Networks (DNNs). However, it is intractable to accurately estimate the MI in DNNs, thus most previous works have to relax the MI bound, which in turn weakens the information theoretic explanation for generalization. To address the limitation, this paper introduces a probabilistic representation of DNNs for accurately estimating the MI. Leveraging the proposed MI estimator, we validate the information theoretic explanation for generalization, and derive a tighter generalization bound than the state-of-the-art relaxations.
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.