Within a perception framework for autonomous mobile and robotic systems, semantic analysis of 3D point clouds typically generated by LiDARs is key to numerous applications, such as object detection and recognition, and scene reconstruction. Scene semantic segmentation can be achieved by directly integrating 3D spatial data with specialized deep neural networks. Although this type of data provides rich geometric information regarding the surrounding environment, it also presents numerous challenges: its unstructured and sparse nature, its unpredictable size, and its demanding computational requirements. These characteristics hinder the real-time semantic analysis, particularly on resource-constrained hardware architectures that constitute the main computational components of numerous robotic applications. Therefore, in this paper, we investigate various 3D semantic segmentation methodologies and analyze their performance and capabilities for resource-constrained inference on embedded NVIDIA Jetson platforms. We evaluate them for a fair comparison through a standardized training protocol and data augmentations, providing benchmark results on the Jetson AGX Orin and AGX Xavier series for two large-scale outdoor datasets: SemanticKITTI and nuScenes.
Reconfigurable intelligent surfaces (RISs) have emerged as one of the most studied topics in recent years, hailed as a transformative technology with the potential to revolutionize future wireless systems. While RISs are recognized for their ability to enhance spectral efficiency, coverage, and the reliability of wireless channels, several challenges remain. Notably, convincing and profitable use cases must be developed before widespread commercial deployment can be realized. The first sixth-generation (6G) networks will most likely utilize upper mid-band frequencies (i.e., 7-24\,GHz). This is regarded as the \textit{golden band} since it combines good coverage, much new spectrum, and enables many antennas in compact form factors. There has been much prior work on channel modeling, coexistence, and possible implementation scenarios for these bands. There are significant frequency-specific challenges related to RIS deployment, use cases, number of required elements, channel estimation, and control. These are previously unaddressed for the upper mid-band. In this paper, we aim to bridge this gap by exploring various use cases, including RIS-assisted fixed wireless access (FWA), enhanced capacity in mobile communications, and increased reliability at the cell edge. We identify the conditions under which RIS can provide major benefits and optimal strategies for deploying RIS to enhance the performance of 6G upper mid-band communication systems.
Building LiDAR generative models holds promise as powerful data priors for restoration, scene manipulation, and scalable simulation in autonomous mobile robots. In recent years, approaches using diffusion models have emerged, significantly improving training stability and generation quality. Despite the success of diffusion models, generating high-quality samples requires numerous iterations of running neural networks, and the increasing computational cost can pose a barrier to robotics applications. To address this challenge, this paper presents R2Flow, a fast and high-fidelity generative model for LiDAR data. Our method is based on rectified flows that learn straight trajectories, simulating data generation with much fewer sampling steps against diffusion models. We also propose a efficient Transformer-based model architecture for processing the image representation of LiDAR range and reflectance measurements. Our experiments on the unconditional generation of the KITTI-360 dataset demonstrate the effectiveness of our approach in terms of both efficiency and quality.
Artificial Neural Networks (ANNs) require significant amounts of data and computational resources to achieve high effectiveness in performing the tasks for which they are trained. To reduce resource demands, various techniques, such as Neuron Pruning, are applied. Due to the complex structure of ANNs, interpreting the behavior of hidden layers and the features they recognize in the data is challenging. A lack of comprehensive understanding of which information is utilized during inference can lead to inefficient use of available data, thereby lowering the overall performance of the models. In this paper, we introduce a method for integrating Topological Data Analysis (TDA) with Convolutional Neural Networks (CNN) in the context of image recognition. This method significantly enhances the performance of neural networks by leveraging a broader range of information present in the data, enabling the model to make more informed and accurate predictions. Our approach, further referred to as Vector Stitching, involves combining raw image data with additional topological information derived through TDA methods. This approach enables the neural network to train on an enriched dataset, incorporating topological features that might otherwise remain unexploited or not captured by the network's inherent mechanisms. The results of our experiments highlight the potential of incorporating results of additional data analysis into the network's inference process, resulting in enhanced performance in pattern recognition tasks in digital images, particularly when using limited datasets. This work contributes to the development of methods for integrating TDA with deep learning and explores how concepts from Information Theory can explain the performance of such hybrid methods in practical implementation environments.
In the space sector, due to environmental conditions and restricted accessibility, robust fault detection methods are imperative for ensuring mission success and safeguarding valuable assets. This work proposes a novel approach leveraging Physics-Informed Real NVP neural networks, renowned for their ability to model complex and high-dimensional distributions, augmented with a self-supervised task based on sensors' data permutation. It focuses on enhancing fault detection within the satellite multivariate time series. The experiments involve various configurations, including pre-training with self-supervision, multi-task learning, and standalone self-supervised training. Results indicate significant performance improvements across all settings. In particular, employing only the self-supervised loss yields the best overall results, suggesting its efficacy in guiding the network to extract relevant features for fault detection. This study presents a promising direction for improving fault detection in space systems and warrants further exploration in other datasets and applications.
Recent advancements in long-term time series forecasting (LTSF) have primarily focused on capturing cross-time and cross-variate (channel) dependencies within historical data. However, a critical aspect often overlooked by many existing methods is the explicit incorporation of \textbf{time-related features} (e.g., season, month, day of the week, hour, minute), which are essential components of time series data. The absence of this explicit time-related encoding limits the ability of current models to capture cyclical or seasonal trends and long-term dependencies, especially with limited historical input. To address this gap, we introduce a simple yet highly efficient module designed to encode time-related features, Time Stamp Forecaster (TimeSter), thereby enhancing the backbone's forecasting performance. By integrating TimeSter with a linear backbone, our model, TimeLinear, significantly improves the performance of a single linear projector, reducing MSE by an average of 23\% on benchmark datasets such as Electricity and Traffic. Notably, TimeLinear achieves these gains while maintaining exceptional computational efficiency, delivering results that are on par with or exceed state-of-the-art models, despite using a fraction of the parameters.
Traditional neural networks employ fixed weights during inference, limiting their ability to adapt to changing input conditions, unlike biological neurons that adjust signal strength dynamically based on stimuli. This discrepancy between artificial and biological neurons constrains neural network flexibility and adaptability. To bridge this gap, we propose a novel framework for adaptive neural networks, where neuron weights are modeled as functions of the input signal, allowing the network to adjust dynamically in real-time. Importantly, we achieve this within the same traditional architecture of an Artificial Neural Network, maintaining structural familiarity while introducing dynamic adaptability. In our research, we apply Chebyshev polynomials as one of the many possible decomposition methods to achieve this adaptive weighting mechanism, with polynomial coefficients learned during training. Out of the 145 datasets tested, our adaptive Chebyshev neural network demonstrated a marked improvement over an equivalent MLP in approximately 8\% of cases, performing strictly better on 121 datasets. In the remaining 24 datasets, the performance of our algorithm matched that of the MLP, highlighting its ability to generalize standard neural network behavior while offering enhanced adaptability. As a generalized form of the MLP, this model seamlessly retains MLP performance where needed while extending its capabilities to achieve superior accuracy across a wide range of complex tasks. These results underscore the potential of adaptive neurons to enhance generalization, flexibility, and robustness in neural networks, particularly in applications with dynamic or non-linear data dependencies.
Recurrent stochastic configuration networks (RSCNs) have shown great potential in modelling nonlinear dynamic systems with uncertainties. This paper presents an RSCN with hybrid regularization to enhance both the learning capacity and generalization performance of the network. Given a set of temporal data, the well-known least absolute shrinkage and selection operator (LASSO) is employed to identify the significant order variables. Subsequently, an improved RSCN with L2 regularization is introduced to approximate the residuals between the output of the target plant and the LASSO model. The output weights are updated in real-time through a projection algorithm, facilitating a rapid response to dynamic changes within the system. A theoretical analysis of the universal approximation property is provided, contributing to the understanding of the network's effectiveness in representing various complex nonlinear functions. Experimental results from a nonlinear system identification problem and two industrial predictive tasks demonstrate that the proposed method outperforms other models across all testing datasets.
With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendous progress in recent years, yet these developed traditional algorithms inevitably meet the performance bottleneck due to the lack of the ability to comprehensively analyse and interpret these strongly heterogeneous data. Hence, this non-negligible limitation further arouses an intense demand for an alternative tool with powerful processing competence. Deep learning (DL), as a cutting-edge technology, has witnessed remarkable breakthroughs in numerous computer vision tasks owing to its impressive ability in data representation and reconstruction. Naturally, it has been successfully applied to the field of multimodal RS data fusion, yielding great improvement compared with traditional methods. This survey aims to present a systematic overview in DL-based multimodal RS data fusion. More specifically, some essential knowledge about this topic is first given. Subsequently, a literature survey is conducted to analyse the trends of this field. Some prevalent sub-fields in the multimodal RS data fusion are then reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radar-optical, and RS-Geospatial Big Data fusion. Furthermore, We collect and summarize some valuable resources for the sake of the development in multimodal RS data fusion. Finally, the remaining challenges and potential future directions are highlighted.
Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.
Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.