Current state-of-the-art crowd navigation approaches are mainly deep reinforcement learning (DRL)-based. However, DRL-based methods suffer from the issues of generalization and scalability. To overcome these challenges, we propose a method that includes a Collision Probability (CP) in the observation space to give the robot a sense of the level of danger of the moving crowd to help the robot navigate safely through crowds with unseen behaviors. We studied the effects of changing the number of moving obstacles to pay attention during navigation. During training, we generated local waypoints to increase the reward density and improve the learning efficiency of the system. Our approach was developed using deep reinforcement learning (DRL) and trained using the Gazebo simulator in a non-cooperative crowd environment with obstacles moving at randomized speeds and directions. We then evaluated our model on four different crowd-behavior scenarios. The results show that our method achieved a 100% success rate in all test settings. We compared our approach with a current state-of-the-art DRL-based approach, and our approach has performed significantly better, especially in terms of social safety. Importantly, our method can navigate in different crowd behaviors and requires no fine-tuning after being trained once. We further demonstrated the crowd navigation capability of our model in real-world tests.
The use of deep learning for radio modulation recognition has become prevalent in recent years. This approach automatically extracts high-dimensional features from large datasets, facilitating the accurate classification of modulation schemes. However, in real-world scenarios, it may not be feasible to gather sufficient training data in advance. Data augmentation is a method used to increase the diversity and quantity of training dataset and to reduce data sparsity and imbalance. In this paper, we propose data augmentation methods that involve replacing detail coefficients decomposed by discrete wavelet transform for reconstructing to generate new samples and expand the training set. Different generation methods are used to generate replacement sequences. Simulation results indicate that our proposed methods significantly outperform the other augmentation methods.
Deep learning (DL)-based channel state information (CSI) feedback has shown promising potential to improve spectrum efficiency in massive MIMO systems. However, practical DL approaches require a sizeable CSI dataset for each scenario, and require large storage or updating bandwidth for multiple learned models. To overcome this costly barrier, we develop a solution for efficient training and deployment enhancement of DL-based CSI feedback by exploiting a lightweight translation model to cope with new CSI environments and by proposing novel dataset augmentation based on domain knowledge. Specifically, we first develop a deep unfolding CSI feedback network, SPTM2-ISTANet+, which employs spherical normalization to address the challenge of path loss variation. We also introduce an integration of a trainable measurement matrix and residual CSI recovery blocks within SPTM2-ISTANet+ to improve efficiency and accuracy. Using SPTM2-ISTANet+ as the anchor feedback model, we propose an efficient scenario-adaptive CSI feedback architecture. This new CSI-TransNet exploits a plug-in module for CSI translation consisting of a sparsity aligning function and lightweight DL module to reuse pretrained models in unseen environments. To work with small datasets, we propose a lightweight and general augmentation strategy based on domain knowledge. Test results demonstrate the efficacy and efficiency of the proposed solution for accurate CSI feedback given limited measurements for unseen CSI environments.
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe their success to the use of parameterized policies. However, while theoretical guarantees have been established for this class of algorithms, especially in the tabular setting, the use of general parameterization schemes remains mostly unjustified. In this work, we introduce a novel framework for policy optimization based on mirror descent that naturally accommodates general parameterizations. The policy class induced by our scheme recovers known classes, e.g., softmax, and generates new ones depending on the choice of mirror map. Using our framework, we obtain the first result that guarantees linear convergence for a policy-gradient-based method involving general parameterization. To demonstrate the ability of our framework to accommodate general parameterization schemes, we provide its sample complexity when using shallow neural networks, show that it represents an improvement upon the previous best results, and empirically validate the effectiveness of our theoretical claims on classic control tasks.
Facial expression recognition (FER) is a crucial part of human-computer interaction. Existing FER methods achieve high accuracy and generalization based on different open-source deep models and training approaches. However, the performance of these methods is not always good when encountering practical settings, which are seldom explored. In this paper, we collected a new in-the-wild facial expression dataset for cross-domain validation. Twenty-three commonly used network architectures were implemented and evaluated following a uniform protocol. Moreover, various setups, in terms of input resolutions, class balance management, and pre-trained strategies, were verified to show the corresponding performance contribution. Based on extensive experiments on three large-scale FER datasets and our practical cross-validation, we ranked network architectures and summarized a set of recommendations on deploying deep FER methods in real scenarios. In addition, potential ethical rules, privacy issues, and regulations were discussed in practical FER applications such as marketing, education, and entertainment business.
Fish tracking is a key technology for obtaining movement trajectories and identifying abnormal behavior. However, it faces considerable challenges, including occlusion, multi-scale tracking, and fish deformation. Notably, extant reviews have focused more on behavioral analysis rather than providing a comprehensive overview of computer vision-based fish tracking approaches. This paper presents a comprehensive review of the advancements of fish tracking technologies over the past seven years (2017-2023). It explores diverse fish tracking techniques with an emphasis on fundamental localization and tracking methods. Auxiliary plugins commonly integrated into fish tracking systems, such as underwater image enhancement and re-identification, are also examined. Additionally, this paper summarizes open-source datasets, evaluation metrics, challenges, and applications in fish tracking research. Finally, a comprehensive discussion offers insights and future directions for vision-based fish tracking techniques. We hope that our work could provide a partial reference in the development of fish tracking algorithms.
Deep learning-empowered semantic communication is regarded as a promising candidate for future 6G networks. Although existing semantic communication systems have achieved superior performance compared to traditional methods, the end-to-end architecture adopted by most semantic communication systems is regarded as a black box, leading to the lack of explainability. To tackle this issue, in this paper, a novel semantic communication system with a shared knowledge base is proposed for text transmissions. Specifically, a textual knowledge base constructed by inherently readable sentences is introduced into our system. With the aid of the shared knowledge base, the proposed system integrates the message and corresponding knowledge from the shared knowledge base to obtain the residual information, which enables the system to transmit fewer symbols without semantic performance degradation. In order to make the proposed system more reliable, the semantic self-information and the source entropy are mathematically defined based on the knowledge base. Furthermore, the knowledge base construction algorithm is developed based on a similarity-comparison method, in which a pre-configured threshold can be leveraged to control the size of the knowledge base. Moreover, the simulation results have demonstrated that the proposed approach outperforms existing baseline methods in terms of transmitted data size and sentence similarity.
Modern data analytics workloads combine relational data processing with machine learning (ML). Most DBMS handle these workloads by offloading these ML operations to external specialized ML systems. While both DBMS and ML systems go to great lengths to optimize performance for their specific workloads, significant performance is lost when used in combination, due to data movement across system boundaries, conversions between incompatible internal data formats, and the lack of cross system optimizations. A key idea to remove these bottlenecks is to integrate existing data manipulation systems with ML systems by building a common intermediate layer (IR). Although this idea has been explored before (Weld, Delite), previous such attempts require significant re-engineering of prior systems and still fall short in achieving best-of-breed performance for individual tasks (e.g., SQL, Deep Learning). Specifically, they rely on re-implementing existing systems using a generic set of operators and fail to match best-of-breed individual performance due to the inability to recover high-level optimizations from this generic IR through compiler analysis. We present Flern, the first intermediate-layer integration between DB and ML systems that are best-of-breed individually, competitive with the best compiled query engines such as HyPer on comprehensive relational benchmarks (TPC-H) and competitive with TensorFlow and PyTorch in state-of-the-art ML models (e.g., DeepSpeech, SqueezeNet, Transformers) and also represents a new state-of-the-art for integration. A key realization is to architect intermediate layers based on generative programming capabilities, which preserves high-level contextual information for cross optimizations and enables the construction of a variety of complex structures and cross system optimizations with minimal effort.
Successfully training Physics Informed Neural Networks (PINNs) for highly nonlinear PDEs on complex 3D domains remains a challenging task. In this paper, PINNs are employed to solve the 3D incompressible Navier-Stokes (NS) equations at moderate to high Reynolds numbers for complex geometries. The presented method utilizes very sparsely distributed solution data in the domain. A detailed investigation on the effect of the amount of supplied data and the PDE-based regularizers is presented. Additionally, a hybrid data-PINNs approach is used to generate a surrogate model of a realistic flow-thermal electronics design problem. This surrogate model provides near real-time sampling and was found to outperform standard data-driven neural networks when tested on unseen query points. The findings of the paper show how PINNs can be effective when used in conjunction with sparse data for solving 3D nonlinear PDEs or for surrogate modeling of design spaces governed by them.
Security challenges for Cloud or Fog-based machine learning services pose several concerns. Securing the underlying Cloud or Fog services is essential, as successful attacks against these services, on which machine learning applications rely, can lead to significant impairments of these applications. Because the requirements for AI applications can also be different, we differentiate according to whether they are used in the Cloud or in a Fog Computing network. This then also results in different threats or attack possibilities. For Cloud platforms, the responsibility for security can be divided between different parties. Security deficiencies at a lower level can have a direct impact on the higher level where user data is stored. While responsibilities are simpler for Fog Computing networks, by moving services to the edge of the network, we have to secure them against physical access to the devices. We conclude by outlining specific information security requirements for AI applications.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.