In this paper, we develop upon the topic of loss function learning, an emergent meta-learning paradigm that aims to learn loss functions that significantly improve the performance of the models trained under them. Specifically, we propose a new meta-learning framework for task and model-agnostic loss function learning via a hybrid search approach. The framework first uses genetic programming to find a set of symbolic loss functions. Second, the set of learned loss functions is subsequently parameterized and optimized via unrolled differentiation. The versatility and performance of the proposed framework are empirically validated on a diverse set of supervised learning tasks. Results show that the learned loss functions bring improved convergence, sample efficiency, and inference performance on tabulated, computer vision, and natural language processing problems, using a variety of task-specific neural network architectures.
Continuing our analysis of quantum machine learning applied to our use-case of malware detection, we investigate the potential of quantum convolutional neural networks. More precisely, we propose a new architecture where data is uploaded all along the quantum circuit. This allows us to use more features from the data, hence giving to the algorithm more information, without having to increase the number of qubits that we use for the quantum circuit. This approach is motivated by the fact that we do not always have great amounts of data, and that quantum computers are currently restricted in their number of logical qubits.
In this paper, we explore the application of cognitive intelligence in legal knowledge, focusing on the development of judicial artificial intelligence. Utilizing natural language processing (NLP) as the core technology, we propose a method for the automatic construction of case knowledge graphs for judicial cases. Our approach centers on two fundamental NLP tasks: entity recognition and relationship extraction. We compare two pre-trained models for entity recognition to establish their efficacy. Additionally, we introduce a multi-task semantic relationship extraction model that incorporates translational embedding, leading to a nuanced contextualized case knowledge representation. Specifically, in a case study involving a "Motor Vehicle Traffic Accident Liability Dispute," our approach significantly outperforms the baseline model. The entity recognition F1 score improved by 0.36, while the relationship extraction F1 score increased by 2.37. Building on these results, we detail the automatic construction process of case knowledge graphs for judicial cases, enabling the assembly of knowledge graphs for hundreds of thousands of judgments. This framework provides robust semantic support for applications of judicial AI, including the precise categorization and recommendation of related cases.
In this paper, we formalize an optimization framework for analog beamforming in the context of monostatic integrated sensing and communication (ISAC), where we also address the problem of self-interference in the analog domain. As a result, we derive semidefinite programs to approach detection-optimal transmit and receive beamformers, and we devise a superiorized iterative projection algorithm to approximate them. Our simulations show that this approach outperforms the detection performance of well-known design techniques for ISAC beamforming, while it achieves satisfactory self-interference suppression.
While deep reinforcement learning (DRL) has attracted a rapidly growing interest in solving the problem of navigation without global maps, DRL typically leads to a mediocre navigation performance in practice due to the gap between the training scene and the actual test scene. To quantify the transferability of a DRL agent between the training and test scenes, this paper proposes a new transferability metric -- the scene similarity calculated using an improved image template matching algorithm. Specifically, two transferability performance indicators are designed including the global scene similarity that evaluates the overall robustness of a DRL algorithm and the local scene similarity that serves as a safety measure when a DRL agent is deployed without a global map. In addition, this paper proposes the use of a local map that fuses 2D LiDAR data with spatial information of both the agent and the destination as the DRL observation, aiming to improve the transferability of DRL navigation algorithms. With a wheeled robot as the case study platform, both simulation and real-world experiments are conducted in a total of 26 different scenes. The experimental results affirm the robustness of the local map observation design and demonstrate the strong correlation between the scene similarity metric and the success rate of DRL navigation algorithms.
In this paper, we investigate Bayesian model compression in federated learning (FL) to construct sparse models that can achieve both communication and computation efficiencies. We propose a decentralized Turbo variational Bayesian inference (D-Turbo-VBI) FL framework where we firstly propose a hierarchical sparse prior to promote a clustered sparse structure in the weight matrix. Then, by carefully integrating message passing and VBI with a decentralized turbo framework, we propose the D-Turbo-VBI algorithm which can (i) reduce both upstream and downstream communication overhead during federated training, and (ii) reduce the computational complexity during local inference. Additionally, we establish the convergence property for thr proposed D-Turbo-VBI algorithm. Simulation results show the significant gain of our proposed algorithm over the baselines in reducing communication overhead during federated training and computational complexity of final model.
In this paper, we introduce a new functional point of view on bilevel optimization problems for machine learning, where the inner objective is minimized over a function space. These types of problems are most often solved by using methods developed in the parametric setting, where the inner objective is strongly convex with respect to the parameters of the prediction function. The functional point of view does not rely on this assumption and notably allows using over-parameterized neural networks as the inner prediction function. We propose scalable and efficient algorithms for the functional bilevel optimization problem and illustrate the benefits of our approach on instrumental regression and reinforcement learning tasks, which admit natural functional bilevel structures.
Data augmentations are known to improve robustness in speech-processing tasks. In this study, we summarize and compare different data augmentation strategies using S3PRL toolkit. We explore how HuBERT and wav2vec perform using different augmentation techniques (SpecAugment, Gaussian Noise, Speed Perturbation) for Phoneme Recognition (PR) and Automatic Speech Recognition (ASR) tasks. We evaluate model performance in terms of phoneme error rate (PER) and word error rate (WER). From the experiments, we observed that SpecAugment slightly improves the performance of HuBERT and wav2vec on the original dataset. Also, we show that models trained using the Gaussian Noise and Speed Perturbation dataset are more robust when tested with augmented test sets.
In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning. We introduce a set of Levy processes and construct a corresponding reinforcement learning model. To simulate the entire process, we use deep neural networks to represent the solutions and non-local terms of the equations. Subsequently, we train the networks using the temporal difference error, termination condition, and properties of the non-local terms as the loss function. The relative error of the method reaches O(10^{-3}) in 100-dimensional experiments and O(10^{-4}) in one-dimensional pure jump problems. Additionally, our method demonstrates the advantages of low computational cost and robustness, making it well-suited for addressing problems with different forms and intensities of jumps.
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).
In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.