This paper presents a data-driven, task-specific paradigm for experimental design, to shorten acquisition time, reduce costs, and accelerate the deployment of imaging devices. Current approaches in experimental design focus on model-parameter estimation and require specification of a particular model, whereas in imaging, other tasks may drive the design. Furthermore, such approaches often lead to intractable optimization problems in real-world imaging applications. Here we present a new paradigm for experimental design that simultaneously optimizes the design (set of image channels) and trains a machine-learning model to execute a user-specified image-analysis task. The approach obtains data densely-sampled over the measurement space (many image channels) for a small number of acquisitions, then identifies a subset of channels of prespecified size that best supports the task. We propose a method: TADRED for TAsk-DRiven Experimental Design in imaging, to identify the most informative channel-subset whilst simultaneously training a network to execute the task given the subset. Experiments demonstrate the potential of TADRED in diverse imaging applications: several clinically-relevant tasks in magnetic resonance imaging; and remote sensing and physiological applications of hyperspectral imaging. Results show substantial improvement over classical experimental design, two recent application-specific methods within the new paradigm, and state-of-the-art approaches in supervised feature selection. We anticipate further applications of our approach. Code is available: //github.com/sbb-gh/experimental-design-multichannel
In recommender systems, multi-behavior methods have demonstrated their effectiveness in mitigating issues like data sparsity, a common challenge in traditional single-behavior recommendation approaches. These methods typically infer user preferences from various auxiliary behaviors and apply them to the target behavior for recommendations. However, this direct transfer can introduce noise to the target behavior in recommendation, due to variations in user attention across different behaviors. To address this issue, this paper introduces a novel approach, Behavior-Contextualized Item Preference Modeling (BCIPM), for multi-behavior recommendation. Our proposed Behavior-Contextualized Item Preference Network discerns and learns users' specific item preferences within each behavior. It then considers only those preferences relevant to the target behavior for final recommendations, significantly reducing noise from auxiliary behaviors. These auxiliary behaviors are utilized solely for training the network parameters, thereby refining the learning process without compromising the accuracy of the target behavior recommendations. To further enhance the effectiveness of BCIPM, we adopt a strategy of pre-training the initial embeddings. This step is crucial for enriching the item-aware preferences, particularly in scenarios where data related to the target behavior is sparse. Comprehensive experiments conducted on four real-world datasets demonstrate BCIPM's superior performance compared to several leading state-of-the-art models, validating the robustness and efficiency of our proposed approach.
As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traffic scenarios. It represents a significant leap forward, achieving marked performance improvements on several key datasets. Specifically, it surpasses existing benchmarks with gains of 16.2% on the Next Generation Simulation (NGSIM), 27.4% on the Highway Drone (HighD), and 19.8% on the Macao Connected Autonomous Driving (MoCAD) dataset. Our proposed model shows exceptional proficiency in handling corner cases, essential for real-world applications. Moreover, its robustness is evident in scenarios with missing or limited data, outperforming most of the state-of-the-art baselines. This adaptability and resilience position our model as a viable tool for real-world autonomous driving systems, heralding a new standard in vehicle trajectory prediction for enhanced safety and efficiency.
This paper presents an innovative enhancement to the Sphere as Prior Generative Adversarial Network (SP-GAN) model, a state-of-the-art GAN designed for point cloud generation. A novel method is introduced for point cloud generation that elevates the structural integrity and overall quality of the generated point clouds by incorporating topological priors into the training process of the generator. Specifically, this work utilizes the K-means algorithm to segment a point cloud from the repository into clusters and extract centroids, which are then used as priors in the generation process of the SP-GAN. Furthermore, the discriminator component of the SP-GAN utilizes the identical point cloud that contributed the centroids, ensuring a coherent and consistent learning environment. This strategic use of centroids as intuitive guides not only boosts the efficiency of global feature learning but also substantially improves the structural coherence and fidelity of the generated point clouds. By applying the K-means algorithm to generate centroids as the prior, the work intuitively and experimentally demonstrates that such a prior enhances the quality of generated point clouds.
This work initiates the study of a beyond-diagonal reconfigurable intelligent surface (BD-RIS)-aided transmitter architecture for integrated sensing and communication (ISAC) in the millimeter-wave (mmWave) frequency band. Deploying BD-RIS at the transmitter side not only alleviates the need for extensive fully digital radio frequency (RF) chains but also enhances both communication and sensing performance. These benefits are facilitated by the additional design flexibility introduced by the fully-connected scattering matrix of BD-RIS. To achieve the aforementioned benefits, in this work, we propose an efficient two-stage algorithm to design the digital beamforming of the transmitter and the scattering matrix of the BD-RIS with the aim of jointly maximizing the sum rate for multiple communication users and minimizing the largest eigenvalue of the Cramer-Rao bound (CRB) matrix for multiple sensing targets. Numerical results show that the transmitter-side BD-RIS-aided mmWave ISAC outperforms the conventional diagonal-RIS-aided ones in both communication and sensing performance.
This paper explores the importance of text sentiment analysis and classification in the field of natural language processing, and proposes a new approach to sentiment analysis and classification based on the bidirectional gated recurrent units (GRUs) model. The study firstly analyses the word cloud model of the text with six sentiment labels, and then carries out data preprocessing, including the steps of removing special symbols, punctuation marks, numbers, stop words and non-alphabetic parts. Subsequently, the data set is divided into training set and test set, and through model training and testing, it is found that the accuracy of the validation set is increased from 85% to 93% with training, which is an increase of 8%; at the same time, the loss value of the validation set decreases from 0.7 to 0.1 and tends to be stable, and the model is gradually close to the actual value, which can effectively classify the text emotions. The confusion matrix shows that the accuracy of the model on the test set reaches 94.8%, the precision is 95.9%, the recall is 99.1%, and the F1 score is 97.4%, which proves that the model has good generalisation ability and classification effect. Overall, the study demonstrated an effective method for text sentiment analysis and classification with satisfactory results.
Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loop mechanisms of bounding box and mask refinements. With an improved Generic RoI Extraction (GRoIE), we also address the feature-level imbalance at the Feature Pyramid Network (FPN) level, originated by a non-uniform integration between low- and high-level features from the backbone layers. In addition, the redesign of the architecture heads toward a fully convolutional approach with FCC further reduces the number of parameters and obtains more clues to the connection between the task to solve and the layers used. Moreover, our SBR-CNN model shows the same or even better improvements if adopted in conjunction with other state-of-the-art models. In fact, with a lightweight ResNet-50 as backbone, evaluated on COCO minival 2017 dataset, our model reaches 45.3% and 41.5% AP for object detection and instance segmentation, with 12 epochs and without extra tricks. The code is available at //github.com/IMPLabUniPr/mmdetection/tree/sbr_cnn
This paper presents a question-answering approach to extract document-level event-argument structures. We automatically ask and answer questions for each argument type an event may have. Questions are generated using manually defined templates and generative transformers. Template-based questions are generated using predefined role-specific wh-words and event triggers from the context document. Transformer-based questions are generated using large language models trained to formulate questions based on a passage and the expected answer. Additionally, we develop novel data augmentation strategies specialized in inter-sentential event-argument relations. We use a simple span-swapping technique, coreference resolution, and large language models to augment the training instances. Our approach enables transfer learning without any corpora-specific modifications and yields competitive results with the RAMS dataset. It outperforms previous work, and it is especially beneficial to extract arguments that appear in different sentences than the event trigger. We also present detailed quantitative and qualitative analyses shedding light on the most common errors made by our best model.
This paper proposes an approach to build 3D scene graphs in arbitrary indoor and outdoor environments. Such extension is challenging; the hierarchy of concepts that describe an outdoor environment is more complex than for indoors, and manually defining such hierarchy is time-consuming and does not scale. Furthermore, the lack of training data prevents the straightforward application of learning-based tools used in indoor settings. To address these challenges, we propose two novel extensions. First, we develop methods to build a spatial ontology defining concepts and relations relevant for indoor and outdoor robot operation. In particular, we use a Large Language Model (LLM) to build such an ontology, thus largely reducing the amount of manual effort required. Second, we leverage the spatial ontology for 3D scene graph construction using Logic Tensor Networks (LTN) to add logical rules, or axioms (e.g., "a beach contains sand"), which provide additional supervisory signals at training time thus reducing the need for labelled data, providing better predictions, and even allowing predicting concepts unseen at training time. We test our approach in a variety of datasets, including indoor, rural, and coastal environments, and show that it leads to a significant increase in the quality of the 3D scene graph generation with sparsely annotated data.
This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. However, making these methods practical and scalable to web-scale recommendation tasks with billions of items and hundreds of millions of users remains a challenge. Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a data-efficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model. We also develop an efficient MapReduce model inference algorithm to generate embeddings using a trained model. We deploy PinSage at Pinterest and train it on 7.5 billion examples on a graph with 3 billion nodes representing pins and boards, and 18 billion edges. According to offline metrics, user studies and A/B tests, PinSage generates higher-quality recommendations than comparable deep learning and graph-based alternatives. To our knowledge, this is the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures.