Text Simplification is an ongoing problem in Natural Language Processing, solution to which has varied implications. In conjunction with the TSAR-2022 Workshop @EMNLP2022 Lexical Simplification is the process of reducing the lexical complexity of a text by replacing difficult words with easier to read (or understand) expressions while preserving the original information and meaning. This paper explains the work done by our team "teamPN" for English sub task. We created a modular pipeline which combines modern day transformers based models with traditional NLP methods like paraphrasing and verb sense disambiguation. We created a multi level and modular pipeline where the target text is treated according to its semantics(Part of Speech Tag). Pipeline is multi level as we utilize multiple source models to find potential candidates for replacement, It is modular as we can switch the source models and their weight-age in the final re-ranking.
Driver process models play a central role in the testing, verification, and development of automated and autonomous vehicle technologies. Prior models developed from control theory and physics-based rules are limited in automated vehicle applications due to their restricted behavioral repertoire. Data-driven machine learning models are more capable than rule-based models but are limited by the need for large training datasets and their lack of interpretability, i.e., an understandable link between input data and output behaviors. We propose a novel car following modeling approach using active inference, which has comparable behavioral flexibility to data-driven models while maintaining interpretability. We assessed the proposed model, the Active Inference Driving Agent (AIDA), through a benchmark analysis against the rule-based Intelligent Driver Model, and two neural network Behavior Cloning models. The models were trained and tested on a real-world driving dataset using a consistent process. The testing results showed that the AIDA predicted driving controls significantly better than the rule-based Intelligent Driver Model and had similar accuracy to the data-driven neural network models in three out of four evaluations. Subsequent interpretability analyses illustrated that the AIDA's learned distributions were consistent with driver behavior theory and that visualizations of the distributions could be used to directly comprehend the model's decision making process and correct model errors attributable to limited training data. The results indicate that the AIDA is a promising alternative to black-box data-driven models and suggest a need for further research focused on modeling driving style and model training with more diverse datasets.
Being able to efficiently retrieve the required building information is critical for construction project stakeholders to carry out their engineering and management activities. Natural language interface (NLI) systems are emerging as a time and cost-effective way to query Building Information Models (BIMs). However, the existing methods cannot logically combine different constraints to perform fine-grained queries, dampening the usability of natural language (NL)-based BIM queries. This paper presents a novel ontology-aided semantic parser to automatically map natural language queries (NLQs) that contain different attribute and relational constraints into computer-readable codes for querying complex BIM models. First, a modular ontology was developed to represent NL expressions of Industry Foundation Classes (IFC) concepts and relationships, and was then populated with entities from target BIM models to assimilate project-specific information. Hereafter, the ontology-aided semantic parser progressively extracts concepts, relationships, and value restrictions from NLQs to fully identify constraint conditions, resulting in standard SPARQL queries with reasoning rules to successfully retrieve IFC-based BIM models. The approach was evaluated based on 225 NLQs collected from BIM users, with a 91% accuracy rate. Finally, a case study about the design-checking of a real-world residential building demonstrates the practical value of the proposed approach in the construction industry.
Deep learning-based human activity recognition (HAR) methods have shown great promise in the applications of smart healthcare systems and wireless body sensor network (BSN). Despite their demonstrated performance in laboratory settings, the real-world implementation of such methods is still hindered by the cross-subject issue when adapting to new users. To solve this issue, we propose ActiveSelfHAR, a framework that combines active learning's benefit of sparsely acquiring data with actual labels and self- training's benefit of effectively utilizing unlabeled data to enable the deep model to adapt to the target domain, i.e., the new users. In this framework, the model trained in the last iteration or the source domain is first utilized to generate pseudo labels of the target-domain samples and construct a self-training set based on the confidence score. Second, we propose to use the spatio-temporal relationships among the samples in the non-self-training set to augment the core set selected by active learning. Finally, we combine the self-training set and the augmented core set to fine-tune the model. We demonstrate our method by comparing it with state-of-the-art methods on two IMU-based datasets and an EMG-based dataset. Our method presents similar HAR accuracies with the upper bound, i.e. fully supervised fine-tuning with less than 1\% labeled data of the target dataset and significantly improves data efficiency and time cost. Our work highlights the potential of implementing user-independent HAR methods into smart healthcare systems and BSN.
Efficiently representing source code is essential for various software engineering tasks such as code classification and code clone detection. Existing approaches for representing source code primarily use AST, and only a few works focus on semantic graphs such as CFG and PDG, which contain essential information about source code that AST does not have. Even though some works tried to utilize multiple representations, they do not provide any insights about the costs and benefits of using multiple representations against a single appropriate representation for the task. Moreover, they use hand-crafted program features to solve a specific task and have limited use cases. The primary goal of this paper is to discuss the implications of utilizing multiple code representations, specifically AST, CFG, and PDG, and how each of them affects the performance of a task. In this process, we use an approach that can use program features from multiple code graphs while not specifically coupling this approach to a specific task or a language. Our approach stems from the idea of modeling AST as a set of paths and using a learning model to capture program properties. We modify an existing AST path-based approach to accept multiple code representations as input. We do this since it allows us to measure the performance boost provided by additional representations over AST. We evaluate our approach on three tasks: Method Naming, Program Classification, and Code Clone Detection. Our approach increases the performance on these three tasks by 11% (F1), 15.7% (Accuracy), and 9.3% (F1), respectively, over the baseline. We discuss the impact of semantic features from the CFG and PDG paths on performance and the additional overheads incurred through our approach. We envision this work providing researchers with a lens to evaluate combinations of source code representations for various tasks.
Sentence simplification tends to focus on the generic simplification of sentences by making them more readable and easier to understand. This paper provides a dataset aimed at training models that perform subject aware sentence simplifications rather than simplifying sentences as a whole. We also test models on that dataset which are inspired by model architecture used in abstractive summarization. We hand generated portions of the data and augment the dataset by further manipulating those hand written simplifications. Our results show that data-augmentation, data-masking, and model architecture choices used in summarization provide a solid baseline for comparison on subject aware simplification.
Code example is a crucial part of good documentation. It helps the developers to understand the documentation easily and use the corresponding code unit (e.g., method) properly. However, many official documentation still lacks (good) code example and it is one of the common documentation issues as found by several studies. Hence in this paper, we consider automatic code example generation for documentation, a direction less explored by the existing research. We employ Codex, a GPT-3 based model, pre-trained on both natural and programming languages to generate code examples from source code and documentation given as input. Our preliminary investigation on 40 scikit-learn methods reveals that this approach is able to generate good code examples where 72.5% code examples were executed without error (passability) and 82.5% properly dealt with the target method and documentation (relevance). We also find that incorporation of error logs (produced by the compiler while executing a failed code example) in the input further improves the passability from 72.5% to 87.5%. Thus, our investigation sets the base of documentation-specific code example generation and warrants in-depth future studies.
New retrieval tasks have always been emerging, thus urging the development of new retrieval models. However, instantiating a retrieval model for each new retrieval task is resource-intensive and time-consuming, especially for a retrieval model that employs a large-scale pre-trained language model. To address this issue, we shift to a novel retrieval paradigm called modular retrieval, which aims to solve new retrieval tasks by instead composing multiple existing retrieval modules. Built upon the paradigm, we propose a retrieval model with modular prompt tuning named REMOP. It constructs retrieval modules subject to task attributes with deep prompt tuning, and yields retrieval models subject to tasks with module composition. We validate that, REMOP inherently with modularity not only has appealing generalizability and interpretability in preliminary explorations, but also achieves comparable performance to state-of-the-art retrieval models on a zero-shot retrieval benchmark.\footnote{Our code is available at \url{//github.com/FreedomIntelligence/REMOP}}
CLIP has enabled new and exciting joint vision-language applications, one of which is open-vocabulary segmentation, which can locate any segment given an arbitrary text query. In our research, we ask whether it is possible to discover semantic segments without any user guidance in the form of text queries or predefined classes, and label them using natural language automatically? We propose a novel problem zero-guidance segmentation and the first baseline that leverages two pre-trained generalist models, DINO and CLIP, to solve this problem without any fine-tuning or segmentation dataset. The general idea is to first segment an image into small over-segments, encode them into CLIP's visual-language space, translate them into text labels, and merge semantically similar segments together. The key challenge, however, is how to encode a visual segment into a segment-specific embedding that balances global and local context information, both useful for recognition. Our main contribution is a novel attention-masking technique that balances the two contexts by analyzing the attention layers inside CLIP. We also introduce several metrics for the evaluation of this new task. With CLIP's innate knowledge, our method can precisely locate the Mona Lisa painting among a museum crowd. Project page: //zero-guide-seg.github.io/.
With the advances of data-driven machine learning research, a wide variety of prediction problems have been tackled. It has become critical to explore how machine learning and specifically deep learning methods can be exploited to analyse healthcare data. A major limitation of existing methods has been the focus on grid-like data; however, the structure of physiological recordings are often irregular and unordered which makes it difficult to conceptualise them as a matrix. As such, graph neural networks have attracted significant attention by exploiting implicit information that resides in a biological system, with interactive nodes connected by edges whose weights can be either temporal associations or anatomical junctions. In this survey, we thoroughly review the different types of graph architectures and their applications in healthcare. We provide an overview of these methods in a systematic manner, organized by their domain of application including functional connectivity, anatomical structure and electrical-based analysis. We also outline the limitations of existing techniques and discuss potential directions for future research.
Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.