Precise thigh muscle volumes are crucial to monitor the motor functionality of patients with diseases that may result in various degrees of thigh muscle loss. T1-weighted MRI is the default surrogate to obtain thigh muscle masks due to its contrast between muscle and fat signals. Deep learning approaches have recently been widely used to obtain these masks through segmentation. However, due to the insufficient amount of precise annotations, thigh muscle masks generated by deep learning approaches tend to misclassify intra-muscular fat (IMF) as muscle impacting the analysis of muscle volumetrics. As IMF is infiltrated inside the muscle, human annotations require expertise and time. Thus, precise muscle masks where IMF is excluded are limited in practice. To alleviate this, we propose a few-shot segmentation framework to generate thigh muscle masks excluding IMF. In our framework, we design a novel pseudo-label correction and evaluation scheme, together with a new noise robust loss for exploiting high certainty areas. The proposed framework only takes $1\%$ of the fine-annotated training dataset, and achieves comparable performance with fully supervised methods according to the experimental results.
Research in 3D semantic segmentation has been increasing performance metrics, like the IoU, by scaling model complexity and computational resources, leaving behind researchers and practitioners that (1) cannot access the necessary resources and (2) do need transparency on the model decision mechanisms. In this paper, we propose SCENE-Net, a low-resource white-box model for 3D point cloud semantic segmentation. SCENE-Net identifies signature shapes on the point cloud via group equivariant non-expansive operators (GENEOs), providing intrinsic geometric interpretability. Our training time on a laptop is 85~min, and our inference time is 20~ms. SCENE-Net has 11 trainable geometrical parameters and requires fewer data than black-box models. SCENE--Net offers robustness to noisy labeling and data imbalance and has comparable IoU to state-of-the-art methods. With this paper, we release a 40~000 Km labeled dataset of rural terrain point clouds and our code implementation.
Training learnable metrics using modern language models has recently emerged as a promising method for the automatic evaluation of machine translation. However, existing human evaluation datasets for text simplification have limited annotations that are based on unitary or outdated models, making them unsuitable for this approach. To address these issues, we introduce the SimpEval corpus that contains: SimpEval_past, comprising 12K human ratings on 2.4K simplifications of 24 past systems, and SimpEval_2022, a challenging simplification benchmark consisting of over 1K human ratings of 360 simplifications including GPT-3.5 generated text. Training on SimpEval, we present LENS, a Learnable Evaluation Metric for Text Simplification. Extensive empirical results show that LENS correlates much better with human judgment than existing metrics, paving the way for future progress in the evaluation of text simplification. We also introduce Rank and Rate, a human evaluation framework that rates simplifications from several models in a list-wise manner using an interactive interface, which ensures both consistency and accuracy in the evaluation process and is used to create the SimpEval datasets.
Training a classifier exploiting a huge amount of supervised data is expensive or even prohibited in a situation, where the labeling cost is high. The remarkable progress in working with weaker forms of supervision is binary classification from multiple unlabeled datasets which requires the knowledge of exact class priors for all unlabeled datasets. However, the availability of class priors is restrictive in many real-world scenarios. To address this issue, we propose to solve a new problem setting, i.e., binary classification from multiple unlabeled datasets with only one pairwise numerical relationship of class priors (MU-OPPO), which knows the relative order (which unlabeled dataset has a higher proportion of positive examples) of two class-prior probabilities for two datasets among multiple unlabeled datasets. In MU-OPPO, we do not need the class priors for all unlabeled datasets, but we only require that there exists a pair of unlabeled datasets for which we know which unlabeled dataset has a larger class prior. Clearly, this form of supervision is easier to be obtained, which can make labeling costs almost free. We propose a novel framework to handle the MU-OPPO problem, which consists of four sequential modules: (i) pseudo label assignment; (ii) confident example collection; (iii) class prior estimation; (iv) classifier training with estimated class priors. Theoretically, we analyze the gap between estimated class priors and true class priors under the proposed framework. Empirically, we confirm the superiority of our framework with comprehensive experiments. Experimental results demonstrate that our framework brings smaller estimation errors of class priors and better performance of binary classification.
Recent voice assistants are usually based on the cascade spoken language understanding (SLU) solution, which consists of an automatic speech recognition (ASR) engine and a natural language understanding (NLU) system. Because such approach relies on the ASR output, it often suffers from the so-called ASR error propagation. In this work, we investigate impacts of this ASR error propagation on state-of-the-art NLU systems based on pre-trained language models (PLM), such as BERT and RoBERTa. Moreover, a multimodal language understanding (MLU) module is proposed to mitigate SLU performance degradation caused by errors present in the ASR transcript. The MLU benefits from self-supervised features learned from both audio and text modalities, specifically Wav2Vec for speech and Bert/RoBERTa for language. Our MLU combines an encoder network to embed the audio signal and a text encoder to process text transcripts followed by a late fusion layer to fuse audio and text logits. We found that the proposed MLU showed to be robust towards poor quality ASR transcripts, while the performance of BERT and RoBERTa are severely compromised. Our model is evaluated on five tasks from three SLU datasets and robustness is tested using ASR transcripts from three ASR engines. Results show that the proposed approach effectively mitigates the ASR error propagation problem, surpassing the PLM models' performance across all datasets for the academic ASR engine.
In the realm of EEG decoding, enhancing the performance of artificial neural networks (ANNs) carries significant potential. This study introduces a novel approach, termed "weight freezing", that is anchored on the principles of ANN regularization and neuroscience prior knowledge. The concept of weight freezing revolves around the idea of reducing certain neurons' influence on the decision-making process for a specific EEG task by freezing specific weights in the fully connected layer during the backpropagation process. This is actualized through the use of a mask matrix and a threshold to determine the proportion of weights to be frozen during backpropagation. Moreover, by setting the masked weights to zero, weight freezing can not only realize sparse connections in networks with a fully connected layer as the classifier but also function as an efficacious regularization method for fully connected layers. Through experiments involving three distinct ANN architectures and three widely recognized EEG datasets, we validate the potency of weight freezing. Our method significantly surpasses previous peak performances in classification accuracy across all examined datasets. Supplementary control experiments offer insights into performance differences pre and post weight freezing implementation and scrutinize the influence of the threshold in the weight freezing process. Our study underscores the superior efficacy of weight freezing compared to traditional fully connected networks for EEG feature classification tasks. With its proven effectiveness, this innovative approach holds substantial promise for contributing to future strides in EEG decoding research.
This paper presents a user interface designed to enable computer cursor control through hand detection and gesture classification. A comprehensive hand dataset comprising 6720 image samples was collected, encompassing four distinct classes: fist, palm, pointing to the left, and pointing to the right. The images were captured from 15 individuals in various settings, including simple backgrounds with different perspectives and lighting conditions. A convolutional neural network (CNN) was trained on this dataset to accurately predict labels for each captured image and measure their similarity. The system incorporates defined commands for cursor movement, left-click, and right-click actions. Experimental results indicate that the proposed algorithm achieves a remarkable accuracy of 91.88% and demonstrates its potential applicability across diverse backgrounds.
We present for the first time a complete solution to the problem of proving the correctness of a concurrency control algorithm for collaborative text editors against the standard consistency model. The success of our approach stems from the use of comprehensive stringwise operational transformations, which appear to have escaped a formal treatment until now. Because these transformations sometimes lead to an increase in the number of operations as they are transformed, we cannot use inductive methods and adopt the novel idea of decreasing diagrams instead. We also base our algorithm on a client-server model rather than a peer-to-peer one, which leads to the correct application of operational transformations to both newly generated and pending operations. And lastly we solve the problem of latency, so that our algorithm works perfectly in practice. The result of these innovations is the first ever formally correct concurrency control algorithm for collaborative text editors together with a fast, fault tolerant and highly scalable implementation.
Constraint Programming (CP) is a declarative programming paradigm that allows for modeling and solving combinatorial optimization problems, such as the Job-Shop Scheduling Problem (JSSP). While CP solvers manage to find optimal or near-optimal solutions for small instances, they do not scale well to large ones, i.e., they require long computation times or yield low-quality solutions. Therefore, real-world scheduling applications often resort to fast, handcrafted, priority-based dispatching heuristics to find a good initial solution and then refine it using optimization methods. This paper proposes a novel end-to-end approach to solving scheduling problems by means of CP and Reinforcement Learning (RL). In contrast to previous RL methods, tailored for a given problem by including procedural simulation algorithms, complex feature engineering, or handcrafted reward functions, our neural-network architecture and training algorithm merely require a generic CP encoding of some scheduling problem along with a set of small instances. Our approach leverages existing CP solvers to train an agent learning a Priority Dispatching Rule (PDR) that generalizes well to large instances, even from separate datasets. We evaluate our method on seven JSSP datasets from the literature, showing its ability to find higher-quality solutions for very large instances than obtained by static PDRs and by a CP solver within the same time limit.
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling. Our paper sets out to analyze DA methods in detail according to the above categories. Further, we also introduce their applications in NLP tasks as well as the challenges.
The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.