The analysis of Synthetic Aperture Radar (SAR) imagery is an important step in remote sensing applications, and it is a challenging problem due to its inherent speckle noise. One typical solution is to model the data using the $G_I^0$ distribution and extract its roughness information, which in turn can be used in posterior imaging tasks, such as segmentation, classification and interpretation. This leads to the need of quick and reliable estimation of the roughness parameter from SAR data, especially with high resolution images. Unfortunately, traditional parameter estimation procedures are slow and prone to estimation failures. In this work, we proposed a neural network-based estimation framework that first learns how to predict underlying parameters of $G_I^0$ samples and then can be used to estimate the roughness of unseen data. We show that this approach leads to an estimator that is quicker, yields less estimation error and is less prone to failures than the traditional estimation procedures for this problem, even when we use a simple network. More importantly, we show that this same methodology can be generalized to handle image inputs and, even if trained on purely synthetic data for a few seconds, is able to perform real time pixel-wise roughness estimation for high resolution real SAR imagery.
We propose a framework for the automatic one-shot segmentation of synthetic images generated by a StyleGAN. Our framework is based on the observation that the multi-scale hidden features in the GAN generator hold useful semantic information that can be utilized for automatic on-the-fly segmentation of the generated images. Using these features, our framework learns to segment synthetic images using a self-supervised contrastive clustering algorithm that projects the hidden features into a compact space for per-pixel classification. This contrastive learner is based on using a novel data augmentation strategy and a pixel-wise swapped prediction loss that leads to faster learning of the feature vectors for one-shot segmentation. We have tested our implementation on five standard benchmarks to yield a segmentation performance that not only outperforms the semi-supervised baselines by an average wIoU margin of 1.02 % but also improves the inference speeds by a factor of 4.5. Finally, we also show the results of using the proposed one-shot learner in implementing BagGAN, a framework for producing annotated synthetic baggage X-ray scans for threat detection. This framework was trained and tested on the PIDRay baggage benchmark to yield a performance comparable to its baseline segmenter based on manual annotations.
Thanks to their ease of implementation, multilayer perceptrons (MLPs) have become ubiquitous in deep learning applications. The graph underlying an MLP is indeed multipartite, i.e. each layer of neurons only connects to neurons belonging to the adjacent layer. In contrast, in vivo brain connectomes at the level of individual synapses suggest that biological neuronal networks are characterized by scale-free degree distributions or exponentially truncated power law strength distributions, hinting at potentially novel avenues for the exploitation of evolution-derived neuronal networks. In this paper, we present ``4Ward'', a method and Python library capable of generating flexible and efficient neural networks (NNs) from arbitrarily complex directed acyclic graphs. 4Ward is inspired by layering algorithms drawn from the graph drawing discipline to implement efficient forward passes, and provides significant time gains in computational experiments with various Erd\H{o}s-R\'enyi graphs. 4Ward not only overcomes the sequential nature of the learning matrix method, by parallelizing the computation of activations, but also addresses the scalability issues encountered in the current state-of-the-art and provides the designer with freedom to customize weight initialization and activation functions. Our algorithm can be of aid for any investigator seeking to exploit complex topologies in a NN design framework at the microscale.
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing. The SSL model is normally pre-trained on a great variety of unlabelled data and a large model size is preferred to increase the modeling capacity. However, this might limit its potential applications due to the expensive computation and memory costs introduced by the oversize model. Miniaturization for SSL models has become an important research direction of practical value. To this end, we explore the effective distillation of HuBERT-based SSL models for automatic speech recognition (ASR). First, in order to establish a strong baseline, a comprehensive study on different student model structures is conducted. On top of this, as a supplement to the regression loss widely adopted in previous works, a discriminative loss is introduced for HuBERT to enhance the distillation performance, especially in low-resource scenarios. In addition, we design a simple and effective algorithm to distill the front-end input from waveform to Fbank feature, resulting in 17% parameter reduction and doubling inference speed, at marginal performance degradation.
Among the wide variety of evolutionary computing models, Finite State Machines (FSMs) have several attractions for fundamental research. They are easy to understand in concept and can be visualised clearly in simple cases. They have a ready fitness criterion through their relationship with Regular Languages. They have also been shown to be tractably evolvable, even up to exhibiting evidence of open-ended evolution in specific scenarios. In addition to theoretical attraction, they also have industrial applications, as a paradigm of both automated and user-initiated control. Improving the understanding of the factors affecting FSM evolution has relevance to both computer science and practical optimisation of control. We investigate an evolutionary scenario of FSMs adapting to recognise one of a family of Regular Languages by categorising positive and negative samples, while also being under a counteracting selection pressure that favours fewer states. The results appear to indicate that longer strings provided as samples reduce the speed of fitness gain, when fitness is measured against a fixed number of sample strings. We draw the inference that additional information from longer strings is not sufficient to compensate for sparser coverage of the combinatorial space of positive and negative sample strings.
Physics Informed Neural Networks (PINNs) represent the intersection between physics-based modeling and deep learning, but successfully training PINNs in 3D for highly nonlinear PDEs on complex domains remains a challenging task. In this paper, PINNs are used to solve the 3D incompressible Navier-Stokes (NS) equations at high Reynolds numbers for complex geometries, using very sparsely distributed solution data in the domain. The effect of the amount of data provided and the PDE-based regularizers are investigated. Additionally, hybrid data-PINNs are used to create surrogate models to solve a realistic flow-thermal electronics design problem in near real-time, and it is found that the hybrid data-PINNs consistently outperform standard data-driven neural networks when tested on unseen query points. The findings of the paper show how PINNs can be effective when used in conjunction with sparse data for solving 3D nonlinear PDEs or for surrogate modeling of design spaces governed by them.
We address the task of automatically scoring the competency of candidates based on textual features, from the automatic speech recognition (ASR) transcriptions in the asynchronous video job interview (AVI). The key challenge is how to construct the dependency relation between questions and answers, and conduct the semantic level interaction for each question-answer (QA) pair. However, most of the recent studies in AVI focus on how to represent questions and answers better, but ignore the dependency information and interaction between them, which is critical for QA evaluation. In this work, we propose a Hierarchical Reasoning Graph Neural Network (HRGNN) for the automatic assessment of question-answer pairs. Specifically, we construct a sentence-level relational graph neural network to capture the dependency information of sentences in or between the question and the answer. Based on these graphs, we employ a semantic-level reasoning graph attention network to model the interaction states of the current QA session. Finally, we propose a gated recurrent unit encoder to represent the temporal question-answer pairs for the final prediction. Empirical results conducted on CHNAT (a real-world dataset) validate that our proposed model significantly outperforms text-matching based benchmark models. Ablation studies and experimental results with 10 random seeds also show the effectiveness and stability of our models.
Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.
Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.
Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree of over-segmentation produced. It still remains a challenge to properly select such parameters for human-like perceptual grouping. In this work, we exploit the diversity of segments produced by different choices of parameters. We scan the segmentation parameter space and generate a collection of image segmentation hypotheses (from highly over-segmented to under-segmented). These are fed into a cost minimization framework that produces the final segmentation by selecting segments that: (1) better describe the natural contours of the image, and (2) are more stable and persistent among all the segmentation hypotheses. We compare our algorithm's performance with state-of-the-art algorithms, showing that we can achieve improved results. We also show that our framework is robust to the choice of segmentation kernel that produces the initial set of hypotheses.
Image segmentation is considered to be one of the critical tasks in hyperspectral remote sensing image processing. Recently, convolutional neural network (CNN) has established itself as a powerful model in segmentation and classification by demonstrating excellent performances. The use of a graphical model such as a conditional random field (CRF) contributes further in capturing contextual information and thus improving the segmentation performance. In this paper, we propose a method to segment hyperspectral images by considering both spectral and spatial information via a combined framework consisting of CNN and CRF. We use multiple spectral cubes to learn deep features using CNN, and then formulate deep CRF with CNN-based unary and pairwise potential functions to effectively extract the semantic correlations between patches consisting of three-dimensional data cubes. Effective piecewise training is applied in order to avoid the computationally expensive iterative CRF inference. Furthermore, we introduce a deep deconvolution network that improves the segmentation masks. We also introduce a new dataset and experimented our proposed method on it along with several widely adopted benchmark datasets to evaluate the effectiveness of our method. By comparing our results with those from several state-of-the-art models, we show the promising potential of our method.