Causal disentanglement seeks a representation of data involving latent variables that relate to one another via a causal model. A representation is identifiable if both the latent model and the transformation from latent to observed variables are unique. In this paper, we study observed variables that are a linear transformation of a linear latent causal model. Data from interventions are necessary for identifiability: if one latent variable is missing an intervention, we show that there exist distinct models that cannot be distinguished. Conversely, we show that a single intervention on each latent variable is sufficient for identifiability. Our proof uses a generalization of the RQ decomposition of a matrix that replaces the usual orthogonal and upper triangular conditions with analogues depending on a partial order on the rows of the matrix, with partial order determined by a latent causal model. We corroborate our theoretical results with a method for causal disentanglement that accurately recovers a latent causal model.
In past decade, previous balanced datasets have been used to advance algorithms for classification, object detection, semantic segmentation, and anomaly detection in industrial applications. Specifically, for condition-based maintenance, automating visual inspection is crucial to ensure high quality. Deterioration prognostic attempts to optimize the fine decision process for predictive maintenance and proactive repair. In civil infrastructure and living environment, damage data mining cannot avoid the imbalanced data issue because of rare unseen events and high quality status by improved operations. For visual inspection, deteriorated class acquired from the surface of concrete and steel components are occasionally imbalanced. From numerous related surveys, we summarize that imbalanced data problems can be categorized into four types; 1) missing range of target and label valuables, 2) majority-minority class imbalance, 3) foreground-background of spatial imbalance, 4) long-tailed class of pixel-wise imbalance. Since 2015, there has been many imbalanced studies using deep learning approaches that includes regression, image classification, object detection, semantic segmentation. However, anomaly detection for imbalanced data is not yet well known. In the study, we highlight one-class anomaly detection application whether anomalous class or not, and demonstrate clear examples on imbalanced vision datasets: blood smear, lung infection, wooden, concrete deterioration, and disaster damage. We provide key results on damage vision mining advantage, hypothesizing that the more effective range of positive ratio, the higher accuracy gain of anomaly detection application. Finally, the applicability of the damage learning methods, limitations, and future works are mentioned.
Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respecting data privacy constraints. We use the speaker and chapter information in the unlabeled speech dataset, Libri-Light, to simulate non-IID speaker-siloed data distributions and pre-train an LSTM encoder with the Contrastive Predictive Coding framework with FedSGD. We show that the pre-trained ASR encoder in FL performs as well as a centrally pre-trained model and produces an improvement of 12-15% (WER) compared to no pre-training. We further adapt the federated pre-trained models to a new language, French, and show a 20% (WER) improvement over no pre-training.
Humans possess a remarkable ability to react to sudden and unpredictable perturbations through immediate mechanical responses, which harness the visco-elastic properties of muscles to perform auto-corrective movements to maintain balance. In this paper, we propose a novel design of a robotic leg inspired by this mechanism. We develop multi-material fibre jammed tendons, and demonstrate their use as passive compliant mechanisms to achieve variable joint stiffness and improve stability. Through numerical simulations and extensive experimentation, we demonstrate the ability for our system to achieve a wide range of potentially beneficial compliance regimes. We show the role and contribution of each tendon quantitatively by evaluating their individual force contribution in resisting rotational perturbations. We also perform walking experiments with programmed bioinspired gaits that varying the stiffness of the tendons throughout the gait cycle, demonstrating a stable and consistent behaviour. We show the potential of such systems when integrated into legged robots, where compliance and shock absorption can be provided entirely through the morphological properties of the leg.
Predictive variability due to data ambiguities has typically been addressed via construction of dedicated models with built-in probabilistic capabilities that are trained to predict uncertainty estimates as variables of interest. These approaches require distinct architectural components and training mechanisms, may include restrictive assumptions and exhibit overconfidence, i.e., high confidence in imprecise predictions. In this work, we propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity. The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions. It is architecture agnostic and can be applied to any feed-forward deterministic network without changes to the architecture or training procedure. Experiments on regression tasks on imaging and non-imaging input data show the method's ability to generate diverse and multi-modal predictive distributions, and a desirable correlation of the estimated uncertainty with the prediction error.
The principle of boosting in supervised learning involves combining multiple weak classifiers to obtain a stronger classifier. AdaBoost has the reputation to be a perfect example of this approach. We have previously shown that AdaBoost is not truly an optimization algorithm. This paper shows that AdaBoost is an algorithm in name only, as the resulting combination of weak classifiers can be explicitly calculated using a truth table. This study is carried out by considering a problem with two classes and is illustrated by the particular case of three binary classifiers and presents results in comparison with those from the implementation of AdaBoost algorithm of the Python library scikit-learn.
2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.
The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.
Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries to leverage any exposed information. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for causal learning, suggesting a tighter integration with SCM. To this effect we present a theoretical analysis from first principles that establishes a novel connection between GNN and SCM while providing an extended view on general neural-causal models. We then establish a new model class for GNN-based causal inference that is necessary and sufficient for causal effect identification. Our empirical illustration on simulations and standard benchmarks validate our theoretical proofs.
It is always well believed that modeling relationships between objects would be helpful for representing and eventually describing an image. Nevertheless, there has not been evidence in support of the idea on image description generation. In this paper, we introduce a new design to explore the connections between objects for image captioning under the umbrella of attention-based encoder-decoder framework. Specifically, we present Graph Convolutional Networks plus Long Short-Term Memory (dubbed as GCN-LSTM) architecture that novelly integrates both semantic and spatial object relationships into image encoder. Technically, we build graphs over the detected objects in an image based on their spatial and semantic connections. The representations of each region proposed on objects are then refined by leveraging graph structure through GCN. With the learnt region-level features, our GCN-LSTM capitalizes on LSTM-based captioning framework with attention mechanism for sentence generation. Extensive experiments are conducted on COCO image captioning dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, GCN-LSTM increases CIDEr-D performance from 120.1% to 128.7% on COCO testing set.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.