In this paper, we propose a multi-task representation learning framework to jointly estimate the identity, gender and age of individuals from their hand images for the purpose of criminal investigations since the hand images are often the only available information in cases of serious crime such as sexual abuse. We investigate different up-to-date deep learning architectures and compare their performance for joint estimation of identity, gender and age from hand images of perpetrators of serious crime. To simplify the age prediction, we create age groups for the age estimation. We make extensive evaluations and comparisons of both convolution-based and transformer-based deep learning architectures on a publicly available 11k hands dataset. Our experimental analysis shows that it is possible to efficiently estimate not only identity but also other attributes such as gender and age of suspects jointly from hand images for criminal investigations, which is crucial in assisting international police forces in the court to identify and convict abusers.
In this paper, we investigate the long-term memory learning capabilities of state-space models (SSMs) from the perspective of parameterization. We prove that state-space models without any reparameterization exhibit a memory limitation similar to that of traditional RNNs: the target relationships that can be stably approximated by state-space models must have an exponential decaying memory. Our analysis identifies this ``curse of memory'' as a result of the recurrent weights converging to a stability boundary, suggesting that a reparameterization technique can be effective. To this end, we introduce a class of reparameterization techniques for SSMs that effectively lift its memory limitations. Besides improving approximation capabilities, we further illustrate that a principled choice of reparameterization scheme can also enhance optimization stability. We validate our findings using synthetic datasets, language models and image classifications.
In this paper, we consider the design of data-driven predictive controllers for nonlinear systems from input-output data via linear-in-control input Koopman lifted models. Instead of identifying and simulating a Koopman model to predict future outputs, we design a subspace predictive controller in the Koopman space. This allows us to learn the observables minimizing the multi-step output prediction error of the Koopman subspace predictor, preventing the propagation of prediction errors. To avoid losing feasibility of our predictive control scheme due to prediction errors, we compute a terminal cost and terminal set in the Koopman space and we obtain recursive feasibility guarantees through an interpolated initial state. As a third contribution, we introduce a novel regularization cost yielding input-to-state stability guarantees with respect to the prediction error for the resulting closed-loop system. The performance of the developed Koopman data-driven predictive control methodology is illustrated on a nonlinear benchmark example from the literature.
In this paper, we present a novel approach that combines deep metric learning and synthetic data generation using diffusion models for out-of-distribution (OOD) detection. One popular approach for OOD detection is outlier exposure, where models are trained using a mixture of in-distribution (ID) samples and ``seen" OOD samples. For the OOD samples, the model is trained to minimize the KL divergence between the output probability and the uniform distribution while correctly classifying the in-distribution (ID) data. In this paper, we propose a label-mixup approach to generate synthetic OOD data using Denoising Diffusion Probabilistic Models (DDPMs). Additionally, we explore recent advancements in metric learning to train our models. In the experiments, we found that metric learning-based loss functions perform better than the softmax. Furthermore, the baseline models (including softmax, and metric learning) show a significant improvement when trained with the generated OOD data. Our approach outperforms strong baselines in conventional OOD detection metrics.
In this paper, we present a linear and reversible programming language with inductives types and recursion. The semantics of the languages is based on pattern-matching; we show how ensuring syntactical exhaustivity and non-overlapping of clauses is enough to ensure reversibility. The language allows to represent any Primitive Recursive Function. We then give a Curry-Howard correspondence with the logic $\mu$MALL: linear logic extended with least fixed points allowing inductive statements. The critical part of our work is to show how primitive recursion yields circular proofs that satisfy $\mu$MALL validity criterion and how the language simulates the cut-elimination procedure of $\mu$MALL.
In this paper, we develop an analytical approach for estimating brain connectivity networks that accounts for subject heterogeneity. More specifically, we consider a novel extension of a multi-subject Bayesian vector autoregressive model that estimates group-specific directed brain connectivity networks and accounts for the effects of covariates on the network edges. We adopt a flexible approach, allowing for (possibly) non-linear effects of the covariates on edge strength via a novel Bayesian nonparametric prior that employs a weighted mixture of Gaussian processes. For posterior inference, we achieve computational scalability by implementing a variational Bayes scheme. Our approach enables simultaneous estimation of group-specific networks and selection of relevant covariate effects. We show improved performance over competing two-stage approaches on simulated data. We apply our method on resting-state fMRI data from children with a history of traumatic brain injury and healthy controls to estimate the effects of age and sex on the group-level connectivities. Our results highlight differences in the distribution of parent nodes. They also suggest alteration in the relation of age, with peak edge strength in children with traumatic brain injury (TBI), and differences in effective connectivity strength between males and females.
In this paper, we study the performance of few-shot learning, specifically meta learning empowered few-shot relation networks, over supervised deep learning and conventional machine learning approaches in the problem of Sound Source Distance Estimation (SSDE). In previous research on deep supervised SSDE, low accuracies have often resulted from the mismatch between the training data (from known environments) and the test data (from unknown environments). By performing comparative experiments on a sufficient amount of data, we show that the few-shot relation network outperforms other competitors including eXtreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Convolutional Neural Network (CNN), and MultiLayer Perceptron (MLP). Hence it is possible to calibrate a microphone-equipped system, with a few labeled samples of audio recorded in a particular unknown environment to adjust and generalize our classifier to the possible input data and gain higher accuracies.
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.
Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursue good learning performance, human experts are heavily involved in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automated machine learning (AutoML) has emerged as a hot topic with both industrial and academic interest. In this paper, we provide an up to date survey on AutoML. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers most existing approaches to date but also can guide the design for new methods. Subsequently, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future research.
In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.