The physical and textural attributes of objects have been widely studied for recognition, detection and segmentation tasks in computer vision.~A number of datasets, such as large scale ImageNet, have been proposed for feature learning using data hungry deep neural networks and for hand-crafted feature extraction. To intelligently interact with objects, robots and intelligent machines need the ability to infer beyond the traditional physical/textural attributes, and understand/learn visual cues, called visual affordances, for affordance recognition, detection and segmentation. To date there is no publicly available large dataset for visual affordance understanding and learning. In this paper, we introduce a large scale multi-view RGBD visual affordance learning dataset, a benchmark of 47210 RGBD images from 37 object categories, annotated with 15 visual affordance categories. To the best of our knowledge, this is the first ever and the largest multi-view RGBD visual affordance learning dataset. We benchmark the proposed dataset for affordance segmentation and recognition tasks using popular Vision Transformer and Convolutional Neural Networks. Several state-of-the-art deep learning networks are evaluated each for affordance recognition and segmentation tasks. Our experimental results showcase the challenging nature of the dataset and present definite prospects for new and robust affordance learning algorithms. The dataset is publicly available at //sites.google.com/view/afaqshah/dataset.
In an era where scientific experiments can be very costly, multi-fidelity emulators provide a useful tool for cost-efficient predictive scientific computing. For scientific applications, the experimenter is often limited by a tight computational budget, and thus wishes to (i) maximize predictive power of the multi-fidelity emulator via a careful design of experiments, and (ii) ensure this model achieves a desired error tolerance with some notion of confidence. Existing design methods, however, do not jointly tackle objectives (i) and (ii). We propose a novel stacking design approach that addresses both goals. A multi-level reproducing kernel Hilbert space (RKHS) interpolator is first introduced to build the emulator, under which our stacking design provides a sequential approach for designing multi-fidelity runs such that a desired prediction error of $\epsilon > 0$ is met under regularity assumptions. We then prove a novel cost complexity theorem that, under this multi-level interpolator, establishes a bound on the computation cost (for training data simulation) needed to achieve a prediction bound of $\epsilon$. This result provides novel insights on conditions under which the proposed multi-fidelity approach improves upon a conventional RKHS interpolator which relies on a single fidelity level. Finally, we demonstrate the effectiveness of stacking designs in a suite of simulation experiments and an application to finite element analysis.
Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. {While classical results only concern their marginal distribution, we show that their joint distribution follows a P\'olya urn model, and establish a concentration inequality for their empirical distribution function.} The results hold for arbitrary exchangeable scores, including {\it adaptive} ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.
We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.
This proposed model introduces novel deep learning methodologies. The objective here is to create a reliable intrusion detection mechanism to help identify malicious attacks. Deep learning based solution framework is developed consisting of three approaches. The first approach is Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) with seven optimizer functions such as adamax, SGD, adagrad, adam, RMSprop, nadam and adadelta. The model is evaluated on NSL-KDD dataset and classified multi attack classification. The model has outperformed with adamax optimizer in terms of accuracy, detection rate and low false alarm rate. The results of LSTM-RNN with adamax optimizer is compared with existing shallow machine and deep learning models in terms of accuracy, detection rate and low false alarm rate. The multi model methodology consisting of Recurrent Neural Network (RNN), Long-Short Term Memory Recurrent Neural Network (LSTM-RNN), and Deep Neural Network (DNN). The multi models are evaluated on bench mark datasets such as KDD99, NSL-KDD, and UNSWNB15 datasets. The models self-learnt the features and classifies the attack classes as multi-attack classification. The models RNN, and LSTM-RNN provide considerable performance compared to other existing methods on KDD99 and NSL-KDD dataset
This note introduces an unsupervised learning algorithm to debug errors in finite element (FE) simulation models and details how it was productionised. The algorithm clusters degrees of freedom in the FE model using numerical properties of the adjacency of its stiffness matrix. The algorithm has been deployed as a tool called `Model Stability Analysis' tool within the commercial structural FE suite Oasys GSA (www.oasys-software.com/gsa). It has been used successfully by end-users for debugging real world FE models and we present examples of the tool in action.
Feature attribution is a fundamental task in both machine learning and data analysis, which involves determining the contribution of individual features or variables to a model's output. This process helps identify the most important features for predicting an outcome. The history of feature attribution methods can be traced back to General Additive Models (GAMs), which extend linear regression models by incorporating non-linear relationships between dependent and independent variables. In recent years, gradient-based methods and surrogate models have been applied to unravel complex Artificial Intelligence (AI) systems, but these methods have limitations. GAMs tend to achieve lower accuracy, gradient-based methods can be difficult to interpret, and surrogate models often suffer from stability and fidelity issues. Furthermore, most existing methods do not consider users' contexts, which can significantly influence their preferences. To address these limitations and advance the current state-of-the-art, we define a novel feature attribution framework called Context-Aware Feature Attribution Through Argumentation (CA-FATA). Our framework harnesses the power of argumentation by treating each feature as an argument that can either support, attack or neutralize a prediction. Additionally, CA-FATA formulates feature attribution as an argumentation procedure, and each computation has explicit semantics, which makes it inherently interpretable. CA-FATA also easily integrates side information, such as users' contexts, resulting in more accurate predictions.
In variable selection, a selection rule that prescribes the permissible sets of selected variables (called a "selection dictionary") is desirable due to the inherent structural constraints among the candidate variables. Such selection rules can be complex in real-world data analyses, and failing to incorporate such restrictions could not only compromise the interpretability of the model but also lead to decreased prediction accuracy. However, no general framework has been proposed to formalize selection rules and their applications, which poses a significant challenge for practitioners seeking to integrate these rules into their analyses. In this work, we establish a framework for structured variable selection that can incorporate universal structural constraints. We develop a mathematical language for constructing arbitrary selection rules, where the selection dictionary is formally defined. We demonstrate that all selection rules can be expressed as combinations of operations on constructs, facilitating the identification of the corresponding selection dictionary. Once this selection dictionary is derived, practitioners can apply their own user-defined criteria to select the optimal model. Additionally, our framework enhances existing penalized regression methods for variable selection by providing guidance on how to appropriately group variables to achieve the desired selection rule. Furthermore, our innovative framework opens the door to establishing new l0 norm-based penalized regression techniques that can be tailored to respect arbitrary selection rules, thereby expanding the possibilities for more robust and tailored model development.
Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to eliminate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle properties of the optimal subsample size and provide an in-depth comparison between different bagging variants.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.