Despite the significant advances achieved in Artificial Neural Networks (ANNs), their design process remains notoriously tedious, depending primarily on intuition, experience and trial-and-error. This human-dependent process is often time-consuming and prone to errors. Furthermore, the models are generally bound to their training contexts, with no considerations to their surrounding environments. Continual adaptiveness and automation of neural networks is of paramount importance to several domains where model accessibility is limited after deployment (e.g IoT devices, self-driving vehicles, etc.). Additionally, even accessible models require frequent maintenance post-deployment to overcome issues such as Concept/Data Drift, which can be cumbersome and restrictive. By leveraging and combining approaches from Neural Architecture Search (NAS) and Continual Learning (CL), more robust and adaptive agents can be developed. This study conducts the first extensive review on the intersection between NAS and CL, formalizing the prospective Continually-Adaptive Neural Networks (CANNs) paradigm and outlining research directions for lifelong autonomous ANNs.
The Skolem problem is a long-standing open problem in linear dynamical systems: can a linear recurrence sequence (LRS) ever reach 0 from a given initial configuration? Similarly, the positivity problem asks whether the LRS stays positive from an initial configuration. Deciding Skolem (or positivity) has been open for half a century: the best known decidability results are for LRS with special properties (e.g., low order recurrences). But these problems are easier for ``uninitialized'' variants, where the initial configuration is not fixed but can vary arbitrarily: checking if there is an initial configuration from which the LRS stays positive can be decided in polynomial time (Tiwari in 2004, Braverman in 2006). In this paper, we consider problems that lie between the initialized and uninitialized variant. More precisely, we ask if 0 (resp. negative numbers) can be avoided from every initial configuration in a neighborhood of a given initial configuration. This can be considered as a robust variant of the Skolem (resp. positivity) problem. We show that these problems lie at the frontier of decidability: if the neighbourhood is given as part of the input, then robust Skolem and robust positivity are Diophantine hard, i.e., solving either would entail major breakthrough in Diophantine approximations, as happens for (non-robust) positivity. However, if one asks whether such a neighbourhood exists, then the problems turn out to be decidable with PSPACE complexity. Our techniques also allow us to tackle robustness for ultimate positivity, which asks whether there is a bound on the number of steps after which the LRS remains positive. There are two variants depending on whether we ask for a ``uniform'' bound on this number of steps. For the non-uniform variant, when the neighbourhood is open, the problem turns out to be tractable, even when the neighbourhood is given as input.
Transport engineers employ various interventions to enhance traffic-network performance. Quantifying the impacts of Cycle Superhighways is complicated due to the non-random assignment of such an intervention over the transport network. Treatment effects on asymmetric and heavy-tailed distributions are better reflected at extreme tails rather than at the median. We propose a novel method to estimate the treatment effect at extreme tails incorporating heavy-tailed features in the outcome distribution. The analysis of London transport data using the proposed method indicates that the extreme traffic flow increased substantially after Cycle Superhighways came into operation.
The recently proposed data augmentation TransMix employs attention labels to help visual transformers (ViT) achieve better robustness and performance. However, TransMix is deficient in two aspects: 1) The image cropping method of TransMix may not be suitable for ViTs. 2) At the early stage of training, the model produces unreliable attention maps. TransMix uses unreliable attention maps to compute mixed attention labels that can affect the model. To address the aforementioned issues, we propose MaskMix and Progressive Attention Labeling (PAL) in image and label space, respectively. In detail, from the perspective of image space, we design MaskMix, which mixes two images based on a patch-like grid mask. In particular, the size of each mask patch is adjustable and is a multiple of the image patch size, which ensures each image patch comes from only one image and contains more global contents. From the perspective of label space, we design PAL, which utilizes a progressive factor to dynamically re-weight the attention weights of the mixed attention label. Finally, we combine MaskMix and Progressive Attention Labeling as our new data augmentation method, named MixPro. The experimental results show that our method can improve various ViT-based models at scales on ImageNet classification (73.8\% top-1 accuracy based on DeiT-T for 300 epochs). After being pre-trained with MixPro on ImageNet, the ViT-based models also demonstrate better transferability to semantic segmentation, object detection, and instance segmentation. Furthermore, compared to TransMix, MixPro also shows stronger robustness on several benchmarks. The code is available at //github.com/fistyee/MixPro.
The combination of Visual Guidance and Extended Reality (XR) technology holds the potential to greatly improve the performance of human workforces in numerous areas, particularly industrial environments. Focusing on virtual assembly tasks and making use of different forms of supportive visualisations, this study investigates the potential of XR Visual Guidance. Set in a web-based immersive environment, our results draw from a heterogeneous pool of 199 participants. This research is designed to significantly differ from previous exploratory studies, which yielded conflicting results on user performance and associated human factors. Our results clearly show the advantages of XR Visual Guidance based on an over 50\% reduction in task completion times and mistakes made; this may further be enhanced and refined using specific frameworks and other forms of visualisations/Visual Guidance. Discussing the role of other factors, such as cognitive load, motivation, and usability, this paper also seeks to provide concrete avenues for future research and practical takeaways for practitioners.
Deep Attractor Network (DANet) is the state-of-the-art technique in speech separation field, which uses Bidirectional Long Short-Term Memory (BLSTM), but the complexity of the DANet model is very high. In this paper, a simplified and powerful DANet model is proposed using Bidirectional Gated neural network (BGRU) instead of BLSTM. The Gaussian Mixture Model (GMM) other than the k-means was applied in DANet as a clustering algorithm to reduce the complexity and increase the learning speed and accuracy. The metrics used in this paper are Signal to Distortion Ratio (SDR), Signal to Interference Ratio (SIR), Signal to Artifact Ratio (SAR), and Perceptual Evaluation Speech Quality (PESQ) score. Two speaker mixture datasets from TIMIT corpus were prepared to evaluate the proposed model, and the system achieved 12.3 dB and 2.94 for SDR and PESQ scores respectively, which were better than the original DANet model. Other improvements were 20.7% and 17.9% in the number of parameters and time training, respectively. The model was applied on mixed Arabic speech signals and the results were better than that in English.
This thesis delves into the intricate world of Deep Neural Networks (DNNs), focusing on the exciting concept of the Lottery Ticket Hypothesis (LTH). The LTH posits that within extensive DNNs, smaller, trainable subnetworks termed "winning tickets", can achieve performance comparable to the full model. A key process in LTH, Iterative Magnitude Pruning (IMP), incrementally eliminates minimal weights, emulating stepwise learning in DNNs. Once we identify these winning tickets, we further investigate their "universality". In other words, we check if a winning ticket that works well for one specific problem could also work well for other, similar problems. We also bridge the divide between the IMP and the Renormalisation Group (RG) theory in physics, promoting a more rigorous understanding of IMP.
Numerical methods for Inverse Kinematics (IK) employ iterative, linear approximations of the IK until the end-effector is brought from its initial pose to the desired final pose. These methods require the computation of the Jacobian of the Forward Kinematics (FK) and its inverse in the linear approximation of the IK. Despite all the successful implementations reported in the literature, Jacobian-based IK methods can still fail to preserve certain useful properties if an improper matrix inverse, e.g. Moore-Penrose (MP), is employed for incommensurate robotic systems. In this paper, we propose a systematic, robust and accurate numerical solution for the IK problem using the Mixed (MX) Generalized Inverse (GI) applied to any type of Jacobians (e.g., analytical, numerical or geometric) derived for any commensurate and incommensurate robot. This approach is robust to whether the system is under-determined (less than 6 DoF) or over-determined (more than 6 DoF). We investigate six robotics manipulators with various Degrees of Freedom (DoF) to demonstrate that commonly used GI's fail to guarantee the same system behaviors when the units are varied for incommensurate robotics manipulators. In addition, we evaluate the proposed methodology as a global IK solver and compare against well-known IK methods for redundant manipulators. Based on the experimental results, we conclude that the right choice of GI is crucial in preserving certain properties of the system (i.e. unit-consistency).
Wireless short-packet communications pose challenges to the security and reliability of the transmission. Besides, the proactive warder compounds these challenges, who detects and interferes with the potential transmission. An extra jamming channel is introduced by the proactive warder compared with the passive one, resulting in the inapplicability of analytical methods and results in exsiting works. Thus, effective system design schemes are required for short-packet communications against the proactive warder. To address this issue, we consider the analysis and design of covert and reliable transmissions for above systems. Specifically, to investigate the reliable and covert performance of the system, detection error probability at the warder and decoding error probability at the receiver are derived, which is affected by both the transmit power and the jamming power. Furthermore, to maximize the effective throughput, an optimization framework is proposed under reliability and covertness constraints. Numerical results verify the accuracy of analytical results and the feasibility of the optimization framework. It is shown that the tradeoff between transmission reliability and covertness is changed by the proactive warder compared with the passive one. Besides, it is shown that longer blocklength is always beneficial to improve the throughput for systems with optimized transmission rates. But when transmission rates are fixed, the blocklength should be carefully designed since the maximum one is not optimal in this case.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
Automatic KB completion for commonsense knowledge graphs (e.g., ATOMIC and ConceptNet) poses unique challenges compared to the much studied conventional knowledge bases (e.g., Freebase). Commonsense knowledge graphs use free-form text to represent nodes, resulting in orders of magnitude more nodes compared to conventional KBs (18x more nodes in ATOMIC compared to Freebase (FB15K-237)). Importantly, this implies significantly sparser graph structures - a major challenge for existing KB completion methods that assume densely connected graphs over a relatively smaller set of nodes. In this paper, we present novel KB completion models that can address these challenges by exploiting the structural and semantic context of nodes. Specifically, we investigate two key ideas: (1) learning from local graph structure, using graph convolutional networks and automatic graph densification and (2) transfer learning from pre-trained language models to knowledge graphs for enhanced contextual representation of knowledge. We describe our method to incorporate information from both these sources in a joint model and provide the first empirical results for KB completion on ATOMIC and evaluation with ranking metrics on ConceptNet. Our results demonstrate the effectiveness of language model representations in boosting link prediction performance and the advantages of learning from local graph structure (+1.5 points in MRR for ConceptNet) when training on subgraphs for computational efficiency. Further analysis on model predictions shines light on the types of commonsense knowledge that language models capture well.