The current dominated wearable body motion sensor is IMU. This work presented an alternative wearable motion-sensing approach: human body capacitance (HBC, also commonly defined as body-area electric field). While being less robust in tracking the posture and trajectory, HBC has two properties that make it an attractive. First, the deployment of the sensing node on the being tracked body part is not a requirement for HBC sensing approach. Second, HBC is sensitive to the body's interaction with its surroundings, including both touching and being in the immediate proximity of people and objects. We first described the sensing principle for HBC, sensor architecture and implementation, and methods for evaluation. We then presented two case studies demonstrating the usefulness of HBC as a complement/alternative to IMUs. First, we explored the exercise recognition and repetition counting of seven machine-free leg-only exercises and eleven general gym workouts with the signal source of HBC and IMU. The HBC sensing shows significant advantages over the IMU signals in classification(0.89 vs 0.78 in F-score) and counting(0.982 vs 0.938 in accuracy) of the leg-only exercises. For the general gym workouts, HBC only shows recognition improvement for certain workouts like adductor where legs alone complete the movement. And it also supplies better results over the IMU for workouts counting(0.800 vs. 0.756 when wearing the sensors on the wrist). In the second case, we tried to recognize actions related to manipulating objects and physical collaboration between users by using a wrist-worn HBC sensing unit. We detected collaboration between the users with 0.69 F-score when receiving data from a single user and 0.78 when receiving data from both users. The capacitive sensor can improve the recognition of collaborative activities with an F-score over a single wrist accelerometer approach by 16\%.
Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assisted by these solutions, we observe that neurons in ViTs trained with language model supervision (e.g., CLIP) are activated by semantic concepts rather than visual features. We also explore the underlying differences between ViTs and CNNs, and we find that transformers detect image background features, just like their convolutional counterparts, but their predictions depend far less on high-frequency information. On the other hand, both architecture types behave similarly in the way features progress from abstract patterns in early layers to concrete objects in late layers. In addition, we show that ViTs maintain spatial information in all layers except the final layer. In contrast to previous works, we show that the last layer most likely discards the spatial information and behaves as a learned global pooling operation. Finally, we conduct large-scale visualizations on a wide range of ViT variants, including DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method.
Currently, mobile robots are developing rapidly and are finding numerous applications in the industry. However, several problems remain related to their practical use, such as the need for expensive hardware and high power consumption levels. In this study, we build a low-cost indoor mobile robot platform that does not include a LiDAR or a GPU. Then, we design an autonomous navigation architecture that guarantees real-time performance on our platform with an RGB-D camera and a low-end off-the-shelf single board computer. The overall system includes SLAM, global path planning, ground segmentation, and motion planning. The proposed ground segmentation approach extracts a traversability map from raw depth images for the safe driving of low-body mobile robots. We apply both rule-based and learning-based navigation policies using the traversability map. Running sensor data processing and other autonomous driving components simultaneously, our navigation policies perform rapidly at a refresh rate of 18 Hz for control command, whereas other systems have slower refresh rates. Our methods show better performances than current state-of-the-art navigation approaches within limited computation resources as shown in 3D simulation tests. In addition, we demonstrate the applicability of our mobile robot system through successful autonomous driving in an indoor environment.
A critical problem in post hoc explainability is the lack of a common foundational goal among methods. For example, some methods are motivated by function approximation, some by game theoretic notions, and some by obtaining clean visualizations. This fragmentation of goals causes not only an inconsistent conceptual understanding of explanations but also the practical challenge of not knowing which method to use when. In this work, we begin to address these challenges by unifying eight popular post hoc explanation methods (LIME, C-LIME, SHAP, Occlusion, Vanilla Gradients, Gradients x Input, SmoothGrad, and Integrated Gradients). We show that these methods all perform local function approximation of the black-box model, differing only in the neighbourhood and loss function used to perform the approximation. This unification enables us to (1) state a no free lunch theorem for explanation methods which demonstrates that no single method can perform optimally across all neighbourhoods, and (2) provide a guiding principle to choose among methods based on faithfulness to the black-box model. We empirically validate these theoretical results using various real-world datasets, model classes, and prediction tasks. By bringing diverse explanation methods into a common framework, this work (1) advances the conceptual understanding of these methods, revealing their shared local function approximation objective, properties, and relation to one another, and (2) guides the use of these methods in practice, providing a principled approach to choose among methods and paving the way for the creation of new ones.
In this essay, I present the advantages and, I dare say, the beauty of programming in a language with set-theoretic types, that is, types that include union, intersection, and negation type connectives. I show by several examples how set-theoretic types are necessary to type some common programming patterns, but also how they play a key role in typing several language constructs-from branching and pattern matching to function overloading and type-cases-very precisely. I start by presenting the theory of types known as semantic subtyping and extend it to include polymorphic types. Next, I discuss the design of languages that use these types. I start by defining a theoretical framework that covers all the examples given in the first part of the presentation. Since the system of the framework cannot be effectively implemented, I then describe three effective restrictions of this system: (i) a polymorphic language with explicitly-typed functions, (ii) an implicitly-typed polymorphic language \`a la Hindley-Milner, and (iii) a monomorphic language that, by implementing classic union-elimination, precisely reconstructs intersection types for functions and implements a very general form of occurrence typing. I conclude the presentation with a short overview of other aspects of these languages, such as pattern matching, gradual typing, and denotational semantics.
Citation counts are widely used as indicators of research quality to support or replace human peer review and for lists of top cited papers, researchers, and institutions. Nevertheless, the extent to which citation counts reflect research quality is not well understood. We report the largest-scale evaluation of the relationship between research quality and citation counts, correlating them for 87,739 journal articles in 34 field-based Units of Assessment (UoAs) from the UK. We show that the two correlate positively in all academic fields examined, from very weak (0.1) to strong (0.5). The highest correlations are in health, life sciences and physical sciences and the lowest are in the arts and humanities. The patterns are similar for the field classification schemes of Scopus and Dimensions.ai. We also show that there is no citation threshold in any field beyond which all articles are excellent quality, so lists of top cited articles are not definitive collections of excellence. Moreover, log transformed citation counts have a close to linear relationship with UK research quality ranked scores that is shallow in some fields but steep in others. In conclusion, whilst appropriately field normalised citations associate positively with research quality in all fields, they never perfectly reflect it, even at very high values.
As an important contributor to GDP growth, the construction industry is suffering from labor shortage due to population ageing, COVID-19 pandemic, and harsh environments. Considering the complexity and dynamics of construction environment, it is still challenging to develop fully automated robots. For a long time in the future, workers and robots will coexist and collaborate with each other to build or maintain a facility efficiently. As an emerging field, human-robot collaboration (HRC) still faces various open problems. To this end, this pioneer research introduces an agent-based modeling approach to investigate the coupling effect and scale effect of HRC in the bricklaying process. With multiple experiments based on simulation, the dynamic and complex nature of HRC is illustrated in two folds: 1) agents in HRC are interdependent due to human factors of workers, features of robots, and their collaboration behaviors; 2) different parameters of HRC are correlated and have significant impacts on construction productivity (CP). Accidentally and interestingly, it is discovered that HRC has a scale effect on CP, which means increasing the number of collaborated human-robot teams will lead to higher CP even if the human-robot ratio keeps unchanged. Overall, it is argued that more investigations in HRC are needed for efficient construction, occupational safety, etc.; and this research can be taken as a stepstone for developing and evaluating new robots, optimizing HRC processes, and even training future industrial workers in the construction industry.
Graph neural networks (GNNs) are widely used for modeling complex interactions between entities represented as vertices of a graph. Despite recent efforts to theoretically analyze the expressive power of GNNs, a formal characterization of their ability to model interactions is lacking. The current paper aims to address this gap. Formalizing strength of interactions through an established measure known as separation rank, we quantify the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i.e. between sides of a given partition of input vertices. Our results reveal that the ability to model interaction is primarily determined by the partition's walk index -- a graph-theoretical characteristic that we define by the number of walks originating from the boundary of the partition. Experiments with common GNN architectures corroborate this finding. As a practical application of our theory, we design an edge sparsification algorithm named Walk Index Sparsification (WIS), which preserves the ability of a GNN to model interactions when input edges are removed. WIS is simple, computationally efficient, and markedly outperforms alternative methods in terms of induced prediction accuracy. More broadly, it showcases the potential of improving GNNs by theoretically analyzing the interactions they can model.
We propose a monitoring strategy for efficient and robust estimation of disease prevalence and case numbers within closed and enumerated populations such as schools, workplaces, or retirement communities. The proposed design relies largely on voluntary testing, notoriously biased (e.g., in the case of COVID-19) due to non-representative sampling. The approach yields unbiased and comparatively precise estimates with no assumptions about factors underlying selection of individuals for voluntary testing, building on the strength of what can be a small random sampling component. This component unlocks a previously proposed "anchor stream" estimator, a well-calibrated alternative to classical capture-recapture (CRC) estimators based on two data streams. We show here that this estimator is equivalent to a direct standardization based on "capture", i.e., selection (or not) by the voluntary testing program, made possible by means of a key parameter identified by design. This equivalency simultaneously allows for novel two-stream CRC-like estimation of general means (e.g., of continuous variables such as antibody levels or biomarkers). For inference, we propose adaptations of a Bayesian credible interval when estimating case counts and bootstrapping when estimating means of continuous variables. We use simulations to demonstrate significant precision benefits relative to random sampling alone.
In this paper, we propose an end-to-end framework that jointly learns keypoint detection, descriptor representation and cross-frame matching for the task of image-based 3D localization. Prior art has tackled each of these components individually, purportedly aiming to alleviate difficulties in effectively train a holistic network. We design a self-supervised image warping correspondence loss for both feature detection and matching, a weakly-supervised epipolar constraints loss on relative camera pose learning, and a directional matching scheme that detects key-point features in a source image and performs coarse-to-fine correspondence search on the target image. We leverage this framework to enforce cycle consistency in our matching module. In addition, we propose a new loss to robustly handle both definite inlier/outlier matches and less-certain matches. The integration of these learning mechanisms enables end-to-end training of a single network performing all three localization components. Bench-marking our approach on public data-sets, exemplifies how such an end-to-end framework is able to yield more accurate localization that out-performs both traditional methods as well as state-of-the-art weakly supervised methods.
Deep neural network architectures have traditionally been designed and explored with human expertise in a long-lasting trial-and-error process. This process requires huge amount of time, expertise, and resources. To address this tedious problem, we propose a novel algorithm to optimally find hyperparameters of a deep network architecture automatically. We specifically focus on designing neural architectures for medical image segmentation task. Our proposed method is based on a policy gradient reinforcement learning for which the reward function is assigned a segmentation evaluation utility (i.e., dice index). We show the efficacy of the proposed method with its low computational cost in comparison with the state-of-the-art medical image segmentation networks. We also present a new architecture design, a densely connected encoder-decoder CNN, as a strong baseline architecture to apply the proposed hyperparameter search algorithm. We apply the proposed algorithm to each layer of the baseline architectures. As an application, we train the proposed system on cine cardiac MR images from Automated Cardiac Diagnosis Challenge (ACDC) MICCAI 2017. Starting from a baseline segmentation architecture, the resulting network architecture obtains the state-of-the-art results in accuracy without performing any trial-and-error based architecture design approaches or close supervision of the hyperparameters changes.