This article argues that the Situation theory and the Channel theory can be used as a general framework for Imperfect Information Management. Different kinds of imperfections are uncertainty, imprecision, vagueness, incompleteness, inconsistency, and context-dependency which can be handled pretty well by our brain. Basic approaches like probability theory and standard logic are intrinsically inefficient in modeling fallacious minds. The generalized probability and nonstandard logic theories have epistemological motivations to provide better models for information integration in cognitive agents. Among many models of them, possibility theory and probabilistic logic theory are the best approaches. I argue, based on a review of different approaches to Imperfect Information Management, that a good framework for it is the Situation theory of Barwise and the Channel theory of Barwise-Seligman. These theories have relied on a powerful and unique epistemological-based notion of information to refer to partiality. These frameworks have a proper approach for context modeling to handle common knowledge and incomplete information. Also, they distinguish belief from knowledge clearly to model the non-monotonic and dynamic nature of knowledge. They discern the logic of the world from information flow in the mind. The objectification process in these theories reveals to us the nature of default or probabilistic rules in perceptions. The concept of the channel can be used to represent those types of reasoning mechanisms that move from one model or logic to another one. The imprecision in our perceptions causes fuzziness in reasoning and vagueness in communication that can be represented by some suitable classifications connected by some channels. This new framework like a network framework can provide a scalable and open framework to cover different models based on a relativistic notion of truth.
In this work we introduce KERNELIZED TRANSFORMER, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space complexity of Transformers from quadratic to linear. We show that KERNELIZED TRANSFORMERS achieve performance comparable to existing efficient Transformer architectures, both in terms of accuracy as well as computational efficiency. Our study also demonstrates that the choice of the kernel has a substantial impact on performance, and kernel learning variants are competitive alternatives to fixed kernel Transformers, both in long as well as short sequence tasks.
Heterogeneity is a dominant factor in the behaviour of many biological processes. Despite this, it is common for mathematical and statistical analyses to ignore biological heterogeneity as a source of variability in experimental data. Therefore, methods for exploring the identifiability of models that explicitly incorporate heterogeneity through variability in model parameters are relatively underdeveloped. We develop a new likelihood-based framework, based on moment matching, for inference and identifiability analysis of differential equation models that capture biological heterogeneity through parameters that vary according to probability distributions. As our novel method is based on an approximate likelihood function, it is highly flexible; we demonstrate identifiability analysis using both a frequentist approach based on profile likelihood, and a Bayesian approach based on Markov-chain Monte Carlo. Through three case studies, we demonstrate our method by providing a didactic guide to inference and identifiability analysis of hyperparameters that relate to the statistical moments of model parameters from independent observed data. Our approach has a computational cost comparable to analysis of models that neglect heterogeneity, a significant improvement over many existing alternatives. We demonstrate how analysis of random parameter models can aid better understanding of the sources of heterogeneity from biological data.
Providing security guarantees for embedded devices with limited interface capabilities is an increasingly crucial task. Although these devices don't have traditional interfaces, they still generate unintentional electromagnetic signals that correlate with the instructions being executed. By collecting these traces using our methodology and leveraging a random forest algorithm to develop a machine learning model, we built an EM side channel based instruction level disassembler. The disassembler was tested on an Arduino UNO board, yielding an accuracy of 88.69% instruction recognition for traces from twelve instructions captured at a single location in the device; this is an improvement compared to the 75.6% (for twenty instructions) reported in previous similar work.
Several types of dependencies have been proposed for the static analysis of existential rule ontologies, promising insights about computational properties and possible practical uses of a given set of rules, e.g., in ontology-based query answering. Unfortunately, these dependencies are rarely implemented, so their potential is hardly realised in practice. We focus on two kinds of rule dependencies -- positive reliances and restraints -- and design and implement optimised algorithms for their efficient computation. Experiments on real-world ontologies of up to more than 100,000 rules show the scalability of our approach, which lets us realise several previously proposed applications as practical case studies. In particular, we can analyse to what extent rule-based bottom-up approaches of reasoning can be guaranteed to yield redundancy-free "lean" knowledge graphs (so-called cores) on practical ontologies.
In popular media, there is often a connection drawn between the advent of awareness in artificial agents and those same agents simultaneously achieving human or superhuman level intelligence. In this work, we explore the validity and potential application of this seemingly intuitive link between consciousness and intelligence. We do so by examining the cognitive abilities associated with three contemporary theories of conscious function: Global Workspace Theory (GWT), Information Generation Theory (IGT), and Attention Schema Theory (AST). We find that all three theories specifically relate conscious function to some aspect of domain-general intelligence in humans. With this insight, we turn to the field of Artificial Intelligence (AI) and find that, while still far from demonstrating general intelligence, many state-of-the-art deep learning methods have begun to incorporate key aspects of each of the three functional theories. Having identified this trend, we use the motivating example of mental time travel in humans to propose ways in which insights from each of the three theories may be combined into a single unified and implementable model. Given that it is made possible by cognitive abilities underlying each of the three functional theories, artificial agents capable of mental time travel would not only possess greater general intelligence than current approaches, but also be more consistent with our current understanding of the functional role of consciousness in humans, thus making it a promising near-term goal for AI research.
To solve the problem that the information volume defined on the basis of information entropy is not suitable for the development and practical application of information system, this paper constructs simple models to explain the definition of objective information based on the objective information theory. This paper analyzes and discusses three important deconstructions of the sextuple model of information, namely subject, existing time and content of information, and proves that the information volume calculation formula expressed by the information entropy is only a special case of the definition of the information volume based on the objective information theory. According to the combination and decomposition properties of information, the information volume effect under various combination conditions is studied. The volume additivity of atomic information combination is proposed and proved, which provides an important basis for the study of the volume of quantum information. Besides, the information volume of a single quantum carrier is proved to be based on the Margolus-Levitin theorem, and the relationship between information volume and matter, energy and time is proved according to Einstein's mass-energy conversion formula. The information volume that the universe may carry to date is estimated and verified, which shows that the definition of information volume based on objective information theory can reveal the relationship among the three elements of the objective world more accurately and has greater application significance.
We examine federated learning (FL) with over-the-air (OTA) aggregation, where mobile users (MUs) aim to reach a consensus on a global model with the help of a parameter server (PS) that aggregates the local gradients. In OTA FL, MUs train their models using local data at every training round and transmit their gradients simultaneously using the same frequency band in an uncoded fashion. Based on the received signal of the superposed gradients, the PS performs a global model update. While the OTA FL has a significantly decreased communication cost, it is susceptible to adverse channel effects and noise. Employing multiple antennas at the receiver side can reduce these effects, yet the path-loss is still a limiting factor for users located far away from the PS. To ameliorate this issue, in this paper, we propose a wireless-based hierarchical FL scheme that uses intermediate servers (ISs) to form clusters at the areas where the MUs are more densely located. Our scheme utilizes OTA cluster aggregations for the communication of the MUs with their corresponding IS, and OTA global aggregations from the ISs to the PS. We present a convergence analysis for the proposed algorithm, and show through numerical evaluations of the derived analytical expressions and experimental results that utilizing ISs results in a faster convergence and a better performance than the OTA FL alone while using less transmit power. We also validate the results on the performance using different number of cluster iterations with different datasets and data distributions. We conclude that the best choice of cluster aggregations depends on the data distribution among the MUs and the clusters.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.
Since real-world objects and their interactions are often multi-modal and multi-typed, heterogeneous networks have been widely used as a more powerful, realistic, and generic superclass of traditional homogeneous networks (graphs). Meanwhile, representation learning (\aka~embedding) has recently been intensively studied and shown effective for various network mining and analytical tasks. In this work, we aim to provide a unified framework to deeply summarize and evaluate existing research on heterogeneous network embedding (HNE), which includes but goes beyond a normal survey. Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms. Moreover, existing HNE algorithms, though mostly claimed generic, are often evaluated on different datasets. Understandable due to the application favor of HNE, such indirect comparisons largely hinder the proper attribution of improved task performance towards effective data preprocessing and novel technical design, especially considering the various ways possible to construct a heterogeneous network from real-world application data. Therefore, as the second contribution, we create four benchmark datasets with various properties regarding scale, structure, attribute/label availability, and \etc.~from different sources, towards handy and fair evaluations of HNE algorithms. As the third contribution, we carefully refactor and amend the implementations and create friendly interfaces for 13 popular HNE algorithms, and provide all-around comparisons among them over multiple tasks and experimental settings.
Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. As the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning researches, as well as to summarize and interpret the mechanisms and the strategies in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Different from previous surveys, this survey paper reviews over forty representative transfer learning approaches from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, twenty representative transfer learning models are used for experiments. The models are performed on three different datasets, i.e., Amazon Reviews, Reuters-21578, and Office-31. And the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.