Whenever humans use tools human performance is enhanced. Cognitive systems are a new kind of tool continually increasing in cognitive capability and are now performing high level cognitive tasks previously thought to be explicitly human. Usage of such tools, known as cogs, are expected to result in ever increasing levels of human cognitive augmentation. In a human cog ensemble, a cooperative, peer to peer, and collaborative dialog between a human and a cognitive system, human cognitive capability is augmented as a result of the interaction. The human cog ensemble is therefore able to achieve more than just the human or the cog working alone. This article presents results from two studies designed to measure the effect information supplied by a cog has on cognitive accuracy, the ability to produce the correct result, and cognitive precision, the propensity to produce only the correct result. Both cognitive accuracy and cognitive precision are shown to be increased by information of different types (policies and rules, examples, and suggestions) and with different kinds of problems (inventive problem solving and puzzles). Similar effects shown in other studies are compared.
Foundation models (FMs) are general-purpose artificial intelligence (AI) models that have recently enabled multiple brand-new generative AI applications. The rapid advances in FMs serve as an important contextual backdrop for the vision of next-generation wireless networks, where federated learning (FL) is a key enabler of distributed network intelligence. Currently, the exploration of the interplay between FMs and FL is still in its nascent stage. Naturally, FMs are capable of boosting the performance of FL, and FL could also leverage decentralized data and computing resources to assist in the training of FMs. However, the exceptionally high requirements that FMs have for computing resources, storage, and communication overhead would pose critical challenges to FL-enabled wireless networks. In this article, we explore the extent to which FMs are suitable for FL over wireless networks, including a broad overview of research challenges and opportunities. In particular, we discuss multiple new paradigms for realizing future intelligent networks that integrate FMs and FL. We also consolidate several broad research directions associated with these paradigms.
Objective: For lower arm amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG and visual evidence individually, resulting in an overall fusion accuracy of 95.3%. Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.
Vision-language models (VLMs) are impactful in part because they can be applied to a variety of visual understanding tasks in a zero-shot fashion, without any fine-tuning. We study $\textit{generative VLMs}$ that are trained for next-word generation given an image. We explore their zero-shot performance on the illustrative task of image-text retrieval across 8 popular vision-language benchmarks. Our first observation is that they can be repurposed for discriminative tasks (such as image-text retrieval) by simply computing the match score of generating a particular text string given an image. We call this probabilistic score the $\textit{Visual Generative Pre-Training Score}$ (VisualGPTScore). While the VisualGPTScore produces near-perfect accuracy on some retrieval benchmarks, it yields poor accuracy on others. We analyze this behavior through a probabilistic lens, pointing out that some benchmarks inadvertently capture unnatural language distributions by creating adversarial but unlikely text captions. In fact, we demonstrate that even a "blind" language model that ignores any image evidence can sometimes outperform all prior art, reminiscent of similar challenges faced by the visual-question answering (VQA) community many years ago. We derive a probabilistic post-processing scheme that controls for the amount of linguistic bias in generative VLMs at test time without having to retrain or fine-tune the model. We show that the VisualGPTScore, when appropriately debiased, is a strong zero-shot baseline for vision-language understanding, oftentimes producing state-of-the-art accuracy.
Cloud computing platforms are progressively adopting Field Programmable Gate Arrays to deploy specialized hardware accelerators for specific computational tasks. However, the security of FPGA-based bitstream for Intellectual Property, IP cores from unauthorized interception in cloud environments remains a prominent concern. Existing methodologies for protection of such bitstreams possess several limitations, such as requiring a large number of keys, tying bitstreams to specific FPGAs, and relying on trusted third parties. This paper proposes Aggregate Encryption and Individual Decryption, a cryptosystem based on key aggregation to enhance the security of FPGA-based bitstream for IP cores and to address the pitfalls of previous related works. In our proposed scheme, IP providers can encrypt their bitstreams with a single key for a set S of FPGA boards, with which the bitstreams can directly be decrypted on any of the FPGA boards in S. Aggregate encryption of the key is performed in a way which ensures that the key can solely be obtained onboard through individual decryption employing the board's private key, thus facilitating secure key provisioning. The proposed cryptosystem is evaluated mainly on Zynq FPGAs. The outcomes demonstrate that our cryptosystem not only outperforms existing techniques with respect to resource, time and energy significantly but also upholds robust security assurances.
Tensors are ubiquitous in science and engineering and tensor factorization approaches have become important tools for the characterization of higher order structure. Factorizations includes the outer-product rank Canonical Polyadic Decomposition (CPD) as well as the multi-linear rank Tucker decomposition in which the Block-Term Decomposition (BTD) is a structured intermediate interpolating between these two representations. Whereas CPD, Tucker, and BTD have traditionally relied on maximum-likelihood estimation, Bayesian inference has been use to form probabilistic CPD and Tucker. We propose, an efficient variational Bayesian probabilistic BTD, which uses the von-Mises Fisher matrix distribution to impose orthogonality in the multi-linear Tucker parts forming the BTD. On synthetic and two real datasets, we highlight the Bayesian inference procedure and demonstrate using the proposed pBTD on noisy data and for model order quantification. We find that the probabilistic BTD can quantify suitable multi-linear structures providing a means for robust inference of patterns in multi-linear data.
Many software engineers develop, fine-tune, and deploy deep learning (DL) models. They use DL models in a variety of development frameworks and deploy to a range of runtime environments. In this diverse ecosystem, engineers use DL model converters to move models from frameworks to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, failure modes and patterns of DL model converters are unknown. This knowledge gap adds engineering risk in DL interoperability technologies. In this paper, we conduct the first failure analysis on DL model converters. Specifically, we characterize failures in model converters associated with ONNX (Open Neural Network eXchange). We analyze failures in the ONNX converters for two major DL frameworks, PyTorch and TensorFlow. The symptoms, causes, and locations of failures are reported for N=200 issues. We also evaluate why models fail by converting 5,149 models, both real-world and synthetically generated instances. Through the course of our testing, we find 11 defects (5 new) across torch.onnx, tf2onnx, and the ONNXRuntime. We evaluated two hypotheses about the relationship between model operators and converter failures, falsifying one and with equivocal results on the other. We describe and note weaknesses in the current testing strategies for model converters. Our results motivate future research on making DL software simpler to maintain, extend, and validate.
System correctness is one of the most crucial and challenging objectives in software and hardware systems. With the increasing evolution of connected and distributed systems, ensuring their correctness requires the use of formal verification for multi-agent systems. In this paper, we present a summary of certain results on model checking for multi-agent systems that derive from the selection of strategies and information for agents. Additionally, we discuss some open directions for future research.
Knowledge graph embedding (KGE) is a increasingly popular technique that aims to represent entities and relations of knowledge graphs into low-dimensional semantic spaces for a wide spectrum of applications such as link prediction, knowledge reasoning and knowledge completion. In this paper, we provide a systematic review of existing KGE techniques based on representation spaces. Particularly, we build a fine-grained classification to categorise the models based on three mathematical perspectives of the representation spaces: (1) Algebraic perspective, (2) Geometric perspective, and (3) Analytical perspective. We introduce the rigorous definitions of fundamental mathematical spaces before diving into KGE models and their mathematical properties. We further discuss different KGE methods over the three categories, as well as summarise how spatial advantages work over different embedding needs. By collating the experimental results from downstream tasks, we also explore the advantages of mathematical space in different scenarios and the reasons behind them. We further state some promising research directions from a representation space perspective, with which we hope to inspire researchers to design their KGE models as well as their related applications with more consideration of their mathematical space properties.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
Dialogue systems have attracted more and more attention. Recent advances on dialogue systems are overwhelmingly contributed by deep learning techniques, which have been employed to enhance a wide range of big data applications such as computer vision, natural language processing, and recommender systems. For dialogue systems, deep learning can leverage a massive amount of data to learn meaningful feature representations and response generation strategies, while requiring a minimum amount of hand-crafting. In this article, we give an overview to these recent advances on dialogue systems from various perspectives and discuss some possible research directions. In particular, we generally divide existing dialogue systems into task-oriented and non-task-oriented models, then detail how deep learning techniques help them with representative algorithms and finally discuss some appealing research directions that can bring the dialogue system research into a new frontier.