Sparse neural networks have shown similar or better generalization performance than their dense counterparts while having higher parameter efficiency. This has motivated a number of works to learn, induce, or search for high performing sparse networks. While reports of quality or efficiency gains are impressive, standard baselines are lacking, therefore hindering having reliable comparability and reproducibility across methods. In this work, we provide an evaluation approach and a naive Random Search baseline method for finding good sparse configurations. We apply Random Search on the node space of an overparameterized network with the goal of finding better initialized sparse sub-networks that are positioned more advantageously in the loss landscape. We record sparse network post-training performances at various levels of sparsity and compare against both their fully connected parent networks and random sparse configurations at the same sparsity levels. We observe that for this architecture search task, initialized sparse networks found by Random Search neither perform better nor converge more efficiently than their random counterparts. Thus we conclude that Random Search may be viewed as a suitable neutral baseline for sparsity search methods.
Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities. This work proposes and experimentally demonstrates layer ensemble averaging, a technique to map pre-trained neural network solutions from software to defective hardware crossbars of emerging memory devices and reliably attain near-software performance on inference. The approach is investigated using a custom 20,000-device hardware prototyping platform on a continual learning problem where a network must learn new tasks without catastrophically forgetting previously learned information. Results demonstrate that by trading off the number of devices required for layer mapping, layer ensemble averaging can reliably boost defective memristive network performance up to the software baseline. For the investigated problem, the average multi-task classification accuracy improves from 61 % to 72 % (< 1 % of software baseline) using the proposed approach.
In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0.8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.
Recent advancements in artificial intelligence have propelled the capabilities of Large Language Models, yet their ability to mimic nuanced human reasoning remains limited. This paper introduces a novel conceptual enhancement to LLMs, termed the Artificial Neuron, designed to significantly bolster cognitive processing by integrating external memory systems. This enhancement mimics neurobiological processes, facilitating advanced reasoning and learning through a dynamic feedback loop mechanism. We propose a unique framework wherein each LLM interaction specifically in solving complex math word problems and common sense reasoning tasks is recorded and analyzed. Incorrect responses are refined using a higher capacity LLM or human in the loop corrections, and both the query and the enhanced response are stored in a vector database, structured much like neuronal synaptic connections. This Artificial Neuron thus serves as an external memory aid, allowing the LLM to reference past interactions and apply learned reasoning strategies to new problems. Our experimental setup involves training with the GSM8K dataset for initial model response generation, followed by systematic refinements through feedback loops. Subsequent testing demonstrated a significant improvement in accuracy and efficiency, underscoring the potential of external memory systems to advance LLMs beyond current limitations. This approach not only enhances the LLM's problem solving precision but also reduces computational redundancy, paving the way for more sophisticated applications of artificial intelligence in cognitive tasks. This paper details the methodology, implementation, and implications of the Artificial Neuron model, offering a transformative perspective on enhancing machine intelligence.
Robotic assistance has significantly improved the outcomes of open microsurgery and rigid endoscopic surgery, however is yet to make an impact in flexible endoscopic neurosurgery. Some of the most common intracranial procedures for treatment of hydrocephalus and tumors stand to benefit from increased dexterity and reduced invasiveness offered by robotic systems that can navigate in the deep ventricular system of the brain. We review a spectrum of flexible robotic devices, from the traditional highly actuated approach, to more novel and bio-inspired mechanisms for safe navigation. For each technology, we identify the operating principle and are able to evaluate the potential for minimally invasive surgical applications. Overall, rigid-type continuum robots have seen the most development, however, approaches combining rigid and soft robotic principles into innovative devices, are ideally situated to address safety and complexity limitations after future design evolution. We also observe a number of related challenges in the field, from surgeon-robot interfaces to robot evaluation procedures. Fundamentally, the challenges revolve around a guarantee of safety in robotic devices with the prerequisites to assist and improve upon surgical tasks. With innovative designs, materials and evaluation techniques emerging, we see potential impacts in the next 5--10 years.
Doubly robust estimators have gained widespread popularity in various fields due to their ability to provide unbiased estimates under model misspecification. However, the asymptotic theory for doubly robust estimators with continuous-time nuisance parameters remains largely unexplored. In this short communication, we address this gap by developing a general asymptotic theory for a class of doubly robust estimating equations involving stochastic processes and Riemann-Stieltjes integrals. We introduce generic assumptions on the nuisance parameter estimators that ensure the consistency and asymptotic normality of the resulting doubly robust estimator. Our results cover both the model doubly robust estimator, which relies on parametric or semiparametric models, and the rate doubly robust estimator, which allows for flexible machine learning methods. We discuss the implications of our findings and highlight the key differences between the continuous-time setting and the classical theory for doubly robust estimators. Our work provides a solid theoretical foundation for the use of doubly robust estimators in complex settings with continuous-time nuisance parameters, paving the way for future research and applications.
With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people's lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF's eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale.
Hyperproperties are commonly used in computer security to define information-flow policies and other requirements that reason about the relationship between multiple computations. In this paper, we study a novel class of hyperproperties where the individual computation paths are chosen by the strategic choices of a coalition of agents in a multi-agent system. We introduce HyperATL*, an extension of computation tree logic with path variables and strategy quantifiers. Our logic can express strategic hyperproperties, such as that the scheduler in a concurrent system has a strategy to avoid information leakage. HyperATL* is particularly useful to specify asynchronous hyperproperties, i.e., hyperproperties where the speed of the execution on the different computation paths depends on the choices of the scheduler. Unlike other recent logics for the specification of asynchronous hyperproperties, our logic is the first to admit decidable model checking for the full logic. We present a model checking algorithm for HyperATL* based on alternating automata, and show that our algorithm is asymptotically optimal by providing a matching lower bound. We have implemented a prototype model checker for a fragment of HyperATL*, able to check various security properties on small programs.
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.
Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration. The recently introduced batch ensembles provide a drop-in replacement that is more parameter efficient. In this paper, we design ensembles not only over weights, but over hyperparameters to improve the state of the art in both settings. For best performance independent of budget, we propose hyper-deep ensembles, a simple procedure that involves a random search over different hyperparameters, themselves stratified across multiple random initializations. Its strong performance highlights the benefit of combining models with both weight and hyperparameter diversity. We further propose a parameter efficient version, hyper-batch ensembles, which builds on the layer structure of batch ensembles and self-tuning networks. The computational and memory costs of our method are notably lower than typical ensembles. On image classification tasks, with MLP, LeNet, and Wide ResNet 28-10 architectures, our methodology improves upon both deep and batch ensembles.
Graph neural networks (GNNs) have emerged as a powerful paradigm for embedding-based entity alignment due to their capability of identifying isomorphic subgraphs. However, in real knowledge graphs (KGs), the counterpart entities usually have non-isomorphic neighborhood structures, which easily causes GNNs to yield different representations for them. To tackle this problem, we propose a new KG alignment network, namely AliNet, aiming at mitigating the non-isomorphism of neighborhood structures in an end-to-end manner. As the direct neighbors of counterpart entities are usually dissimilar due to the schema heterogeneity, AliNet introduces distant neighbors to expand the overlap between their neighborhood structures. It employs an attention mechanism to highlight helpful distant neighbors and reduce noises. Then, it controls the aggregation of both direct and distant neighborhood information using a gating mechanism. We further propose a relation loss to refine entity representations. We perform thorough experiments with detailed ablation studies and analyses on five entity alignment datasets, demonstrating the effectiveness of AliNet.