Recent progress in artificial intelligence (AI) has been driven by insights from neuroscience, particularly with the development of artificial neural networks (ANNs). This has significantly enhanced the replication of complex cognitive tasks such as vision and natural language processing. Despite these advances, ANNs struggle with continual learning, adaptable knowledge transfer, robustness, and resource efficiency - capabilities that biological systems handle seamlessly. Specifically, ANNs often overlook the functional and morphological diversity of the brain, hindering their computational capabilities. Furthermore, incorporating cell-type specific neuromodulatory effects into ANNs with neuronal heterogeneity could enable learning at two spatial scales: spiking behavior at the neuronal level, and synaptic plasticity at the circuit level, thereby potentially enhancing their learning abilities. In this article, we summarize recent bio-inspired models, learning rules and architectures and propose a biologically-informed framework for enhancing ANNs. Our proposed dual-framework approach highlights the potential of spiking neural networks (SNNs) for emulating diverse spiking behaviors and dendritic compartments to simulate morphological and functional diversity of neuronal computations. Finally, we outline how the proposed approach integrates brain-inspired compartmental models and task-driven SNNs, balances bioinspiration and complexity, and provides scalable solutions for pressing AI challenges, such as continual learning, adaptability, robustness, and resource-efficiency.
Although deep learning techniques have shown significant achievements, they frequently depend on extensive amounts of hand-labeled data and tend to perform inadequately in few-shot scenarios. The objective of this study is to devise a strategy that can improve the model's capability to recognize biomedical entities in scenarios of few-shot learning. By redefining biomedical named entity recognition (BioNER) as a machine reading comprehension (MRC) problem, we propose a demonstration-based learning method to address few-shot BioNER, which involves constructing appropriate task demonstrations. In assessing our proposed method, we compared the proposed method with existing advanced methods using six benchmark datasets, including BC4CHEMD, BC5CDR-Chemical, BC5CDR-Disease, NCBI-Disease, BC2GM, and JNLPBA. We examined the models' efficacy by reporting F1 scores from both the 25-shot and 50-shot learning experiments. In 25-shot learning, we observed 1.1% improvements in the average F1 scores compared to the baseline method, reaching 61.7%, 84.1%, 69.1%, 70.1%, 50.6%, and 59.9% on six datasets, respectively. In 50-shot learning, we further improved the average F1 scores by 1.0% compared to the baseline method, reaching 73.1%, 86.8%, 76.1%, 75.6%, 61.7%, and 65.4%, respectively. We reported that in the realm of few-shot learning BioNER, MRC-based language models are much more proficient in recognizing biomedical entities compared to the sequence labeling approach. Furthermore, our MRC-language models can compete successfully with fully-supervised learning methodologies that rely heavily on the availability of abundant annotated data. These results highlight possible pathways for future advancements in few-shot BioNER methodologies.
Ridge functions are used to describe and study the lower bound of the approximation done by the neural networks which can be written as a linear combination of activation functions. If the activation functions are also ridge functions, these networks are called explainable neural networks. In this brief paper, we first show that quantum neural networks which are based on variational quantum circuits can be written as a linear combination of ridge functions by following matrix notations. Consequently, we show that the interpretability and explainability of such quantum neural networks can be directly considered and studied as an approximation with the linear combination of ridge functions.
Individual mobility trajectories are difficult to measure and often incur long periods of missingness. Aggregation of this mobility data without accounting for the missingness leads to erroneous results, underestimating travel behavior. This paper proposes Dynamic Time Warping-Based Multiple Imputation (DTWBMI) as a method of filling long gaps in human mobility trajectories in order to use the available data to the fullest extent. This method reduces spatiotemporal trajectories to time series of particular travel behavior, then selects candidates for multiple imputation on the basis of the dynamic time warping distance between the potential donor series and the series preceding and following the gap in the recipient series and finally imputes values multiple times. A simulation study designed to establish optimal parameters for DTWBMI provides two versions of the method. These two methods are applied to a real-world dataset of individual mobility trajectories with simulated missingness and compared against other methods of handling missingness. Linear interpolation outperforms DTWBMI and other methods when gaps are short and data are limited. DTWBMI outperforms other methods when gaps become longer and when more data are available.
Recent advances in memory technologies, devices and materials have shown great potential for integration into neuromorphic electronic systems. However, a significant gap remains between the development of these materials and the realization of large-scale, fully functional systems. One key challenge is determining which devices and materials are best suited for specific functions and how they can be paired with CMOS circuitry. To address this, we introduce TEXEL, a mixed-signal neuromorphic architecture designed to explore the integration of on-chip learning circuits and novel two- and three-terminal devices. TEXEL serves as an accessible platform to bridge the gap between CMOS-based neuromorphic computation and the latest advancements in emerging devices. In this paper, we demonstrate the readiness of TEXEL for device integration through comprehensive chip measurements and simulations. TEXEL provides a practical system for testing bio-inspired learning algorithms alongside emerging devices, establishing a tangible link between brain-inspired computation and cutting-edge device research.
Osteoarthritis is a degenerative condition affecting bones and cartilage, often leading to osteophyte formation, bone density loss, and joint space narrowing. Treatment options to restore normal joint function vary depending on the severity of the condition. This work introduces an innovative deep-learning framework processing shoulder CT scans. It features the semantic segmentation of the proximal humerus and scapula, the 3D reconstruction of bone surfaces, the identification of the glenohumeral (GH) joint region, and the staging of three common osteoarthritic-related pathologies: osteophyte formation (OS), GH space reduction (JS), and humeroscapular alignment (HSA). The pipeline comprises two cascaded CNN architectures: 3D CEL-UNet for segmentation and 3D Arthro-Net for threefold classification. A retrospective dataset of 571 CT scans featuring patients with various degrees of GH osteoarthritic-related pathologies was used to train, validate, and test the pipeline. Root mean squared error and Hausdorff distance median values for 3D reconstruction were 0.22mm and 1.48mm for the humerus and 0.24mm and 1.48mm for the scapula, outperforming state-of-the-art architectures and making it potentially suitable for a PSI-based shoulder arthroplasty preoperative plan context. The classification accuracy for OS, JS, and HSA consistently reached around 90% across all three categories. The computational time for the inference pipeline was less than 15s, showcasing the framework's efficiency and compatibility with orthopedic radiology practice. The outcomes represent a promising advancement toward the medical translation of artificial intelligence tools. This progress aims to streamline the preoperative planning pipeline delivering high-quality bone surfaces and supporting surgeons in selecting the most suitable surgical approach according to the unique patient joint conditions.
Human learning embodies a striking duality: sometimes, we appear capable of following logical, compositional rules and benefit from structured curricula (e.g., in formal education), while other times, we rely on an incremental approach or trial-and-error, learning better from curricula that are unstructured or randomly interleaved. Influential psychological theories explain this seemingly disparate behavioral evidence by positing two qualitatively different learning systems -- one for rapid, rule-based inferences and another for slow, incremental adaptation. It remains unclear how to reconcile such theories with neural networks, which learn via incremental weight updates and are thus a natural model for the latter type of learning, but are not obviously compatible with the former. However, recent evidence suggests that both metalearning neural networks and large language models are capable of "in-context learning" (ICL) -- the ability to flexibly grasp the structure of a new task from a few examples given at inference time. Here, we show that networks capable of ICL can reproduce human-like learning and compositional behavior on rule-governed tasks, while at the same time replicating human behavioral phenomena in tasks lacking rule-like structure via their usual in-weight learning (IWL). Our work shows how emergent ICL can equip neural networks with fundamentally different learning properties than those traditionally attributed to them, and that these can coexist with the properties of their native IWL, thus offering a novel perspective on dual-process theories and human cognitive flexibility.
Image monitoring and guidance during medical examinations can aid both diagnosis and treatment. However, the sampling frequency is often too low, which creates a need to estimate the missing images. We present a probabilistic motion model for sequential medical images, with the ability to both estimate motion between acquired images and forecast the motion ahead of time. The core is a low-dimensional temporal process based on a linear Gaussian state-space model with analytically tractable solutions for forecasting, simulation, and imputation of missing samples. The results, from two experiments on publicly available cardiac datasets, show reliable motion estimates and an improved forecasting performance using patient-specific adaptation by online learning.
Bayesian models of cognition have gained considerable traction in computational neuroscience and psychiatry. Their scopes are now expected to expand rapidly to artificial intelligence, providing general inference frameworks to support embodied, adaptable, and energy-efficient autonomous agents. A central theory in this domain is predictive coding, which posits that learning and behaviour are driven by hierarchical probabilistic inferences about the causes of sensory inputs. Biological realism constrains these networks to rely on simple local computations in the form of precision-weighted predictions and prediction errors. This can make this framework highly efficient, but its implementation comes with unique challenges on the software development side. Embedding such models in standard neural network libraries often becomes limiting, as these libraries' compilation and differentiation backends can force a conceptual separation between optimization algorithms and the systems being optimized. This critically departs from other biological principles such as self-monitoring, self-organisation, cellular growth and functional plasticity. In this paper, we introduce \texttt{pyhgf}: a Python package backed by JAX and Rust for creating, manipulating and sampling dynamic networks for predictive coding. We improve over other frameworks by enclosing the network components as transparent, modular and malleable variables in the message-passing steps. The resulting graphs can implement arbitrary computational complexities as beliefs propagation. But the transparency of core variables can also translate into inference processes that leverage self-organisation principles, and express structure learning, meta-learning or causal discovery as the consequence of network structural adaptation to surprising inputs. The code, tutorials and documentation are hosted at: //github.com/ilabcode/pyhgf.
Meta-learning, or learning to learn, has gained renewed interest in recent years within the artificial intelligence community. However, meta-learning is incredibly prevalent within nature, has deep roots in cognitive science and psychology, and is currently studied in various forms within neuroscience. The aim of this review is to recast previous lines of research in the study of biological intelligence within the lens of meta-learning, placing these works into a common framework. More recent points of interaction between AI and neuroscience will be discussed, as well as interesting new directions that arise under this perspective.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.