Insights: - The human-centered AI (HCAI) approach and the sociotechnical systems (STS) theory share the same goal: ensuring that new technologies such as AI best serve humans in a sociotechnical environment. - HCAI practice needs to fully embrace sociotechnical systems thinking, while traditional STS needs to evolve to address the emerging characteristics of AI technology. - We propose a conceptual framework for intelligent sociotechnical systems (iSTS) to enhance traditional STS theory in the AI era. - Based on iSTS, we further propose a sociotechnical-based hierarchical HCAI approach as a paradigmatic extension to existing HCAI practice, further advancing HCAI practice.
We present a unification and generalization of sequentially and hierarchically semi-separable (SSS and HSS) matrices called tree semi-separable (TSS) matrices. Our main result is to show that any dense matrix can be expressed in a TSS format. Here, the dimensions of the generators are specified by the ranks of the Hankel blocks of the matrix. TSS matrices satisfy a graph-induced rank structure (GIRS) property. It is shown that TSS matrices generalize the algebraic properties of SSS and HSS matrices under addition, products, and inversion. Subsequently, TSS matrices admit linear time matrix-vector multiply, matrix-matrix multiply, matrix-matrix addition, inversion, and solvers.
We propose a novel methodology to solve a key eigenvalue optimization problem which arises in the contractivity analysis of neural ODEs. When looking at contractivity properties of a one layer weight-tied neural ODE $\dot{u}(t)=\sigma(Au(t)+b)$ (with $u,b \in {\mathbb R}^n$, $A$ is a given $n \times n$ matrix, $\sigma : {\mathbb R} \to {\mathbb R}^+$ denotes an activation function and for a vector $z \in {\mathbb R}^n$, $\sigma(z) \in {\mathbb R}^n$ has to be interpreted entry-wise), we are led to study the logarithmic norm of a set of products of type $D A$, where $D$ is a diagonal matrix such that ${\mathrm{diag}}(D) \in \sigma'({\mathbb R}^n)$. Specifically, given a real number $c$ (usually $c=0$), the problem consists in finding the largest positive interval $\chi\subseteq \mathbb [0,\infty)$ such that the logarithmic norm $\mu(DA) \le c$ for all diagonal matrices $D$ with $D_{ii}\in \chi$. We propose a two-level nested methodology: an inner level where, for a given $\chi$, we compute an optimizer $D^\star(\chi)$ by a gradient system approach, and an outer level where we tune $\chi$ so that the value $c$ is reached by $\mu(D^\star(\chi)A)$. We extend the proposed two-level approach to the general multilayer, and possibly time-dependent, case $\dot{u}(t) = \sigma( A_k(t) \ldots \sigma ( A_{1}(t) u(t) + b_{1}(t) ) \ldots + b_{k}(t) )$ and we propose several numerical examples to illustrate its behaviour, including its stabilizing performance on a one-layer neural ODE applied to the classification of the MNIST handwritten digits dataset.
The typical phases of Bayesian network (BN) structured development include specification of purpose and scope, structure development, parameterisation and validation. Structure development is typically focused on qualitative issues and parameterisation quantitative issues, however there are qualitative and quantitative issues that arise in both phases. A common step that occurs after the initial structure has been developed is to perform a rough parameterisation that only captures and illustrates the intended qualitative behaviour of the model. This is done prior to a more rigorous parameterisation, ensuring that the structure is fit for purpose, as well as supporting later development and validation. In our collective experience and in discussions with other modellers, this step is an important part of the development process, but is under-reported in the literature. Since the practice focuses on qualitative issues, despite being quantitative in nature, we call this step qualitative parameterisation and provide an outline of its role in the BN development process.
Large Language Models are becoming the go-to solution for many natural language processing tasks, including in specialized domains where their few-shot capacities are expected to yield high performance in low-resource settings. Herein, we aim to assess the performance of Large Language Models for few shot clinical entity recognition in multiple languages. We evaluate named entity recognition in English, French and Spanish using 8 in-domain (clinical) and 6 out-domain gold standard corpora. We assess the performance of 10 auto-regressive language models using prompting and 16 masked language models used for text encoding in a biLSTM-CRF supervised tagger. We create a few-shot set-up by limiting the amount of annotated data available to 100 sentences. Our experiments show that although larger prompt-based models tend to achieve competitive F-measure for named entity recognition outside the clinical domain, this level of performance does not carry over to the clinical domain where lighter supervised taggers relying on masked language models perform better, even with the performance drop incurred from the few-shot set-up. In all experiments, the CO2 impact of masked language models is inferior to that of auto-regressive models. Results are consistent over the three languages and suggest that few-shot learning using Large language models is not production ready for named entity recognition in the clinical domain. Instead, models could be used for speeding-up the production of gold standard annotated data.
Threat actor attribution is a crucial defense strategy for combating advanced persistent threats (APTs). Cyber threat intelligence (CTI), which involves analyzing multisource heterogeneous data from APTs, plays an important role in APT actor attribution. The current attribution methods extract features from different CTI perspectives and employ machine learning models to classify CTI reports according to their threat actors. However, these methods usually extract only one kind of feature and ignore heterogeneous information, especially the attributes and relations of indicators of compromise (IOCs), which form the core of CTI. To address these problems, we propose an APT actor attribution method based on multimodal and multilevel feature fusion (APT-MMF). First, we leverage a heterogeneous attributed graph to characterize APT reports and their IOC information. Then, we extract and fuse multimodal features, including attribute type features, natural language text features and topological relationship features, to construct comprehensive node representations. Furthermore, we design multilevel heterogeneous graph attention networks to learn the deep hidden features of APT report nodes; these networks integrate IOC type-level, metapath-based neighbor node-level, and metapath semantic-level attention. Utilizing multisource threat intelligence, we construct a heterogeneous attributed graph dataset for verification purposes. The experimental results show that our method not only outperforms the existing methods but also demonstrates its good interpretability for attribution analysis tasks.
The macro-element variant of the hybridized discontinuous Galerkin (HDG) method combines advantages of continuous and discontinuous finite element discretization. In this paper, we investigate the performance of the macro-element HDG method for the analysis of compressible flow problems at moderate Reynolds numbers. To efficiently handle the corresponding large systems of equations, we explore several strategies at the solver level. On the one hand, we devise a second-layer static condensation approach that reduces the size of the local system matrix in each macro-element and hence the factorization time of the local solver. On the other hand, we employ a multi-level preconditioner based on the FGMRES solver for the global system that integrates well within a matrix-free implementation. In addition, we integrate a standard diagonally implicit Runge-Kutta scheme for time integration. We test the matrix-free macro-element HDG method for compressible flow benchmarks, including Couette flow, flow past a sphere, and the Taylor-Green vortex. Our results show that unlike standard HDG, the macro-element HDG method can operate efficiently for moderate polynomial degrees, as the local computational load can be flexibly increased via mesh refinement within a macro-element. Our results also show that due to the balance of local and global operations, the reduction in degrees of freedom, and the reduction of the global problem size and the number of iterations for its solution, the macro-element HDG method can be a competitive option for the analysis of compressible flow problems.
Continual learning algorithms strive to acquire new knowledge while preserving prior information. Often, these algorithms emphasise stability and restrict network updates upon learning new tasks. In many cases, such restrictions come at a cost to the model's plasticity, i.e. the model's ability to adapt to the requirements of a new task. But is all change detrimental? Here, we approach this question by proposing that activation spaces in neural networks can be decomposed into two subspaces: a readout range in which change affects prior tasks and a null space in which change does not alter prior performance. Based on experiments with this novel technique, we show that, indeed, not all activation change is associated with forgetting. Instead, the only change in the subspace visible to the readout of a task can lead to decreased stability, while restricting change outside of this subspace is associated only with a loss of plasticity. Analysing various commonly used algorithms, we show that regularisation-based techniques do not fully disentangle the two spaces and, as a result, restrict plasticity more than need be. We expand our results by investigating a linear model in which we can manipulate learning in the two subspaces directly and thus causally link activation changes to stability and plasticity. For hierarchical, nonlinear cases, we present an approximation that enables us to estimate functionally relevant subspaces at every layer of a deep nonlinear network, corroborating our previous insights. Together, this work provides novel means to derive insights into the mechanisms behind stability and plasticity in continual learning and may serve as a diagnostic tool to guide developments of future continual learning algorithms that stabilise inference while allowing maximal space for learning.
The Bellman-Ford algorithm for single-source shortest paths repeatedly updates tentative distances in an operation called relaxing an edge. In several important applications a non-adaptive (oblivious) implementation is preferred, which means fixing the entire sequence of relaxations upfront, independent of the edge-weights. In a dense graph on $n$ vertices, the algorithm in its standard form performs $(1 + o(1))n^3$ relaxations. An improvement by Yen from 1970 reduces the number of relaxations by a factor of two. We show that no further constant-factor improvements are possible, and every non-adaptive deterministic algorithm based on relaxations must perform $(\frac{1}{2} - o(1))n^3$ steps. This improves an earlier lower bound of Eppstein of $(\frac{1}{6} - o(1))n^3$. Given that a non-adaptive randomized variant of Bellman-Ford with at most $(\frac{1}{3} + o(1))n^3$ relaxations (with high probability) is known, our result implies a strict separation between deterministic and randomized strategies, answering an open question of Eppstein.
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.