Recurrent neural networks (RNNs) notoriously struggle to learn long-term memories, primarily due to vanishing and exploding gradients. The recent success of state-space models (SSMs), a subclass of RNNs, to overcome such difficulties challenges our theoretical understanding. In this paper, we delve into the optimization challenges of RNNs and discover that, as the memory of a network increases, changes in its parameters result in increasingly large output variations, making gradient-based learning highly sensitive, even without exploding gradients. Our analysis further reveals the importance of the element-wise recurrence design pattern combined with careful parametrizations in mitigating this effect. This feature is present in SSMs, as well as in other architectures, such as LSTMs. Overall, our insights provide a new explanation for some of the difficulties in gradient-based learning of RNNs and why some architectures perform better than others.
It is a notorious open question whether integer programs (IPs), with an integer coefficient matrix $M$ whose subdeterminants are all bounded by a constant $\Delta$ in absolute value, can be solved in polynomial time. We answer this question in the affirmative if we further require that, by removing a constant number of rows and columns from $M$, one obtains a submatrix $A$ that is the transpose of a network matrix. Our approach focuses on the case where $A$ arises from $M$ after removing $k$ rows only, where $k$ is a constant. We achieve our result in two main steps, the first related to the theory of IPs and the second related to graph minor theory. First, we derive a strong proximity result for the case where $A$ is a general totally unimodular matrix: Given an optimal solution of the linear programming relaxation, an optimal solution to the IP can be obtained by finding a constant number of augmentations by circuits of $[A\; I]$. Second, for the case where $A$ is transpose of a network matrix, we reformulate the problem as a maximum constrained integer potential problem on a graph $G$. We observe that if $G$ is $2$-connected, then it has no rooted $K_{2,t}$-minor for $t = \Omega(k \Delta)$. We leverage this to obtain a tree-decomposition of $G$ into highly structured graphs for which we can solve the problem locally. This allows us to solve the global problem via dynamic programming.
Causal networks are useful in a wide variety of applications, from medical diagnosis to root-cause analysis in manufacturing. In practice, however, causal networks are often incomplete with missing causal relations. This paper presents a novel approach, called CausalLP, that formulates the issue of incomplete causal networks as a knowledge graph completion problem. More specifically, the task of finding new causal relations in an incomplete causal network is mapped to the task of knowledge graph link prediction. The use of knowledge graphs to represent causal relations enables the integration of external domain knowledge; and as an added complexity, the causal relations have weights representing the strength of the causal association between entities in the knowledge graph. Two primary tasks are supported by CausalLP: causal explanation and causal prediction. An evaluation of this approach uses a benchmark dataset of simulated videos for causal reasoning, CLEVRER-Humans, and compares the performance of multiple knowledge graph embedding algorithms. Two distinct dataset splitting approaches are used for evaluation: (1) random-based split, which is the method typically employed to evaluate link prediction algorithms, and (2) Markov-based split, a novel data split technique that utilizes the Markovian property of causal relations. Results show that using weighted causal relations improves causal link prediction over the baseline without weighted relations.
Deep neural networks and other modern machine learning models are often susceptible to adversarial attacks. Indeed, an adversary may often be able to change a model's prediction through a small, directed perturbation of the model's input - an issue in safety-critical applications. Adversarially robust machine learning is usually based on a minmax optimisation problem that minimises the machine learning loss under maximisation-based adversarial attacks. In this work, we study adversaries that determine their attack using a Bayesian statistical approach rather than maximisation. The resulting Bayesian adversarial robustness problem is a relaxation of the usual minmax problem. To solve this problem, we propose Abram - a continuous-time particle system that shall approximate the gradient flow corresponding to the underlying learning problem. We show that Abram approximates a McKean-Vlasov process and justify the use of Abram by giving assumptions under which the McKean-Vlasov process finds the minimiser of the Bayesian adversarial robustness problem. We discuss two ways to discretise Abram and show its suitability in benchmark adversarial deep learning experiments.
Deep neural networks (DNNs) have demonstrated remarkable empirical performance in large-scale supervised learning problems, particularly in scenarios where both the sample size $n$ and the dimension of covariates $p$ are large. This study delves into the application of DNNs across a wide spectrum of intricate causal inference tasks, where direct estimation falls short and necessitates multi-stage learning. Examples include estimating the conditional average treatment effect and dynamic treatment effect. In this framework, DNNs are constructed sequentially, with subsequent stages building upon preceding ones. To mitigate the impact of estimation errors from early stages on subsequent ones, we integrate DNNs in a doubly robust manner. In contrast to previous research, our study offers theoretical assurances regarding the effectiveness of DNNs in settings where the dimensionality $p$ expands with the sample size. These findings are significant independently and extend to degenerate single-stage learning problems.
A common method for estimating the Hessian operator from random samples on a low-dimensional manifold involves locally fitting a quadratic polynomial. Although widely used, it is unclear if this estimator introduces bias, especially in complex manifolds with boundaries and nonuniform sampling. Rigorous theoretical guarantees of its asymptotic behavior have been lacking. We show that, under mild conditions, this estimator asymptotically converges to the Hessian operator, with nonuniform sampling and curvature effects proving negligible, even near boundaries. Our analysis framework simplifies the intensive computations required for direct analysis.
Graph combinatorial optimization problems are widely applicable and notoriously difficult to compute; for example, consider the traveling salesman or facility location problems. In this paper, we explore the feasibility of using convolutional neural networks (CNNs) on graph images to predict the cardinality of combinatorial properties of random graphs and networks. Specifically, we use image representations of modified adjacency matrices of random graphs as training samples for a CNN model to predict the stability number of random graphs; where the stability number is the cardinality of a maximum set of vertices containing no pairwise adjacency. Our approach demonstrates the potential for applying deep learning in combinatorial optimization problems.
Artificial Intelligence (AI) is revolutionizing biodiversity research by enabling advanced data analysis, species identification, and habitats monitoring, thereby enhancing conservation efforts. Ensuring reproducibility in AI-driven biodiversity research is crucial for fostering transparency, verifying results, and promoting the credibility of ecological findings.This study investigates the reproducibility of deep learning (DL) methods within the biodiversity domain. We design a methodology for evaluating the reproducibility of biodiversity-related publications that employ DL techniques across three stages. We define ten variables essential for method reproducibility, divided into four categories: resource requirements, methodological information, uncontrolled randomness, and statistical considerations. These categories subsequently serve as the basis for defining different levels of reproducibility. We manually extract the availability of these variables from a curated dataset comprising 61 publications identified using the keywords provided by biodiversity experts. Our study shows that the dataset is shared in 47% of the publications; however, a significant number of the publications lack comprehensive information on deep learning methods, including details regarding randomness.
In this paper, we introduce the finite difference weighted essentially non-oscillatory (WENO) scheme based on the neural network for hyperbolic conservation laws. We employ the supervised learning and design two loss functions, one with the mean squared error and the other with the mean squared logarithmic error, where the WENO3-JS weights are computed as the labels. Each loss function consists of two components where the first component compares the difference between the weights from the neural network and WENO3-JS weights, while the second component matches the output weights of the neural network and the linear weights. The former of the loss function enforces the neural network to follow the WENO properties, implying that there is no need for the post-processing layer. Additionally the latter leads to better performance around discontinuities. As a neural network structure, we choose the shallow neural network (SNN) for computational efficiency with the Delta layer consisting of the normalized undivided differences. These constructed WENO3-SNN schemes show the outperformed results in one-dimensional examples and improved behavior in two-dimensional examples, compared with the simulations from WENO3-JS and WENO3-Z.
Conventional intelligent systems based on deep neural network (DNN) models encounter challenges in achieving human-like continual learning due to catastrophic forgetting. Here, we propose a metaplasticity model inspired by human working memory, enabling DNNs to perform catastrophic forgetting-free continual learning without any pre- or post-processing. A key aspect of our approach involves implementing distinct types of synapses from stable to flexible, and randomly intermixing them to train synaptic connections with different degrees of flexibility. This strategy allowed the network to successfully learn a continuous stream of information, even under unexpected changes in input length. The model achieved a balanced tradeoff between memory capacity and performance without requiring additional training or structural modifications, dynamically allocating memory resources to retain both old and new information. Furthermore, the model demonstrated robustness against data poisoning attacks by selectively filtering out erroneous memories, leveraging the Hebb repetition effect to reinforce the retention of significant data.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.