Phosphorylation networks, representing the mechanisms by which proteins are phosphorylated at one or multiple sites, are ubiquitous in cell signalling and display rich dynamics such as unlimited multistability. Dual-site phosphorylation networks are known to exhibit oscillations in the form of periodic trajectories, when phosphorylation and dephosphorylation occurs as a mixed mechanism: phosphorylation of the two sites requires one encounter of the kinase, while dephosphorylation of the two sites requires two encounters with the phosphatase. A still open question is whether a mechanism requiring two encounters for both phosphorylation and dephosphorylation also admits oscillations. In this work we provide evidence in favor of the absence of oscillations of this network by precluding Hopf bifurcations in any reduced network comprising three out of its four intermediate protein complexes. Our argument relies on a novel network reduction step that preserves the absence of Hopf bifurcations, and on a detailed analysis of the semi-algebraic conditions precluding Hopf bifurcations obtained from Hurwitz determinants of the characteristic polynomial of the Jacobian of the system. We conjecture that the removal of certain reverse reactions appearing in Michaelis-Menten-type mechanisms does not have an impact on the presence or absence of Hopf bifurcations. We prove an implication of the conjecture under certain favorable scenarios and support the conjecture with additional example-based evidence.
The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have then attempted to adapt LLMs for protein understanding by integrating a protein sequence encoder with a pre-trained LLM. However, this adaptation raises a fundamental question: "Can LLMs, originally designed for NLP, effectively comprehend protein sequences as a form of language?" Current datasets fall short in addressing this question due to the lack of a direct correlation between protein sequences and corresponding text descriptions, limiting the ability to train and evaluate LLMs for protein understanding effectively. To bridge this gap, we introduce ProteinLMDataset, a dataset specifically designed for further self-supervised pretraining and supervised fine-tuning (SFT) of LLMs to enhance their capability for protein sequence comprehension. Specifically, ProteinLMDataset includes 17.46 billion tokens for pretraining and 893,000 instructions for SFT. Additionally, we present ProteinLMBench, the first benchmark dataset consisting of 944 manually verified multiple-choice questions for assessing the protein understanding capabilities of LLMs. ProteinLMBench incorporates protein-related details and sequences in multiple languages, establishing a new standard for evaluating LLMs' abilities in protein comprehension. The large language model InternLM2-7B, pretrained and fine-tuned on the ProteinLMDataset, outperforms GPT-4 on ProteinLMBench, achieving the highest accuracy score.
Convolutional Neural Networks (CNNs) are known for their ability to learn hierarchical structures, naturally developing detectors for objects, and semantic concepts within their deeper layers. Activation maps (AMs) reveal these saliency regions, which are crucial for many Explainable AI (XAI) methods. However, the direct exploitation of raw AMs in CNNs for feature attribution remains underexplored in literature. This work revises Class Activation Map (CAM) methods by introducing the Label-free Activation Map (LaFAM), a streamlined approach utilizing raw AMs for feature attribution without reliance on labels. LaFAM presents an efficient alternative to conventional CAM methods, demonstrating particular effectiveness in saliency map generation for self-supervised learning while maintaining applicability in supervised learning scenarios.
We present seven experiments exploring gender biases in GPT. Initially, GPT was asked to generate demographics of a potential writer of twenty phrases containing feminine stereotypes and twenty with masculine stereotypes. Results show a strong asymmetry, with stereotypically masculine sentences attributed to a female more often than vice versa. For example, the sentence "I love playing fotbal! Im practicing with my cosin Michael" was constantly assigned by ChatGPT to a female writer. This phenomenon likely reflects that while initiatives to integrate women in traditionally masculine roles have gained momentum, the reverse movement remains relatively underdeveloped. Subsequent experiments investigate the same issue in high-stakes moral dilemmas. GPT-4 finds it more appropriate to abuse a man to prevent a nuclear apocalypse than to abuse a woman. This bias extends to other forms of violence central to the gender parity debate (abuse), but not to those less central (torture). Moreover, this bias increases in cases of mixed-sex violence for the greater good: GPT-4 agrees with a woman using violence against a man to prevent a nuclear apocalypse but disagrees with a man using violence against a woman for the same purpose. Finally, these biases are implicit, as they do not emerge when GPT-4 is directly asked to rank moral violations. These results highlight the necessity of carefully managing inclusivity efforts to prevent unintended discrimination.
Bayesian geoacoustic inversion problems are conventionally solved by Markov chain Monte Carlo methods or its variants, which are computationally expensive. This paper extends the classic Bayesian geoacoustic inversion framework by deriving important geoacoustic statistics of Bayesian geoacoustic inversion from the multidimensional posterior probability density (PPD) using the mixture density network (MDN) theory. These statistics make it convenient to train the network directly on the whole parameter space and get the multidimensional PPD of model parameters. The present approach provides a much more efficient way to solve geoacoustic inversion problems in Bayesian inference framework. The network is trained on a simulated dataset of surface-wave dispersion curves with shear-wave velocities as labels and tested on both synthetic and real data cases. The results show that the network gives reliable predictions and has good generalization performance on unseen data. Once trained, the network can rapidly (within seconds) give a fully probabilistic solution which is comparable to Monte Carlo methods. It provides an promising approach for real-time inversion.
We investigate a Tikhonov regularization scheme specifically tailored for shallow neural networks within the context of solving a classic inverse problem: approximating an unknown function and its derivatives within a unit cubic domain based on noisy measurements. The proposed Tikhonov regularization scheme incorporates a penalty term that takes three distinct yet intricately related network (semi)norms: the extended Barron norm, the variation norm, and the Radon-BV seminorm. These choices of the penalty term are contingent upon the specific architecture of the neural network being utilized. We establish the connection between various network norms and particularly trace the dependence of the dimensionality index, aiming to deepen our understanding of how these norms interplay with each other. We revisit the universality of function approximation through various norms, establish rigorous error-bound analysis for the Tikhonov regularization scheme, and explicitly elucidate the dependency of the dimensionality index, providing a clearer understanding of how the dimensionality affects the approximation performance and how one designs a neural network with diverse approximating tasks.
Bayesian networks are one of the most widely used classes of probabilistic models for risk management and decision support because of their interpretability and flexibility in including heterogeneous pieces of information. In any applied modelling, it is critical to assess how robust the inferences on certain target variables are to changes in the model. In Bayesian networks, these analyses fall under the umbrella of sensitivity analysis, which is most commonly carried out by quantifying dissimilarities using Kullback-Leibler information measures. In this paper, we argue that robustness methods based instead on the familiar total variation distance provide simple and more valuable bounds on robustness to misspecification, which are both formally justifiable and transparent. We introduce a novel measure of dependence in conditional probability tables called the diameter to derive such bounds. This measure quantifies the strength of dependence between a variable and its parents. We demonstrate how such formal robustness considerations can be embedded in building a Bayesian network.
Network time series are becoming increasingly relevant in the study of dynamic processes characterised by a known or inferred underlying network structure. Generalised Network Autoregressive (GNAR) models provide a parsimonious framework for exploiting the underlying network, even in the high-dimensional setting. We extend the GNAR framework by presenting the $\textit{community}$-$\alpha$ GNAR model that exploits prior knowledge and/or exogenous variables for identifying and modelling dynamic interactions across communities in the network. We further analyse the dynamics of $\textit{ Red, Blue}$ and $\textit{Swing}$ states throughout presidential elections in the USA. Our analysis suggests interesting global and communal effects.
This work aims to extend the well-known high-order WENO finite-difference methods for systems of conservation laws to nonconservative hyperbolic systems. The main difficulty of these systems both from the theoretical and the numerical points of view comes from the fact that the definition of weak solution is not unique: according to the theory developed by Dal Maso, LeFloch, and Murat in 1995, it depends on the choice of a family of paths. A general strategy is proposed here in which WENO operators are not only used to reconstruct fluxes but also the nonconservative products of the system. Moreover, if a Roe linearization is available, the nonconservative products can be computed through matrix-vector operations instead of path-integrals. The methods are extended to problems with source terms and two different strategies are introduced to obtain well-balanced schemes. These numerical schemes will be then applied to the two-layer shallow water equations in one- and two- dimensions to obtain high-order methods that preserve water-at-rest steady states.
Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still lacks a solid mathematical foundation. In this article, we take a step in this direction by establishing an implicit regularization of deep residual networks towards neural ODEs, for nonlinear networks trained with gradient flow. We prove that if the network is initialized as a discretization of a neural ODE, then such a discretization holds throughout training. Our results are valid for a finite training time, and also as the training time tends to infinity provided that the network satisfies a Polyak-Lojasiewicz condition. Importantly, this condition holds for a family of residual networks where the residuals are two-layer perceptrons with an overparameterization in width that is only linear, and implies the convergence of gradient flow to a global minimum. Numerical experiments illustrate our results.
Likelihood-based inference in stochastic non-linear dynamical systems, such as those found in chemical reaction networks and biological clock systems, is inherently complex and has largely been limited to small and unrealistically simple systems. Recent advances in analytically tractable approximations to the underlying conditional probability distributions enable long-term dynamics to be accurately modelled, and make the large number of model evaluations required for exact Bayesian inference much more feasible. We propose a new methodology for inference in stochastic non-linear dynamical systems exhibiting oscillatory behaviour and show the parameters in these models can be realistically estimated from simulated data. Preliminary analyses based on the Fisher Information Matrix of the model can guide the implementation of Bayesian inference. We show that this parameter sensitivity analysis can predict which parameters are practically identifiable. Several Markov chain Monte Carlo algorithms are compared, with our results suggesting a parallel tempering algorithm consistently gives the best approach for these systems, which are shown to frequently exhibit multi-modal posterior distributions.