In this paper, we study recurrent neural networks in the presence of pairwise learning rules. We are specifically interested in how the attractor landscapes of such networks become altered as a function of the strength and nature (Hebbian vs. anti-Hebbian) of learning, which may have a bearing on the ability of such rules to mediate large-scale optimization problems. Through formal analysis, we show that a transition from Hebbian to anti-Hebbian learning brings about a pitchfork bifurcation that destroys convexity in the network attractor landscape. In larger-scale settings, this implies that anti-Hebbian plasticity will bring about multiple stable equilibria, and such effects may be outsized at interconnection or `choke' points. Furthermore, attractor landscapes are more sensitive to slower learning rates than faster ones. These results provide insight into the types of objective functions that can be encoded via different pairwise plasticity rules.
When benchmarking optimization heuristics, we need to take care to avoid an algorithm exploiting biases in the construction of the used problems. One way in which this might be done is by providing different versions of each problem but with transformations applied to ensure the algorithms are equipped with mechanisms for successfully tackling a range of problems. In this paper, we investigate several of these problem transformations and show how they influence the low-level landscape features of a set of 5 problems from the CEC2022 benchmark suite. Our results highlight that even relatively small transformations can significantly alter the measured landscape features. This poses a wider question of what properties we want to preserve when creating problem transformations, and how to fairly measure them.
The use of machine-learning techniques has grown in numerous research areas. Currently, it is also widely used in statistics, including the official statistics for data collection (e.g. satellite imagery, web scraping and text mining, data cleaning, integration and imputation) but also for data analysis. However, the usage of these methods in survey sampling including small area estimation is still very limited. Therefore, we propose a predictor supported by these algorithms which can be used to predict any population or subpopulation characteristics based on cross-sectional and longitudinal data. Machine learning methods have already been shown to be very powerful in identifying and modelling complex and nonlinear relationships between the variables, which means that they have very good properties in case of strong departures from the classic assumptions. Therefore, we analyse the performance of our proposal under a different set-up, in our opinion of greater importance in real-life surveys. We study only small departures from the assumed model, to show that our proposal is a good alternative in this case as well, even in comparison with optimal methods under the model. What is more, we propose the method of the accuracy estimation of machine learning predictors, giving the possibility of the accuracy comparison with classic methods, where the accuracy is measured as in survey sampling practice. The solution of this problem is indicated in the literature as one of the key issues in integration of these approaches. The simulation studies are based on a real, longitudinal dataset, freely available from the Polish Local Data Bank, where the prediction problem of subpopulation characteristics in the last period, with "borrowing strength" from other subpopulations and time periods, is considered.
In this article, we show that the completion problem, i.e. the decision problem whether a partial structure can be completed to a full structure, is NP-complete for many combinatorial structures. While the gadgets for most reductions in literature are found by hand, we present an algorithm to construct gadgets in a fully automated way. Using our framework which is based on SAT, we present the first thorough study of the completion problem on sign mappings with forbidden substructures by classifying thousands of structures for which the completion problem is NP-complete. Our list in particular includes interior triple systems, which were introduced by Knuth towards an axiomatization of planar point configurations. Last but not least, we give an infinite family of structures generalizing interior triple system to higher dimensions for which the completion problem is NP-complete.
In this paper we formally define the hierarchical clustering network problem (HCNP) as the problem to find a good hierarchical partition of a network. This new problem focuses on the dynamic process of the clustering rather than on the final picture of the clustering process. To address it, we introduce a new ierarchical clustering algorithm in networks, based on a new shortest path betweenness measure. To calculate it, the communication between each pair of nodes is weighed by he importance of the nodes that establish this communication. The weights or importance associated to each pair of nodes are calculated as the Shapley value of a game, named as the linear modularity game. This new measure, (the node-game shortest path betweenness measure), is used to obtain a hierarchical partition of the network by eliminating the link with the highest value. To evaluate the performance of our algorithm, we introduce several criteria that allow us to compare different dendrograms of a network from two point of view: modularity and homogeneity. Finally, we propose a faster algorithm based on a simplification of the node-game shortest path betweenness measure, whose order is quadratic on sparse networks. This fast version is competitive from a computational point of view with other hierarchical fast algorithms, and, in general, it provides better results.
In this paper, we provide an analysis of a recently proposed multicontinuum homogenization technique. The analysis differs from those used in classical homogenization methods for several reasons. First, the cell problems in multicontinuum homogenization use constraint problems and can not be directly substituted into the differential operator. Secondly, the problem contains high contrast that remains in the homogenized problem. The homogenized problem averages the microstructure while containing the small parameter. In this analysis, we first based on our previous techniques, CEM-GMsFEM, to define a CEM-downscaling operator that maps the multicontinuum quantities to an approximated microscopic solution. Following the regularity assumption of the multicontinuum quantities, we construct a downscaling operator and the homogenized multicontinuum equations using the information of linear approximation of the multicontinuum quantities. The error analysis is given by the residual estimate of the homogenized equations and the well-posedness assumption of the homogenized equations.
In recent years, there has been an intense debate about how learning in biological neural networks (BNNs) differs from learning in artificial neural networks. It is often argued that the updating of connections in the brain relies only on local information, and therefore a stochastic gradient-descent type optimization method cannot be used. In this paper, we study a stochastic model for supervised learning in BNNs. We show that a (continuous) gradient step occurs approximately when each learning opportunity is processed by many local updates. This result suggests that stochastic gradient descent may indeed play a role in optimizing BNNs.
In this chapter we illustrate the use of some Machine Learning techniques in the context of omics data. More precisely, we review and evaluate the use of Random Forest and Penalized Multinomial Logistic Regression for integrative analysis of genomics and immunomics in pancreatic cancer. Furthermore, we propose the use of association rules with predictive purposes to overcome the low predictive power of the previously mentioned models. Finally, we apply the reviewed methods to a real data set from TCGA made of 107 tumoral pancreatic samples and 117,486 germline SNPs, showing the good performance of the proposed methods to predict the immunological infiltration in pancreatic cancer.
In this paper, we demonstrate for the first time how the Integrated Finite Element Neural Network (I-FENN) framework, previously proposed by the authors, can efficiently simulate the entire loading history of non-local gradient damage propagation. To achieve this goal, we first adopt a Temporal Convolutional Network (TCN) as the neural network of choice to capture the history-dependent evolution of the non-local strain in a coarsely meshed domain. The quality of the network predictions governs the computational performance of I-FENN, and therefore we perform an extended investigation aimed at enhancing them. We explore a data-driven vs. physics-informed TCN setup to arrive at an optimum network training, evaluating the network based on a coherent set of relevant performance metrics. We address the crucial issue of training a physics-informed network with input data that span vastly different length scales by proposing a systematic way of input normalization and output un-normalization. We then integrate the trained TCN within the nonlinear iterative FEM solver and apply I-FENN to simulate the damage propagation analysis. I-FENN is always applied in mesh idealizations different from the one used for the TCN training, showcasing the framework's ability to be used at progressively refined mesh resolutions. We illustrate several cases that I-FENN completes the simulation using either a modified or a full Newton-Raphson scheme, and we showcase its computational savings compared to both the classical monolithic and staggered FEM solvers. We underline that we satisfy very strict convergence criteria for every increment across the entire simulation, providing clear evidence of the robustness and accuracy of I-FENN. All the code and data used in this work will be made publicly available upon publication of the article.
Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved problems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we explain the emergence of this correlation. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and a significant component of the empirical neural tangent kernels associated with those weights. We establish that the NFA introduced in prior works is driven by a centered NFA that isolates this alignment. We show that the speed of NFA development can be predicted analytically at early training times in terms of simple statistics of the inputs and labels. Finally, we introduce a simple intervention to increase NFA correlation at any given layer, which dramatically improves the quality of features learned.
In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.