Bilevel learning is a powerful optimization technique that has extensively been employed in recent years to bridge the world of model-driven variational approaches with data-driven methods. Upon suitable parametrization of the desired quantities of interest (e.g., regularization terms or discretization filters), such approach computes optimal parameter values by solving a nested optimization problem where the variational model acts as a constraint. In this work, we consider two different use cases of bilevel learning for the problem of image restoration. First, we focus on learning scalar weights and convolutional filters defining a Field of Experts regularizer to restore natural images degraded by blur and noise. For improving the practical performance, the lower-level problem is solved by means of a gradient descent scheme combined with a line-search strategy based on the Barzilai-Borwein rule. As a second application, the bilevel setup is employed for learning a discretization of the popular total variation regularizer for solving image restoration problems (in particular, deblurring and super-resolution). Numerical results show the effectiveness of the approach and their generalization to multiple tasks.
The recent success of neural networks in natural language processing has drawn renewed attention to learning sequence-to-sequence (seq2seq) tasks. While there exists a rich literature that studies classification and regression tasks using solvable models of neural networks, seq2seq tasks have not yet been studied from this perspective. Here, we propose a simple model for a seq2seq task that has the advantage of providing explicit control over the degree of memory, or non-Markovianity, in the sequences -- the stochastic switching-Ornstein-Uhlenbeck (SSOU) model. We introduce a measure of non-Markovianity to quantify the amount of memory in the sequences. For a minimal auto-regressive (AR) learning model trained on this task, we identify two learning regimes corresponding to distinct phases in the stationary state of the SSOU process. These phases emerge from the interplay between two different time scales that govern the sequence statistics. Moreover, we observe that while increasing the integration window of the AR model always improves performance, albeit with diminishing returns, increasing the non-Markovianity of the input sequences can improve or degrade its performance. Finally, we perform experiments with recurrent and convolutional neural networks that show that our observations carry over to more complicated neural network architectures.
We present a novel deep learning method for estimating time-dependent parameters in Markov processes through discrete sampling. Departing from conventional machine learning, our approach reframes parameter approximation as an optimization problem using the maximum likelihood approach. Experimental validation focuses on parameter estimation in multivariate regression and stochastic differential equations (SDEs). Theoretical results show that the real solution is close to SDE with parameters approximated using our neural network-derived under specific conditions. Our work contributes to SDE-based model parameter estimation, offering a versatile tool for diverse fields.
Active learning can improve the efficiency of training prediction models by identifying the most informative new labels to acquire. However, non-response to label requests can impact active learning's effectiveness in real-world contexts. We conceptualise this degradation by considering the type of non-response present in the data, demonstrating that biased non-response is particularly detrimental to model performance. We argue that this sort of non-response is particularly likely in contexts where the labelling process, by nature, relies on user interactions. To mitigate the impact of biased non-response, we propose a cost-based correction to the sampling strategy--the Upper Confidence Bound of the Expected Utility (UCB-EU)--that can, plausibly, be applied to any active learning algorithm. Through experiments, we demonstrate that our method successfully reduces the harm from labelling non-response in many settings. However, we also characterise settings where the non-response bias in the annotations remains detrimental under UCB-EU for particular sampling methods and data generating processes. Finally, we evaluate our method on a real-world dataset from e-commerce platform Taobao. We show that UCB-EU yields substantial performance improvements to conversion models that are trained on clicked impressions. Most generally, this research serves to both better conceptualise the interplay between types of non-response and model improvements via active learning, and to provide a practical, easy to implement correction that helps mitigate model degradation.
Integrated computational materials engineering (ICME) has significantly enhanced the systemic analysis of the relationship between microstructure and material properties, paving the way for the development of high-performance materials. However, analyzing microstructure-sensitive material behavior remains challenging due to the scarcity of three-dimensional (3D) microstructure datasets. Moreover, this challenge is amplified if the microstructure is anisotropic, as this results in anisotropic material properties as well. In this paper, we present a framework for reconstruction of anisotropic microstructures solely based on two-dimensional (2D) micrographs using conditional diffusion-based generative models (DGMs). The proposed framework involves spatial connection of multiple 2D conditional DGMs, each trained to generate 2D microstructure samples for three different orthogonal planes. The connected multiple reverse diffusion processes then enable effective modeling of a Markov chain for transforming noise into a 3D microstructure sample. Furthermore, a modified harmonized sampling is employed to enhance the sample quality while preserving the spatial connection between the slices of anisotropic microstructure samples in 3D space. To validate the proposed framework, the 2D-to-3D reconstructed anisotropic microstructure samples are evaluated in terms of both the spatial correlation function and the physical material behavior. The results demonstrate that the framework is capable of reproducing not only the statistical distribution of material phases but also the material properties in 3D space. This highlights the potential application of the proposed 2D-to-3D reconstruction framework in establishing microstructure-property linkages, which could aid high-throughput material design for future studies
Rule-based reasoning is an essential part of human intelligence prominently formalized in artificial intelligence research via logic programs. Describing complex objects as the composition of elementary ones is a common strategy in computer science and science in general. The author has recently introduced the sequential composition of logic programs in the context of logic-based analogical reasoning and learning in logic programming. Motivated by these applications, in this paper we construct a qualitative and algebraic notion of syntactic logic program similarity from sequential decompositions of programs. We then show how similarity can be used to answer queries across different domains via a one-step reduction. In a broader sense, this paper is a further step towards an algebraic theory of logic programming.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.
Meta-learning, or learning to learn, has gained renewed interest in recent years within the artificial intelligence community. However, meta-learning is incredibly prevalent within nature, has deep roots in cognitive science and psychology, and is currently studied in various forms within neuroscience. The aim of this review is to recast previous lines of research in the study of biological intelligence within the lens of meta-learning, placing these works into a common framework. More recent points of interaction between AI and neuroscience will be discussed, as well as interesting new directions that arise under this perspective.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.
Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.