Despite the successful implementations of physics-informed neural networks in different scientific domains, it has been shown that for complex nonlinear systems, achieving an accurate model requires extensive hyperparameter tuning, network architecture design, and costly and exhaustive training processes. To avoid such obstacles and make the training of physics-informed models less precarious, in this paper, a data-driven multi-fidelity physics-informed framework is proposed based on transfer learning principles. The framework incorporates the knowledge from low-fidelity auxiliary systems and limited labeled data from target actual system to significantly improve the performance of conventional physics-informed models. While minimizing the efforts of designing a complex task-specific network for the problem at hand, the proposed settings guide the physics-informed model towards a fast and efficient convergence to a global optimum. An adaptive weighting method is utilized to further enhance the optimization of the model composite loss function during the training process. A data-driven strategy is also introduced for maintaining high performance in subdomains with significant divergence between low- and high-fidelity behaviours. The heat transfer of composite materials undergoing a cure cycle is investigated as a case study to demonstrate the proposed framework's performance compared to conventional physics-informed models.
We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying Markov random processes parameterized by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, convexity, the Polyak-Lojasiewicz condition, and general non-convexity. We apply our framework to two problems in control and reinforcement learning. First, we look at the standard online actor-critic algorithm over finite state and action spaces and derive a convergence rate of O(k^(-2/5)), which recovers the best known rate derived specifically for this problem. Second, we study an online actor-critic algorithm for the linear-quadratic regulator and show that a convergence rate of O(k^(-2/3)) is achieved. This is the first time such a result is known in the literature. Finally, we support our theoretical analysis with numerical simulations where the convergence rates are visualized.
The neural network (NN) becomes one of the most heated type of models in various signal processing applications. However, NNs are extremely vulnerable to adversarial examples (AEs). To defend AEs, adversarial training (AT) is believed to be the most effective method while due to the intensive computation, AT is limited to be applied in most applications. In this paper, to resolve the problem, we design a generic and efficient AT improvement scheme, namely case-aware adversarial training (CAT). Specifically, the intuition stems from the fact that a very limited part of informative samples can contribute to most of model performance. Alternatively, if only the most informative AEs are used in AT, we can lower the computation complexity of AT significantly as maintaining the defense effect. To achieve this, CAT achieves two breakthroughs. First, a method to estimate the information degree of adversarial examples is proposed for AE filtering. Second, to further enrich the information that the NN can obtain from AEs, CAT involves a weight estimation and class-level balancing based sampling strategy to increase the diversity of AT at each iteration. Extensive experiments show that CAT is faster than vanilla AT by up to 3x while achieving competitive defense effect.
Operator learning for complex nonlinear operators is increasingly common in modeling physical systems. However, training machine learning methods to learn such operators requires a large amount of expensive, high-fidelity data. In this work, we present a composite Deep Operator Network (DeepONet) for learning using two datasets with different levels of fidelity, to accurately learn complex operators when sufficient high-fidelity data is not available. Additionally, we demonstrate that the presence of low-fidelity data can improve the predictions of physics-informed learning with DeepONets.
Linear mixed models (LMMs) are instrumental for regression analysis with structured dependence, such as grouped, clustered, or multilevel data. However, selection among the covariates--while accounting for this structured dependence--remains a challenge. We introduce a Bayesian decision analysis for subset selection with LMMs. Using a Mahalanobis loss function that incorporates the structured dependence, we derive optimal linear coefficients for (i) any given subset of variables and (ii) all subsets of variables that satisfy a cardinality constraint. Crucially, these estimates inherit shrinkage or regularization and uncertainty quantification from the underlying Bayesian model, and apply for any well-specified Bayesian LMM. More broadly, our decision analysis strategy deemphasizes the role of a single "best" subset, which is often unstable and limited in its information content, and instead favors a collection of near-optimal subsets. This collection is summarized by key member subsets and variable-specific importance metrics. Customized subset search and out-of-sample approximation algorithms are provided for more scalable computing. These tools are applied to simulated data and a longitudinal physical activity dataset, and demonstrate excellent prediction, estimation, and selection ability.
Reinforcement Learning (RL) approaches are lately deployed for orchestrating wireless communications empowered by Reconfigurable Intelligent Surfaces (RISs), leveraging their online optimization capabilities. Most commonly, in RL-based formulations for realistic RISs with low resolution phase-tunable elements, each configuration is modeled as a distinct reflection action, resulting to inefficient exploration due to the exponential nature of the search space. In this paper, we consider RISs with 1-bit phase resolution elements, and model the action of each of them as a binary vector including the feasible reflection coefficients. We then introduce two variations of the well-established Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) agents, aiming for effective exploration of the binary action spaces. For the case of DQN, we make use of an efficient approximation of the Q-function, whereas a discretization post-processing step is applied to the output of DDPG. Our simulation results showcase that the proposed techniques greatly outperform the baseline in terms of the rate maximization objective, when large-scale RISs are considered. In addition, when dealing with moderate scale RIS sizes, where the conventional DQN based on configuration-based action spaces is feasible, the performance of the latter technique is similar to the proposed learning approach.
Annotating data for supervised learning can be costly. When the annotation budget is limited, active learning can be used to select and annotate those observations that are likely to give the most gain in model performance. We propose an active learning algorithm that, in addition to selecting which observation to annotate, selects the precision of the annotation that is acquired. Assuming that annotations with low precision are cheaper to obtain, this allows the model to explore a larger part of the input space, with the same annotation costs. We build our acquisition function on the previously proposed BALD objective for Gaussian Processes, and empirically demonstrate the gains of being able to adjust the annotation precision in the active learning loop.
Earthquake-induced secondary ground failure hazards, such as liquefaction and landslides, result in catastrophic building and infrastructure damage as well as human fatalities. To facilitate emergency responses and mitigate losses, the U.S. Geological Survey provides a rapid hazard estimation system for earthquake-triggered landslides and liquefaction using geospatial susceptibility proxies and ShakeMap ground motion estimates. In this study, we develop a generalized causal graph-based Bayesian network that models the physical interdependencies between geospatial features, seismic ground failures, and building damage, as well as DPMs. Geospatial features provide physical insights for estimating ground failure occurrence while DPMs contain event-specific surface change observations. This physics-informed causal graph incorporates these variables with complex physical relationships in one holistic Bayesian updating scheme to effectively fuse information from both geospatial models and remote sensing data. This framework is scalable and flexible enough to deal with highly complex multi-hazard combinations. We then develop a stochastic variational inference algorithm to jointly update the intractable posterior probabilities of unobserved landslides, liquefaction, and building damage at different locations efficiently. In addition, a local graphical model pruning algorithm is presented to reduce the computational cost of large-scale seismic ground failure estimation. We apply this framework to the September 2018 Hokkaido Iburi-Tobu, Japan (M6.6) earthquake and January 2020 Southwest Puerto Rico (M6.4) earthquake to evaluate the performance of our algorithm.
The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.
Current deep learning research is dominated by benchmark evaluation. A method is regarded as favorable if it empirically performs well on the dedicated test set. This mentality is seamlessly reflected in the resurfacing area of continual learning, where consecutively arriving sets of benchmark data are investigated. The core challenge is framed as protecting previously acquired representations from being catastrophically forgotten due to the iterative parameter updates. However, comparison of individual methods is nevertheless treated in isolation from real world application and typically judged by monitoring accumulated test set performance. The closed world assumption remains predominant. It is assumed that during deployment a model is guaranteed to encounter data that stems from the same distribution as used for training. This poses a massive challenge as neural networks are well known to provide overconfident false predictions on unknown instances and break down in the face of corrupted data. In this work we argue that notable lessons from open set recognition, the identification of statistically deviating data outside of the observed dataset, and the adjacent field of active learning, where data is incrementally queried such that the expected performance gain is maximized, are frequently overlooked in the deep learning era. Based on these forgotten lessons, we propose a consolidated view to bridge continual learning, active learning and open set recognition in deep neural networks. Our results show that this not only benefits each individual paradigm, but highlights the natural synergies in a common framework. We empirically demonstrate improvements when alleviating catastrophic forgetting, querying data in active learning, selecting task orders, while exhibiting robust open world application where previously proposed methods fail.