We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits. We provide a framework which modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on "optimism in the face of uncertainty", by complementing them with principles from "action elimination". Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms which (a) attain near-optimal regret in the absence of corruptions and (b) adapt to unknown levels corruption, enjoying regret guarantees which degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) as well as linear-function-approximation settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee which accommodates any deviation from purely i.i.d. transitions in the bandit-feedback model for episodic reinforcement learning.
We propose, analyze, and test new iterative solvers for large-scale systems of linear algebraic equations arising from the finite element discretization of reduced optimality systems defining the finite element approximations to the solution of elliptic tracking-type distributed optimal control problems with both the standard $L_2$ and the more general energy regularizations. If we aim at an approximation of the given desired state $y_d$ by the computed finite element state $y_h$ that asymptotically differs from $y_d$ in the order of the best $L_2$ approximation under acceptable costs for the control, then the optimal choice of the regularization parameter $\varrho$ is linked to the mesh-size $h$ by the relations $\varrho=h^4$ and $\varrho=h^2$ for the $L_2$ and the energy regularization, respectively. For this setting, we can construct efficient parallel iterative solvers for the reduced finite element optimality systems. These results can be generalized to variable regularization parameters adapted to the local behavior of the mesh-size that can heavily change in case of adaptive mesh refinement. Similar results can be obtained for the space-time finite element discretization of the corresponding parabolic and hyperbolic optimal control problems.
A primary challenge of physics-informed machine learning (PIML) is its generalization beyond the training domain, especially when dealing with complex physical problems represented by partial differential equations (PDEs). This paper aims to enhance the generalization capabilities of PIML, facilitating practical, real-world applications where accurate predictions in unexplored regions are crucial. We leverage the inherent causality and temporal sequential characteristics of PDE solutions to fuse PIML models with recurrent neural architectures based on systems of ordinary differential equations, referred to as neural oscillators. Through effectively capturing long-time dependencies and mitigating the exploding and vanishing gradient problem, neural oscillators foster improved generalization in PIML tasks. Extensive experimentation involving time-dependent nonlinear PDEs and biharmonic beam equations demonstrates the efficacy of the proposed approach. Incorporating neural oscillators outperforms existing state-of-the-art methods on benchmark problems across various metrics. Consequently, the proposed method improves the generalization capabilities of PIML, providing accurate solutions for extrapolation and prediction beyond the training data.
Deep learning methods have gained considerable interest in the numerical solution of various partial differential equations (PDEs). One particular focus is physics-informed neural networks (PINN), which integrate physical principles into neural networks. This transforms the process of solving PDEs into optimization problems for neural networks. To address a collection of advection-diffusion equations (ADE) in a range of difficult circumstances, this paper proposes a novel network structure. This architecture integrates the solver, a multi-scale deep neural networks (MscaleDNN) utilized in the PINN method, with a hard constraint technique known as HCPINN. This method introduces a revised formulation of the desired solution for ADE by utilizing a loss function that incorporates the residuals of the governing equation and penalizes any deviations from the specified boundary and initial constraints. By surpassing the boundary constraints automatically, this method improves the accuracy and efficiency of the PINN technique. To address the ``spectral bias'' phenomenon in neural networks, a subnetwork structure of MscaleDNN and a Fourier-induced activation function are incorporated into the HCPINN, resulting in a hybrid approach called SFHCPINN. The effectiveness of SFHCPINN is demonstrated through various numerical experiments involving ADE in different dimensions. The numerical results indicate that SFHCPINN outperforms both standard PINN and its subnetwork version with Fourier feature embedding. It achieves remarkable accuracy and efficiency while effectively handling complex boundary conditions and high-frequency scenarios in ADE.
Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields $\delta$-optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general ensemble predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests. In comparison to sample-split cross-validation and $K$-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting. At the same time, its computational cost is considerably lower owing to the use of the risk extrapolation technique. Additional numerical results validate the finite-sample accuracy of ECV for several common ensemble predictors under a computational constraint on the maximum ensemble size.
We describe and analyze a quasi-Trefftz DG method for solving boundary value problems for the homogeneous diffusion-advection-reaction equation with piecewise-smooth coefficients. Trefftz schemes are high-order Galerkin methods whose discrete functions are elementwise exact solutions of the underlying PDE. Trefftz basis functions can be computed for many PDEs that are linear, homogeneous and with piecewise-constant coefficients. However, if the equation has varying coefficients, in general, exact solutions are unavailable, hence the construction of discrete Trefftz spaces is impossible. Quasi-Trefftz methods have been introduced to overcome this limitation, relying on discrete spaces of functions that are elementwise "approximate solutions" of the PDE. A space-time quasi-Trefftz DG method for the acoustic wave equation with smoothly varying coefficients has recently been studied; since it has shown excellent results, we propose a related method that can be applied to second-order elliptic equations. The DG weak formulation is derived using an interior penalty parameter and the upwind numerical fluxes. We choose polynomial quasi-Trefftz basis functions, whose coefficients can be computed with a simple algorithm based on the Taylor expansion of the PDE's coefficients. The main advantage of Trefftz and quasi-Trefftz schemes over more classical ones is the higher accuracy for comparable numbers of degrees of freedom. We prove that the dimension of the quasi-Trefftz space is smaller than the dimension of the full polynomial space of the same degree and that yields the same optimal convergence rates. The quasi-Trefftz DG method is well-posed, consistent and stable and we prove its high-order convergence. We present some numerical experiments in two dimensions that show excellent properties in terms of approximation and convergence rate.
The identification of the Purkinje conduction system in the heart is a challenging task, yet essential for a correct definition of cardiac digital twins for precision cardiology. Here, we propose a probabilistic approach for identifying the Purkinje network from non-invasive clinical data such as the standard electrocardiogram (ECG). We use cardiac imaging to build an anatomically accurate model of the ventricles; we algorithmically generate a rule-based Purkinje network tailored to the anatomy; we simulate physiological electrocardiograms with a fast model; we identify the geometrical and electrical parameters of the Purkinje-ECG model with Bayesian optimization and approximate Bayesian computation. The proposed approach is inherently probabilistic and generates a population of plausible Purkinje networks, all fitting the ECG within a given tolerance. In this way, we can estimate the uncertainty of the parameters, thus providing reliable predictions. We test our methodology in physiological and pathological scenarios, showing that we are able to accurately recover the ECG with our model. We propagate the uncertainty in the Purkinje network parameters in a simulation of conduction system pacing therapy. Our methodology is a step forward in creation of digital twins from non-invasive data in precision medicine. An open source implementation can be found at //github.com/fsahli/purkinje-learning
With the increasing demand of intelligent systems capable of operating in different contexts (e.g. users on the move) the correct interpretation of the user-need by such systems has become crucial to give consistent answers to the user questions. The most effective applications addressing such task are in the fields of natural language processing and semantic expansion of terms. These techniques are aimed at estimating the goal of an input query reformulating it as an intent, commonly relying on textual resources built exploiting different semantic relations like \emph{synonymy}, \emph{antonymy} and many others. The aim of this paper is to generate such resources using the labels of a given taxonomy as source of information. The obtained resources are integrated into a plain classifier for reformulating a set of input queries as intents and tracking the effect of each relation, in order to quantify the impact of each semantic relation on the classification. As an extension to this, the best tradeoff between improvement and noise introduction when combining such relations is evaluated. The assessment is made generating the resources and their combinations and using them for tuning the classifier which is used to reformulate the user questions as labels. The evaluation employs a wide and varied taxonomy as a use-case, exploiting its labels as basis for the semantic expansion and producing several corpora with the purpose of enhancing the pseudo-queries estimation.
We use Stein characterisations to derive new moment-type estimators for the parameters of several multivariate distributions in the i.i.d. case; we also derive the asymptotic properties of these estimators. Our examples include the multivariate truncated normal distribution and several spherical distributions. The estimators are explicit and therefore provide an interesting alternative to the maximum-likelihood estimator. The quality of these estimators is assessed through competitive simulation studies in which we compare their behaviour to the performance of other estimators available in the literature.
Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.