We present a new framework for computing fine-scale solutions of multiscale Partial Differential Equations (PDEs) using operator learning tools. Obtaining fine-scale solutions of multiscale PDEs can be challenging, but there are many inexpensive computational methods for obtaining coarse-scale solutions. Additionally, in many real-world applications, fine-scale solutions can only be observed at a limited number of locations. In order to obtain approximations or predictions of fine-scale solutions over general regions of interest, we propose to learn the operator mapping from coarse-scale solutions to fine-scale solutions using a limited number (and possibly noisy) observations of the fine-scale solutions. The approach is to train multi-fidelity homogenization maps using mathematically motivated neural operators. The operator learning framework can efficiently obtain the solution of multiscale PDEs at any arbitrary point, making our proposed framework a mesh-free solver. We verify our results on multiple numerical examples showing that our approach is an efficient mesh-free solver for multiscale PDEs.
We formulate and solve data-driven aerodynamic shape design problems with distributionally robust optimization (DRO) approaches. Building on the findings of the work \cite{gotoh2018robust}, we study the connections between a class of DRO and the Taguchi method in the context of robust design optimization. Our preliminary computational experiments on aerodynamic shape optimization in transonic turbulent flow show promising design results.
This paper presents a new approach for training two-stage object detection ensemble models, more specifically, Faster R-CNN models to estimate uncertainty. We propose training one Region Proposal Network(RPN)~\cite{//doi.org/10.48550/arxiv.1506.01497} and multiple Fast R-CNN prediction heads is all you need to build a robust deep ensemble network for estimating uncertainty in object detection. We present this approach and provide experiments to show that this approach is much faster than the naive method of fully training all $n$ models in an ensemble. We also estimate the uncertainty by measuring this ensemble model's Expected Calibration Error (ECE). We then further compare the performance of this model with that of Gaussian YOLOv3, a variant of YOLOv3 that models uncertainty using predicted bounding box coordinates. The source code is released at \url{//github.com/Akola-Mbey-Denis/EfficientEnsemble}
A comprehensive mathematical model of the multiphysics flow of blood and Cerebrospinal Fluid (CSF) in the brain can be expressed as the coupling of a poromechanics system and Stokes' equations: the first describes fluids filtration through the cerebral tissue and the tissue's elastic response, while the latter models the flow of the CSF in the brain ventricles. This model describes the functioning of the brain's waste clearance mechanism, which has been recently discovered to play an essential role in the progress of neurodegenerative diseases. To model the interactions between different scales in the porous medium, we propose a physically consistent coupling between Multi-compartment Poroelasticity (MPE) equations and Stokes' equations. In this work, we introduce a numerical scheme for the discretization of such coupled MPE-Stokes system, employing a high-order discontinuous Galerkin method on polytopal grids to efficiently account for the geometric complexity of the domain. We analyze the stability and convergence of the space semidiscretized formulation, we prove a-priori error estimates, and we present a temporal discretization based on a combination of Newmark's $\beta$-method for the elastic wave equation and the $\theta$-method for the other equations of the model. Numerical simulations carried out on test cases with manufactured solutions validate the theoretical error estimates. We also present numerical results on a two-dimensional slice of a patient-specific brain geometry reconstructed from diagnostic images, to test in practice the advantages of the proposed approach.
Prognostic Health Management aims to predict the Remaining Useful Life (RUL) of degrading components/systems utilizing monitoring data. These RUL predictions form the basis for optimizing maintenance planning in a Predictive Maintenance (PdM) paradigm. We here propose a metric for assessing data-driven prognostic algorithms based on their impact on downstream PdM decisions. The metric is defined in association with a decision setting and a corresponding PdM policy. We consider two typical PdM decision settings, namely component ordering and/or replacement planning, for which we investigate and improve PdM policies that are commonly utilized in the literature. All policies are evaluated via the data-based estimation of the long-run expected maintenance cost per unit time, using monitored run-to-failure experiments. The policy evaluation enables the estimation of the proposed metric. We employ the metric as an objective function for optimizing heuristic PdM policies and algorithms' hyperparameters. The effect of different PdM policies on the metric is initially investigated through a theoretical numerical example. Subsequently, we employ four data-driven prognostic algorithms on a simulated turbofan engine degradation problem, and investigate the joint effect of prognostic algorithm and PdM policy on the metric, resulting in a decision-oriented performance assessment of these algorithms.
Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages. Some of these models employ language adapters in their formulation, which helps to improve monolingual performance and avoids some of the drawbacks of multi-lingual modeling on resource-rich languages. However, this formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance. In this work, we propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network. We also model code-switching as a sequence of latent binary sequences that can be used to guide the flow of information from each language adapter at the frame level. The proposed approaches are evaluated on three code-switched datasets encompassing Arabic, Mandarin, and Hindi languages paired with English, showing consistent improvements in code-switching performance with at least 10\% absolute reduction in CER across all test sets.
In model-based reinforcement learning (MBRL), most algorithms rely on simulating trajectories from one-step dynamics models learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as length of the trajectory grows. In this paper we tackle this issue by using a multi-timestep objective to train one-step models. Our objective is a weighted sum of a loss function (e.g., negative log-likelihood) at various future horizons. We explore and test a range of weights profiles. We find that exponentially decaying weights lead to models that significantly improve the long-horizon R2 score. This improvement is particularly noticeable when the models were evaluated on noisy data. Finally, using a soft actor-critic (SAC) agent in pure batch reinforcement learning (RL) and iterated batch RL scenarios, we found that our multi-timestep models outperform or match standard one-step models. This was especially evident in a noisy variant of the considered environment, highlighting the potential of our approach in real-world applications.
This paper presents a new weak Galerkin (WG) method for elliptic interface problems on general curved polygonal partitions. The method's key innovation lies in its ability to transform the complex interface jump condition into a more manageable Dirichlet boundary condition, simplifying the theoretical analysis significantly. The numerical scheme is designed by using locally constructed weak gradient on the curved polygonal partitions. We establish error estimates of optimal order for the numerical approximation in both discrete $H^1$ and $L^2$ norms. Additionally, we present various numerical results that serve to illustrate the robust numerical performance of the proposed WG interface method.
Most state-of-the-art machine learning techniques revolve around the optimisation of loss functions. Defining appropriate loss functions is therefore critical to successfully solving problems in this field. We present a survey of the most commonly used loss functions for a wide range of different applications, divided into classification, regression, ranking, sample generation and energy based modelling. Overall, we introduce 33 different loss functions and we organise them into an intuitive taxonomy. Each loss function is given a theoretical backing and we describe where it is best used. This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners.
The Evidential regression network (ENet) estimates a continuous target and its predictive uncertainty without costly Bayesian model averaging. However, it is possible that the target is inaccurately predicted due to the gradient shrinkage problem of the original loss function of the ENet, the negative log marginal likelihood (NLL) loss. In this paper, the objective is to improve the prediction accuracy of the ENet while maintaining its efficient uncertainty estimation by resolving the gradient shrinkage problem. A multi-task learning (MTL) framework, referred to as MT-ENet, is proposed to accomplish this aim. In the MTL, we define the Lipschitz modified mean squared error (MSE) loss function as another loss and add it to the existing NLL loss. The Lipschitz modified MSE loss is designed to mitigate the gradient conflict with the NLL loss by dynamically adjusting its Lipschitz constant. By doing so, the Lipschitz MSE loss does not disturb the uncertainty estimation of the NLL loss. The MT-ENet enhances the predictive accuracy of the ENet without losing uncertainty estimation capability on the synthetic dataset and real-world benchmarks, including drug-target affinity (DTA) regression. Furthermore, the MT-ENet shows remarkable calibration and out-of-distribution detection capability on the DTA benchmarks.
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.