When introducing physics-constrained deep learning solutions to the volumetric super-resolution of scientific data, the training is challenging to converge and always time-consuming. We propose a new hierarchical sampling method based on octree to solve these difficulties. In our approach, scientific data is preprocessed before training, and a hierarchical octree-based data structure is built to guide sampling on the latent context grid. Each leaf node in the octree corresponds to an indivisible subblock of the volumetric data. The dimensions of the subblocks are different, making the number of sample points in each randomly cropped training data block to be adaptive. We reconstruct the octree at intervals according to loss distribution to perform the multi-stage training. With the Rayleigh-B\'enard convection problem, we deploy our method to state-of-the-art models. We constructed adequate experiments to evaluate the training performance and model accuracy of our method. Experiments indicate that our sampling optimization improves the convergence performance of physics-constrained deep learning super-resolution solutions. Furthermore, the sample points and training time are significantly reduced with no drop in model accuracy. We also test our method in training tasks of other deep neural networks, and the results show our sampling optimization has extensive effectiveness and applicability. The code is publicly available at //github.com/xinjiewang/octree-based_sampling.
Using techniques developed recently in the field of compressed sensing we prove new upper bounds for general (nonlinear) sampling numbers of (quasi-)Banach smoothness spaces in $L^2$. In particular, we show that in relevant cases such as mixed and isotropic weighted Wiener classes or Sobolev spaces with mixed smoothness, sampling numbers in $L^2$ can be upper bounded by best $n$-term trigonometric widths in $L^\infty$. We describe a recovery procedure from $m$ function values based on $\ell^1$-minimization (basis pursuit denoising). With this method, a significant gain in the rate of convergence compared to recently developed linear recovery methods is achieved. In this deterministic worst-case setting we see an additional speed-up of $m^{-1/2}$ (up to log factors) compared to linear methods in case of weighted Wiener spaces. For their quasi-Banach counterparts even arbitrary polynomial speed-up is possible. Surprisingly, our approach allows to recover mixed smoothness Sobolev functions belonging to $S^r_pW(\mathbb{T}^d)$ on the $d$-torus with a logarithmically better rate of convergence than any linear method can achieve when $1 < p < 2$ and $d$ is large. This effect is not present for isotropic Sobolev spaces.
Heatmap-based methods have become the mainstream method for pose estimation due to their superior performance. However, heatmap-based approaches suffer from significant quantization errors with downscale heatmaps, which result in limited performance and the detrimental effects of intermediate supervision. Previous heatmap-based methods relied heavily on additional post-processing to mitigate quantization errors. Some heatmap-based approaches improve the resolution of feature maps by using multiple costly upsampling layers to improve localization precision. To solve the above issues, we creatively view the backbone network as a degradation process and thus reformulate the heatmap prediction as a Super-Resolution (SR) task. We first propose the SR head, which predicts heatmaps with a spatial resolution higher than the input feature maps (or even consistent with the input image) by super-resolution, to effectively reduce the quantization error and the dependence on further post-processing. Besides, we propose SRPose to gradually recover the HR heatmaps from LR heatmaps and degraded features in a coarse-to-fine manner. To reduce the training difficulty of HR heatmaps, SRPose applies SR heads to supervise the intermediate features in each stage. In addition, the SR head is a lightweight and generic head that applies to top-down and bottom-up methods. Extensive experiments on the COCO, MPII, and CrowdPose datasets show that SRPose outperforms the corresponding heatmap-based approaches. The code and models are available at //github.com/haonanwang0522/SRPose.
Computed Tomography (CT) is a prominent example of Imaging Inverse Problem (IIP), highlighting the unrivalled performances of data-driven methods in degraded measurements setups like sparse X-ray projections. Although a significant proportion of deep learning approaches benefit from large supervised datasets to directly map experimental measurements to medical scans, they cannot generalize to unknown acquisition setups. In contrast, fully unsupervised techniques, most notably using score-based generative models, have recently demonstrated similar or better performances compared to supervised approaches to solve IIPs while being flexible at test time regarding the imaging setup. However, their use cases are limited by two factors: (a) they need considerable amounts of training data to have good generalization properties and (b) they require a backward operator, like Filtered-Back-Projection in the case of CT, to condition the learned prior distribution of medical scans to experimental measurements. To overcome these issues, we propose an unsupervised conditional approach to the Generative Latent Optimization framework (cGLO), in which the parameters of a decoder network are initialized on an unsupervised dataset. The decoder is then used for reconstruction purposes, by performing Generative Latent Optimization with a loss function directly comparing simulated measurements from proposed reconstructions to experimental measurements. The resulting approach, tested on sparse-view CT using multiple training dataset sizes, demonstrates better reconstruction quality compared to state-of-the-art score-based strategies in most data regimes and shows an increasing performance advantage for smaller training datasets and reduced projection angles. Furthermore, cGLO does not require any backward operator and could expand use cases even to non-linear IIPs.
This paper proposes a non-centered parameterization based infinite-dimensional mean-field variational inference (NCP-iMFVI) approach for solving the hierarchical Bayesian inverse problems. This method can generate available estimates from the approximated posterior distribution efficiently. To avoid the mutually singular obstacle that occurred in the infinite-dimensional hierarchical approach, we propose a rigorous theory of the non-centered variational Bayesian approach. Since the non-centered parameterization weakens the connection between the parameter and the hyper-parameter, we can introduce the hyper-parameter to all terms of the eigendecomposition of the prior covariance operator. We also show the relationships between the NCP-iMFVI and infinite-dimensional hierarchical approaches with centered parameterization. The proposed algorithm is applied to three inverse problems governed by the simple smooth equation, the Helmholtz equation, and the steady-state Darcy flow equation. Numerical results confirm our theoretical findings, illustrate the efficiency of solving the iMFVI problem formulated by large-scale linear and nonlinear statistical inverse problems, and verify the mesh-independent property.
Recently developed reduced-order modeling techniques aim to approximate nonlinear dynamical systems on low-dimensional manifolds learned from data. This is an effective approach for modeling dynamics in a post-transient regime where the effects of initial conditions and other disturbances have decayed. However, modeling transient dynamics near an underlying manifold, as needed for real-time control and forecasting applications, is complicated by the effects of fast dynamics and nonnormal sensitivity mechanisms. To begin to address these issues, we introduce a parametric class of nonlinear projections described by constrained autoencoder neural networks in which both the manifold and the projection fibers are learned from data. Our architecture uses invertible activation functions and biorthogonal weight matrices to ensure that the encoder is a left inverse of the decoder. We also introduce new dynamics-aware cost functions that promote learning of oblique projection fibers that account for fast dynamics and nonnormality. To demonstrate these methods and the specific challenges they address, we provide a detailed case study of a three-state model of vortex shedding in the wake of a bluff body immersed in a fluid, which has a two-dimensional slow manifold that can be computed analytically. In anticipation of future applications to high-dimensional systems, we also propose several techniques for constructing computationally efficient reduced-order models using our proposed nonlinear projection framework. This includes a novel sparsity-promoting penalty for the encoder that avoids detrimental weight matrix shrinkage via computation on the Grassmann manifold.
Neural operators have emerged as a powerful tool for learning the mapping between infinite-dimensional parameter and solution spaces of partial differential equations (PDEs). In this work, we focus on multiscale PDEs that have important applications such as reservoir modeling and turbulence prediction. We demonstrate that for such PDEs, the spectral bias towards low-frequency components presents a significant challenge for existing neural operators. To address this challenge, we propose a hierarchical attention neural operator (HANO) inspired by the hierarchical matrix approach. HANO features a scale-adaptive interaction range and self-attentions over a hierarchy of levels, enabling nested feature computation with controllable linear cost and encoding/decoding of multiscale solution space. We also incorporate an empirical $H^1$ loss function to enhance the learning of high-frequency components. Our numerical experiments demonstrate that HANO outperforms state-of-the-art (SOTA) methods for representative multiscale problems.
Over the past few years, we have seen fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks. At the same time, the amount of data collected in a wide array of scientific domains is dramatically increasing in both size and complexity. Taken together, this suggests many exciting opportunities for deep learning applications in scientific settings. But a significant challenge to this is simply knowing where to start. The sheer breadth and diversity of different deep learning techniques makes it difficult to determine what scientific problems might be most amenable to these methods, or which specific combination of methods might offer the most promising first approach. In this survey, we focus on addressing this central issue, providing an overview of many widely used deep learning models, spanning visual, sequential and graph structured data, associated tasks and different training methods, along with techniques to use deep learning with less data and better interpret these complex models --- two central considerations for many scientific use cases. We also include overviews of the full design process, implementation tips, and links to a plethora of tutorials, research summaries and open-sourced deep learning pipelines and pretrained models, developed by the community. We hope that this survey will help accelerate the use of deep learning across different scientific domains.
Convolutional neural networks (CNNs) have shown dramatic improvements in single image super-resolution (SISR) by using large-scale external samples. Despite their remarkable performance based on the external dataset, they cannot exploit internal information within a specific image. Another problem is that they are applicable only to the specific condition of data that they are supervised. For instance, the low-resolution (LR) image should be a "bicubic" downsampled noise-free image from a high-resolution (HR) one. To address both issues, zero-shot super-resolution (ZSSR) has been proposed for flexible internal learning. However, they require thousands of gradient updates, i.e., long inference time. In this paper, we present Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR), which leverages ZSSR. Precisely, it is based on finding a generic initial parameter that is suitable for internal learning. Thus, we can exploit both external and internal information, where one single gradient update can yield quite considerable results. (See Figure 1). With our method, the network can quickly adapt to a given image condition. In this respect, our method can be applied to a large spectrum of image conditions within a fast adaptation process.
Graph convolutional networks (GCNs) have been successfully applied in node classification tasks of network mining. However, most of these models based on neighborhood aggregation are usually shallow and lack the "graph pooling" mechanism, which prevents the model from obtaining adequate global information. In order to increase the receptive field, we propose a novel deep Hierarchical Graph Convolutional Network (H-GCN) for semi-supervised node classification. H-GCN first repeatedly aggregates structurally similar nodes to hyper-nodes and then refines the coarsened graph to the original to restore the representation for each node. Instead of merely aggregating one- or two-hop neighborhood information, the proposed coarsening procedure enlarges the receptive field for each node, hence more global information can be learned. Comprehensive experiments conducted on public datasets demonstrate the effectiveness of the proposed method over the state-of-art methods. Notably, our model gains substantial improvements when only a few labeled samples are provided.
Medical image segmentation requires consensus ground truth segmentations to be derived from multiple expert annotations. A novel approach is proposed that obtains consensus segmentations from experts using graph cuts (GC) and semi supervised learning (SSL). Popular approaches use iterative Expectation Maximization (EM) to estimate the final annotation and quantify annotator's performance. Such techniques pose the risk of getting trapped in local minima. We propose a self consistency (SC) score to quantify annotator consistency using low level image features. SSL is used to predict missing annotations by considering global features and local image consistency. The SC score also serves as the penalty cost in a second order Markov random field (MRF) cost function optimized using graph cuts to derive the final consensus label. Graph cut obtains a global maximum without an iterative procedure. Experimental results on synthetic images, real data of Crohn's disease patients and retinal images show our final segmentation to be accurate and more consistent than competing methods.