Deep learning based deformable medical image registration methods have emerged as a strong alternative for classical iterative registration methods. Since image registration is in general an ill-defined problem, the usefulness of inductive biases of symmetricity, inverse consistency and topology preservation has been widely accepted by the research community. However, while many deep learning registration methods enforce these properties via loss functions, no prior deep learning registration method fulfills all of these properties by construct. Here, we propose a novel multi-resolution registration architecture which is by construct symmetric, inverse consistent, and topology preserving. We also develop an implicit layer for memory efficient inversion of the deformation fields. The proposed method achieves state-of-the-art registration accuracy on two datasets.
We study parameterisation-independent closed planar curve matching as a Bayesian inverse problem. The motion of the curve is modelled via a curve on the diffeomorphism group acting on the ambient space, leading to a large deformation diffeomorphic metric mapping (LDDMM) functional penalising the kinetic energy of the deformation. We solve Hamilton's equations for the curve matching problem using the Wu-Xu element [S. Wu, J. Xu, Nonconforming finite element spaces for $2m^\text{th}$ order partial differential equations on $\mathbb{R}^n$ simplicial grids when $m=n+1$, Mathematics of Computation 88 (316) (2019) 531-551] which provides mesh-independent Lipschitz constants for the forward motion of the curve, and solve the inverse problem for the momentum using Bayesian inversion. Since this element is not affine-equivalent we provide a pullback theory which expedites the implementation and efficiency of the forward map. We adopt ensemble Kalman inversion using a negative Sobolev norm mismatch penalty to measure the discrepancy between the target and the ensemble mean shape. We provide several numerical examples to validate the approach.
The spectral decomposition of a symmetric, second-order tensor is widely adopted in many fields of Computational Mechanics. As an example, in elasto-plasticity under large strain and rotations, given the Cauchy deformation tensor, it is a fundamental step to compute the logarithmic strain tensor. Recently, this approach has been also adopted in small-strain isotropic plasticity to reconstruct the stress tensor as a function of its eigenvalues, allowing the formulation of predictor-corrector return algorithms in the invariants space. These algorithms not only reduce the number of unknowns at the constitutive level, but also allow the correct handling of stress states in which the plastic normals are undefined, thus ensuring a better convergence with respect to the standard approach. While the eigenvalues of a symmetric, second-order tensor can be simply computed as a function of the tensor invariants, the computation of its eigenbasis can be more difficult, especially when two or more eigenvalues are coincident. Moreover, when a Newton-Rhapson algorithm is adopted to solve nonlinear problems in Computational Mechanics, also the tensorial derivatives of the eigenbasis, whose computation is still more complicate, are required to assemble the tangent matrix. A simple and comprehensive method is presented, which can be adopted to compute a closed form representation of a second-order tensor, as well as their derivatives with respect to the tensor itself, allowing a simpler implementation of spectral decomposition of a tensor in Computational Mechanics applications.
Scene understanding has made tremendous progress over the past few years, as data acquisition systems are now providing an increasing amount of data of various modalities (point cloud, depth, RGB...). However, this improvement comes at a large cost on computation resources and data annotation requirements. To analyze geometric information and images jointly, many approaches rely on both a 2D loss and 3D loss, requiring not only 2D per pixel-labels but also 3D per-point labels. However, obtaining a 3D groundtruth is challenging, time-consuming and error-prone. In this paper, we show that image segmentation can benefit from 3D geometric information without requiring a 3D groundtruth, by training the geometric feature extraction and the 2D segmentation network jointly, in an end-to-end fashion, using only the 2D segmentation loss. Our method starts by extracting a map of 3D features directly from a provided point cloud by using a lightweight 3D neural network. The 3D feature map, merged with the RGB image, is then used as an input to a classical image segmentation network. Our method can be applied to many 2D segmentation networks, improving significantly their performance with only a marginal network weight increase and light input dataset requirements, since no 3D groundtruth is required.
We show the sup-norm convergence of deep neural network estimators with a novel adversarial training scheme. For the nonparametric regression problem, it has been shown that an estimator using deep neural networks can achieve better performances in the sense of the $L2$-norm. In contrast, it is difficult for the neural estimator with least-squares to achieve the sup-norm convergence, due to the deep structure of neural network models. In this study, we develop an adversarial training scheme and investigate the sup-norm convergence of deep neural network estimators. First, we find that ordinary adversarial training makes neural estimators inconsistent. Second, we show that a deep neural network estimator achieves the optimal rate in the sup-norm sense by the proposed adversarial training with correction. We extend our adversarial training to general setups of a loss function and a data-generating function. Our experiments support the theoretical findings.
This paper introduces a smooth method for (structured) sparsity in $\ell_q$ and $\ell_{p,q}$ regularized optimization problems. Optimization of these non-smooth and possibly non-convex problems typically relies on specialized procedures. In contrast, our general framework is compatible with prevalent first-order optimization methods like Stochastic Gradient Descent and accelerated variants without any required modifications. This is accomplished through a smooth optimization transfer, comprising an overparametrization of selected model parameters using Hadamard products and a change of penalties. In the overparametrized problem, smooth and convex $\ell_2$ regularization of the surrogate parameters induces non-smooth and non-convex $\ell_q$ or $\ell_{p,q}$ regularization in the original parametrization. We show that our approach yields not only matching global minima but also equivalent local minima. This is particularly useful in non-convex sparse regularization, where finding global minima is NP-hard and local minima are known to generalize well. We provide a comprehensive overview consolidating various literature strands on sparsity-inducing parametrizations and propose meaningful extensions to existing approaches. The feasibility of our approach is evaluated through numerical experiments, which demonstrate that its performance is on par with or surpasses commonly used implementations of convex and non-convex regularization methods.
We present a novel geometric deep learning layer that leverages the varifold gradient (VariGrad) to compute feature vector representations of 3D geometric data. These feature vectors can be used in a variety of downstream learning tasks such as classification, registration, and shape reconstruction. Our model's use of parameterization independent varifold representations of geometric data allows our model to be both trained and tested on data independent of the given sampling or parameterization. We demonstrate the efficiency, generalizability, and robustness to resampling demonstrated by the proposed VariGrad layer.
Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization. A number of approaches have been proposed to address these drawbacks such as extra regularization, adversarial weights perturbation, and training with more data over the last few years. However, the robust generalization improvement is yet far from satisfactory. In this paper, we approach this challenge with a brand new perspective -- refining historical optimization trajectories. We propose a new method named \textbf{Weighted Optimization Trajectories (WOT)} that leverages the optimization trajectories of adversarial training in time. We have conducted extensive experiments to demonstrate the effectiveness of WOT under various state-of-the-art adversarial attacks. Our results show that WOT integrates seamlessly with the existing adversarial training methods and consistently overcomes the robust overfitting issue, resulting in better adversarial robustness. For example, WOT boosts the robust accuracy of AT-PGD under AA-$L_{\infty}$ attack by 1.53\% $\sim$ 6.11\% and meanwhile increases the clean accuracy by 0.55\%$\sim$5.47\% across SVHN, CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.
Hyperspectral images (HSI) with abundant spectral information reflected materials property usually perform low spatial resolution due to the hardware limits. Meanwhile, multispectral images (MSI), e.g., RGB images, have a high spatial resolution but deficient spectral signatures. Hyperspectral and multispectral image fusion can be cost-effective and efficient for acquiring both high spatial resolution and high spectral resolution images. Many of the conventional HSI and MSI fusion algorithms rely on known spatial degradation parameters, i.e., point spread function, spectral degradation parameters, spectral response function, or both of them. Another class of deep learning-based models relies on the ground truth of high spatial resolution HSI and needs large amounts of paired training images when working in a supervised manner. Both of these models are limited in practical fusion scenarios. In this paper, we propose an unsupervised HSI and MSI fusion model based on the cycle consistency, called CycFusion. The CycFusion learns the domain transformation between low spatial resolution HSI (LrHSI) and high spatial resolution MSI (HrMSI), and the desired high spatial resolution HSI (HrHSI) are considered to be intermediate feature maps in the transformation networks. The CycFusion can be trained with the objective functions of marginal matching in single transform and cycle consistency in double transforms. Moreover, the estimated PSF and SRF are embedded in the model as the pre-training weights, which further enhances the practicality of our proposed model. Experiments conducted on several datasets show that our proposed model outperforms all compared unsupervised fusion methods. The codes of this paper will be available at this address: https: //github.com/shuaikaishi/CycFusion for reproducibility.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.
Retrieving object instances among cluttered scenes efficiently requires compact yet comprehensive regional image representations. Intuitively, object semantics can help build the index that focuses on the most relevant regions. However, due to the lack of bounding-box datasets for objects of interest among retrieval benchmarks, most recent work on regional representations has focused on either uniform or class-agnostic region selection. In this paper, we first fill the void by providing a new dataset of landmark bounding boxes, based on the Google Landmarks dataset, that includes $94k$ images with manually curated boxes from $15k$ unique landmarks. Then, we demonstrate how a trained landmark detector, using our new dataset, can be leveraged to index image regions and improve retrieval accuracy while being much more efficient than existing regional methods. In addition, we further introduce a novel regional aggregated selective match kernel (R-ASMK) to effectively combine information from detected regions into an improved holistic image representation. R-ASMK boosts image retrieval accuracy substantially at no additional memory cost, while even outperforming systems that index image regions independently. Our complete image retrieval system improves upon the previous state-of-the-art by significant margins on the Revisited Oxford and Paris datasets. Code and data will be released.