When dealing with difficult inverse problems such as inverse rendering, using Monte Carlo estimated gradients to optimise parameters can slow down convergence due to variance. Averaging many gradient samples in each iteration reduces this variance trivially. However, for problems that require thousands of optimisation iterations, the computational cost of this approach rises quickly. We derive a theoretical framework for interleaving sampling and optimisation. We update and reuse past samples with low-variance finite-difference estimators that describe the change in the estimated gradients between each iteration. By combining proportional and finite-difference samples, we continuously reduce the variance of our novel gradient meta-estimators throughout the optimisation process. We investigate how our estimator interlinks with Adam and derive a stable combination. We implement our method for inverse path tracing and demonstrate how our estimator speeds up convergence on difficult optimisation tasks.
Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) directly apply square convolution to k-space data without considering the inherent properties of k-space sampling, limiting k-space learning efficiency and image reconstruction quality. In this work, we propose an encoding enhanced (EN2) complex CNN for highly undersampled pulmonary MRI reconstruction. EN2 employs convolution along either the frequency or phase-encoding direction, resembling the mechanisms of k-space sampling, to maximize the utilization of the encoding correlation and integrity within a row or column of k-space. We also employ complex convolution to learn rich representations from the complex k-space data. In addition, we develop a feature-strengthened modularized unit to further boost the reconstruction performance. Experiments demonstrate that our approach can accurately reconstruct hyperpolarized 129Xe and 1H lung MRI from 6-fold undersampled k-space data and provide lung function measurements with minimal biases compared with fully-sampled image. These results demonstrate the effectiveness of the proposed algorithmic components and indicate that the proposed approach could be used for accelerated pulmonary MRI in research and clinical lung disease patient care.
We consider problems related to initial meshing and adaptive mesh refinement for the electromagnetic simulation of various structures. The quality of the initial mesh and the performance of the adaptive refinement are of great importance for the finite element solution of the Maxwell equations, since they directly affect the accuracy and the computational time. In this paper, we describe the complete meshing workflow, which allows the simulation of arbitrary structures. Test simulations confirm that the presented approach allows to reach the quality of the industrial simulation software.
We define a new class of predicates called equilevel predicates on a distributive lattice which eases the analysis of parallel algorithms. Many combinatorial problems such as the vertex cover problem, the bipartite matching problem, and the minimum spanning tree problem can be modeled as detecting an equilevel predicate. The problem of detecting an equilevel problem is NP-complete, but equilevel predicates with the helpful property can be detected in polynomial time in an online manner. An equilevel predicate has the helpful property with a polynomial time algorithm if the algorithm can return a nonempty set of indices such that advancing on any of them can be used to detect the predicate. Furthermore, the refined independently helpful property allows online parallel detection of such predicates in NC. When the independently helpful property holds, advancing on all the specified indices in parallel can be used to detect the predicate in polylogarithmic time. We also define a special class of equilevel predicates called solitary predicates. Unless NP = RP, this class of predicate also does not admit efficient algorithms. Earlier work has shown that solitary predicates with the efficient advancement can be detected in polynomial time. We introduce two properties called the antimonotone advancement and the efficient rejection which yield the detection of solitary predicates in NC. Finally, we identify the minimum spanning tree, the shortest path, and the conjunctive predicate detection as problems satisfying such properties, giving alternative certifications of their NC memberships as a result.
Registering clothes from 4D scans with vertex-accurate correspondence is challenging, yet important for dynamic appearance modeling and physics parameter estimation from real-world data. However, previous methods either rely on texture information, which is not always reliable, or achieve only coarse-level alignment. In this work, we present a novel approach to enabling accurate surface registration of texture-less clothes with large deformation. Our key idea is to effectively leverage a shape prior learned from pre-captured clothing using diffusion models. We also propose a multi-stage guidance scheme based on learned functional maps, which stabilizes registration for large-scale deformation even when they vary significantly from training data. Using high-fidelity real captured clothes, our experiments show that the proposed approach based on diffusion models generalizes better than surface registration with VAE or PCA-based priors, outperforming both optimization-based and learning-based non-rigid registration methods for both interpolation and extrapolation tests.
The variational quantum eigensolver (VQE) is a hybrid algorithm that has the potential to provide a quantum advantage in practical chemistry problems that are currently intractable on classical computers. VQE trains parameterized quantum circuits using a classical optimizer to approximate the eigenvalues and eigenstates of a given Hamiltonian. However, VQE faces challenges in task-specific design and machine-specific architecture, particularly when running on noisy quantum devices. This can have a negative impact on its trainability, accuracy, and efficiency, resulting in noisy quantum data. We propose variational denoising, an unsupervised learning method that employs a parameterized quantum neural network to improve the solution of VQE by learning from noisy VQE outputs. Our approach can significantly decrease energy estimation errors and increase fidelities with ground states compared to noisy input data for the $\text{H}_2$, LiH, and $\text{BeH}_2$ molecular Hamiltonians, and the transverse field Ising model. Surprisingly, it only requires noisy data for training. Variational denoising can be integrated into quantum hardware, increasing its versatility as an end-to-end quantum processing for quantum data.
We initiate the study of sub-linear sketching and streaming techniques for estimating the output size of common dictionary compressors such as Lempel-Ziv '77, the run-length Burrows-Wheeler transform, and grammar compression. To this end, we focus on a measure that has recently gained much attention in the information-theoretic community and which approximates up to a polylogarithmic multiplicative factor the output sizes of those compressors: the normalized substring complexity function $\delta$. We present a data sketch of $O(\epsilon^{-3}\log n + \epsilon^{-1}\log^2 n)$ words that allows computing a multiplicative $(1\pm \epsilon)$-approximation of $\delta$ with high probability, where $n$ is the string length. The sketches of two strings $S_1,S_2$ can be merged in $O(\epsilon^{-1}\log^2 n)$ time to yield the sketch of $\{S_1,S_2\}$, speeding up by orders of magnitude tasks such as the computation of all-pairs \emph{Normalized Compression Distances} (NCD). If random access is available on the input, our sketch can be updated in $O(\epsilon^{-1}\log^2 n)$ time for each character right-extension of the string. This yields a polylogarithmic-space algorithm for approximating $\delta$, improving exponentially over the working space of the state-of-the-art algorithms running in nearly-linear time. Motivated by the fact that random access is not always available on the input data, we then present a streaming algorithm computing our sketch in $O(\sqrt n \cdot \log n)$ working space and $O(\epsilon^{-1}\log^2 n)$ worst-case delay per character. We show that an implementation of our streaming algorithm can estimate {\delta} on a dataset of 189GB with a throughput of 203MB per minute while using only 5MB of RAM, and that our sketch speeds up the computation of all-pairs NCD distances by one order of magnitude, with applications to phylogenetic tree reconstruction.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Sequential recommendation as an emerging topic has attracted increasing attention due to its important practical significance. Models based on deep learning and attention mechanism have achieved good performance in sequential recommendation. Recently, the generative models based on Variational Autoencoder (VAE) have shown the unique advantage in collaborative filtering. In particular, the sequential VAE model as a recurrent version of VAE can effectively capture temporal dependencies among items in user sequence and perform sequential recommendation. However, VAE-based models suffer from a common limitation that the representational ability of the obtained approximate posterior distribution is limited, resulting in lower quality of generated samples. This is especially true for generating sequences. To solve the above problem, in this work, we propose a novel method called Adversarial and Contrastive Variational Autoencoder (ACVAE) for sequential recommendation. Specifically, we first introduce the adversarial training for sequence generation under the Adversarial Variational Bayes (AVB) framework, which enables our model to generate high-quality latent variables. Then, we employ the contrastive loss. The latent variables will be able to learn more personalized and salient characteristics by minimizing the contrastive loss. Besides, when encoding the sequence, we apply a recurrent and convolutional structure to capture global and local relationships in the sequence. Finally, we conduct extensive experiments on four real-world datasets. The experimental results show that our proposed ACVAE model outperforms other state-of-the-art methods.
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.
Intent classification and slot filling are two essential tasks for natural language understanding. They often suffer from small-scale human-labeled training data, resulting in poor generalization capability, especially for rare words. Recently a new language representation model, BERT (Bidirectional Encoder Representations from Transformers), facilitates pre-training deep bidirectional representations on large-scale unlabeled corpora, and has created state-of-the-art models for a wide variety of natural language processing tasks after simple fine-tuning. However, there has not been much effort on exploring BERT for natural language understanding. In this work, we propose a joint intent classification and slot filling model based on BERT. Experimental results demonstrate that our proposed model achieves significant improvement on intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy on several public benchmark datasets, compared to the attention-based recurrent neural network models and slot-gated models.