The use of neural networks to approximate partial differential equations (PDEs) has gained significant attention in recent years. However, the approximation of PDEs with localised phenomena, e.g., sharp gradients and singularities, remains a challenge, due to ill-defined cost functions in terms of pointwise residual sampling or poor numerical integration. In this work, we introduce $h$-adaptive finite element interpolated neural networks. The method relies on the interpolation of a neural network onto a finite element space that is gradually adapted to the solution during the training process to equidistribute a posteriori error indicator. The use of adaptive interpolation is essential in preserving the non-linear approximation capabilities of the neural networks to effectively tackle problems with localised features. The training relies on a gradient-based optimisation of a loss function based on the (dual) norm of the finite element residual of the interpolated neural network. Automatic mesh adaptation (i.e., refinement and coarsening) is performed based on a posteriori error indicators till a certain level of accuracy is reached. The proposed methodology can be applied to indefinite and nonsymmetric problems. We carry out a detailed numerical analysis of the scheme and prove several a priori error estimates, depending on the expressiveness of the neural network compared to the interpolation mesh. Our numerical experiments confirm the effectiveness of the method in capturing sharp gradients and singularities for forward PDE problems, both in 2D and 3D scenarios. We also show that the proposed preconditioning strategy (i.e., using a dual residual norm of the residual as a cost function) enhances training robustness and accelerates convergence.
Selective inference is the problem of giving valid answers to statistical questions chosen in a data-driven manner. A standard solution to selective inference is simultaneous inference, which delivers valid answers to the set of all questions that could possibly have been asked. However, simultaneous inference can be unnecessarily conservative if this set includes many questions that were unlikely to be asked in the first place. We introduce a less conservative solution to selective inference that we call locally simultaneous inference, which only answers those questions that could plausibly have been asked in light of the observed data, all the while preserving rigorous type I error guarantees. For example, if the objective is to construct a confidence interval for the "winning" treatment effect in a clinical trial with multiple treatments, and it is obvious in hindsight that only one treatment had a chance to win, then our approach will return an interval that is nearly the same as the uncorrected, standard interval. Locally simultaneous inference is implemented by refining any method for simultaneous inference of interest. Under mild conditions satisfied by common confidence intervals, locally simultaneous inference strictly dominates its underlying simultaneous inference method, meaning it can never yield less statistical power but only more. Compared to conditional selective inference, which demands stronger guarantees, locally simultaneous inference is more easily applicable in nonparametric settings and is more numerically stable.
In recent years, several experimental groups have claimed demonstrations of ``quantum supremacy'' or computational quantum advantage. A notable first claim by Google Quantum AI revolves around a metric called the Linear Cross Entropy Benchmarking (Linear XEB), which has been used in multiple quantum supremacy experiments since. The complexity-theoretic hardness of spoofing Linear XEB has nevertheless been doubtful due to its dependence on the Cross-Entropy Quantum Threshold (XQUATH) conjecture put forth by Aaronson and Gunn, which has been disproven for sublinear depth circuits. In efforts on demonstrating quantum supremacy by quantum Hamiltonian simulation, a similar benchmarking metric called the System Linear Cross Entropy Score (sXES) holds firm in light of the aforementioned negative result due to its fundamental distinction with Linear XEB. Moreover, the hardness of spoofing sXES complexity-theoretically rests on the System Linear Cross-Entropy Quantum Threshold Assumption (sXQUATH), the formal relationship of which to XQUATH is unclear. Despite the promises that sXES offers for future demonstration of quantum supremacy, in this work we show that it is an unsound benchmarking metric. Particularly, we prove that sXQUATH does not hold for sublinear depth circuits and present a classical algorithm that spoofs sXES for experiments corrupted with noise larger than certain threshold.
One of the most important processing steps in any analysis pipeline is handling missing data. Traditional approaches simply delete any sample or feature with missing elements. Recent imputation methods replace missing data based on assumed relationships between observed data and the missing elements. However, there is a largely under-explored alternative amid these extremes. Partial deletion approaches remove excessive amounts of missing data, as defined by the user. They can be used in place of traditional deletion or as a precursor to imputation. In this manuscript, we expand upon the Mr. Clean suite of algorithms, focusing on the scenario where all missing data is removed. We show that the RowCol Integer Program can be recast as a Linear Program, thereby reducing runtime. Additionally, the Element Integer Program can be reformulated to reduce the number of variables and allow for high levels of parallelization. Using real-world data sets from genetic, gene expression, and single cell RNA-seq experiments we demonstrate that our algorithms outperform existing deletion techniques over several missingness values, balancing runtime and data retention. Our combined greedy algorithm retains the maximum number of valid elements in 126 of 150 scenarios and stays within 1\% of maximum in 23 of the remaining experiments. The reformulated Element IP complements the greedy algorithm when removing all missing data, boasting a reduced runtime and increase in valid elements in larger data sets, over its generic counterpart. These two programs greatly increase the amount of valid data retained over traditional deletion techniques and further improve on existing partial deletion algorithms.
Deep learning-based methods have achieved prestigious performance for magnetic resonance imaging (MRI) reconstruction, enabling fast imaging for many clinical applications. Previous methods employ convolutional networks to learn the image prior as the regularization term. In quantitative MRI, the physical model of nuclear magnetic resonance relaxometry is known, providing additional prior knowledge for image reconstruction. However, traditional reconstruction networks are limited to learning the spatial domain prior knowledge, ignoring the relaxometry prior. Therefore, we propose a relaxometry-guided quantitative MRI reconstruction framework to learn the spatial prior from data and the relaxometry prior from MRI physics. Additionally, we also evaluated the performance of two popular reconstruction backbones, namely, recurrent variational networks (RVN) and variational networks (VN) with U- Net. Experiments demonstrate that the proposed method achieves highly promising results in quantitative MRI reconstruction.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs---a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DiffPool, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DiffPool learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DiffPool yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.
This paper proposes a method to modify traditional convolutional neural networks (CNNs) into interpretable CNNs, in order to clarify knowledge representations in high conv-layers of CNNs. In an interpretable CNN, each filter in a high conv-layer represents a certain object part. We do not need any annotations of object parts or textures to supervise the learning process. Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process. Our method can be applied to different types of CNNs with different structures. The clear knowledge representation in an interpretable CNN can help people understand the logics inside a CNN, i.e., based on which patterns the CNN makes the decision. Experiments showed that filters in an interpretable CNN were more semantically meaningful than those in traditional CNNs.
Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.