Numerical modeling of localizations is a challenging task due to the evolving rough solution in which the localization paths are not predefined. Despite decades of efforts, there is a need for innovative discretization-independent computational methods to predict the evolution of localizations. In this work, an improved version of the neural network-enhanced Reproducing Kernel Particle Method (NN-RKPM) is proposed for modeling brittle fracture. In the proposed method, a background reproducing kernel (RK) approximation defined on a coarse and uniform discretization is enriched by a neural network (NN) approximation under a Partition of Unity framework. In the NN approximation, the deep neural network automatically locates and inserts regularized discontinuities in the function space. The NN-based enrichment functions are then patched together with RK approximation functions using RK as a Partition of Unity patching function. The optimum NN parameters defining the location, orientation, and displacement distribution across location together with RK approximation coefficients are obtained via the energy-based loss function minimization. To regularize the NN-RK approximation, a constraint on the spatial gradient of the parametric coordinates is imposed in the loss function. Analysis of the convergence properties shows that the solution convergence of the proposed method is guaranteed. The effectiveness of the proposed method is demonstrated by a series of numerical examples involving damage propagation and branching.
Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory demands. In addition, the efficiency of a deep ensemble is related to diversity among the ensemble members which is challenging for large, over-parameterized deep neural networks. Moreover, ensemble learning has not yet seen such widespread adoption, and it remains a challenging endeavor for self-supervised or unsupervised representation learning. Motivated by these challenges, we present a novel self-supervised training regime that leverages an ensemble of independent sub-networks, complemented by a new loss function designed to encourage diversity. Our method efficiently builds a sub-model ensemble with high diversity, leading to well-calibrated estimates of model uncertainty, all achieved with minimal computational overhead compared to traditional deep self-supervised ensembles. To evaluate the effectiveness of our approach, we conducted extensive experiments across various tasks, including in-distribution generalization, out-of-distribution detection, dataset corruption, and semi-supervised settings. The results demonstrate that our method significantly improves prediction reliability. Our approach not only achieves excellent accuracy but also enhances calibration, surpassing baseline performance across a wide range of self-supervised architectures in computer vision, natural language processing, and genomics data.
Sparse variational approximations are popular methods for scaling up inference and learning in Gaussian processes to larger datasets. For $N$ training points, exact inference has $O(N^3)$ cost; with $M \ll N$ features, state of the art sparse variational methods have $O(NM^2)$ cost. Recently, methods have been proposed using more sophisticated features; these promise $O(M^3)$ cost, with good performance in low dimensional tasks such as spatial modelling, but they only work with a very limited class of kernels, excluding some of the most commonly used. In this work, we propose integrated Fourier features, which extends these performance benefits to a very broad class of stationary covariance functions. We motivate the method and choice of parameters from a convergence analysis and empirical exploration, and show practical speedup in synthetic and real world spatial regression tasks.
It is shown that the psychometric test reliability, based on any true-score model with randomly sampled items and conditionally independent errors, converges to 1 as the test length goes to infinity, assuming some fairly general regularity conditions. The asymptotic rate of convergence is given by the Spearman-Brown formula, and for this it is not needed that the items are parallel, or latent unidimensional, or even finite dimensional. Simulations with the 2-parameter logistic item response theory model reveal that there can be a positive bias in the reliability of short multidimensional tests, meaning that applying the Spearman-Brown formula in these cases would lead to overprediction of the reliability that will result from lengthening the tests. For short unidimensional tests under the 2-parameter logistic model the reliabilities are almost unbiased, meaning that application of the Spearman-Brown formula in these cases leads to predictions that are approximately unbiased.
Getting standard multigrid to work efficiently for the high-frequency Helmholtz equation has been an open problem in applied mathematics for years. Much effort has been dedicated to finding solution methods which can use multigrid components to obtain solvers with a linear time complexity. In this work we present one among the first stand-alone multigrid solvers for the 2D Helmholtz equation using both a constant and non-constant wavenumber model problem. We use standard smoothing techniques and do not impose any restrictions on the number of grid points per wavelength on the coarse-grid. As a result we are able to obtain a full V- and W-cycle algorithm. The key features of the algorithm are the use of higher-order inter-grid transfer operators combined with a complex constant in the coarsening process. Using weighted-Jacobi smoothing, we obtain a solver which is $h-$independent and scales linearly with the wavenumber $k$. Numerical results using 1 to 5 GMRES(3) smoothing steps approach $k-$ and $h-$ independent convergence, when combined with the higher-order inter-grid transfer operators and a small or even zero complex shift. The proposed algorithm provides an important step towards the perpetuating branch of research in finding scalable solvers for challenging wave propagation problems.
The performance of three routing selection algorithms is compared in terms of bandwidth blocking probability, quality of transmission, and run time in EONs over the C+L band. The min-max frequency algorithm shows the best performance on all metrics.
Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. However, the workflows, computational patterns, communication patterns, and optimization techniques of distributed GNN training remain preliminarily understood. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.
Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.
Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.