There are plenty of applications and analysis for time-independent elliptic partial differential equations in the literature hinting at the benefits of overtesting by using more collocation conditions than the number of basis functions. Overtesting not only reduces the problem size, but is also known to be necessary for stability and convergence of widely used unsymmetric Kansa-type strong-form collocation methods. We consider kernel-based meshfree methods, which is a method of lines with collocation and overtesting spatially, for solving parabolic partial differential equations on surfaces without parametrization. In this paper, we extend the time-independent convergence theories for overtesting techniques to the parabolic equations on smooth and closed surfaces.
This paper studies structured node classification on graphs, where the predictions should consider dependencies between the node labels. In particular, we focus on solving the problem for partially labeled graphs where it is essential to incorporate the information in the known label for predicting the unknown labels. To address this issue, we propose a novel framework leveraging the diffusion probabilistic model for structured node classification (DPM-SNC). At the heart of our framework is the extraordinary capability of DPM-SNC to (a) learn a joint distribution over the labels with an expressive reverse diffusion process and (b) make predictions conditioned on the known labels utilizing manifold-constrained sampling. Since the DPMs lack training algorithms for partially labeled data, we design a novel training algorithm to apply DPMs, maximizing a new variational lower bound. We also theoretically analyze how DPMs benefit node classification by enhancing the expressive power of GNNs based on proposing AGG-WL, which is strictly more powerful than the classic 1-WL test. We extensively verify the superiority of our DPM-SNC in diverse scenarios, which include not only the transductive setting on partially labeled graphs but also the inductive setting and unlabeled graphs.
Despite the growing interest in parallel-in-time methods as an approach to accelerate numerical simulations in atmospheric modelling, improving their stability and convergence remains a substantial challenge for their application to operational models. In this work, we study the temporal parallelization of the shallow water equations on the rotating sphere combined with time-stepping schemes commonly used in atmospheric modelling due to their stability properties, namely an Eulerian implicit-explicit (IMEX) method and a semi-Lagrangian semi-implicit method (SL-SI-SETTLS). The main goal is to investigate the performance of parallel-in-time methods, namely Parareal and Multigrid Reduction in Time (MGRIT), when these well-established schemes are used on the coarse discretization levels and provide insights on how they can be improved for better performance. We begin by performing an analytical stability study of Parareal and MGRIT applied to a linearized ordinary differential equation depending on the choice of coarse scheme. Next, we perform numerical simulations of two standard tests to evaluate the stability, convergence and speedup provided by the parallel-in-time methods compared to a fine reference solution computed serially. We also conduct a detailed investigation on the influence of artificial viscosity and hyperviscosity approaches, applied on the coarse discretization levels, on the performance of the temporal parallelization. Both the analytical stability study and the numerical simulations indicate a poorer stability behaviour when SL-SI-SETTLS is used on the coarse levels, compared to the IMEX scheme. With the IMEX scheme, a better trade-off between convergence, stability and speedup compared to serial simulations can be obtained under proper parameters and artificial viscosity choices, opening the perspective of the potential competitiveness for realistic models.
Diffusion models, which convert noise into new data instances by learning to reverse a Markov diffusion process, have become a cornerstone in contemporary generative modeling. While their practical power has now been widely recognized, the theoretical underpinnings remain far from mature. In this work, we develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models in discrete time, assuming access to reliable estimates of the (Stein) score functions. For a popular deterministic sampler (based on the probability flow ODE), we establish a convergence rate proportional to $1/T$ (with $T$ the total number of steps), improving upon past results; for another mainstream stochastic sampler (i.e., a type of the denoising diffusion probabilistic model (DDPM)), we derive a convergence rate proportional to $1/\sqrt{T}$, matching the state-of-the-art theory. Our theory imposes only minimal assumptions on the target data distribution (e.g., no smoothness assumption is imposed), and is developed based on an elementary yet versatile non-asymptotic approach without resorting to toolboxes for SDEs and ODEs. Further, we design two accelerated variants, improving the convergence to $1/T^2$ for the ODE-based sampler and $1/T$ for the DDPM-type sampler, which might be of independent theoretical and empirical interest.
Since optimization on Riemannian manifolds relies on the chosen metric, it is appealing to know that how the performance of a Riemannian optimization method varies with different metrics and how to exquisitely construct a metric such that a method can be accelerated. To this end, we propose a general framework for optimization problems on product manifolds where the search space is endowed with a preconditioned metric, and we develop the Riemannian gradient descent and Riemannian conjugate gradient methods under this metric. Specifically, the metric is constructed by an operator that aims to approximate the diagonal blocks of the Riemannian Hessian of the cost function, which has a preconditioning effect. We explain the relationship between the proposed methods and the variable metric methods, and show that various existing methods, e.g., the Riemannian Gauss--Newton method, can be interpreted by the proposed framework with specific metrics. In addition, we tailor new preconditioned metrics and adapt the proposed Riemannian methods to the canonical correlation analysis and the truncated singular value decomposition problems, and we propose the Gauss--Newton method to solve the tensor ring completion problem. Numerical results among these applications verify that a delicate metric does accelerate the Riemannian optimization methods.
Thepaperprovesconvergenceofone-levelandmultilevelunsymmetriccollocationforsecondorderelliptic boundary value problems on the bounded domains. By using Schaback's linear discretization theory,L2 errors are obtained based on the kernel-based trial spaces generated by the compactly supported radial basis functions. For the one-level unsymmetric collocation case, we obtain convergence when the testing discretization is finer than the trial discretization. The convergence rates depend on the regularity of the solution, the smoothness of the computing domain, and the approximation of scaled kernel-based spaces. The multilevel process is implemented by employing successive refinement scattered data sets and scaled compactly supported radial basis functions with varying support radii. Convergence of multilevel collocation is further proved based on the theoretical results of one-level unsymmetric collocation. In addition to having the same dependencies as the one-level collocation, the convergence rates of multilevel unsymmetric collocation especially depends on the increasing rules of scattered data and the selection of scaling parameters.
Classic learning theory suggests that proper regularization is the key to good generalization and robustness. In classification, current training schemes only target the complexity of the classifier itself, which can be misleading and ineffective. Instead, we advocate directly measuring the complexity of the decision boundary. Existing literature is limited in this area with few well-established definitions of boundary complexity. As a proof of concept, we start by analyzing ReLU neural networks, whose boundary complexity can be conveniently characterized by the number of affine pieces. With the help of tropical geometry, we develop a novel method that can explicitly count the exact number of boundary pieces, and as a by-product, the exact number of total affine pieces. Numerical experiments are conducted and distinctive properties of our boundary complexity are uncovered. First, the boundary piece count appears largely independent of other measures, e.g., total piece count, and $l_2$ norm of weights, during the training process. Second, the boundary piece count is negatively correlated with robustness, where popular robust training techniques, e.g., adversarial training or random noise injection, are found to reduce the number of boundary pieces.
In this paper, we propose and analyze the least squares finite element methods for the linear elasticity interface problem in the stress-displacement system on unfitted meshes. We consider the cases that the interface is $C^2$ or polygonal, and the exact solution $(\sigma,u)$ belongs to $H^s(div; \Omega_0 \cup \Omega_1) \times $H^{1+s}(\Omega_0 \cup \Omega_1)$ with $s > 1/2$. Two types of least squares functionals are defined to seek the numerical solution. The first is defined by simply applying the $L^2$ norm least squares principle, and requires the condition $s \geq 1$. The second is defined with a discrete minus norm, which is related to the inner product in $H^{-1/2}(\Gamma)$. The use of this discrete minus norm results in a method of optimal convergence rates and allows the exact solution has the regularity of any $s > 1/2$. The stability near the interface for both methods is guaranteed by the ghost penalty bilinear forms and we can derive the robust condition number estimates. The convergence rates under $L^2$ norm and the energy norm are derived for both methods. We illustrate the accuracy and the robustness of the proposed methods by a series of numerical experiments for test problems in two and three dimensions.
We consider blind ptychography, an imaging technique which aims to reconstruct an object of interest from a set of its diffraction patterns, each obtained by a local illumination. As the distribution of the light within the illuminated region, called the window, is unknown, it also has to be estimated as well. For the recovery, we consider gradient and stochastic gradient descent methods for the minimization of amplitude-base squared loss. In particular, this includes extended Ptychographic Iterative Engine as a special case of stochastic gradient descent. We show that all methods converge to a critical point at a sublinear rate with a proper choice of step sizes. We also discuss possibilities for larger step sizes.
Collections of probability distributions arise in a variety of applications ranging from user activity pattern analysis to brain connectomics. In practice these distributions can be defined over diverse domain types including finite intervals, circles, cylinders, spheres, other manifolds, and graphs. This paper introduces an approach for detecting differences between two collections of distributions over such general domains. To this end, we propose the intrinsic slicing construction that yields a novel class of Wasserstein distances on manifolds and graphs. These distances are Hilbert embeddable, allowing us to reduce the distribution collection comparison problem to a more familiar mean testing problem in a Hilbert space. We provide two testing procedures one based on resampling and another on combining p-values from coordinate-wise tests. Our experiments in various synthetic and real data settings show that the resulting tests are powerful and the p-values are well-calibrated.
We study the Electrical Impedance Tomography Bayesian inverse problem for recovering the conductivity given noisy measurements of the voltage on some boundary surface electrodes. The uncertain conductivity depends linearly on a countable number of uniformly distributed random parameters in a compact interval, with the coefficient functions in the linear expansion decaying at an algebraic rate. We analyze the surrogate Markov Chain Monte Carlo (MCMC) approach for sampling the posterior probability measure, where the multivariate sparse adaptive interpolation, with interpolating points chosen according to a lower index set, is used for approximating the forward map. The forward equation is approximated once before running the MCMC for all the realizations, using interpolation on the finite element (FE) approximation at the parametric interpolating points. When evaluation of the solution is needed for a realization, we only need to compute a polynomial, thus cutting drastically the computation time. We contribute a rigorous error estimate for the MCMC convergence. In particular, we show that there is a nested sequence of interpolating lower index sets for which we can derive an interpolation error estimate in terms of the cardinality of these sets, uniformly for all the parameter realizations. An explicit convergence rate for the MCMC sampling of the posterior expectation of the conductivity is rigorously derived, in terms of the interpolating point number, the accuracy of the FE approximation of the forward equation, and the MCMC sample number. We perform numerical experiments using an adaptive greedy approach to construct the sets of interpolation points. We show the benefits of this approach over the simple MCMC where the forward equation is repeatedly solved for all the samples and the non-adaptive surrogate MCMC with an isotropic index set treating all the random parameters equally.