We study the computational scalability of a Gaussian process (GP) framework for solving general nonlinear partial differential equations (PDEs). This framework transforms solving PDEs to solving quadratic optimization problem with nonlinear constraints. Its complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel of the GP and its partial derivatives at collocation points. We present a sparse Cholesky factorization algorithm for such kernel matrices based on the near-sparsity of the Cholesky factor under a new ordering of Diracs and derivative measurements. We rigorously identify the sparsity pattern and quantify the exponentially convergent accuracy of the corresponding Vecchia approximation of the GP, which is optimal in the Kullback-Leibler divergence. This enables us to compute $\epsilon$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N\log^d(N/\epsilon))$ in space and $O(N\log^{2d}(N/\epsilon))$ in time. With the sparse factors, gradient-based optimization methods become scalable. Furthermore, we can use the oftentimes more efficient Gauss-Newton method, for which we apply the conjugate gradient algorithm with the sparse factor of a reduced kernel matrix as a preconditioner to solve the linear system. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Amp\`ere equations. In summary, we provide a fast, scalable, and accurate method for solving general PDEs with GPs.
State-space models (SSMs) are a powerful statistical tool for modelling time-varying systems via a latent state. In these models, the latent state is never directly observed. Instead, a sequence of data points related to the state are obtained. The linear-Gaussian state-space model is widely used, since it allows for exact inference when all model parameters are known, however this is rarely the case. The estimation of these parameters is a very challenging but essential task to perform inference and prediction. In the linear-Gaussian model, the state dynamics are described via a state transition matrix. This model parameter is known to behard to estimate, since it encodes the relationships between the state elements, which are never observed. In many applications, this transition matrix is sparse since not all state components directly affect all other state components. However, most parameter estimation methods do not exploit this feature. In this work we propose SpaRJ, a fully probabilistic Bayesian approach that obtains sparse samples from the posterior distribution of the transition matrix. Our method explores sparsity by traversing a set of models that exhibit differing sparsity patterns in the transition matrix. Moreover, we also design new effective rules to explore transition matrices within the same level of sparsity. This novel methodology has strong theoretical guarantees, and unveils the latent structure of the data generating process, thereby enhancing interpretability. The performance of SpaRJ is showcased in example with dimension 144 in the parameter space, and in a numerical example with real data.
In this paper, we present a kernel-based, multi-task Gaussian Process (GP) model for approximating the underlying function of an individual's mobility state using a time-inhomogeneous Markov Process with two states: moves and pauses. Our approach accounts for the correlations between the transition probabilities by creating a covariance matrix over the tasks. We also introduce time-variability by assuming that an individual's transition probabilities vary over time in response to exogenous variables. We enforce the stochasticity and non-negativity constraints of probabilities in a Markov process through the incorporation of a set of constraint points in the GP. We also discuss opportunities to speed up GP estimation and inference in this context by exploiting Toeplitz and Kronecker product structures. Our numerical experiments demonstrate the ability of our formulation to enforce the desired constraints while learning the functional form of transition probabilities.
The Plackett--Luce model is a popular approach for ranking data analysis, where a utility vector is employed to determine the probability of each outcome based on Luce's choice axiom. In this paper, we investigate the asymptotic theory of utility vector estimation by maximizing different types of likelihood, such as the full-, marginal-, and quasi-likelihood. We provide a rank-matching interpretation for the estimating equations of these estimators and analyze their asymptotic behavior as the number of items being compared tends to infinity. In particular, we establish the uniform consistency of these estimators under conditions characterized by the topology of the underlying comparison graph sequence and demonstrate that the proposed conditions are sharp for common sampling scenarios such as the nonuniform random hypergraph model and the hypergraph stochastic block model; we also obtain the asymptotic normality of these estimators and discuss the trade-off between statistical efficiency and computational complexity for practical uncertainty quantification. Both results allow for nonuniform and inhomogeneous comparison graphs with varying edge sizes and different asymptotic orders of edge probabilities. We verify our theoretical findings by conducting detailed numerical experiments.
We solve acoustic scattering problems by means of the isogeometric boundary integral equation method. In order to avoid spurious modes, we apply the combined field integral equations for either sound-hard scatterers or sound-soft scatterers. These integral equations are discretized by Galerkin's method, which especially enables the mathematically correct regularization of the hypersingular integral operator. In order to circumvent densely populated system matrices, we employ the isogeometric fast multipole method. The result is an algorithm that scales essentially linear in the number of boundary elements. Numerical experiments are performed which show the feasibility and the performance of the approach.
Point process models are widely used to analyze asynchronous events occurring within a graph that reflect how different types of events influence one another. Predicting future events' times and types is a crucial task, and the size and topology of the graph add to the challenge of the problem. Recent neural point process models unveil the possibility of capturing intricate inter-event-category dependencies. However, such methods utilize an unfiltered history of events, including all event categories in the intensity computation for each target event type. In this work, we propose a graph point process method where event interactions occur based on a latent graph topology. The corresponding undirected graph has nodes representing event categories and edges indicating potential contribution relationships. We then develop a novel deep graph kernel to characterize the triggering and inhibiting effects between events. The intrinsic influence structures are incorporated via the graph neural network (GNN) model used to represent the learnable kernel. The computational efficiency of the GNN approach allows our model to scale to large graphs. Comprehensive experiments on synthetic and real-world data show the superior performance of our approach against the state-of-the-art methods in predicting future events and uncovering the relational structure among data.
In this paper, we are concerned with efficiently solving the sequences of regularized linear least squares problems associated with employing Tikhonov-type regularization with regularization operators designed to enforce edge recovery. An optimal regularization parameter, which balances the fidelity to the data with the edge-enforcing constraint term, is typically not known a priori. This adds to the total number of regularized linear least squares problems that must be solved before the final image can be recovered. Therefore, in this paper, we determine effective multigrid preconditioners for these sequences of systems. We focus our approach on the sequences that arise as a result of the edge-preserving method introduced in [6], where we can exploit an interpretation of the regularization term as a diffusion operator; however, our methods are also applicable in other edge-preserving settings, such as iteratively reweighted least squares problems. Particular attention is paid to the selection of components of the multigrid preconditioner in order to achieve robustness for different ranges of the regularization parameter value. In addition, we present a parameter culling approach that, when used with the L-curve heuristic, reduces the total number of solves required. We demonstrate our preconditioning and parameter culling routines on examples in computed tomography and image deblurring.
This paper explores variants of the subspace iteration algorithm for computing approximate invariant subspaces. The standard subspace iteration approach is revisited and new variants that exploit gradient-type techniques combined with a Grassmann manifold viewpoint are developed. A gradient method as well as a conjugate gradient technique are described. Convergence of the gradient-based algorithm is analyzed and a few numerical experiments are reported, indicating that the proposed algorithms are sometimes superior to a standard Chebyshev-based subspace iteration when compared in terms of number of matrix vector products, but do not require estimating optimal parameters. An important contribution of this paper to achieve this good performance is the accurate and efficient implementation of an exact line search. In addition, new convergence proofs are presented for the non-accelerated gradient method that includes a locally exponential convergence if started in a $\mathcal{O(\sqrt{\delta})}$ neighbourhood of the dominant subspace with spectral gap $\delta$.
This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies with i.i.d. random excitation noises. The problem is motivated by safe learning-based control for constrained linear systems, where the safe policies during the learning process are usually nonlinear and time-varying for satisfying the state and input constraints. In this paper, we provide a non-asymptotic error bound for least square estimation when the data trajectory is generated by any nonlinear and/or time-varying policies as long as the generated state and action trajectories are bounded. This significantly generalizes the existing non-asymptotic guarantees for linear system identification, which usually consider i.i.d. random inputs or linear policies. Interestingly, our error bound is consistent with that for linear policies with respect to the dependence on the trajectory length, system dimensions, and excitation levels. Lastly, we demonstrate the applications of our results by safe learning with robust model predictive control and provide numerical analysis.
In this study, we examine numerical approximations for 2nd-order linear-nonlinear differential equations with diverse boundary conditions, followed by the residual corrections of the first approximations. We first obtain numerical results using the Galerkin weighted residual approach with Bernstein polynomials. The generation of residuals is brought on by the fact that our first approximation is computed using numerical methods. To minimize these residuals, we use the compact finite difference scheme of 4th-order convergence to solve the error differential equations in accordance with the error boundary conditions. We also introduce the formulation of the compact finite difference method of fourth-order convergence for the nonlinear BVPs. The improved approximations are produced by adding the error values derived from the approximations of the error differential equation to the weighted residual values. Numerical results are compared to the exact solutions and to the solutions available in the published literature to validate the proposed scheme, and high accuracy is achieved in all cases
Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.