We study Bayesian methods for large-scale linear inverse problems, focusing on the challenging task of hyperparameter estimation. Typical hierarchical Bayesian formulations that follow a Markov Chain Monte Carlo approach are possible for small problems with very few hyperparameters but are not computationally feasible for problems with a very large number of unknown parameters. In this work, we describe an empirical Bayesian (EB) method to estimate hyperparameters that maximize the marginal posterior, i.e., the probability density of the hyperparameters conditioned on the data, and then we use the estimated values to compute the posterior of the inverse parameters. For problems where the computation of the square root and inverse of prior covariance matrices are not feasible, we describe an approach based on the generalized Golub-Kahan bidiagonalization to approximate the marginal posterior and seek hyperparameters that minimize the approximate marginal posterior. Numerical results from seismic and atmospheric tomography demonstrate the accuracy, robustness, and potential benefits of the proposed approach.
The most popular method for computing the matrix logarithm is a combination of the inverse scaling and squaring method in conjunction with a Pad\'e approximation, sometimes accompanied by the Schur decomposition. The main computational effort lies in matrix-matrix multiplications and left matrix division. In this work we illustrate that the number of such operations can be substantially reduced, by using a graph based representation of an efficient polynomial evaluation scheme. A technique to analyze the rounding error is proposed, and backward error analysis is adapted. We provide substantial simulations illustrating competitiveness both in terms of computation time and rounding errors.
In this paper we develop a linear expectile hidden Markov model for the analysis of cryptocurrency time series in a risk management framework. The methodology proposed allows to focus on extreme returns and describe their temporal evolution by introducing in the model time-dependent coefficients evolving according to a latent discrete homogeneous Markov chain. As it is often used in the expectile literature, estimation of the model parameters is based on the asymmetric normal distribution. Maximum likelihood estimates are obtained via an Expectation-Maximization algorithm using efficient M-step update formulas for all parameters. We evaluate the introduced method with both artificial data under several experimental settings and real data investigating the relationship between daily Bitcoin returns and major world market indices.
This paper studies the convergence of a spatial semidiscretization of a three-dimensional stochastic Allen-Cahn equation with multiplicative noise. For non-smooth initial values, the regularity of the mild solution is investigated, and an error estimate is derived with the spatial $ L^2 $-norm. For smooth initial values, two error estimates with the general spatial $ L^q $-norms are established.
We present a novel discontinuous Galerkin finite element method for numerical simulations of the rotating thermal shallow water equations in complex geometries using curvilinear meshes, with arbitrary accuracy. We derive an entropy functional which is convex, and which must be preserved in order to preserve model stability at the discrete level. The numerical method is provably entropy stable and conserves mass, buoyancy, vorticity, and energy. This is achieved by using novel entropy stable numerical fluxes, summation-by-parts principle, and splitting the pressure and convection operators so that we can circumvent the use of chain rule at the discrete level. Numerical simulations on a cubed sphere mesh are presented to verify the theoretical results. The numerical experiments demonstrate the robustness of the method for a regime of well developed turbulence, where it can be run stably without any dissipation. The entropy stable fluxes are sufficient to control the grid scale noise generated by geostrophic turbulence, eliminating the need for artificial stabilisation.
This work focuses on the numerical approximations of random periodic solutions of stochastic differential equations (SDEs). Under non-globally Lipschitz conditions, we prove the existence and uniqueness of random periodic solutions for the considered equations and its numerical approximations generated by the stochastic theta (ST) methods with theta within (1/2,1]. It is shown that the random periodic solution of each ST method converges strongly in the mean square sense to that of SDEs for all step size. More precisely, the mean square convergence order is 1/2 for SDEs with multiplicative noise and 1 for SDEs with additive noise. Numerical results are finally reported to confirm these theoretical findings.
We develop a novel and efficient discontinuous Galerkin spectral element method (DG-SEM) for the spherical rotating shallow water equations in vector invariant form. We prove that the DG-SEM is energy stable, and discretely conserves mass, vorticity, and linear geostrophic balance on general curvlinear meshes. These theoretical results are possible due to our novel entropy stable numerical DG fluxes for the shallow water equations in vector invariant form. We experimentally verify these results on a cubed sphere mesh. Additionally, we show that our method is robust, that is can be run stably without any dissipation. The entropy stable fluxes are sufficient to control the grid scale noise generated by geostrophic turbulence without the need for artificial stabilisation.
Due to the complex behavior arising from non-uniqueness, symmetry, and bifurcations in the solution space, solving inverse problems of nonlinear differential equations (DEs) with multiple solutions is a challenging task. To address this, we propose homotopy physics-informed neural networks (HomPINNs), a novel framework that leverages homotopy continuation and neural networks (NNs) to solve inverse problems. The proposed framework begins with the use of NNs to simultaneously approximate unlabeled observations across diverse solutions while adhering to DE constraints. Through homotopy continuation, the proposed method solves the inverse problem by tracing the observations and identifying multiple solutions. The experiments involve testing the performance of the proposed method on one-dimensional DEs and applying it to solve a two-dimensional Gray-Scott simulation. Our findings demonstrate that the proposed method is scalable and adaptable, providing an effective solution for solving DEs with multiple solutions and unknown parameters. Moreover, it has significant potential for various applications in scientific computing, such as modeling complex systems and solving inverse problems in physics, chemistry, biology, etc.
We study the problem of distribution shift generally arising in machine-learning augmented hybrid simulation, where parts of simulation algorithms are replaced by data-driven surrogates. We first establish a mathematical framework to understand the structure of machine-learning augmented hybrid simulation problems, and the cause and effect of the associated distribution shift. We show correlations between distribution shift and simulation error both numerically and theoretically. Then, we propose a simple methodology based on tangent-space regularized estimator to control the distribution shift, thereby improving the long-term accuracy of the simulation results. In the linear dynamics case, we provide a thorough theoretical analysis to quantify the effectiveness of the proposed method. Moreover, we conduct several numerical experiments, including simulating a partially known reaction-diffusion equation and solving Navier-Stokes equations using the projection method with a data-driven pressure solver. In all cases, we observe marked improvements in simulation accuracy under the proposed method, especially for systems with high degrees of distribution shift, such as those with relatively strong non-linear reaction mechanisms, or flows at large Reynolds numbers.
We develop a numerical method for computing with orthogonal polynomials that are orthogonal on multiple, disjoint intervals for which analytical formulae are currently unknown. Our approach exploits the Fokas--Its--Kitaev Riemann--Hilbert representation of the orthogonal polynomials to produce an $\mathrm{O}(N)$ method to compute the first $N$ recurrence coefficients. The method can also be used for pointwise evaluation of the polynomials and their Cauchy transforms throughout the complex plane. The method encodes the singularity behavior of weight functions using weighted Cauchy integrals of Chebyshev polynomials. This greatly improves the efficiency of the method, outperforming other available techniques. We demonstrate the fast convergence of our method and present applications to integrable systems and approximation theory.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.