In [3] it was shown that four seemingly different algorithms for computing low-rank approximate solutions $X_j$ to the solution $X$ of large-scale continuous-time algebraic Riccati equations (CAREs) $0 = \mathcal{R}(X) := A^HX+XA+C^HC-XBB^HX $ generate the same sequence $X_j$ when used with the same parameters. The Hermitian low-rank approximations $X_j$ are of the form $X_j = Z_jY_jZ_j^H,$ where $Z_j$ is a matrix with only few columns and $Y_j$ is a small square Hermitian matrix. Each $X_j$ generates a low-rank Riccati residual $\mathcal{R}(X_j)$ such that the norm of the residual can be evaluated easily allowing for an efficient termination criterion. Here a new family of methods to generate such low-rank approximate solutions $X_j$ of CAREs is proposed. Each member of this family of algorithms proposed here generates the same sequence of $X_j$ as the four previously known algorithms. The approach is based on a block rational Arnoldi decomposition and an associated block rational Krylov subspace spanned by $A^H$ and $C^H.$ Two specific versions of the general algorithm will be considered; one will turn out to be a rediscovery of the RADI algorithm, the other one allows for a slightly more efficient implementation compared to the RADI algorithm (in case the Sherman-Morrision-Woodbury formula and a direct solver is used to solve the linear systems that occur). Moreover, our approach allows for adding more than one shift at a time.
Sylvester matrix equations are ubiquitous in scientific computing. However, few solution techniques exist for their generalized multiterm version, as they now arise in an increasingly large number of applications. In this work, we consider algebraic parameter-free preconditioning techniques for the iterative solution of generalized multiterm Sylvester equations. They consist in constructing low Kronecker rank approximations of either the operator itself or its inverse. While the former requires solving standard Sylvester equations in each iteration, the latter only requires matrix-matrix multiplications, which are highly optimized on modern computer architectures. Moreover, low Kronecker rank approximate inverses can be easily combined with sparse approximate inverse techniques, thereby enhancing their performance with little or no damage to their effectiveness.
We study the problem of adaptive variable selection in a Gaussian white noise model of intensity $\varepsilon$ under certain sparsity and regularity conditions on an unknown regression function $f$. The $d$-variate regression function $f$ is assumed to be a sum of functions each depending on a smaller number $k$ of variables ($1 \leq k \leq d$). These functions are unknown to us and only few of them are nonzero. We assume that $d=d_\varepsilon \to \infty$ as $\varepsilon \to 0$ and consider the cases when $k$ is fixed and when $k=k_\varepsilon \to \infty$, $k=o(d)$ as $\varepsilon \to 0$. In this work, we introduce an adaptive selection procedure that, under some model assumptions, identifies exactly all nonzero $k$-variate components of $f$. In addition, we establish conditions under which exact identification of the nonzero components is impossible. These conditions ensure that the proposed selection procedure is the best possible in the asymptotically minimax sense with respect to the Hamming risk.
The low-rank quaternion matrix approximation has been successfully applied in many applications involving signal processing and color image processing. However, the cost of quaternion models for generating low-rank quaternion matrix approximation is sometimes considerable due to the computation of the quaternion singular value decomposition (QSVD), which limits their application to real large-scale data. To address this deficiency, an efficient quaternion matrix CUR (QMCUR) method for low-rank approximation is suggested, which provides significant acceleration in color image processing. We first explore the QMCUR approximation method, which uses actual columns and rows of the given quaternion matrix, instead of the costly QSVD. Additionally, two different sampling strategies are used to sample the above-selected columns and rows. Then, the perturbation analysis is performed on the QMCUR approximation of noisy versions of low-rank quaternion matrices. Extensive experiments on both synthetic and real data further reveal the superiority of the proposed algorithm compared with other algorithms for getting low-rank approximation, in terms of both efficiency and accuracy.
We consider a class of linear Vlasov partial differential equations driven by Wiener noise. Different types of stochastic perturbations are treated: additive noise, multiplicative It\^o and Stratonovich noise, and transport noise. We propose to employ splitting integrators for the temporal discretization of these stochastic partial differential equations. These integrators are designed in order to preserve qualitative properties of the exact solutions depending on the stochastic perturbation, such as preservation of norms or positivity of the solutions. We provide numerical experiments in order to illustrate the properties of the proposed integrators and investigate mean-square rates of convergence.
We study Langevin-type algorithms for sampling from Gibbs distributions such that the potentials are dissipative and their weak gradients have finite moduli of continuity not necessarily convergent to zero. Our main result is a non-asymptotic upper bound of the 2-Wasserstein distance between a Gibbs distribution and the law of general Langevin-type algorithms based on the Liptser--Shiryaev theory and Poincar\'{e} inequalities. We apply this bound to show that the Langevin Monte Carlo algorithm can approximate Gibbs distributions with arbitrary accuracy if the potentials are dissipative and their gradients are uniformly continuous. We also propose Langevin-type algorithms with spherical smoothing for distributions whose potentials are not convex or continuously differentiable.
We present a new method to compute the solution to a nonlinear tensor differential equation with dynamical low-rank approximation. The idea of dynamical low-rank approximation is to project the differential equation onto the tangent space of a low-rank tensor manifold at each time. Traditionally, an orthogonal projection onto the tangent space is employed, which is challenging to compute for nonlinear differential equations. We introduce a novel interpolatory projection onto the tangent space that is easily computed for many nonlinear differential equations and satisfies the differential equation at a set of carefully selected indices. To select these indices, we devise a new algorithm based on the discrete empirical interpolation method (DEIM) that parameterizes any tensor train and its tangent space with tensor cross interpolants. We demonstrate the proposed method with applications to tensor differential equations arising from the discretization of partial differential equations.
This paper introduces an efficient high-order numerical method for solving the 1D stationary Schr\"odinger equation in the highly oscillatory regime. Building upon the ideas from [2], we first analytically transform the given equation into a smoother (i.e. less oscillatory) equation. By developing sufficiently accurate quadratures for several (iterated) oscillatory integrals occurring in the Picard approximation of the solution, we obtain a one-step method that is third order w.r.t. the step size. The accuracy and efficiency of the method are illustrated through several numerical examples.
In this paper, we derive the improved uniform error bounds for the long-time dynamics of the $d$-dimensional $(d=2,3)$ nonlinear space fractional sine-Gordon equation (NSFSGE). The nonlinearity strength of the NSFSGE is characterized by $\varepsilon^2$ where $0<\varepsilon \le 1$ is a dimensionless parameter. The second-order time-splitting method is applied to the temporal discretization and the Fourier pseudo-spectral method is used for the spatial discretization. To obtain the explicit relation between the numerical errors and the parameter $\varepsilon$, we introduce the regularity compensation oscillation technique to the convergence analysis of fractional models. Then we establish the improved uniform error bounds $O\left(\varepsilon^2 \tau^2\right)$ for the semi-discretization scheme and $O\left(h^m+\varepsilon^2 \tau^2\right)$ for the full-discretization scheme up to the long time at $O(1/\varepsilon^2)$. Further, we extend the time-splitting Fourier pseudo-spectral method to the complex NSFSGE as well as the oscillatory complex NSFSGE, and the improved uniform error bounds for them are also given. Finally, extensive numerical examples in two-dimension or three-dimension are provided to support the theoretical analysis. The differences in dynamic behaviors between the fractional sine-Gordon equation and classical sine-Gordon equation are also discussed.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.