A low-rank approximation of a parameter-dependent matrix $A(t)$ is an important task in the computational sciences appearing for example in dynamical systems and compression of a series of images. In this work, we introduce AdaCUR, an efficient algorithm for computing a low-rank approximation of parameter-dependent matrices via CUR decomposition. The key idea for this algorithm is that for nearby parameter values, the column and row indices for the CUR decomposition can often be reused. AdaCUR is rank-adaptive, certifiable and has complexity that compares favourably against existing methods. A faster algorithm which we call FastAdaCUR that prioritizes speed over accuracy is also given, which is rank-adaptive and has complexity which is at most linear in the number of rows or columns, but without certification.
The Roman domination in a graph $G$ is a variant of the classical domination, defined by means of a so-called Roman domination function $f\colon V(G)\to \{0,1,2\}$ such that if $f(v)=0$ then, the vertex $v$ is adjacent to at least one vertex $w$ with $f(w)=2$. The weight $f(G)$ of a Roman dominating function of $G$ is the sum of the weights of all vertices of $G$, that is, $f(G)=\sum_{u\in V(G)}f(u)$. The Roman domination number $\gamma_R(G)$ is the minimum weight of a Roman dominating function of $G$. In this paper we propose algorithms to compute this parameter involving the $(\min,+)$ powers of large matrices with high computational requirements and the GPU (Graphics Processing Unit) allows us to accelerate such operations. Specific routines have been developed to efficiently compute the $(\min ,+)$ product on GPU architecture, taking advantage of its computational power. These algorithms allow us to compute the Roman domination number of cylindrical graphs $P_m\Box C_n$ i.e., the Cartesian product of a path and a cycle, in cases $m=7,8,9$, $ n\geq 3$ and $m\geq $10$, n\equiv 0\pmod 5$. Moreover, we provide a lower bound for the remaining cases $m\geq 10, n\not\equiv 0\pmod 5$.
We study the largest eigenvalue of a Gaussian random symmetric matrix $X_n$, with zero-mean, unit variance entries satisfying the condition $\sup_{(i, j) \ne (i', j')}|\mathbb{E}[X_{ij} X_{i'j'}]| = O(n^{-(1 + \varepsilon)})$, where $\varepsilon > 0$. It follows from Catalano et al. (2024) that the empirical spectral distribution of $n^{-1/2} X_n$ converges weakly almost surely to the standard semi-circle law. Using a F\"{u}redi-Koml\'{o}s-type high moment analysis, we show that the largest eigenvalue $\lambda_1(n^{-1/2} X_n)$ of $n^{-1/2} X_n$ converges almost surely to $2$. This result is essentially optimal in the sense that one cannot take $\varepsilon = 0$ and still obtain an almost sure limit of $2$. We also derive Gaussian fluctuation results for the largest eigenvalue in the case where the entries have a common non-zero mean. Let $Y_n = X_n + \frac{\lambda}{\sqrt{n}}\mathbf{1} \mathbf{1}^\top$. When $\varepsilon \ge 1$ and $\lambda \gg n^{1/4}$, we show that \[ n^{1/2}\bigg(\lambda_1(n^{-1/2} Y_n) - \lambda - \frac{1}{\lambda}\bigg) \xrightarrow{d} \sqrt{2} Z, \] where $Z$ is a standard Gaussian. On the other hand, when $0 < \varepsilon < 1$, we have $\mathrm{Var}(\frac{1}{n}\sum_{i, j}X_{ij}) = O(n^{1 - \varepsilon})$. Assuming that $\mathrm{Var}(\frac{1}{n}\sum_{i, j} X_{ij}) = \sigma^2 n^{1 - \varepsilon} (1 + o(1))$, if $\lambda \gg n^{\varepsilon/4}$, then we have \[ n^{\varepsilon/2}\bigg(\lambda_1(n^{-1/2} Y_n) - \lambda - \frac{1}{\lambda}\bigg) \xrightarrow{d} \sigma Z. \] While the ranges of $\lambda$ in these fluctuation results are certainly not optimal, a striking aspect is that different scalings are required in the two regimes $0 < \varepsilon < 1$ and $\varepsilon \ge 1$.
The problem of approximate joint diagonalization of a collection of matrices arises in a number of diverse engineering and signal processing problems. This problem is usually cast as an optimization problem, and it is the main goal of this publication to provide a theoretical study of the corresponding cost-functional. As our main result, we prove that this functional tends to infinity in the vicinity of rank-deficient matrices with probability one, thereby proving that the optimization problem is well posed. Secondly, we provide unified expressions for its higher-order derivatives in multilinear form, and explicit expressions for the gradient and the Hessian of the functional in standard form, thereby opening for new improved numerical schemes for the solution of the joint diagonalization problem. A special section is devoted to the important case of self-adjoint matrices.
Many estimators of the variance of the well-known unbiased and uniform most powerful estimator $\htheta$ of the Mann-Whitney effect, $\theta = P(X < Y) + \nfrac12 P(X=Y)$, are considered in the literature. Some of these estimators are only valid in case of no ties or are biased in case of small sample sizes where the amount of the bias is not discussed. Here we derive an unbiased estimator that is based on different rankings, the so-called 'placements' (Orban and Wolfe, 1980), and is therefore easy to compute. This estimator does not require the assumption of continuous \dfs\ and is also valid in the case of ties. Moreover, it is shown that this estimator is non-negative and has a sharp upper bound which may be considered an empirical version of the well-known Birnbaum-Klose inequality. The derivation of this estimator provides an option to compute the biases of some commonly used estimators in the literature. Simulations demonstrate that, for small sample sizes, the biases of these estimators depend on the underlying \dfs\ and thus are not under control. This means that in the case of a biased estimator, simulation results for the type-I error of a test or the coverage probability of a \ci\ do not only depend on the quality of the approximation of $\htheta$ by a normal \db\ but also an additional unknown bias caused by the variance estimator. Finally, it is shown that this estimator is $L_2$-consistent.
Least-squares approximation is one of the most important methods for recovering an unknown function from data. While in many applications the data is fixed, in many others there is substantial freedom to choose where to sample. In this paper, we review recent progress on optimal sampling for (weighted) least-squares approximation in arbitrary linear spaces. We introduce the Christoffel function as a key quantity in the analysis of (weighted) least-squares approximation from random samples, then show how it can be used to construct sampling strategies that possess near-optimal sample complexity: namely, the number of samples scales log-linearly in $n$, the dimension of the approximation space. We discuss a series of variations, extensions and further topics, and throughout highlight connections to approximation theory, machine learning, information-based complexity and numerical linear algebra. Finally, motivated by various contemporary applications, we consider a generalization of the classical setting where the samples need not be pointwise samples of a scalar-valued function, and the approximation space need not be linear. We show that even in this significantly more general setting suitable generalizations of the Christoffel function still determine the sample complexity. This provides a unified procedure for designing improved sampling strategies for general recovery problems. This article is largely self-contained, and intended to be accessible to nonspecialists.
Recently, the description logic LE-$\mathcal{ALC}$ was introduced for reasoning in the semantic environment of the enriched formal contexts, and a tableaux algorithm was developed for checking the consistency of ABoxes in this logic \cite{van2023non}. In this paper, we study the ontology-mediated query answering in LE-$\mathcal{ALC}$. In particular, we show that several different types of queries can be answered efficiently for LE-$\mathcal{ALC}$ knowledge bases with acyclic TBoxes using our tableaux algorithm directly or by extending it with some additional rules.
Invariant coordinate selection (ICS) is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem (EVP) by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular and the problem cannot be solved. To address this limitation, three approaches are proposed based on: a Moore-Penrose pseudo inverse (GINV), a dimension reduction (DR), and a generalized singular value decomposition (GSVD). Their properties are investigated theoretically and in different empirical applications. Overall, the extension based on GSVD seems the most promising even if it restricts the choice of scatter matrices that can be expressed as cross-products. In practice, some of the approaches also look suitable in the context of data in high dimension low sample size (HDLSS).
This paper introduces a formulation of the variable density incompressible Navier-Stokes equations by modifying the nonlinear terms in a consistent way. For Galerkin discretizations, the formulation leads to full discrete conservation of mass, squared density, momentum, angular momentum and kinetic energy without the divergence-free constraint being strongly enforced. In addition to favorable conservation properties, the formulation is shown to make the density field invariant to global shifts. The effect of viscous regularizations on conservation properties is also investigated. Numerical tests validate the theory developed in this work. The new formulation shows superior performance compared to other formulations from the literature, both in terms of accuracy for smooth problems and in terms of robustness.
A finite element (FE) discretization for the steady, incompressible, fully inhomogeneous, generalized Navier-Stokes equations is proposed. By the method of divergence reconstruction operators, the formulation is valid for all shear stress exponents $p > \tfrac{2d}{d+2}$. The Dirichlet boundary condition is imposed strongly, using any discretization of the boundary data which converges at a sufficient rate. $\textit{A priori}$ error estimates for velocity vector field and kinematic pressure are derived and numerical experiments are conducted. These confirm the quasi-optimality of the $\textit{a priori}$ error estimate for the velocity vector field. The $\textit{a priori}$ error estimates for the kinematic pressure are quasi-optimal if $p \leq 2$.
Density deconvolution addresses the estimation of the unknown (probability) density function $f$ of a random signal from data that are observed with an independent additive random noise. This is a classical problem in statistics, for which frequentist and Bayesian nonparametric approaches are available to deal with static or batch data. In this paper, we consider the problem of density deconvolution in a streaming or online setting where noisy data arrive progressively, with no predetermined sample size, and we develop a sequential nonparametric approach to estimate $f$. By relying on a quasi-Bayesian sequential approach, often referred to as Newton's algorithm, we obtain estimates of $f$ that are of easy evaluation, computationally efficient, and with a computational cost that remains constant as the amount of data increases, which is critical in the streaming setting. Large sample asymptotic properties of the proposed estimates are studied, yielding provable guarantees with respect to the estimation of $f$ at a point (local) and on an interval (uniform). In particular, we establish local and uniform central limit theorems, providing corresponding asymptotic credible intervals and bands. We validate empirically our methods on synthetic and real data, by considering the common setting of Laplace and Gaussian noise distributions, and make a comparison with respect to the kernel-based approach and a Bayesian nonparametric approach with a Dirichlet process mixture prior.