This work considers the low-rank approximation of a matrix $A(t)$ depending on a parameter $t$ in a compact set $D \subset \mathbb{R}^d$. Application areas that give rise to such problems include computational statistics and dynamical systems. Randomized algorithms are an increasingly popular approach for performing low-rank approximation and they usually proceed by multiplying the matrix with random dimension reduction matrices (DRMs). Applying such algorithms directly to $A(t)$ would involve different, independent DRMs for every $t$, which is not only expensive but also leads to inherently non-smooth approximations. In this work, we propose to use constant DRMs, that is, $A(t)$ is multiplied with the same DRM for every $t$. The resulting parameter-dependent extensions of two popular randomized algorithms, the randomized singular value decomposition and the generalized Nystr\"{o}m method, are computationally attractive, especially when $A(t)$ admits an affine linear decomposition with respect to $t$. We perform a probabilistic analysis for both algorithms, deriving bounds on the expected value as well as failure probabilities for the approximation error when using Gaussian random DRMs. Both, the theoretical results and numerical experiments, show that the use of constant DRMs does not impair their effectiveness; our methods reliably return quasi-best low-rank approximations.
We present simulation-free score and flow matching ([SF]$^2$M), a simulation-free objective for inferring stochastic dynamics given unpaired samples drawn from arbitrary source and target distributions. Our method generalizes both the score-matching loss used in the training of diffusion models and the recently proposed flow matching loss used in the training of continuous normalizing flows. [SF]$^2$M interprets continuous-time stochastic generative modeling as a Schr\"odinger bridge problem. It relies on static entropy-regularized optimal transport, or a minibatch approximation, to efficiently learn the SB without simulating the learned stochastic process. We find that [SF]$^2$M is more efficient and gives more accurate solutions to the SB problem than simulation-based methods from prior work. Finally, we apply [SF]$^2$M to the problem of learning cell dynamics from snapshot data. Notably, [SF]$^2$M is the first method to accurately model cell dynamics in high dimensions and can recover known gene regulatory networks from simulated data.
A common approach to evaluating the significance of a collection of $p$-values combines them with a pooling function, in particular when the original data are not available. These pooled $p$-values convert a sample of $p$-values into a single number which behaves like a univariate $p$-value. To clarify discussion of these functions, a telescoping series of alternative hypotheses are introduced that communicate the strength and prevalence of non-null evidence in the $p$-values before general pooling formulae are discussed. A pattern noticed in the UMP pooled $p$-value for a particular alternative motivates the definition and discussion of central and marginal rejection levels at $\alpha$. It is proven that central rejection is always greater than or equal to marginal rejection, motivating a quotient to measure the balance between the two for pooled $p$-values. A combining function based on the $\chi^2_{\kappa}$ quantile transformation is proposed to control this quotient and shown to be robust to mis-specified parameters relative to the UMP. Different powers for different parameter settings motivate a map of plausible alternatives based on where this pooled $p$-value is minimized.
Whittle-Mat\'ern fields are a recently introduced class of Gaussian processes on metric graphs, which are specified as solutions to a fractional-order stochastic differential equation. Unlike earlier covariance-based approaches for specifying Gaussian fields on metric graphs, the Whittle-Mat\'ern fields are well-defined for any compact metric graph and can provide Gaussian processes with differentiable sample paths. We derive the main statistical properties of the model class, particularly the consistency and asymptotic normality of maximum likelihood estimators of model parameters and the necessary and sufficient conditions for asymptotic optimality properties of linear prediction based on the model with misspecified parameters. The covariance function of the Whittle-Mat\'ern fields is generally unavailable in closed form, and they have therefore been challenging to use for statistical inference. However, we show that for specific values of the fractional exponent, when the fields have Markov properties, likelihood-based inference and spatial prediction can be performed exactly and computationally efficiently. This facilitates using the Whittle-Mat\'ern fields in statistical applications involving big datasets without the need for any approximations. The methods are illustrated via an application to modeling of traffic data, where allowing for differentiable processes dramatically improves the results.
We consider the fast in-place computation of the Euclidean polynomial modular remainder $R(X) \not\equiv A(X) \mod B(X)$ with $A$ and $B$ of respective degrees n and m $\le$ n. If the multiplication of two polynomials of degree $k$ can be performed with $M(k)$ operations and $O(k)$ extra space, then standard algorithms for the remainder require $O(n/m M(m))$ arithmetic operations and, apart from that of $A$ and $B$, at least $O(n-m)$ extra memory. This extra space is notably usually used to store the whole quotient $Q(X)$ such that $A = BQ + R$ with deg $R$ < deg $B$. We avoid the storage of the whole of this quotient, and propose an algorithm still using $O(n/m M(m))$ arithmetic operations but only $O(m)$ extra space.When the divisor $B$ is sparse with a constant number of non-zero terms, the arithmetic complexity bound reduces to $O(n)$. When it is allowed to use the input space of $A$ or $B$ for intermediate computations, but putting $A$ and $B$ back to their initial states after the completion of the remainder computation, we further propose an in-place algorithm (that is with its extra required space reduced to $O(1)$ only) using at most $O(n/m M(m) \log(m))$ arithmetic operations if $M(m)$ is quasi-linear and $O(n/m M(m))$ otherwise. We also propose variants that compute -- still in-place and with the same complexity bounds -- the over-place remainder $A(X) \not\equiv A(X) \mod B(X)$ and the accumulated remainder $R(X) +\not\equiv A(X) \mod B(X)$. To achieve this, we develop techniques for Toeplitz matrix operations which output is also part of the input. In-place accumulating versions are obtained for the latter and for polynomial remaindering. This is realized via further reductions to accumulated polynomial multiplication, for which in-place fast algorithms have recently been developed.
Let $G$ be a graph and $S\subseteq V(G)$ with $|S|\geq 2$. Then the trees $T_1, T_2, \cdots, T_\ell$ in $G$ are \emph{internally disjoint Steiner trees} connecting $S$ (or $S$-Steiner trees) if $E(T_i) \cap E(T_j )=\emptyset$ and $V(T_i)\cap V(T_j)=S$ for every pair of distinct integers $i,j$, $1 \leq i, j \leq \ell$. Similarly, if we only have the condition $E(T_i) \cap E(T_j )=\emptyset$ but without the condition $V(T_i)\cap V(T_j)=S$, then they are \emph{edge-disjoint Steiner trees}. The \emph{generalized $k$-connectivity}, denoted by $\kappa_k(G)$, of a graph $G$, is defined as $\kappa_k(G)=\min\{\kappa_G(S)|S \subseteq V(G) \ \textrm{and} \ |S|=k \}$, where $\kappa_G(S)$ is the maximum number of internally disjoint $S$-Steiner trees. The \emph{generalized local edge-connectivity} $\lambda_{G}(S)$ is the maximum number of edge-disjoint Steiner trees connecting $S$ in $G$. The {\it generalized $k$-edge-connectivity} $\lambda_k(G)$ of $G$ is defined as $\lambda_k(G)=\min\{\lambda_{G}(S)\,|\,S\subseteq V(G) \ and \ |S|=k\}$. These measures are generalizations of the concepts of connectivity and edge-connectivity, and they and can be used as measures of vulnerability of networks. It is, in general, difficult to compute these generalized connectivities. However, there are precise results for some special classes of graphs. In this paper, we obtain the exact value of $\lambda_{k}(S(n,\ell))$ for $3\leq k\leq \ell^n$, and the exact value of $\kappa_{k}(S(n,\ell))$ for $3\leq k\leq \ell$, where $S(n, \ell)$ is the Sierpi\'{n}ski graphs with order $\ell^n$. As a direct consequence, these graphs provide additional interesting examples when $\lambda_{k}(S(n,\ell))=\kappa_{k}(S(n,\ell))$. We also study the some network properties of Sierpi\'{n}ski graphs.
Brown and Walker (1997) showed that GMRES determines a least squares solution of $ A x = b $ where $ A \in {\bf R}^{n \times n} $ without breakdown for arbitrary $ b, x_0 \in {\bf R}^n $ if and only if $A$ is range-symmetric, i.e. $ {\cal R} (A^{\rm T}) = {\cal R} (A) $, where $ A $ may be singular and $ b $ may not be in the range space ${\cal R} A)$ of $A$. In this paper, we propose applying GMRES to $ A C A^{\rm T} z = b $, where $ C \in {\bf R}^{n \times n} $ is symmetric positive definite. This determines a least squares solution $ x = CA^{\rm T} z $ of $ A x = b $ without breakdown for arbitrary (singular) matrix $A \in {\bf R}^{n \times n}$ and $ b \in {\bf R}^n $. To make the method numerically stable, we propose using the pseudoinverse with an appropriate threshold parameter to suppress the influence of tiny singular values when solving the severely ill-conditioned Hessenberg systems which arise in the Arnoldi process of GMRES when solving inconsistent range-symmetric systems. Numerical experiments show that the method taking $C$ to be the identity matrix gives the least squares solution even when $A$ is not range-symmetric, including the case when $ {\rm index}(A) >1$.
We describe Bayes factors functions based on z, t, $\chi^2$, and F statistics and the prior distributions used to define alternative hypotheses. The non-local alternative prior distributions are centered on standardized effects, which index the Bayes factor function. The prior densities include a dispersion parameter that models the variation of effect sizes across replicated experiments. We examine the convergence rates of Bayes factor functions under true null and true alternative hypotheses. Several examples illustrate the application of the Bayes factor functions to replicated experimental designs and compare the conclusions from these analyses to other default Bayes factor methods.
The multispecies Landau collision operator describes the two-particle, small scattering angle or grazing collisions in a plasma made up of different species of particles such as electrons and ions. Recently, a structure preserving deterministic particle method arXiv:1910.03080 has been developed for the single species spatially homogeneous Landau equation. This method relies on a regularization of the Landau collision operator so that an approximate solution, which is a linear combination of Dirac delta distributions, is well-defined. Based on a weak form of the regularized Landau equation, the time dependent locations of the Dirac delta functions satisfy a system of ordinary differential equations. In this work, we extend this particle method to the multispecies case, and examine its conservation of mass, momentum, and energy, and decay of entropy properties. We show that the equilibrium distribution of the regularized multispecies Landau equation is a Maxwellian distribution, and state a critical condition on the regularization parameters that guarantees a species independent equilibrium temperature. A convergence study comparing an exact multispecies BKW solution to the particle solution shows approximately 2nd order accuracy. Important physical properties such as conservation, decay of entropy, and equilibrium distribution of the particle method are demonstrated with several numerical examples.
Let ${\mathcal P}$ be a family of probability measures on a measurable space $(S,{\mathcal A}).$ Given a Banach space $E,$ a functional $f:E\mapsto {\mathbb R}$ and a mapping $\theta: {\mathcal P}\mapsto E,$ our goal is to estimate $f(\theta(P))$ based on i.i.d. observations $X_1,\dots, X_n\sim P, P\in {\mathcal P}.$ In particular, if ${\mathcal P}=\{P_{\theta}: \theta\in \Theta\}$ is an identifiable statistical model with parameter set $\Theta\subset E,$ one can consider the mapping $\theta(P)=\theta$ for $P\in {\mathcal P}, P=P_{\theta},$ resulting in a problem of estimation of $f(\theta)$ based on i.i.d. observations $X_1,\dots, X_n\sim P_{\theta}, \theta\in \Theta.$ Given a smooth functional $f$ and estimators $\hat \theta_n(X_1,\dots, X_n), n\geq 1$ of $\theta(P),$ we use these estimators, the sample split and the Taylor expansion of $f(\theta(P))$ of a proper order to construct estimators $T_f(X_1,\dots, X_n)$ of $f(\theta(P)).$ For these estimators and for a functional $f$ of smoothness $s\geq 1,$ we prove upper bounds on the $L_p$-errors of estimator $T_f(X_1,\dots, X_n)$ under certain moment assumptions on the base estimators $\hat \theta_n.$ We study the performance of estimators $T_f(X_1,\dots, X_n)$ in several concrete problems, showing their minimax optimality and asymptotic efficiency. In particular, this includes functional estimation in high-dimensional models with many low dimensional components, functional estimation in high-dimensional exponential families and estimation of functionals of covariance operators in infinite-dimensional subgaussian models.
The problems of determining the minimum-sized \emph{identifying}, \emph{locating-dominating} and \emph{open locating-dominating codes} of an input graph are special search problems that are challenging from both theoretical and computational viewpoints. In these problems, one selects a dominating set $C$ of a graph $G$ such that the vertices of a chosen subset of $V(G)$ (i.e. either $V(G)\setminus C$ or $V(G)$ itself) are uniquely determined by their neighborhoods in $C$. A typical line of attack for these problems is to determine tight bounds for the minimum codes in various graphs classes. In this work, we present tight lower and upper bounds for all three types of codes for \emph{block graphs} (i.e. diamond-free chordal graphs). Our bounds are in terms of the number of maximal cliques (or \emph{blocks}) of a block graph and the order of the graph. Two of our upper bounds verify conjectures from the literature - with one of them being now proven for block graphs in this article. As for the lower bounds, we prove them to be linear in terms of both the number of blocks and the order of the block graph. We provide examples of families of block graphs whose minimum codes attain these bounds, thus showing each bound to be tight.