In this study, a novel preconditioner based on the absolute-value block $\alpha$-circulant matrix approximation is developed, specifically designed for nonsymmetric dense block lower triangular Toeplitz (BLTT) systems that emerge from the numerical discretization of evolutionary equations. Our preconditioner is constructed by taking an absolute-value of a block $\alpha$-circulant matrix approximation to the BLTT matrix. To apply our preconditioner, the original BLTT linear system is converted into a symmetric form by applying a time-reversing permutation transformation. Then, with our preconditioner, the preconditioned minimal residual method (MINRES) solver is employed to solve the symmetrized linear system. With properly chosen $\alpha$, the eigenvalues of the preconditioned matrix are proven to be clustered around $\pm1$ without any significant outliers. With the clustered spectrum, we show that the preconditioned MINRES solver for the preconditioned system has a convergence rate independent of system size. To the best of our knowledge, this is the first preconditioned MINRES method with size-independent convergence rate for the dense BLTT system. The efficacy of the proposed preconditioner is corroborated by our numerical experiments, which reveal that it attains optimal convergence.
We propose a novel neural network architecture based on conformer transducer that adds contextual information flow to the ASR systems. Our method improves the accuracy of recognizing uncommon words while not harming the word error rate of regular words. We explore the uncommon words accuracy improvement when we use the new model and/or shallow fusion with context language model. We found that combination of both provides cumulative gain in uncommon words recognition accuracy.
We provide a new theoretical framework for the variable-step deferred correction (DC) methods based on the well-known BDF2 formula. By using the discrete orthogonal convolution kernels, some high-order BDF2-DC methods are proven to be stable on arbitrary time grids according to the recent definition of stability (SINUM, 60: 2253-2272). It significantly relaxes the existing step-ratio restrictions for the BDF2-DC methods (BIT, 62: 1789-1822). The associated sharp error estimates are established by taking the numerical effects of the starting approximations into account, and they suggest that the BDF2-DC methods have no aftereffect, that is, the lower-order starting scheme for the BDF2 scheme will not cause a loss in the accuracy of the high-order BDF2-DC methods. Extensive tests on the graded and random time meshes are presented to support the new theory.
Covariance matrix estimation is an important task in the analysis of multivariate data in disparate scientific fields, including neuroscience, genomics, and astronomy. However, modern scientific data are often incomplete due to factors beyond the control of researchers, and data missingness may prohibit the use of traditional covariance estimation methods. Some existing methods address this problem by completing the data matrix, or by filling the missing entries of an incomplete sample covariance matrix by assuming a low-rank structure. We propose a novel approach that exploits auxiliary variables to complete covariance matrix estimates. An example of auxiliary variable is the distance between neurons, which is usually inversely related to the strength of neuronal correlation. Our method extracts auxiliary information via regression, and involves a single tuning parameter that can be selected empirically. We compare our method with other matrix completion approaches via simulations, and apply it to the analysis of large-scale neuroscience data.
This paper addresses the multiple two-sample test problem in a graph-structured setting, which is a common scenario in fields such as Spatial Statistics and Neuroscience. Each node $v$ in fixed graph deals with a two-sample testing problem between two node-specific probability density functions (pdfs), $p_v$ and $q_v$. The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected, under the assumption that connected nodes would yield similar test outcomes. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure and minimizes the assumptions over $p_v$ and $q_v$. Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning. We use synthetic experiments and a real sensor network detecting seismic activity to demonstrate that CTST outperforms state-of-the-art non-parametric statistical tests that apply at each node independently, hence disregard the geometry of the problem.
We formulate a uniform tail bound for empirical processes indexed by a class of functions, in terms of the individual deviations of the functions rather than the worst-case deviation in the considered class. The tail bound is established by introducing an initial "deflation" step to the standard generic chaining argument. The resulting tail bound is the sum of the complexity of the "deflated function class" in terms of a generalization of Talagrand's $\gamma$ functional, and the deviation of the function instance, both of which are formulated based on the natural seminorm induced by the corresponding Cram\'{e}r functions. We also provide certain approximations for the mentioned seminorm when the function class lies in a given (exponential type) Orlicz space, that can be used to make the complexity term and the deviation term more explicit.
We introduce $\varepsilon$-approximate versions of the notion of Euclidean vector bundle for $\varepsilon \geq 0$, which recover the classical notion of Euclidean vector bundle when $\varepsilon = 0$. In particular, we study \v{C}ech cochains with coefficients in the orthogonal group that satisfy an approximate cocycle condition. We show that $\varepsilon$-approximate vector bundles can be used to represent classical vector bundles when $\varepsilon > 0$ is sufficiently small. We also introduce distances between approximate vector bundles and use them to prove that sufficiently similar approximate vector bundles represent the same classical vector bundle. This gives a way of specifying vector bundles over finite simplicial complexes using a finite amount of data, and also allows for some tolerance to noise when working with vector bundles in an applied setting. As an example, we prove a reconstruction theorem for vector bundles from finite samples. We give algorithms for the effective computation of low-dimensional characteristic classes of vector bundles directly from discrete and approximate representations and illustrate the usage of these algorithms with computational examples.
The Spatial AutoRegressive model (SAR) is commonly used in studies involving spatial and network data to estimate the spatial or network peer influence and the effects of covariates on the response, taking into account the spatial or network dependence. While the model can be efficiently estimated with a Quasi maximum likelihood approach (QMLE), the detrimental effect of covariate measurement error on the QMLE and how to remedy it is currently unknown. If covariates are measured with error, then the QMLE may not have the $\sqrt{n}$ convergence and may even be inconsistent even when a node is influenced by only a limited number of other nodes or spatial units. We develop a measurement error-corrected ML estimator (ME-QMLE) for the parameters of the SAR model when covariates are measured with error. The ME-QMLE possesses statistical consistency and asymptotic normality properties. We consider two types of applications. The first is when the true covariate cannot be measured directly, and a proxy is observed instead. The second one involves including latent homophily factors estimated with error from the network for estimating peer influence. Our numerical results verify the bias correction property of the estimator and the accuracy of the standard error estimates in finite samples. We illustrate the method on a real dataset related to county-level death rates from the COVID-19 pandemic.
We present a graph-based discretization method for solving hyperbolic systems of conservation laws using discontinuous finite elements. The method is based on the convex limiting technique technique introduced by Guermond et al. (SIAM J. Sci. Comput. 40, A3211--A3239, 2018). As such, these methods are mathematically guaranteed to be invariant-set preserving and to satisfy discrete pointwise entropy inequalities. In this paper we extend the theory for the specific case of discontinuous finite elements, incorporating the effect of boundary conditions into the formulation. From a practical point of view, the implementation of these methods is algebraic, meaning, that they operate directly on the stencil of the spatial discretization. This first paper in a sequence of two papers introduces and verifies essential building blocks for the convex limiting procedure using discontinuous Galerkin discretizations. In particular, we discuss a minimally stabilized high-order discontinuous Galerkin method that exhibits optimal convergence rates comparable to linear stabilization techniques for cell-based methods. In addition, we discuss a proper choice of local bounds for the convex limiting procedure. A follow-up contribution will focus on the high-performance implementation, benchmarking and verification of the method. We verify convergence rates on a sequence of one- and two-dimensional tests with differing regularity. In particular, we obtain optimal convergence rates for single rarefaction waves. We also propose a simple test in order to verify the implementation of boundary conditions and their convergence rates.
We consider nonparametric Bayesian inference in a multidimensional diffusion model with reflecting boundary conditions based on discrete high-frequency observations. We prove a general posterior contraction rate theorem in $L^2$-loss, which is applied to Gaussian priors. The resulting posteriors, as well as their posterior means, are shown to converge to the ground truth at the minimax optimal rate over H\"older smoothness classes in any dimension. Of independent interest and as part of our proofs, we show that certain frequentist penalized least squares estimators are also minimax optimal.
Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.