This work is concerned with approximating matrix functions for banded matrices, hierarchically semiseparable matrices, and related structures. We develop a new divide-and-conquer method based on (rational) Krylov subspace methods for performing low-rank updates of matrix functions. Our convergence analysis of the newly proposed method proceeds by establishing relations to best polynomial and rational approximation. When only the trace or the diagonal of the matrix function is of interest, we demonstrate -- in practice and in theory -- that convergence can be faster. For the special case of a banded matrix, we show that the divide-and-conquer method reduces to a much simpler algorithm, which proceeds by computing matrix functions of small submatrices. Numerical experiments confirm the effectiveness of the newly developed algorithms for computing large-scale matrix functions from a wide variety of applications.
Support constrained generator matrices for linear codes have been found applications in multiple access networks and weakly secure document exchange. The necessary and sufficient conditions for the existence of Reed-Solomon codes with support constrained generator matrices were conjectured by Dau, Song, Yuen and Hassibi. This conjecture is called the GM-MDS conjecture and finally proved recently in independent works of Lovett and Yildiz-Hassibi. From their conjecture support constrained generator matrices for MDS codes are existent over linear size small fields. In this paper we propose a natural generalized conjecture for the support constrained matrices based on the generalized Hamming weights (SCGM-GHW conjecture). The GM-MDS conjecture can be thought as a very special case of our SCGM-GHW conjecture for linear $1$-MDS codes. We investigate the support constrained generator matrices for some linear codes such as $2$-MDS codes, first order Reed-Muller codes, algebraic-geometric codes from elliptic curves from the view of our SCGM-GHW conjecture. In particular the direct generalization of the GM-MDS conjecture about $1$-MDS codes to $2$-MDS codes is not true. For linear $2$-MDS codes only cardinality-based constraints on subset systems are not sufficient for the purpose that these subsets are in the zero coordinate position sets of rows in generator matrices.
This paper introduces a differential dynamic programming (DDP) based framework for polynomial trajectory generation for differentially flat systems. In particular, instead of using a linear equation with increasing size to represent multiple polynomial segments as in literature, we take a new perspective from state-space representation such that the linear equation reduces to a finite horizon control system with a fixed state dimension and the required continuity conditions for consecutive polynomials are automatically satisfied. Consequently, the constrained trajectory generation problem (both with and without time optimization) can be converted to a discrete-time finite-horizon optimal control problem with inequality constraints, which can be approached by a recently developed interior-point DDP (IPDDP) algorithm. Furthermore, for unconstrained trajectory generation with preallocated time, we show that this problem is indeed a linear-quadratic tracking (LQT) problem (DDP algorithm with exact one iteration). All these algorithms enjoy linear complexity with respect to the number of segments. Both numerical comparisons with state-of-the-art methods and physical experiments are presented to verify and validate the effectiveness of our theoretical findings. The implementation code will be open-sourced,
The tube method or the volume-of-tube method approximates the tail probability of the maximum of a smooth Gaussian random field with zero mean and unit variance. This method evaluates the volume of a spherical tube about the index set, and then transforms it to the tail probability. In this study, we generalize the tube method to a case in which the variance is not constant. We provide the volume formula for a spherical tube with a non-constant radius in terms of curvature tensors, and the tail probability formula of the maximum of a Gaussian random field with inhomogeneous variance, as well as its Laplace approximation. In particular, the critical radius of the tube is generalized for evaluation of the asymptotic approximation error. As an example, we discuss the approximation of the largest eigenvalue distribution of the Wishart matrix with a non-identity matrix parameter. The Bonferroni method is the tube method when the index set is a finite set. We provide the formula for the asymptotic approximation error for the Bonferroni method when the variance is not constant.
Deep metric learning (DML) is a cornerstone of many computer vision applications. It aims at learning a mapping from the input domain to an embedding space, where semantically similar objects are located nearby and dissimilar objects far from another. The target similarity on the training data is defined by user in form of ground-truth class labels. However, while the embedding space learns to mimic the user-provided similarity on the training data, it should also generalize to novel categories not seen during training. Besides user-provided groundtruth training labels, a lot of additional visual factors (such as viewpoint changes or shape peculiarities) exist and imply different notions of similarity between objects, affecting the generalization on the images unseen during training. However, existing approaches usually directly learn a single embedding space on all available training data, struggling to encode all different types of relationships, and do not generalize well. We propose to build a more expressive representation by jointly splitting the embedding space and the data hierarchically into smaller sub-parts. We successively focus on smaller subsets of the training data, reducing its variance and learning a different embedding subspace for each data subset. Moreover, the subspaces are learned jointly to cover not only the intricacies, but the breadth of the data as well. Only after that, we build the final embedding from the subspaces in the conquering stage. The proposed algorithm acts as a transparent wrapper that can be placed around arbitrary existing DML methods. Our approach significantly improves upon the state-of-the-art on image retrieval, clustering, and re-identification tasks evaluated using CUB200-2011, CARS196, Stanford Online Products, In-shop Clothes, and PKU VehicleID datasets.
A widely used approach to compute the action $f(A)v$ of a matrix function $f(A)$ on a vector $v$ is to use a rational approximation $r$ for $f$ and compute $r(A)v$ instead. If $r$ is not computed adaptively as in rational Krylov methods, this is usually done using the partial fraction expansion of $r$ and solving linear systems with matrices $A- \tau I$ for the various poles $\tau$ of $r$. Here we investigate an alternative approach for the case that a continued fraction representation for the rational function is known rather than a partial fraction expansion. This is typically the case, for example, for Pad\'e approximations. From the continued fraction, we first construct a matrix pencil from which we then obtain what we call the CF-matrix (continued fraction matrix), a block tridiagonal matrix whose blocks consist of polynomials of $A$ with degree bounded by 1 for many continued fractions. We show that one can evaluate $r(A)v$ by solving a single linear system with the CF-matrix and present a number of first theoretical results as a basis for an analysis of future, specific solution methods for the large linear system. While the CF-matrix approach is of principal interest on its own as a new way to compute $f(A)v$, it can in particular be beneficial when a partial fraction expansion is not known beforehand and computing its parameters is ill-conditioned. We report some numerical experiments which show that with standard preconditioners we can achieve fast convergence in the iterative solution of the large linear system.
We consider the use of Gaussian process (GP) priors for solving inverse problems in a Bayesian framework. As is well known, the computational complexity of GPs scales cubically in the number of datapoints. We here show that in the context of inverse problems involving integral operators, one faces additional difficulties that hinder inversion on large grids. Furthermore, in that context, covariance matrices can become too large to be stored. By leveraging results about sequential disintegrations of Gaussian measures, we are able to introduce an implicit representation of posterior covariance matrices that reduces the memory footprint by only storing low rank intermediate matrices, while allowing individual elements to be accessed on-the-fly without needing to build full posterior covariance matrices. Moreover, it allows for fast sequential inclusion of new observations. These features are crucial when considering sequential experimental design tasks. We demonstrate our approach by computing sequential data collection plans for excursion set recovery for a gravimetric inverse problem, where the goal is to provide fine resolution estimates of high density regions inside the Stromboli volcano, Italy. Sequential data collection plans are computed by extending the weighted integrated variance reduction (wIVR) criterion to inverse problems. Our results show that this criterion is able to significantly reduce the uncertainty on the excursion volume, reaching close to minimal levels of residual uncertainty. Overall, our techniques allow the advantages of probabilistic models to be brought to bear on large-scale inverse problems arising in the natural sciences.
We introduce a new method analyzing the cumulative sum (CUSUM) procedure in sequential change-point detection. When observations are phase-type distributed and the post-change distribution is given by exponential tilting of its pre-change distribution, the first passage analysis of the CUSUM statistic is reduced to that of a certain Markov additive process. By using the theory of the so-called scale matrix and further developing it, we derive exact expressions of the average run length, average detection delay, and false alarm probability under the CUSUM procedure. The proposed method is robust and applicable in a general setting with non-i.i.d. observations. Numerical results also are given.
Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.
In recent years, the sequence-to-sequence learning neural networks with attention mechanism have achieved great progress. However, there are still challenges, especially for Neural Machine Translation (NMT), such as lower translation quality on long sentences. In this paper, we present a hierarchical deep neural network architecture to improve the quality of long sentences translation. The proposed network embeds sequence-to-sequence neural networks into a two-level category hierarchy by following the coarse-to-fine paradigm. Long sentences are input by splitting them into shorter sequences, which can be well processed by the coarse category network as the long distance dependencies for short sentences is able to be handled by network based on sequence-to-sequence neural network. Then they are concatenated and corrected by the fine category network. The experiments shows that our method can achieve superior results with higher BLEU(Bilingual Evaluation Understudy) scores, lower perplexity and better performance in imitating expression style and words usage than the traditional networks.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.