We approximate the d complex zeros of a univariate polynomial p(x) of a degree d or those zeros that lie in a fixed region of interest on the complex plane such as a disc or a square. Our divide and conquer algorithm of STOC 1995 supports solution of this problem in optimal Boolean time (up to a poly-logarithmic factor), that is, runs nearly as fast as one can access the coefficients of p with the precision necessary to support required accuracy of the output. That record complexity has not been matched by any other algorithm yet, but our root-finder of 1995 is quite involved and has never been implemented. We present alternative nearly optimal root-finders based on our novel variants of the classical subdivision iterations. Unlike our predecessor of 1995, we require randomization of Las Vegas type, allowing us to detect any output error at a dominated computational cost, but our new root-finders are much simpler to implement than their predecessor of 1995. According to the results of extensive test with standard test polynomials for their preliminary version, which incorporates only a part of our novel techniques, the new root-finders compete and for a large class of inputs significantly supersedes the package of root-finding subroutines MPSolve, which for decades has been user's choice package. Unlike our predecessor of 1995 and all known fast algorithms for the cited tasks of polynomial root-finding, our new algorithms can be also applied to a polynomial given by a black box oracle for its evaluation rather than by its coefficients. This makes our root-finders particularly efficient for polynomials p(x) that can be evaluated fast such as the Mandelbrot polynomials or those given by the sum of a small number of shifted monomials. Our algorithm can be readily extended to fast approximation of the eigenvalues of a matrix or a matrix polynomial.
We consider the problem of finite-time identification of linear dynamical systems from $T$ samples of a single trajectory. Recent results have predominantly focused on the setup where no structural assumption is made on the system matrix $A^* \in \mathbb{R}^{n \times n}$, and have consequently analyzed the ordinary least squares (OLS) estimator in detail. We assume prior structural information on $A^*$ is available, which can be captured in the form of a convex set $\mathcal{K}$ containing $A^*$. For the solution of the ensuing constrained least squares estimator, we derive non-asymptotic error bounds in the Frobenius norm that depend on the local size of $\mathcal{K}$ at $A^*$. To illustrate the usefulness of these results, we instantiate them for three examples, namely when (i) $A^*$ is sparse and $\mathcal{K}$ is a suitably scaled $\ell_1$ ball; (ii) $\mathcal{K}$ is a subspace; (iii) $\mathcal{K}$ consists of matrices each of which is formed by sampling a bivariate convex function on a uniform $n \times n$ grid (convex regression). In all these situations, we show that $A^*$ can be reliably estimated for values of $T$ much smaller than what is needed for the unconstrained setting.
The matrix factor model has drawn growing attention for its advantage in achieving two-directional dimension reduction simultaneously for matrix-structured observations. In this paper, we propose a simple iterative least squares algorithm for matrix factor models, in contrast to the Principal Component Analysis (PCA)-based methods in the literature. In detail, we first propose to estimate the latent factor matrices by projecting the observations with two deterministic weight matrices, which are chosen to diversify away the idiosyncratic components. We show that the inferences on factors are still asymptotically valid even if we overestimate both the row/column factor numbers. We then estimate the row/column loading matrices by minimizing the squared loss function under certain identifiability conditions. The resultant estimators of the loading matrices are treated as the new weight/projection matrices and thus the above update procedure can be iteratively performed until convergence. Theoretically, given the true dimensions of the factor matrices, we derive the convergence rates of the estimators for loading matrices and common components at any $s$-th step iteration. Additionally, we propose an eigenvalue-ratio method to estimate the pair of factor numbers consistently. Thorough numerical simulations are conducted to investigate the finite-sample performance of the proposed methods and two real datasets associated with financial portfolios and multinational macroeconomic indices are used to illustrate our algorithm's practical usefulness.
Using techniques developed recently in the field of compressed sensing we prove new upper bounds for general (nonlinear) sampling numbers of (quasi-)Banach smoothness spaces in $L^2$. In particular, we show that in relevant cases such as mixed and isotropic weighted Wiener classes or Sobolev spaces with mixed smoothness, sampling numbers in $L^2$ can be upper bounded by best $n$-term trigonometric widths in $L^\infty$. We describe a recovery procedure from $m$ function values based on $\ell^1$-minimization (basis pursuit denoising). With this method, a significant gain in the rate of convergence compared to recently developed linear recovery methods is achieved. In this deterministic worst-case setting we see an additional speed-up of $m^{-1/2}$ (up to log factors) compared to linear methods in case of weighted Wiener spaces. For their quasi-Banach counterparts even arbitrary polynomial speed-up is possible. Surprisingly, our approach allows to recover mixed smoothness Sobolev functions belonging to $S^r_pW(\mathbb{T}^d)$ on the $d$-torus with a logarithmically better rate of convergence than any linear method can achieve when $1 < p < 2$ and $d$ is large. This effect is not present for isotropic Sobolev spaces.
Pattern discovery is a machine learning technique that aims to find sets of items, subsequences, or substructures that are present in a dataset with a higher frequency value than a manually set threshold. This process helps to identify recurring patterns or relationships within the data, allowing for valuable insights and knowledge extraction. In this work, we propose Information Gained Subgroup Discovery (IGSD), a new SD algorithm for pattern discovery that combines Information Gain (IG) and Odds Ratio (OR) as a multi-criteria for pattern selection. The algorithm tries to tackle some limitations of state-of-the-art SD algorithms like the need for fine-tuning of key parameters for each dataset, usage of a single pattern search criteria set by hand, usage of non-overlapping data structures for subgroup space exploration, and the impossibility to search for patterns by fixing some relevant dataset variables. Thus, we compare the performance of IGSD with two state-of-the-art SD algorithms: FSSD and SSD++. Eleven datasets are assessed using these algorithms. For the performance evaluation, we also propose to complement standard SD measures with IG, OR, and p-value. Obtained results show that FSSD and SSD++ algorithms provide less reliable patterns and reduced sets of patterns than IGSD algorithm for all datasets considered. Additionally, IGSD provides better OR values than FSSD and SSD++, stating a higher dependence between patterns and targets. Moreover, patterns obtained for one of the datasets used, have been validated by a group of domain experts. Thus, patterns provided by IGSD show better agreement with experts than patterns obtained by FSSD and SSD++ algorithms. These results demonstrate the suitability of the IGSD as a method for pattern discovery and suggest that the inclusion of non-standard SD metrics allows to better evaluate discovered patterns.
In many applications, it is desired to obtain extreme eigenvalues and eigenvectors of large Hermitian matrices by efficient and compact algorithms. In particular, orthogonalization-free methods are preferred for large-scale problems for finding eigenspaces of extreme eigenvalues without explicitly computing orthogonal vectors in each iteration. For the top $p$ eigenvalues, the simplest orthogonalization-free method is to find the best rank-$p$ approximation to a positive semi-definite Hermitian matrix by algorithms solving the unconstrained Burer-Monteiro formulation. We show that the nonlinear conjugate gradient method for the unconstrained Burer-Monteiro formulation is equivalent to a Riemannian conjugate gradient method on a quotient manifold with the Bures-Wasserstein metric, thus its global convergence to a stationary point can be proven. Numerical tests suggest that it is efficient for computing the largest $k$ eigenvalues for large-scale matrices if the largest $k$ eigenvalues are nearly distributed uniformly.
Distribution data refers to a data set where each sample is represented as a probability distribution, a subject area receiving burgeoning interest in the field of statistics. Although several studies have developed distribution-to-distribution regression models for univariate variables, the multivariate scenario remains under-explored due to technical complexities. In this study, we introduce models for regression from one Gaussian distribution to another, utilizing the Wasserstein metric. These models are constructed using the geometry of the Wasserstein space, which enables the transformation of Gaussian distributions into components of a linear matrix space. Owing to their linear regression frameworks, our models are intuitively understandable, and their implementation is simplified because of the optimal transport problem's analytical solution between Gaussian distributions. We also explore a generalization of our models to encompass non-Gaussian scenarios. We establish the convergence rates of in-sample prediction errors for the empirical risk minimizations in our models. In comparative simulation experiments, our models demonstrate superior performance over a simpler alternative method that transforms Gaussian distributions into matrices. We present an application of our methodology using weather data for illustration purposes.
We study a portioning setting in which a public resource such as time or money is to be divided among a given set of candidates, and each agent proposes a division of the resource. We consider two families of aggregation rules for this setting - those based on coordinate-wise aggregation and those that optimize some notion of welfare - as well as the recently proposed Independent Markets mechanism. We provide a detailed analysis of these rules from an axiomatic perspective, both for classic axioms, such as strategyproofness and Pareto optimality, and for novel axioms, which aim to capture proportionality in this setting. Our results indicate that a simple rule that computes the average of all proposals satisfies many of our axioms, including some that are violated by more sophisticated rules.
We are interested in creating statistical methods to provide informative summaries of random fields through the geometry of their excursion sets. To this end, we introduce an estimator for the length of the perimeter of excursion sets of random fields on $\mathbb{R}^2$ observed over regular square tilings. The proposed estimator acts on the empirically accessible binary digital images of the excursion regions and computes the length of a piecewise linear approximation of the excursion boundary. The estimator is shown to be consistent as the pixel size decreases, without the need of any normalization constant, and with neither assumption of Gaussianity nor isotropy imposed on the underlying random field. In this general framework, even when the domain grows to cover $\mathbb{R}^2$, the estimation error is shown to be of smaller order than the side length of the domain. For affine, strongly mixing random fields, this translates to a multivariate Central Limit Theorem for our estimator when multiple levels are considered simultaneously. Finally, we conduct several numerical studies to investigate statistical properties of the proposed estimator in the finite-sample data setting.
In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.
Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.