In this paper we derive a Toeplitz-structured closed form of the unique positive semi-definite stabilizing solution for the discrete-time algebraic Riccati equations, especially for the case that the state matrix is not stable. Based on the found form and fast Fourier transform, we propose a new algorithm for solving both discrete-time and continuous-time large-scale algebraic Riccati equations with low-rank structure. It works without unnecessary assumptions, complicated shift selection strategies, or matrix calculations of the cubic order with respect to the problem scale. Numerical examples are given to illustrate its features. Besides, we show that it is theoretically equivalent to several algorithms existing in the literature in the sense that they all produce the same sequence under the same parameter setting.
In many applications, linear systems arise where the coefficient matrix takes the special form ${\bf I} + {\bf K} + {\bf E}$, where ${\bf I}$ is the identity matrix of dimension $n$, ${\rm rank}({\bf K}) = p \ll n$, and $\|{\bf E}\| \leq \epsilon < 1$. GMRES convergence rates for linear systems with coefficient matrices of the forms ${\bf I} + {\bf K}$ and ${\bf I} + {\bf E}$ are guaranteed by well-known theory, but only relatively weak convergence bounds specific to matrices of the form ${\bf I} + {\bf K} + {\bf E}$ currently exist. In this paper, we explore the convergence properties of linear systems with such coefficient matrices by considering the pseudospectrum of ${\bf I} + {\bf K}$. We derive a bound for the GMRES residual in terms of $\epsilon$ when approximately solving the linear system $({\bf I} + {\bf K} + {\bf E}){\bf x} = {\bf b}$ and identify the eigenvalues of ${\bf I} + {\bf K}$ that are sensitive to perturbation. In particular, while a clustered spectrum away from the origin is often a good indicator of fast GMRES convergence, that convergence may be slow when some of those eigenvalues are ill-conditioned. We show there can be at most $2p$ eigenvalues of ${\bf I} + {\bf K}$ that are sensitive to small perturbations. We present numerical results when using GMRES to solve a sequence of linear systems of the form $({\bf I} + {\bf K}_j + {\bf E}_j){\bf x}_j = {\bf b}_j$ that arise from the application of Broyden's method to solve a nonlinear partial differential equation.
We present a novel sequential Monte Carlo approach to online smoothing of additive functionals in a very general class of path-space models. Hitherto, the solutions proposed in the literature suffer from either long-term numerical instability due to particle-path degeneracy or, in the case that degeneracy is remedied by particle approximation of the so-called backward kernel, high computational demands. In order to balance optimally computational speed against numerical stability, we propose to furnish a (fast) naive particle smoother, propagating recursively a sample of particles and associated smoothing statistics, with an adaptive backward-sampling-based updating rule which allows the number of (costly) backward samples to be kept at a minimum. This yields a new, function-specific additive smoothing algorithm, AdaSmooth, which is computationally fast, numerically stable and easy to implement. The algorithm is provided with rigorous theoretical results guaranteeing its consistency, asymptotic normality and long-term stability as well as numerical results demonstrating empirically the clear superiority of AdaSmooth to existing algorithms.
This work provides a theoretical analysis for optimally solving the pose estimation problem using total least squares for vector observations from landmark features, which is central to applications involving simultaneous localization and mapping. First, the optimization process is formulated with observation vectors extracted from point-cloud features. Then, error-covariance expressions are derived. The attitude and position estimates obtained via the derived optimization process are proven to reach the bounds defined by the Cram\'er-Rao lower bound under the small-angle approximation of attitude errors. A fully populated observation noise-covariance matrix is assumed as the weight in the cost function to cover the most general case of the sensor uncertainty. This includes more generic correlations in the errors than previous cases involving an isotropic noise assumption. The proposed solution is verified using Monte Carlo simulations and an experiment with an actual LIDAR to validate the error-covariance analysis.
Matching and pricing are two critical levers in two-sided marketplaces to connect demand and supply. The platform can produce more efficient matching and pricing decisions by batching the demand requests. We initiate the study of the two-stage stochastic matching problem, with or without pricing, to enable the platform to make improved decisions in a batch with an eye toward the imminent future demand requests. This problem is motivated in part by applications in online marketplaces such as ride hailing platforms. We design online competitive algorithms for vertex-weighted (or unweighted) two-stage stochastic matching for maximizing supply efficiency, and two-stage joint matching and pricing for maximizing market efficiency. In the former problem, using a randomized primal-dual algorithm applied to a family of ``balancing'' convex programs, we obtain the optimal $3/4$ competitive ratio against the optimum offline benchmark. Using a factor revealing program and connections to submodular optimization, we improve this ratio against the optimum online benchmark to $(1-1/e+1/e^2)\approx 0.767$ for the unweighted and $0.761$ for the weighted case. In the latter problem, we design optimal $1/2$-competitive joint pricing and matching algorithm by borrowing ideas from the ex-ante prophet inequality literature. We also show an improved $(1-1/e)$-competitive algorithm for the special case of demand efficiency objective using the correlation gap of submodular functions. Finally, we complement our theoretical study by using DiDi's ride-sharing dataset for Chengdu city and numerically evaluating the performance of our proposed algorithms in practical instances of this problem.
In this paper, we investigate the Gaussian graphical model inference problem in a novel setting that we call erose measurements, referring to irregularly measured or observed data. For graphs, this results in different node pairs having vastly different sample sizes which frequently arises in data integration, genomics, neuroscience, and sensor networks. Existing works characterize the graph selection performance using the minimum pairwise sample size, which provides little insights for erosely measured data, and no existing inference method is applicable. We aim to fill in this gap by proposing the first inference method that characterizes the different uncertainty levels over the graph caused by the erose measurements, named GI-JOE (Graph Inference when Joint Observations are Erose). Specifically, we develop an edge-wise inference method and an affiliated FDR control procedure, where the variance of each edge depends on the sample sizes associated with corresponding neighbors. We prove statistical validity under erose measurements, thanks to careful localized edge-wise analysis and disentangling the dependencies across the graph. Finally, through simulation studies and a real neuroscience data example, we demonstrate the advantages of our inference methods for graph selection from erosely measured data.
This paper considers the estimation and inference of the low-rank components in high-dimensional matrix-variate factor models, where each dimension of the matrix-variates ($p \times q$) is comparable to or greater than the number of observations ($T$). We propose an estimation method called $\alpha$-PCA that preserves the matrix structure and aggregates mean and contemporary covariance through a hyper-parameter $\alpha$. We develop an inferential theory, establishing consistency, the rate of convergence, and the limiting distributions, under general conditions that allow for correlations across time, rows, or columns of the noise. We show both theoretical and empirical methods of choosing the best $\alpha$, depending on the use-case criteria. Simulation results demonstrate the adequacy of the asymptotic results in approximating the finite sample properties. The $\alpha$-PCA compares favorably with the existing ones. Finally, we illustrate its applications with a real numeric data set and two real image data sets. In all applications, the proposed estimation procedure outperforms previous methods in the power of variance explanation using out-of-sample 10-fold cross-validation.
In this work, a fully adaptive finite element algorithm for symmetric second-order elliptic diffusion problems with inexact solver is developed. The discrete systems are treated by a local higher-order geometric multigrid method extending the approach of [Mira\c{c}i, Pape\v{z}, Vohral\'{i}k, SIAM J. Sci. Comput. (2021)]. We show that the iterative solver contracts the algebraic error robustly with respect to the polynomial degree $p \ge 1$ and the (local) mesh size $h$. We further prove that the built-in algebraic error estimator is $h$- and $p$-robustly equivalent to the algebraic error. The proofs rely on suitably chosen robust stable decompositions and a strengthened Cauchy-Schwarz inequality on bisection-generated meshes. Together, this yields that the proposed adaptive algorithm has optimal computational cost. Numerical experiments confirm the theoretical findings.
A popular method for variance reduction in observational causal inference is propensity-based trimming, the practice of removing units with extreme propensities from the sample. This practice has theoretical grounding when the data are homoscedastic and the propensity model is parametric (Yang and Ding, 2018; Crump et al. 2009), but in modern settings where heteroscedastic data are analyzed with non-parametric models, existing theory fails to support current practice. In this work, we address this challenge by developing new methods and theory for sample trimming. Our contributions are three-fold: first, we describe novel procedures for selecting which units to trim. Our procedures differ from previous work in that we trim not only units with small propensities, but also units with extreme conditional variances. Second, we give new theoretical guarantees for inference after trimming. In particular, we show how to perform inference on the trimmed subpopulation without requiring that our regressions converge at parametric rates. Instead, we make only fourth-root rate assumptions like those in the double machine learning literature. This result applies to conventional propensity-based trimming as well and thus may be of independent interest. Finally, we propose a bootstrap-based method for constructing simultaneously valid confidence intervals for multiple trimmed sub-populations, which are valuable for navigating the trade-off between sample size and variance reduction inherent in trimming. We validate our methods in simulation, on the 2007-2008 National Health and Nutrition Examination Survey, and on a semi-synthetic Medicare dataset and find promising results in all settings.
With the rapid development of machine learning, improving its explainability has become a crucial research goal. We study the problem of making the clusters more explainable by investigating the cluster descriptors. Given a set of objects $S$, a clustering of these objects $\pi$, and a set of tags $T$ that have not participated in the clustering algorithm. Each object in $S$ is associated with a subset of $T$. The goal is to find a representative set of tags for each cluster, referred to as the cluster descriptors, with the constraint that these descriptors we find are pairwise disjoint, and the total size of all the descriptors is minimized. In general, this problem is NP-hard. We propose a novel explainability model that reinforces the previous models in such a way that tags that do not contribute to explainability and do not sufficiently distinguish between clusters are not added to the optimal descriptors. The proposed model is formulated as a quadratic unconstrained binary optimization problem which makes it suitable for solving on modern optimization hardware accelerators. We experimentally demonstrate how a proposed explainability model can be solved on specialized hardware for accelerating combinatorial optimization, the Fujitsu Digital Annealer, and use real-life Twitter and PubMed datasets for use cases.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.