A precision matrix is the inverse of a covariance matrix. In this paper, we study the problem of estimating the precision matrix with a known graphical structure under high-dimensional settings. We propose a simple estimator of the precision matrix based on the connection between the known graphical structure and the precision matrix. We obtain the rates of convergence of the proposed estimators and derive the asymptotic normality of the proposed estimator in the high-dimensional setting when the data dimension grows with the sample size. Numerical simulations are conducted to demonstrate the performance of the proposed method. We also show that the proposed method outperforms some existing methods that do not utilize the graphical structure information.
Most physical processes posses structural properties such as constant energies, volumes, and other invariants over time. When learning models of such dynamical systems, it is critical to respect these invariants to ensure accurate predictions and physically meaningful behavior. Strikingly, state-of-the-art methods in Gaussian process (GP) dynamics model learning are not addressing this issue. On the other hand, classical numerical integrators are specifically designed to preserve these crucial properties through time. We propose to combine the advantages of GPs as function approximators with structure preserving numerical integrators for dynamical systems, such as Runge-Kutta methods. These integrators assume access to the ground truth dynamics and require evaluations of intermediate and future time steps that are unknown in a learning-based scenario. This makes direct inference of the GP dynamics, with embedded numerical scheme, intractable. Our key technical contribution is the evaluation of the implicitly defined Runge-Kutta transition probability. In a nutshell, we introduce an implicit layer for GP regression, which is embedded into a variational inference-based model learning scheme.
We consider the problem of signal estimation in generalized linear models defined via rotationally invariant design matrices. Since these matrices can have an arbitrary spectral distribution, this model is well suited to capture complex correlation structures which often arise in applications. We propose a novel family of approximate message passing (AMP) algorithms for signal estimation, and rigorously characterize their performance in the high-dimensional limit via a state evolution recursion. Assuming knowledge of the design matrix spectrum, our rotationally invariant AMP has complexity of the same order as the existing AMP for Gaussian matrices; it also recovers the existing AMP as a special case. Numerical results showcase a performance close to Vector AMP (which is conjectured to be Bayes-optimal in some settings), but obtained with a much lower complexity, as the proposed algorithm does not require a computationally expensive singular value decomposition.
Finding a minimum vertex cover in a network is a fundamental NP-complete graph problem. One way to deal with its computational hardness, is to trade the qualitative performance of an algorithm (allowing non-optimal outputs) for an improved running time. For the vertex cover problem, there is a gap between theory and practice when it comes to understanding this tradeoff. On the one hand, it is known that it is NP-hard to approximate a minimum vertex cover within a factor of $\sqrt{2}$. On the other hand, a simple greedy algorithm yields close to optimal approximations in practice. A promising approach towards understanding this discrepancy is to recognize the differences between theoretical worst-case instances and real-world networks. Following this direction, we close the gap between theory and practice by providing an algorithm that efficiently computes nearly optimal vertex cover approximations on hyperbolic random graphs; a network model that closely resembles real-world networks in terms of degree distribution, clustering, and the small-world property. More precisely, our algorithm computes a $(1 + o(1))$-approximation, asymptotically almost surely, and has a running time of $\mathcal{O}(m \log(n))$. The proposed algorithm is an adaptation of the successful greedy approach, enhanced with a procedure that improves on parts of the graph where greedy is not optimal. This makes it possible to introduce a parameter that can be used to tune the tradeoff between approximation performance and running time. Our empirical evaluation on real-world networks shows that this allows for improving over the near-optimal results of the greedy approach.
In the article we focus on large-dimensional matrix factor models and propose estimators of factor loading matrices and factor score matrix from the perspective of minimizing least squares objective function. The resultant estimators turns out to be equivalent to the corresponding projected estimators in Yu et al. (2021), which enjoys the nice properties of reducing the magnitudes of the idiosyncratic error components and thereby increasing the signal-to-noise ratio. We derive the convergence rate of the theoretical minimizers under sub-Gaussian tails, instead of the one-step iteration estimators by Yu et al. (2021). Motivated by the least squares formulation, we further consider a robust method for estimating large-dimensional matrix factor model by utilizing Huber Loss function. Theoretically, we derive the convergence rates of the robust estimators of the factor loading matrices under finite fourth moment conditions. We also propose an iterative procedure to estimate the pair of row and column factor numbers robustly. We conduct extensive numerical studies to investigate the empirical performance of the proposed robust methods relative to the sate-of-the-art ones, which show the proposed ones perform robustly and much better than the existing ones when data are heavy-tailed while perform almost the same (comparably) with the projected estimators when data are light-tailed, and as a result can be used as a safe replacement of the existing ones. An application to a Fama-French financial portfolios dataset illustrates its empirical usefulness.
The gaussoid axioms are conditional independence inference rules which characterize regular Gaussian CI structures over a three-element ground set. It is known that no finite set of inference rules completely describes regular Gaussian CI as the ground set grows. In this article we show that the gaussoid axioms logically imply every inference rule of at most two antecedents which is valid for regular Gaussians over any ground set. The proof is accomplished by exhibiting for each inclusion-minimal gaussoid extension of at most two CI statements a regular Gaussian realization. Moreover we prove that all those gaussoids have rational positive-definite realizations inside every $\varepsilon$-ball around the identity matrix. For the proof we introduce the concept of algebraic Gaussians over arbitrary fields and of positive Gaussians over ordered fields and obtain the same two-antecedental completeness of the gaussoid axioms for algebraic and positive Gaussians over all fields of characteristic zero as a byproduct.
Analyzing massive spatial datasets using Gaussian process model poses computational challenges. This is a problem prevailing heavily in applications such as environmental modeling, ecology, forestry and environmental heath. We present a novel approximate inference methodology that uses profile likelihood and Krylov subspace methods to estimate the spatial covariance parameters and makes spatial predictions with uncertainty quantification. The proposed method, Kryging, applies for both observations on regular grid and irregularly-spaced observations, and for any Gaussian process with a stationary covariance function, including the popular $\Matern$ covariance family. We make use of the block Toeplitz structure with Toeplitz blocks of the covariance matrix and use fast Fourier transform methods to alleviate the computational and memory bottlenecks. We perform extensive simulation studies to show the effectiveness of our model by varying sample sizes, spatial parameter values and sampling designs. A real data application is also performed on a dataset consisting of land surface temperature readings taken by the MODIS satellite. Compared to existing methods, the proposed method performs satisfactorily with much less computation time and better scalability.
Gaussian covariance graph model is a popular model in revealing underlying dependency structures among random variables. A Bayesian approach to the estimation of covariance structures uses priors that force zeros on some off-diagonal entries of covariance matrices and put a positive definite constraint on matrices. In this paper, we consider a spike and slab prior on off-diagonal entries, which uses a mixture of point-mass and normal distribution. The point-mass naturally introduces sparsity to covariance structures so that the resulting posterior from this prior renders covariance structure learning. Under this prior, we calculate posterior model probabilities of covariance structures using Laplace approximation. We show that the error due to Laplace approximation becomes asymptotically marginal at some rate depending on the posterior convergence rate of covariance matrix under the Frobenius norm. With the approximated posterior model probabilities, we propose a new framework for estimating a covariance structure. Since the Laplace approximation is done around the mode of conditional posterior of covariance matrix, which cannot be obtained in the closed form, we propose a block coordinate descent algorithm to find the mode and show that the covariance matrix can be estimated using this algorithm once the structure is chosen. Through a simulation study based on five numerical models, we show that the proposed method outperforms graphical lasso and sample covariance matrix in terms of root mean squared error, max norm, spectral norm, specificity, and sensitivity. Also, the advantage of the proposed method is demonstrated in terms of accuracy compared to our competitors when it is applied to linear discriminant analysis (LDA) classification to breast cancer diagnostic dataset.
Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, in continuous state and actions spaces and a Gaussian policy -- common in computer animation and robotics -- PPO is prone to getting stuck in local optima. In this paper, we observe a tendency of PPO to prematurely shrink the exploration variance, which naturally leads to slow progress. Motivated by this, we borrow ideas from CMA-ES, a black-box optimization method designed for intelligent adaptive Gaussian exploration, to derive PPO-CMA, a novel proximal policy optimization approach that can expand the exploration variance on objective function slopes and shrink the variance when close to the optimum. This is implemented by using separate neural networks for policy mean and variance and training the mean and variance in separate passes. Our experiments demonstrate a clear improvement over vanilla PPO in many difficult OpenAI Gym MuJoCo tasks.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.