We establish new, and surprisingly tight, connections between propositional proof complexity and finite model theory. Specifically, we show that the power of several propositional proof systems, such as Horn resolution, bounded-width resolution, and the monomial calculus of bounded degree, can be characterised in a precise sense by variants of fixed-point logics that are of fundamental importance in descriptive complexity theory. Our main results are that Horn resolution has the same expressive power as least fixed-point logic, that bounded-width resolution captures existential least fixed-point logic, and that the polynomial calculus with bounded degree over the rationals solves precisely the problems definable in fixed-point logic with counting. We also study the bounded-degree polynomial calculus. Over the rationals, it captures fixed-point logic with counting if we restrict the bit-complexity of the coefficients. For unrestricted coefficients, we can only say that the bounded-degree polynomial calculus is at most as powerful as bounded variable infinitary counting logic, but a precise logical characterisation of its power remains an open problem. These connections between logics and proof systems allow us to establish finite-model-theoretic tools for proving lower bounds for the polynomial calculus over the rationals and also over finite fields. This is a corrected version of the paper (arXiv:1802.09377) published originally on January 23, 2019.
Johnson-Lindenstrauss lemma states random projections can be used as a topology preserving embedding technique for fixed vectors. In this paper, we try to understand how random projections affect probabilistic properties of random vectors. In particular we prove the distribution of inner product of two independent random vectors $X, Z \in {R}^n$ is preserved by random projection $S:{R}^n \to {R}^m$. More precisely, \[ \sup_t \left| \text{P}(\frac{1}{C_{m,n}} X^TS^TSZ <t) - \text{P}(\frac{1}{\sqrt{n}} X^TZ<t) \right| \le O\left(\frac{1}{\sqrt{n}}+ \frac{1}{\sqrt{m}} \right) \] This is achieved by proving a general central limit theorem (product-CLT) for $\sum_{k=1}^{n} X_k Y_k$, where $\{X_k\}$ is a martingale difference sequence, and $\{Y_k\}$ has dependency within the sequence. We also obtain the rate of convergence in the spirit of Berry-Esseen theorem.
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression with m data points, each an independent n-dimensional multivariate Gaussian. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise, and thus fairly reflects the value or lack of same of the model. This paper is the first to solve for the training and test size for any model in a way that is truly optimal. The number of data points in the training set is the root of a quartic polynomial Theorem 1 derives which depends only on m and n; the covariance matrix of the multivariate Gaussian, the true model parameters, and the true measurement noise drop out of the calculations. The critical mathematical difficulties were realizing that the problems herein were discussed in the context of the Jacobi Ensemble, a probability distribution describing the eigenvalues of a known random matrix model, and evaluating a new integral in the style of Selberg and Aomoto. Mathematical results are supported with thorough computational evidence. This paper is a step towards automatic choices of training/test set sizes in machine learning.
We propose application-layer coding schemes to recover lost data in delay-sensitive uplink (sensor-to-gateway) communications in the Internet of Things. Built on an approach that combines retransmissions and forward erasure correction, the proposed schemes' salient features include low computational complexity and the ability to exploit sporadic receiver feedback for efficient data recovery. Reduced complexity is achieved by keeping the number of coded transmissions as low as possible and by devising a mechanism to compute the optimal degree of a coded packet in O(1). Our major contributions are: (a) An enhancement to an existing scheme called windowed coding, whose complexity is greatly reduced and data recovery performance is improved by our proposed approach. (b) A technique that combines elements of windowed coding with a new feedback structure to further reduce the coding complexity and improve data recovery. (c) A coded forwarding scheme in which a relay node provides further resilience against packet loss by overhearing source-to-destination communications and making forwarding decisions based on overheard information.
Available in the literature are properties which characterize the gamma distribution via independence of two appropriately chosen statistics. Well-known is the classical result when one of the statistics is the sample mean and the other one the sample coefficient of variation. In this paper, we elaborate on a version of Anosov's theorem which allows to establish a general result, Theorem 1, and a series of seven corollaries providing new characterization results for gamma distributions. We keep the sample mean as one of involved statistics, while now the second one can be taken from a quite large class of homogeneous feasible definite statistics. It is relevant to mention that there is an interesting parallel between the new characterization results for gamma distributions and recent characterization results for the normal distribution.
Learning controllers from data for stabilizing dynamical systems typically follows a two step process of first identifying a model and then constructing a controller based on the identified model. However, learning models means identifying generic descriptions of the dynamics of systems, which can require large amounts of data and extracting information that are unnecessary for the specific task of stabilization. The contribution of this work is to show that if a linear dynamical system has dimension (McMillan degree) $n$, then there always exist $n$ states from which a stabilizing feedback controller can be constructed, independent of the dimension of the representation of the observed states and the number of inputs. By building on previous work, this finding implies that any linear dynamical system can be stabilized from fewer observed states than the minimal number of states required for learning a model of the dynamics. The theoretical findings are demonstrated with numerical experiments that show the stabilization of the flow behind a cylinder from less data than necessary for learning a model.
We study the problem of approximating the eigenspectrum of a symmetric matrix $\mathbf A \in \mathbb{R}^{n \times n}$ with bounded entries (i.e., $\|\mathbf A\|_{\infty} \leq 1$). We present a simple sublinear time algorithm that approximates all eigenvalues of $\mathbf{A}$ up to additive error $\pm \epsilon n$ using those of a randomly sampled $\tilde {O}\left (\frac{\log^3 n}{\epsilon^3}\right ) \times \tilde O\left (\frac{\log^3 n}{\epsilon^3}\right )$ principal submatrix. Our result can be viewed as a concentration bound on the complete eigenspectrum of a random submatrix, significantly extending known bounds on just the singular values (the magnitudes of the eigenvalues). We give improved error bounds of $\pm \epsilon \sqrt{\text{nnz}(\mathbf{A})}$ and $\pm \epsilon \|\mathbf A\|_F$ when the rows of $\mathbf A$ can be sampled with probabilities proportional to their sparsities or their squared $\ell_2$ norms respectively. Here $\text{nnz}(\mathbf{A})$ is the number of non-zero entries in $\mathbf{A}$ and $\|\mathbf A\|_F$ is its Frobenius norm. Even for the strictly easier problems of approximating the singular values or testing the existence of large negative eigenvalues (Bakshi, Chepurko, and Jayaram, FOCS '20), our results are the first that take advantage of non-uniform sampling to give improved error bounds. From a technical perspective, our results require several new eigenvalue concentration and perturbation bounds for matrices with bounded entries. Our non-uniform sampling bounds require a new algorithmic approach, which judiciously zeroes out entries of a randomly sampled submatrix to reduce variance, before computing the eigenvalues of that submatrix as estimates for those of $\mathbf A$. We complement our theoretical results with numerical simulations, which demonstrate the effectiveness of our algorithms in practice.
The ergodic decomposition theorem is a cornerstone result of dynamical systems and ergodic theory. It states that every invariant measure on a dynamical system is a mixture of ergodic ones. Here we formulate and prove the theorem in terms of string diagrams, using the formalism of Markov categories. We recover the usual measure-theoretic statement by instantiating our result in the category of stochastic kernels. Along the way we give a conceptual treatment of several concepts in the theory of deterministic and stochastic dynamical systems. In particular, - ergodic measures appear very naturally as particular cones of deterministic morphisms (in the sense of Markov categories); - the invariant $\sigma$-algebra of a dynamical system can be seen as a colimit in the category of Markov kernels. In line with other uses of category theory, once the necessary structures are in place, our proof of the main theorem is much simpler than traditional approaches. In particular, it does not use any quantitative limiting arguments, and it does not rely on the cardinality of the group or monoid indexing the dynamics. We hope that this result paves the way for further applications of category theory to dynamical systems, ergodic theory, and information theory.
Developing new ways to estimate probabilities can be valuable for science, statistics, and engineering. By considering the information content of different output patterns, recent work invoking algorithmic information theory has shown that a priori probability predictions based on pattern complexities can be made in a broad class of input-output maps. These algorithmic probability predictions do not depend on a detailed knowledge of how output patterns were produced, or historical statistical data. Although quantitatively fairly accurate, a main weakness of these predictions is that they are given as an upper bound on the probability of a pattern, but many low complexity, low probability patterns occur, for which the upper bound has little predictive value. Here we study this low complexity, low probability phenomenon by looking at example maps, namely a finite state transducer, natural time series data, RNA molecule structures, and polynomial curves. Some mechanisms causing low complexity, low probability behaviour are identified, and we argue this behaviour should be assumed as a default in the real world algorithmic probability studies. Additionally, we examine some applications of algorithmic probability and discuss some implications of low complexity, low probability patterns for several research areas including simplicity in physics and biology, a priori probability predictions, Solomonoff induction and Occam's razor, machine learning, and password guessing.
With the explosive growth of information technology, multi-view graph data have become increasingly prevalent and valuable. Most existing multi-view clustering techniques either focus on the scenario of multiple graphs or multi-view attributes. In this paper, we propose a generic framework to cluster multi-view attributed graph data. Specifically, inspired by the success of contrastive learning, we propose multi-view contrastive graph clustering (MCGC) method to learn a consensus graph since the original graph could be noisy or incomplete and is not directly applicable. Our method composes of two key steps: we first filter out the undesirable high-frequency noise while preserving the graph geometric features via graph filtering and obtain a smooth representation of nodes; we then learn a consensus graph regularized by graph contrastive loss. Results on several benchmark datasets show the superiority of our method with respect to state-of-the-art approaches. In particular, our simple approach outperforms existing deep learning-based methods.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.