Our main result is a new proof of correctness of Euclid's algorithm. The proof is conducted in algorithmic theory of natural numbers Th3. A formula H is constructed that expresses the halting property of the algorithm. Next, the proof of H is is presented. In the proof we make use of inference rules of calculus of programs. The only formulas accepted without the proof are axioms of program calculus or axioms of the theory Th3. We complete our result by showing that the theorem on correctness of Euclid's algorithm can not be proved in any elementary theory of natural numbers.
Stabbing Planes (also known as Branch and Cut) is a proof system introduced very recently which, informally speaking, extends the DPLL method by branching on integer linear inequalities instead of single variables. The techniques known so far to prove size and depth lower bounds for Stabbing Planes are generalizations of those used for the Cutting Planes proof system. For size lower bounds these are established by monotone circuit arguments, while for depth these are found via communication complexity and protection. As such these bounds apply for lifted versions of combinatorial statements. Rank lower bounds for Cutting Planes are also obtained by geometric arguments called protection lemmas. In this work we introduce two new geometric approaches to prove size/depth lower bounds in Stabbing Planes working for any formula: (1) the antichain method, relying on Sperner's Theorem and (2) the covering method which uses results on essential coverings of the boolean cube by linear polynomials, which in turn relies on Alon's combinatorial Nullenstellensatz. We demonstrate their use on classes of combinatorial principles such as the Pigeonhole principle, the Tseitin contradictions and the Linear Ordering Principle. By the first method we prove almost linear size lower bounds and optimal logarithmic depth lower bounds for the Pigeonhole principle and analogous lower bounds for the Tseitin contradictions over the complete graph and for the Linear Ordering Principle. By the covering method we obtain a superlinear size lower bound and a logarithmic depth lower bound for Stabbing Planes proof of Tseitin contradictions over a grid graph.
Rational best approximations (in a Chebyshev sense) to real functions are characterized by an equioscillating approximation error. Similar results do not hold true for rational best approximations to complex functions in general. In the present work, we consider unitary rational approximations to the exponential function on the imaginary axis, which map the imaginary axis to the unit circle. In the class of unitary rational functions, best approximations are shown to exist, to be uniquely characterized by equioscillation of a phase error, and to possess a super-linear convergence rate. Furthermore, the best approximations have full degree (i.e., non-degenerate), achieve their maximum approximation error at points of equioscillation, and interpolate at intermediate points. Asymptotic properties of poles, interpolation nodes, and equioscillation points of these approximants are studied. Three algorithms, which are found very effective to compute unitary rational approximations including candidates for best approximations, are sketched briefly. Some consequences to numerical time-integration are discussed. In particular, time propagators based on unitary best approximants are unitary, symmetric and A-stable.
Positive semidefinite (PSD) matrices are indispensable in many fields of science. A similarity measurement for such matrices is usually an essential ingredient in the mathematical modelling of a scientific problem. This paper proposes a unified framework to construct similarity measurements for PSD matrices. The framework is obtained by exploring the fiber bundle structure of the cone of PSD matrices and generalizing the idea of the point-set distance previously developed for linear subsapces and positive definite (PD) matrices. The framework demonstrates both theoretical advantages and computational convenience: (1) We prove that the similarity measurement constructed by the framework can be recognized either as the cost of a parallel transport or as the length of a quasi-geodesic curve. (2) We extend commonly used divergences for equidimensional PD matrices to the non-equidimensional case. Examples include Kullback-Leibler divergence, Bhattacharyya divergence and R\'enyi divergence. We prove that these extensions enjoy the same consistency property as their counterpart for geodesic distance. (3) We apply our geometric framework to further extend those in (2) to similarity measurements for arbitrary PSD matrices. We also provide simple formulae to compute these similarity measurements in most situations.
A surprising 'converse to the polynomial method' of Aaronson et al. (CCC'16) shows that any bounded quadratic polynomial can be computed exactly in expectation by a 1-query algorithm up to a universal multiplicative factor related to the famous Grothendieck constant. A natural question posed there asks if bounded quartic polynomials can be approximated by $2$-query quantum algorithms. Arunachalam, Palazuelos and the first author showed that there is no direct analogue of the result of Aaronson et al. in this case. We improve on this result in the following ways: First, we point out and fix a small error in the construction that has to do with a translation from cubic to quartic polynomials. Second, we give a completely explicit example based on techniques from additive combinatorics. Third, we show that the result still holds when we allow for a small additive error. For this, we apply an SDP characterization of Gribling and Laurent (QIP'19) for the completely-bounded approximate degree.
The Kullback-Leibler (KL) divergence is frequently used in data science. For discrete distributions on large state spaces, approximations of probability vectors may result in a few small negative entries, rendering the KL divergence undefined. We address this problem by introducing a parameterized family of substitute divergence measures, the shifted KL (sKL) divergence measures. Our approach is generic and does not increase the computational overhead. We show that the sKL divergence shares important theoretical properties with the KL divergence and discuss how its shift parameters should be chosen. If Gaussian noise is added to a probability vector, we prove that the average sKL divergence converges to the KL divergence for small enough noise. We also show that our method solves the problem of negative entries in an application from computational oncology, the optimization of Mutual Hazard Networks for cancer progression using tensor-train approximations.
Infinitary and cyclic proof systems are proof systems for logical formulas with fixed-point operators or inductive definitions. A cyclic proof system is a restriction of the corresponding infinitary proof system. Hence, these proof systems are generally not the same, as in the cyclic system may be weaker than the infinitary system. For several logics, the infinitary proof systems are shown to be cut-free complete. However, cyclic proof systems are characterized with many unknown problems on the (cut-free) completeness or the cut-elimination property. In this study, we show that the provability of infinitary and cyclic proof systems are the same for some propositional logics with fixed-point operators or inductive definitions and that the cyclic proof systems are cut-free complete.
This paper studies the infinite-time stability of the numerical scheme for stochastic McKean-Vlasov equations (SMVEs) via stochastic particle method. The long-time propagation of chaos in mean-square sense is obtained, with which the almost sure propagation in infinite horizon is proved by exploiting the Chebyshev inequality and the Borel-Cantelli lemma. Then the mean-square and almost sure exponential stabilities of the Euler-Maruyama scheme associated with the corresponding interacting particle system are shown through an ingenious manipulation of empirical measure. Combining the assertions enables the numerical solutions to reproduce the stabilities of the original SMVEs. The examples are demonstrated to reveal the importance of this study.
Coordinate exchange (CEXCH) is a popular algorithm for generating exact optimal experimental designs. The authors of CEXCH advocated for a highly greedy implementation - one that exchanges and optimizes single element coordinates of the design matrix. We revisit the effect of greediness on CEXCHs efficacy for generating highly efficient designs. We implement the single-element CEXCH (most greedy), a design-row (medium greedy) optimization exchange, and particle swarm optimization (PSO; least greedy) on 21 exact response surface design scenarios, under the $D$- and $I-$criterion, which have well-known optimal designs that have been reproduced by several researchers. We found essentially no difference in performance of the most greedy CEXCH and the medium greedy CEXCH. PSO did exhibit better efficacy for generating $D$-optimal designs, and for most $I$-optimal designs than CEXCH, but not to a strong degree under our parametrization. This work suggests that further investigation of the greediness dimension and its effect on CEXCH efficacy on a wider suite of models and criterion is warranted.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.