The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$-means++ with $O(k \log \log k)$ local search steps obtained through the $k$-means++ sampling distribution to yield a $c$-approximation to the $k$-means clustering problem, where $c$ is a large absolute constant. Here we generalize and extend their local search algorithm by considering larger and more sophisticated local search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a $9 + \varepsilon$ approximation ratio, which is the best possible for local search. Importantly we show that our approach yields substantial practical improvements, we show significant quality improvements over the approach of Lattanzi and Sohler (ICML 2019) on several datasets.
We characterise the behaviour of the maximum Diaconis-Ylvisaker prior penalized likelihood estimator in high-dimensional logistic regression, where the number of covariates is a fraction $\kappa \in (0,1)$ of the number of observations $n$, as $n \to \infty$. We derive the estimator's aggregate asymptotic behaviour when covariates are independent normal random variables with mean zero and variance $1/n$, and the vector of regression coefficients has length $\gamma \sqrt{n}$, asymptotically. From this foundation, we devise adjusted $Z$-statistics, penalized likelihood ratio statistics, and aggregate asymptotic results with arbitrary covariate covariance. In the process, we fill in gaps in previous literature by formulating a Lipschitz-smooth approximate message passing recursion, to formally transfer the asymptotic results from approximate message passing to logistic regression. While the maximum likelihood estimate asymptotically exists only for a narrow range of $(\kappa, \gamma)$ values, the maximum Diaconis-Ylvisaker prior penalized likelihood estimate not only exists always but is also directly computable using maximum likelihood routines. Thus, our asymptotic results also hold for $(\kappa, \gamma)$ values where results for maximum likelihood are not attainable, with no overhead in implementation or computation. We study the estimator's shrinkage properties and compare it to logistic ridge regression and demonstrate our theoretical findings with simulations.
This paper presents a randomized algorithm for the problem of single-source shortest paths on directed graphs with real (both positive and negative) edge weights. Given an input graph with $n$ vertices and $m$ edges, the algorithm completes in $\tilde{O}(mn^{8/9})$ time with high probability. For real-weighted graphs, this result constitutes the first asymptotic improvement over the classic $O(mn)$-time algorithm variously attributed to Shimbel, Bellman, Ford, and Moore.
We study the asymptotic eigenvalue distribution of the Slepian spatiospectral concentration problem within subdomains of the $d$-dimensional unit ball $\mathbb{B}^d$. The clustering of the eigenvalues near zero and one is a well-known phenomenon. Here, we provide an analytical investigation of this phenomenon for two different notions of bandlimit: (a) multivariate polynomials, with the maximal polynomial degree determining the bandlimit, (b) basis functions that separate into radial and spherical contributions (expressed in terms of Jacobi polynomials and spherical harmonics, respectively), with separate maximal degrees for the radial and spherical contributions determining the bandlimit. In particular, we investigate the number of relevant non-zero eigenvalues (the so-called Shannon number) and obtain distinct asymptotic results for both notions of bandlimit, characterized by Jacobi weights $W_0$ and a modification $\widetilde{W_0}$, respectively. The analytic results are illustrated by numerical examples on the 3-d ball.
In this paper we present a general theory of $\Pi_{2}$-rules for systems of intuitionistic and modal logic. We introduce the notions of $\Pi_{2}$-rule system and of an Inductive Class, and provide model-theoretic and algebraic completeness theorems, which serve as our basic tools. As an illustration of the general theory, we analyse the structure of inductive classes of G\"{o}del algebras, from a structure theoretic and logical point of view. We show that unlike other well-studied settings (such as logics, or single-conclusion rule systems), there are continuum many $\Pi_{2}$-rule systems extending $\mathsf{LC}=\mathsf{IPC}+(p\rightarrow q)\vee (q\rightarrow p)$, and show how our methods allow easy proofs of the admissibility of the well-known Takeuti-Titani rule. Our final results concern general questions admissibility in $\mathsf{LC}$: (1) we present a full classification of those inductive classes which are inductively complete, i.e., where all $\Pi_{2}$-rules which are admissible are derivable, and (2) show that the problem of admissibility of $\Pi_{2}$-rules over $\mathsf{LC}$ is decidable.
Dedukti is a Logical Framework based on the $\lambda$$\Pi$-Calculus Modulo Theory. We show that many theories can be expressed in Dedukti: constructive and classical predicate logic, Simple type theory, programming languages, Pure type systems, the Calculus of inductive constructions with universes, etc. and that permits to used it to check large libraries of proofs developed in other proof systems: Zenon, iProver, FoCaLiZe, HOL Light, and Matita.
The separation of performance metrics from gradient based loss functions may not always give optimal results and may miss vital aggregate information. This paper investigates incorporating a performance metric alongside differentiable loss functions to inform training outcomes. The goal is to guide model performance and interpretation by assuming statistical distributions on this performance metric for dynamic weighting. The focus is on van Rijsbergens $F_{\beta}$ metric -- a popular choice for gauging classification performance. Through distributional assumptions on the $F_{\beta}$, an intermediary link can be established to the standard binary cross-entropy via dynamic penalty weights. First, the $F_{\beta}$ metric is reformulated to facilitate assuming statistical distributions with accompanying proofs for the cumulative density function. These probabilities are used within a knee curve algorithm to find an optimal $\beta$ or $\beta_{opt}$. This $\beta_{opt}$ is used as a weight or penalty in the proposed weighted binary cross-entropy. Experimentation on publicly available data along with benchmark analysis mostly yields better and interpretable results as compared to the baseline for both imbalanced and balanced classes. For example, for the IMDB text data with known labeling errors, a 14% boost in $F_1$ score is shown. The results also reveal commonalities between the penalty model families derived in this paper and the suitability of recall-centric or precision-centric parameters used in the optimization. The flexibility of this methodology can enhance interpretation.
We extend a certain type of identities on sums of $I$-Bessel functions on lattices, previously given by G. Chinta, J. Jorgenson, A Karlsson and M. Neuhauser. Moreover we prove that, with continuum limit, the transformation formulas of theta functions such as the Dedekind eta function can be given by $I$-Bessel lattice sum identities with characters. We consider analogues of theta functions of lattices coming from linear codes and show that sums of $I$-Bessel functions defined by linear codes can be expressed by complete weight enumerators. We also prove that $I$-Bessel lattice sums appear as solutions of heat equations on general lattices. As a further application, we obtain an explicit solution of the heat equation on $\mathbb{Z}^n$ whose initial condition is given by a linear code.
This paper proposes a new notion of Markov $\alpha$-potential games to study Markov games. Two important classes of practically significant Markov games, Markov congestion games and the perturbed Markov team games, are analyzed in this framework of Markov $\alpha$-potential games, with explicit characterization of the upper bound for $\alpha$ and its relation to game parameters. Moreover, any maximizer of the $\alpha$-potential function is shown to be an $\alpha$-stationary Nash equilibrium of the game. Furthermore, two algorithms for the Nash regret analysis, namely the projected gradient-ascent algorithm and the sequential maximum improvement algorithm, are presented and corroborated by numerical experiments.
Two Latin squares of order $n$ are $r$-orthogonal if, when superimposed, there are exactly $r$ distinct ordered pairs. The spectrum of all values of $r$ for Latin squares of order $n$ is known. A Latin square $A$ of order $n$ is $r$-self-orthogonal if $A$ and its transpose are $r$-orthogonal. The spectrum of all values of $r$ is known for all orders $n\ne 14$. We develop randomized algorithms for computing pairs of $r$-orthogonal Latin squares of order $n$ and algorithms for computing $r$-self-orthogonal Latin squares of order $n$.
We present a method for finding large fixed-size primes of the form $X^2+c$. We study the density of primes on the sets $E_c = \{N(X,c)=X^2+c,\ X \in (2\mathbb{Z}+(c-1))\}$, $c \in \mathbb{N}^*$. We describe an algorithm for generating values of $c$ such that a given prime $p$ is the minimum of the union of prime divisors of all elements in $E_c$. We also present quadratic forms generating divisors of Ec and study the prime divisors of its terms. This paper uses the results of Dirichlet's arithmetic progression theorem [1] and the article [6] to rewrite a conjecture of Shanks [2] on the density of primes in $E_c$. Finally, based on these results, we discuss the heuristics of large primes occurrences in the research set of our algorithm.