Let $(M,g)$ be a Riemannian manifold. If $\mu$ is a probability measure on $M$ given by a continuous density function, one would expect the Fr\'{e}chet means of data-samples $Q=(q_1,q_2,\dots, q_N)\in M^N$, with respect to $\mu$, to behave ``generically''; e.g. the probability that the Fr\'{e}chet mean set $\mbox{FM}(Q)$ has any elements that lie in a given, positive-codimension submanifold, should be zero for any $N\geq 1$. Even this simplest instance of genericity does not seem to have been proven in the literature, except in special cases. The main result of this paper is a general, and stronger, genericity property: given i.i.d. absolutely continuous $M$-valued random variables $X_1,\dots, X_N$, and a subset $A\subset M$ of volume-measure zero, $\mbox{Pr}\left\{\mbox{FM}(\{X_1,\dots,X_N\})\subset M\backslash A\right\}=1.$ We also establish a companion theorem for equivariant Fr\'{e}chet means, defined when $(M,g)$ arises as the quotient of a Riemannian manifold $(\widetilde{M},\tilde{g})$ by a free, isometric action of a finite group. The equivariant Fr\'{e}chet means lie in $\widetilde{M}$, but, as we show, project down to the ordinary Fr\'{e}chet sample means, and enjoy a similar genericity property. Both these theorems are proven as consequences of a purely geometric (and quite general) result that constitutes the core mathematics in this paper: If $A\subset M$ has volume zero in $M$ , then the set $\{Q\in M^N : \mbox{FM}(Q) \cap A\neq\emptyset\}$ has volume zero in $M^N$. We conclude the paper with an application to partial scaling-rotation means, a type of mean for symmetric positive-definite matrices.
A graph $G$ contains a graph $H$ as a pivot-minor if $H$ can be obtained from $G$ by applying a sequence of vertex deletions and edge pivots. Pivot-minors play an important role in the study of rank-width. Pivot-minors have mainly been studied from a structural perspective. In this paper we perform the first systematic computational complexity study of pivot-minors. We first prove that the Pivot-Minor problem, which asks if a given graph $G$ contains a pivot-minor isomorphic to a given graph $H$, is NP-complete. If $H$ is not part of the input, we denote the problem by $H$-Pivot-Minor. We give a certifying polynomial-time algorithm for $H$-Pivot-Minor when (1) $H$ is an induced subgraph of $P_3+tP_1$ for some integer $t\geq 0$, (2) $H=K_{1,t}$ for some integer $t\geq 1$, or (3) $|V(H)|\leq 4$ except when $H \in \{K_4,C_3+ P_1\}$. Let ${\cal F}_H$ be the set of induced-subgraph-minimal graphs that contain a pivot-minor isomorphic to $H$. To prove the above statement, we either show that there is an integer $c_H$ such that all graphs in ${\cal F}_H$ have at most $c_H$ vertices, or we determine ${\cal F}_H$ precisely, for each of the above cases.
A universal partial cycle (or upcycle) for $\mathcal{A}^n$ is a cyclic sequence that covers each word of length $n$ over the alphabet $\mathcal{A}$ exactly once -- like a De Bruijn cycle, except that we also allow a wildcard symbol $\mathord{\diamond}$ that can represent any letter of $\mathcal{A}$. Chen et al. in 2017 and Goeckner et al. in 2018 showed that the existence and structure of upcycles are highly constrained, unlike those of De Bruijn cycles, which exist for every alphabet size and word length. Moreover, it was not known whether any upcycles existed for $n \ge 5$. We present several examples of upcycles over both binary and non-binary alphabets for $n = 8$. We generalize two graph-theoretic representations of De Bruijn cycles to upcycles. We then introduce novel approaches to constructing new upcycles from old ones. Notably, given any upcycle for an alphabet of size $a$, we show how to construct an upcycle for an alphabet of size $ak$ for any $k \in \mathbb{N}$, so each example generates an infinite family of upcycles. We also define folds and lifts of upcycles, which relate upcycles with differing densities of $\mathord{\diamond}$ characters. In particular, we show that every upcycle lifts to a De Bruijn cycle. Our constructions rely on a different generalization of De Bruijn cycles known as perfect necklaces, and we introduce several new examples of perfect necklaces. We extend the definitions of certain pseudorandomness properties to partial words and determine which are satisfied by all upcycles, then draw a conclusion about linear feedback shift registers. Finally, we prove new nonexistence results based on the word length $n$, alphabet size, and $\mathord{\diamond}$ density.
A $k$-uniform hypergraph $H = (V, E)$ is $k$-partite if $V$ can be partitioned into $k$ sets $V_1, \ldots, V_k$ such that every edge in $E$ contains precisely one vertex from each $V_i$. We call such a graph $n$-balanced if $|V_i| = n$ for each $i$. An independent set $I$ in $H$ is balanced if $|I\cap V_i| = |I|/k$ for each $i$, and a coloring is balanced if each color class induces a balanced independent set in $H$. In this paper, we provide a lower bound on the balanced independence number $\alpha_b(H)$ in terms of the average degree $D = |E|/n$, and an upper bound on the balanced chromatic number $\chi_b(H)$ in terms of the maximum degree $\Delta$. Our results match those of recent work of Chakraborti for $k = 2$.
We study $\mu_5(n)$, the minimum number of convex pentagons induced by $n$ points in the plane in general position. Despite a significant body of research in understanding $\mu_4(n)$, the variant concerning convex quadrilaterals, not much is known about $\mu_5(n)$. We present two explicit constructions, inspired by point placements obtained through a combination of Stochastic Local Search and a program for realizability of point sets, that provide $\mu_5(n) \leq \binom{\lfloor n/2 \rfloor}{5} + \binom{\lceil n/2 \rceil}{5}$. Furthermore, we conjecture this bound to be optimal, and provide partial evidence by leveraging a MaxSAT encoding that allows us to verify our conjecture for $n \leq 16$.
Estimating a prediction function is a fundamental component of many data analyses. The Super Learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms, including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a Super Learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screening algorithms should be used to protect against poor performance of any one screen, similar to the guidance for choosing a library of prediction algorithms for the Super Learner.
We consider the problem of maintaining a $(1+\epsilon)\Delta$-edge coloring in a dynamic graph $G$ with $n$ nodes and maximum degree at most $\Delta$. The state-of-the-art update time is $O_\epsilon(\text{polylog}(n))$, by Duan, He and Zhang [SODA'19] and by Christiansen [STOC'23], and more precisely $O(\log^7 n/\epsilon^2)$, where $\Delta = \Omega(\log^2 n / \epsilon^2)$. The following natural question arises: What is the best possible update time of an algorithm for this task? More specifically, \textbf{ can we bring it all the way down to some constant} (for constant $\epsilon$)? This question coincides with the \emph{static} time barrier for the problem: Even for $(2\Delta-1)$-coloring, there is only a naive $O(m \log \Delta)$-time algorithm. We answer this fundamental question in the affirmative, by presenting a dynamic $(1+\epsilon)\Delta$-edge coloring algorithm with $O(\log^4 (1/\epsilon)/\epsilon^9)$ update time, provided $\Delta = \Omega_\epsilon(\text{polylog}(n))$. As a corollary, we also get the first linear time (for constant $\epsilon$) \emph{static} algorithm for $(1+\epsilon)\Delta$-edge coloring; in particular, we achieve a running time of $O(m \log (1/\epsilon)/\epsilon^2)$. We obtain our results by carefully combining a variant of the \textsc{Nibble} algorithm from Bhattacharya, Grandoni and Wajc [SODA'21] with the subsampling technique of Kulkarni, Liu, Sah, Sawhney and Tarnawski [STOC'22].
We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($\lambda$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($\lambda$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($\lambda$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
Reproducing kernel Hilbert $C^*$-module (RKHM) is a generalization of reproducing kernel Hilbert space (RKHS) by means of $C^*$-algebra, and the Perron-Frobenius operator is a linear operator related to the composition of functions. Combining these two concepts, we present deep RKHM, a deep learning framework for kernel methods. We derive a new Rademacher generalization bound in this setting and provide a theoretical interpretation of benign overfitting by means of Perron-Frobenius operators. By virtue of $C^*$-algebra, the dependency of the bound on output dimension is milder than existing bounds. We show that $C^*$-algebra is a suitable tool for deep learning with kernels, enabling us to take advantage of the product structure of operators and to provide a clear connection with convolutional neural networks. Our theoretical analysis provides a new lens through which one can design and analyze deep kernel methods.
Given a $d$-dimensional continuous (resp. discrete) probability distribution $\mu$ and a discrete distribution $\nu$, the semi-discrete (resp. discrete) Optimal Transport (OT) problem asks for computing a minimum-cost plan to transport mass from $\mu$ to $\nu$; we assume $n$ to be the size of the support of the discrete distributions, and we assume we have access to an oracle outputting the mass of $\mu$ inside a constant-complexity region in $O(1)$ time. In this paper, we present three approximation algorithms for the OT problem. (i) Semi-discrete additive approximation: For any $\epsilon>0$, we present an algorithm that computes a semi-discrete transport plan with $\epsilon$-additive error in $n^{O(d)}\log\frac{C_{\max}}{\epsilon}$ time; here, $C_{\max}$ is the diameter of the supports of $\mu$ and $\nu$. (ii) Semi-discrete relative approximation: For any $\epsilon>0$, we present an algorithm that computes a $(1+\epsilon)$-approximate semi-discrete transport plan in $n\epsilon^{-O(d)}\log(n)\log^{O(d)}(\log n)$ time; here, we assume the ground distance is any $L_p$ norm. (iii) Discrete relative approximation: For any $\epsilon>0$, we present a Monte-Carlo $(1+\epsilon)$-approximation algorithm that computes a transport plan under any $L_p$ norm in $n\epsilon^{-O(d)}\log(n)\log^{O(d)}(\log n)$ time; here, we assume that the spread of the supports of $\mu$ and $\nu$ is polynomially bounded.
Quantifying the difference between two probability density functions, $p$ and $q$, using available data, is a fundamental problem in Statistics and Machine Learning. A usual approach for addressing this problem is the likelihood-ratio estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has been investigated mainly for the offline case. This paper contributes by introducing a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t \sim p, x'_t \sim q)$ are observed over time. The non-parametric nature of our approach has the advantage of being agnostic to the forms of $p$ and $q$. Moreover, we capitalize on the recent advances in Kernel Methods and functional minimization to develop an estimator that can be efficiently updated online. We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.