Natural data observed in $\mathbb{R}^n$ is often constrained to an $m$-dimensional manifold $\mathcal{M}$, where $m < n$. This work focuses on the task of building theoretically principled generative models for such data. Current generative models learn $\mathcal{M}$ by mapping an $m$-dimensional latent variable through a neural network $f_\theta: \mathbb{R}^m \to \mathbb{R}^n$. These procedures, which we call pushforward models, incur a straightforward limitation: manifolds cannot in general be represented with a single parameterization, meaning that attempts to do so will incur either computational instability or the inability to learn probability densities within the manifold. To remedy this problem, we propose to model $\mathcal{M}$ as a neural implicit manifold: the set of zeros of a neural network. We then learn the probability density within $\mathcal{M}$ with a constrained energy-based model, which employs a constrained variant of Langevin dynamics to train and sample from the learned manifold. In experiments on synthetic and natural data, we show that our model can learn manifold-supported distributions with complex topologies more accurately than pushforward models.
Consider a multi-class labelling problem, where the labels can take values in $[k]$, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in $k$? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in $k$, or needing to solve computationally intractable problems, or give rather weak guarantees. Our main contribution is a notion of calibration that achieves all these desiderata: we formulate a robust notion of projected smooth calibration for multi-class predictions, and give new recalibration algorithms for efficiently calibrating predictors under this definition with complexity polynomial in $k$. Projected smooth calibration gives strong guarantees for all downstream decision makers who want to use the predictor for binary classification problems of the form: does the label belong to a subset $T \subseteq [k]$: e.g. is this an image of an animal? It ensures that the probabilities predicted by summing the probabilities assigned to labels in $T$ are close to some perfectly calibrated binary predictor for that task. We also show that natural strengthenings of our definition are computationally hard to achieve: they run into information theoretic barriers or computational intractability. Underlying both our upper and lower bounds is a tight connection that we prove between multi-class calibration and the well-studied problem of agnostic learning in the (standard) binary prediction setting.
A 2-packing set for an undirected graph $G=(V,E)$ is a subset $\mathcal{S} \subset V$ such that any two vertices $v_1,v_2 \in \mathcal{S}$ have no common neighbors. Finding a 2-packing set of maximum cardinality is a NP-hard problem. We develop a new approach to solve this problem on arbitrary graphs using its close relation to the independent set problem. Thereby, our algorithm red2pack uses new data reduction rules specific to the 2-packing set problem as well as a graph transformation. Our experiments show that we outperform the state-of-the-art for arbitrary graphs with respect to solution quality and also are able to compute solutions multiple orders of magnitude faster than previously possible. For example, we are able to solve 63% of the graphs in the tested data set to optimality in less than a second while the competitor for arbitrary graphs can only solve 5% of these graphs to optimality even with a 10 hour time limit. Moreover, our approach can solve a wide range of large instances that have previously been unsolved.
We study the existence of finite characterisations for modal formulas. A finite characterisation of a modal formula $\varphi$ is a finite collection of positive and negative examples that distinguishes $\varphi$ from every other, non-equivalent modal formula, where an example is a finite pointed Kripke structure. This definition can be restricted to specific frame classes and to fragments of the modal language: a modal fragment $L$ admits finite characterisations with respect to a frame class $F$ if every formula $\varphi\in L$ has a finite characterisation with respect to $L$ consting of examples that are based on frames in $F$. Finite characterisations are useful for illustration, interactive specification, and debugging of formal specifications, and their existence is a precondition for exact learnability with membership queries. We show that the full modal language admits finite characterisations with respect to a frame class $F$ only when the modal logic of $F$ is locally tabular. We then study which modal fragments, freely generated by some set of connectives, admit finite characterisations. Our main result is that the positive modal language without the truth-constants $\top$ and $\bot$ admits finite characterisations w.r.t. the class of all frames. This result is essentially optimal: finite characterizability fails when the language is extended with the truth constant $\top$ or $\bot$ or with all but very limited forms of negation.
Given two $n$-element structures, $\mathcal{A}$ and $\mathcal{B}$, which can be distinguished by a sentence of $k$-variable first-order logic ($\mathcal{L}^k$), what is the minimum $f(n)$ such that there is guaranteed to be a sentence $\phi \in \mathcal{L}^k$ with at most $f(n)$ quantifiers, such that $\mathcal{A} \models \phi$ but $\mathcal{B} \not \models \phi$? We present various results related to this question obtained by using the recently introduced QVT games. In particular, we show that when we limit the number of variables, there can be an exponential gap between the quantifier depth and the quantifier number needed to separate two structures. Through the lens of this question, we will highlight some difficulties that arise in analysing the QVT game and some techniques which can help to overcome them. As a consequence, we show that $\mathcal{L}^{k+1}$ is exponentially more succinct than $\mathcal{L}^{k}$. We also show, in the setting of the existential-positive fragment, how to lift quantifier depth lower bounds to quantifier number lower bounds. This leads to almost tight bounds.
(I) We revisit the algorithmic problem of finding all triangles in a graph $G=(V,E)$ with $n$ vertices and $m$ edges. According to a result of Chiba and Nishizeki (1985), this task can be achieved by a combinatorial algorithm running in $O(m \alpha) = O(m^{3/2})$ time, where $\alpha= \alpha(G)$ is the graph arboricity. We provide a new very simple combinatorial algorithm for finding all triangles in a graph and show that is amenable to the same running time analysis. We derive these worst-case bounds from first principles and with very simple proofs that do not rely on classic results due to Nash-Williams from the 1960s. (II) We extend our arguments to the problem of finding all small complete subgraphs of a given fixed size. We show that the dependency on $m$ and $\alpha$ in the running time $O(\alpha^{\ell-2} \cdot m)$ of the algorithm of Chiba and Nishizeki for listing all copies of $K_\ell$, where $\ell \geq 3$, is asymptotically tight. (III) We give improved arboricity-sensitive running times for counting and/or detection of copies of $K_\ell$, for small $\ell \geq 4$. A key ingredient in our algorithms is, once again, the algorithm of Chiba and Nishizeki. Our new algorithms are faster than all previous algorithms in certain high-range arboricity intervals for every $\ell \geq 7$.
We show that the principal types of the closed terms of the affine fragment of $\lambda$-calculus, with respect to a simple type discipline, are structurally isomorphic to their interpretations, as partial involutions, in a natural Geometry of Interaction model \`a la Abramsky. This permits to explain in elementary terms the somewhat awkward notion of linear application arising in Geometry of Interaction, simply as the resolution between principal types using an alternate unification algorithm. As a consequence, we provide an answer, for the purely affine fragment, to the open problem raised by Abramsky of characterising those partial involutions which are denotations of combinatory terms.
We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite matrix $\rm{X}^\star \in \mathbb{R}^{d\times d}$ of rank-$r$, parameterized by $\rm{U}\rm{U}^{\top}$, from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization provably converges to the ground truth $\rm{X}^\star$ without requiring any explicit regularization. This convergence result holds true even in the over-parameterized scenario, where the true rank $r$ is unknown and conservatively over-estimated by a search rank $r'\gg r$. The existing results for this problem either require explicit regularization, a sufficiently accurate initial point, or exact knowledge of the true rank $r$. In the over-parameterized regime where $r'\geq r$, we show that, with $\widetilde\Omega(dr^9)$ observations, GD with an initial point $\|\rm{U}_0\| \leq \epsilon$ converges near-linearly to an $\epsilon$-neighborhood of $\rm{X}^\star$. Consequently, smaller initial points result in increasingly accurate solutions. Surprisingly, neither the convergence rate nor the final accuracy depends on the over-parameterized search rank $r'$, and they are only governed by the true rank $r$. In the exactly-parameterized regime where $r'=r$, we further enhance this result by proving that GD converges at a faster rate to achieve an arbitrarily small accuracy $\epsilon>0$, provided the initial point satisfies $\|\rm{U}_0\| = O(1/d)$. At the crux of our method lies a novel weakly-coupled leave-one-out analysis, which allows us to establish the global convergence of GD, extending beyond what was previously possible using the classical leave-one-out analysis.
A nearest neighbor representation of a Boolean function $f$ is a set of vectors (anchors) labeled by $0$ or $1$ such that $f(\vec{x}) = 1$ if and only if the closest anchor to $x$ is labeled by $1$. This model was introduced by Hajnal, Liu, and Tur\'an (2022), who studied bounds on the number of anchors required to represent Boolean functions under different choices of anchors (real vs. Boolean vectors) as well as the more expressive model of $k$-nearest neighbors. We initiate the study of the representational power of nearest and $k$-nearest neighbors through Boolean circuit complexity. To this end, we establish a connection between Boolean functions with polynomial nearest neighbor complexity and those that can be efficiently represented by classes based on linear inequalities -- min-plus polynomial threshold functions -- previously studied in relation to threshold circuits. This extends an observation of Hajnal et al. (2022). We obtain exponential lower bounds on the $k$-nearest neighbors complexity of explicit $n$-variate functions, assuming $k \leq n^{1-\epsilon}$. Previously, no superlinear lower bound was known for any $k>1$. Next, we further extend the connection between nearest neighbor representations and circuits to the $k$-nearest neighbors case. As a result, we show that proving superpolynomial lower bounds for the $k$-nearest neighbors complexity of an explicit function for arbitrary $k$ would require a breakthrough in circuit complexity. In addition, we prove an exponential separation between the nearest neighbor and $k$-nearest neighbors complexity (for unrestricted $k$) of an explicit function. These results address questions raised by Hajnal et al. (2022) of proving strong lower bounds for $k$-nearest neighbors and understanding the role of the parameter $k$. Finally, we devise new bounds on the nearest neighbor complexity for several explicit functions.
We prove the completeness of a first-order analogue of the Fischer Servi logic $\mathsf{FS}$ with respect to its expected birelational semantics. To this end we introduce the notion of the $\textit{trace model}$ and, much like in a canonical model argument, prove a truth lemma. We conclude by examining a number of other first-order Fischer Servi logics, including the first-order analogue of $\mathsf{FSS4}$, whose completeness can be similarly proved.
Cai and Hemachandra used iterative constant-setting to prove that Few $\subseteq$ $\oplus$P (and thus that FewP $\subseteq$ $\oplus$P). In this paper, we note that there is a tension between the nondeterministic ambiguity of the class one is seeking to capture, and the density (or, to be more precise, the needed "nongappy"-ness) of the easy-to-find "targets" used in iterative constant-setting. In particular, we show that even less restrictive gap-size upper bounds regarding the targets allow one to capture ambiguity-limited classes. Through a flexible, metatheorem-based approach, we do so for a wide range of classes including the logarithmic-ambiguity version of Valiant's unambiguous nondeterminism class UP. Our work lowers the bar for what advances regarding the existence of infinite, P-printable sets of primes would suffice to show that restricted counting classes based on the primes have the power to accept superconstant-ambiguity analogues of UP. As an application of our work, we prove that the Lenstra-Pomerance-Wagstaff Conjecture implies that all (O(1) + loglogn)-ambiguity NP sets are in the restricted counting class $\rm RC_{PRIMES}$.