Awodey, later with Newstead, showed how polynomial pseudomonads $(u,1,\Sigma)$ with extra structure (termed "natural models" by Awodey) hold within them the categorical semantics for dependent type theory. Their work presented these ideas clearly but ultimately led them outside of the category of polynomial functors in order to explain all of the structure possessed by such models of type theory. This paper builds off that work -- explicating the categorical semantics of dependent type theory by axiomatizing them \emph{entirely} in the language of polynomial functors. In order to handle the higher-categorical coherences required for such an explanation, we work with polynomial functors internally in the language of Homotopy Type Theory, which allows for higher-dimensional structures such as pseudomonads, etc. to be expressed purely in terms of the structure of a suitably-chosen $\infty$-category of polynomial functors. The move from set theory to Homotopy Type Theory thus has a twofold effect of enabling a simpler exposition of natural models, which is at the same time amenable to formalization in a proof assistant, such as Agda. Moreover, the choice to remain firmly within the setting of polynomial functors reveals many additional structures of natural models that were otherwise left implicit or not considered by Awodey \& Newstead. Chief among these, we highlight the fact that every polynomial pseudomonad $(u,1,\Sigma)$ as above that is also equipped with structure to interpret dependent product types gives rise to a self-distributive law $u \triangleleft u\to u \triangleleft u$, which witnesses the usual distributive law of dependent products over dependent sums.
We propose a scaling law hypothesis for multimodal models processing text, audio, images, and video within a shared token and embedding space. Our framework predicts model performance based on modality-specific compression and tokenization efficiency, extending established scaling laws from text-based decoder models to mixed-modality systems. We explore whether leveraging more training data in multiple modalities can reduce the size of the multimodal model, enabling efficient deployment on resource-constrained devices.
We consider metrical task systems on general metric spaces with $n$ points, and show that any fully randomized algorithm can be turned into a randomized algorithm that uses only $2\log n$ random bits, and achieves the same competitive ratio up to a factor $2$. This provides the first order-optimal barely random algorithms for metrical task systems, i.e., which use a number of random bits that does not depend on the number of requests addressed to the system. We discuss implications on various aspects of online decision-making such as: distributed systems, advice complexity, and transaction costs, suggesting broad applicability. We put forward an equivalent view that we call collective metrical task systems where $k$ agents in a metrical task system team up, and suffer the average cost paid by each agent. Our results imply that such a team can be $O(\log^2 n)$-competitive as soon as $k\geq n^2$. In comparison, a single agent is always $\Omega(n)$-competitive.
Multivariate random effects with unstructured variance-covariance matrices of large dimensions, $q$, can be a major challenge to estimate. In this paper, we introduce a new implementation of a reduced-rank approach to fit large dimensional multivariate random effects by writing them as a linear combination of $d < q$ latent variables. By adding reduced-rank functionality to the package glmmTMB, we enhance the mixed models available to include random effects of dimensions that were previously not possible. We apply the reduced-rank random effect to two examples, estimating a generalized latent variable model for multivariate abundance data and a random-slopes model.
We treat three cubic recurrences, two of which generalize the famous iterated map $x \mapsto x (1-x)$ from discrete chaos theory. A feature of each asymptotic series developed here is a constant, dependent on the initial condition but otherwise intrinsic to the function at hand.
We develop a new approach for approximating large independent sets when the input graph is a one-sided spectral expander - that is, the uniform random walk matrix of the graph has its second eigenvalue bounded away from 1. Consequently, we obtain a polynomial time algorithm to find linear-sized independent sets in one-sided expanders that are almost $3$-colorable or are promised to contain an independent set of size $(1/2-\epsilon)n$. Our second result above can be refined to require only a weaker vertex expansion property with an efficient certificate. In a surprising contrast to our algorithmic result, we observe that the analogous task of finding a linear-sized independent set in almost $4$-colorable one-sided expanders (even when the second eigenvalue is $o_n(1)$) is NP-hard, assuming the Unique Games Conjecture. All prior algorithms that beat the worst-case guarantees for this problem rely on bottom eigenspace enumeration techniques (following the classical spectral methods of Alon and Kahale) and require two-sided expansion, meaning a bounded number of negative eigenvalues of magnitude $\Omega(1)$. Such techniques naturally extend to almost $k$-colorable graphs for any constant $k$, in contrast to analogous guarantees on one-sided expanders, which are Unique Games-hard to achieve for $k \geq 4$. Our rounding builds on the method of simulating multiple samples from a pseudo-distribution introduced by Bafna et. al. for rounding Unique Games instances. The key to our analysis is a new clustering property of large independent sets in expanding graphs - every large independent set has a larger-than-expected intersection with some member of a small list - and its formalization in the low-degree sum-of-squares proof system.
The seminal work of Linial, Mansour, and Nisan gave a quasipolynomial-time algorithm for learning constant-depth circuits ($\mathsf{AC}^0$) with respect to the uniform distribution on the hypercube. Extending their algorithm to the setting of malicious noise, where both covariates and labels can be adversarially corrupted, has remained open. Here we achieve such a result, inspired by recent work on learning with distribution shift. Our running time essentially matches their algorithm, which is known to be optimal assuming various cryptographic primitives. Our proof uses a simple outlier-removal method combined with Braverman's theorem for fooling constant-depth circuits. We attain the best possible dependence on the noise rate and succeed in the harshest possible noise model (i.e., contamination or so-called "nasty noise").
Game-theoretic characterizations of process equivalences traditionally form a central topic in concurrency; for example, most equivalences on the classical linear-time / branching-time spectrum come with such characterizations. Recent work on so-called graded semantics has led to a generic behavioural equivalence game that covers the mentioned games on the linear-time~/ branching-time spectrum and moreover applies in coalgebraic generality, and thus instantiates also to equivalence games on systems with non-relational branching type (probabilistic, weighted, game-based etc.). In the present work, we generalize this approach to cover other types of process comparison beyond equivalence, such as behavioural preorders or pseudometrics. At the most general level, we abstract such notions of behavoiural conformance in terms of topological categories, and later specialize to conformances presented as relational structures to obtain a concrete syntax. We obtain a sound and complete generic game for behavioural conformances in this sense. We present a number of instantiations, obtaining game characterizations of, e.g., trace inclusion, probabilistic trace distance, bisimulation topologies, and simulation distances on metric labelled transition systems.
Matrix sketching, aimed at approximating a matrix $\boldsymbol{A} \in \mathbb{R}^{N\times d}$ consisting of vector streams of length $N$ with a smaller sketching matrix $\boldsymbol{B} \in \mathbb{R}^{\ell\times d}, \ell \ll N$, has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Directions algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound and provides a covariance error guarantee of $\varepsilon = \lVert \boldsymbol{A}^\top \boldsymbol{A} - \boldsymbol{B}^\top \boldsymbol{B} \rVert_2/\lVert \boldsymbol{A} \rVert_F^2$. The matrix sketching problem becomes particularly interesting in the context of sliding windows, where the goal is to approximate the matrix $\boldsymbol{A}_W$, formed by input vectors over the most recent $N$ time units. However, despite recent efforts, whether achieving the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound on sliding windows is possible has remained an open question. In this paper, we introduce the DS-FD algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound for matrix sketching over row-normalized, sequence-based sliding windows. We also present matching upper and lower space bounds for time-based and unnormalized sliding windows, demonstrating the generality and optimality of \dsfd across various sliding window models. This conclusively answers the open question regarding the optimal space bound for matrix sketching over sliding windows. Furthermore, we conduct extensive experiments with both synthetic and real-world datasets, validating our theoretical claims and thus confirming the correctness and effectiveness of our algorithm, both theoretically and empirically.
Geuvers and Jacobs (LMCS 2021) formulated the notion of apartness relation on state-based systems modelled as coalgebras. In this context apartness is formally dual to bisimilarity, and gives an explicit proof system for showing that certain states are not bisimilar. In the current paper, we relate apartness to another classical element of the theory of behavioural equivalences: that of turn-based two-player games. Studying both strong and branching bisimilarity, we show that winning configurations for the Spoiler player correspond to apartness proofs, for transition systems that are image-finite (in the case of strong bisimilarity) and finite (in the case of branching bisimilarity).
Let $\mathcal{D}$ be a set family that is the solution domain of some combinatorial problem. The \emph{max-min diversification problem on $\mathcal{D}$} is the problem to select $k$ sets from $\mathcal{D}$ such that the Hamming distance between any two selected sets is at least $d$. FPT algorithms parameterized by $k,l:=\max_{D\in \mathcal{D}}|D|$ and $k,d$ have been actively studied recently for several specific domains. This paper provides unified algorithmic frameworks to solve this problem. Specifically, for each parameterization $k,l$ and $k,d$, we provide an FPT oracle algorithm for the max-min diversification problem using oracles related to $\mathcal{D}$. We then demonstrate that our frameworks generalize most of the existing domain-specific tractability results and provide the first FPT algorithms for several domains. Our main technical breakthrough is introducing the notion of \emph{max-distance sparsifier} of $\mathcal{D}$, a domain on which the max-min diversification problem is equivalent to the same problem on the original domain $\mathcal{D}$. The core of our framework is to design FPT oracle algorithms that construct a constant-size max-distance sparsifier of $\mathcal{D}$. Using max-distance sparsifiers, we provide FPT algorithms for the max-min and max-sum diversification problems on $\mathcal{D}$, as well as $k$-center and $k$-sum-of-radii clustering problems on $\mathcal{D}$, which are also natural problems in the context of diversification and have their own interests.