This paper introduces AL$\ell_0$CORE, a new form of probabilistic non-negative tensor decomposition. AL$\ell_0$CORE is a Tucker decomposition where the number of non-zero elements (i.e., the $\ell_0$-norm) of the core tensor is constrained to a preset value $Q$ much smaller than the size of the core. While the user dictates the total budget $Q$, the locations and values of the non-zero elements are latent variables and allocated across the core tensor during inference. AL$\ell_0$CORE -- i.e., $allo$cated $\ell_0$-$co$nstrained $core$-- thus enjoys both the computational tractability of CP decomposition and the qualitatively appealing latent structure of Tucker. In a suite of real-data experiments, we demonstrate that AL$\ell_0$CORE typically requires only tiny fractions (e.g.,~1%) of the full core to achieve the same results as full Tucker decomposition at only a correspondingly tiny fraction of the cost.
We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for the convergence of expectations and covariance matrices of the iterates are derived. The results shed more light on the widely cited connection between dropout and l2-regularization in the linear model. We indicate a more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout. Further, we study a simplified variant of dropout which does not have a regularizing effect and converges to the least squares estimator
MAG$\pi$ is a Multiparty, Asynchronous and Generalised $\pi$-calculus that introduces timeouts into session types as a means of reasoning about failure-prone communication. Its type system guarantees that all possible message-loss is handled by timeout branches. In this work, we argue that the previous is unnecessarily strict. We present MAG$\pi$!, an extension serving as the first introduction of replication into Multiparty Session Types (MPST). Replication is a standard $\pi$-calculus construct used to model infinitely available servers. We lift this construct to type-level, and show that it simplifies specification of distributed client-server interactions. We prove properties relevant to generalised MPST: subject reduction, session fidelity and process property verification.
We give a poly-time algorithm for the $k$-edge-connected spanning subgraph ($k$-ECSS) problem that returns a solution of cost no greater than the cheapest $(k+10)$-ECSS on the same graph. Our approach enhances the iterative relaxation framework with a new ingredient, which we call ghost values, that allows for high sparsity in intermediate problems. Our guarantees improve upon the best-known approximation factor of $2$ for $k$-ECSS whenever the optimal value of $(k+10)$-ECSS is close to that of $k$-ECSS. This is a property that holds for the closely related problem $k$-edge-connected spanning multi-subgraph ($k$-ECSM), which is identical to $k$-ECSS except edges can be selected multiple times at the same cost. As a consequence, we obtain a $\left(1+O\left(\frac{1}{k}\right)\right)$-approximation algorithm for $k$-ECSM, which resolves a conjecture of Pritchard and improves upon a recent $\left(1+O\left(\frac{1}{\sqrt{k}}\right)\right)$-approximation algorithm of Karlin, Klein, Oveis Gharan, and Zhang. Moreover, we present a matching lower bound for $k$-ECSM, showing that our approximation ratio is tight up to the constant factor in $O\left(\frac{1}{k}\right)$, unless $P=NP$.
In biological tasks, data is rarely plentiful as it is generated from hard-to-gather measurements. Therefore, pre-training foundation models on large quantities of available data and then transfer to low-data downstream tasks is a promising direction. However, how to design effective foundation models for molecular learning remains an open question, with existing approaches typically focusing on models with large parameter capacities. In this work, we propose $\texttt{MiniMol}$, a foundational model for molecular learning with 10 million parameters. $\texttt{MiniMol}$ is pre-trained on a mix of roughly 3300 sparsely defined graph- and node-level tasks of both quantum and biological nature. The pre-training dataset includes approximately 6 million molecules and 500 million labels. To demonstrate the generalizability of $\texttt{MiniMol}$ across tasks, we evaluate it on downstream tasks from the Therapeutic Data Commons (TDC) ADMET group showing significant improvements over the prior state-of-the-art foundation model across 17 tasks. $\texttt{MiniMol}$ will be a public and open-sourced model for future research.
A graph is called $\alpha_i$-metric ($i \in {\cal N}$) if it satisfies the following $\alpha_i$-metric property for every vertices $u, w, v$ and $x$: if a shortest path between $u$ and $w$ and a shortest path between $x$ and $v$ share a terminal edge $vw$, then $d(u,x) \ge d(u,v) + d(v,x) - i$. The latter is a discrete relaxation of the property that in Euclidean spaces the union of two geodesics sharing a terminal segment must be also a geodesic. Recently in (Dragan & Ducoffe, WG'23) we initiated the study of the algorithmic applications of $\alpha_i$-metric graphs. Our results in this prior work were very similar to those established in (Chepoi et al., SoCG'08) and (Chepoi et al., COCOA'18) for graphs with bounded hyperbolicity. The latter is a heavily studied metric tree-likeness parameter first introduced by Gromov. In this paper, we clarify the relationship between hyperbolicity and the $\alpha_i$-metric property, proving that $\alpha_i$-metric graphs are $f(i)$-hyperbolic for some function $f$ linear in $i$. We give different proofs of this result, using various equivalent definitions to graph hyperbolicity. By contrast, we give simple constructions of $1$-hyperbolic graphs that are not $\alpha_i$-metric for any constant $i$. Finally, in the special case of $i=1$, we prove that $\alpha_1$-metric graphs are $1$-hyperbolic, and the bound is sharp. By doing so, we can answer some questions left open in (Dragan & Ducoffe, WG'23).
An $\mathsf{F}_{d}$ upper bound for the reachability problem in vector addition systems with states (VASS) in fixed dimension is given, where $\mathsf{F}_d$ is the $d$-th level of the Grzegorczyk hierarchy of complexity classes. The new algorithm combines the idea of the linear path scheme characterization of the reachability in the $2$-dimension VASSes with the general decomposition algorithm by Mayr, Kosaraju and Lambert. The result improves the $\mathsf{F}_{d + 4}$ upper bound due to Leroux and Schmitz (LICS 2019).
No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T^{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.
For a graph $G$, a $D$-diameter-reducing exact hopset is a small set of additional edges $H$ that, when added to $G$, maintains its graph metric but guarantees that all node pairs have a shortest path in $G \cup H$ using at most $D$ edges. A shortcut set is the analogous concept for reachability. These objects have been studied since the early '90s due to applications in parallel, distributed, dynamic, and streaming graph algorithms. For most of their history, the state-of-the-art construction for either object was a simple folklore algorithm, based on randomly sampling nodes to hit long paths in the graph. However, recent breakthroughs of Kogan and Parter [SODA '22] and Bernstein and Wein [SODA '23] have finally improved over the folklore diameter bound of $\widetilde{O}(n^{1/2})$ for shortcut sets and for $(1+\epsilon)$-approximate hopsets. For both objects it is now known that one can use $O(n)$ hop-edges to reduce diameter to $\widetilde{O}(n^{1/3})$. The only setting where folklore sampling remains unimproved is for exact hopsets. Can these improvements be continued? We settle this question negatively by constructing graphs on which any exact hopset of $O(n)$ edges has diameter $\widetilde{\Omega}(n^{1/2})$. This improves on the previous lower bound of $\widetilde{\Omega}(n^{1/3})$ by Kogan and Parter [FOCS '22]. Using similar ideas, we also polynomially improve the current lower bounds for shortcut sets, constructing graphs on which any shortcut set of $O(n)$ edges reduces diameter to $\widetilde{\Omega}(n^{1/4})$. This improves on the previous lower bound of $\Omega(n^{1/6})$ by Huang and Pettie [SIAM J. Disc. Math. '18]. We also extend our constructions to provide lower bounds against $O(p)$-size exact hopsets and shortcut sets for other values of $p$; in particular, we show that folklore sampling is near-optimal for exact hopsets in the entire range of $p \in [1, n^2]$.
We determine all functional closure properties of finite $\mathbb{N}$-weighted automata, even all multivariate ones, and in particular all multivariate polynomials. We also determine all univariate closure properties in the promise setting, and all multivariate closure properties under certain assumptions on the promise, in particular we determine all multivariate closure properties where the output vector lies on a monotone algebraic graph variety.
We present a simple semi-streaming algorithm for $(1-\epsilon)$-approximation of bipartite matching in $O(\log{\!(n)}/\epsilon)$ passes. This matches the performance of state-of-the-art "$\epsilon$-efficient" algorithms -- the ones with much better dependence on $\epsilon$ albeit with some mild dependence on $n$ -- while being considerably simpler. The algorithm relies on a direct application of the multiplicative weight update method with a self-contained primal-dual analysis that can be of independent interest. To show case this, we use the same ideas, alongside standard tools from matching theory, to present an equally simple semi-streaming algorithm for $(1-\epsilon)$-approximation of weighted matchings in general (not necessarily bipartite) graphs, again in $O(\log{\!(n)}/\epsilon)$ passes.