An Eulerian circuit in a directed graph is one of the most fundamental Graph Theory notions. Detecting if a graph $G$ has a unique Eulerian circuit can be done in polynomial time via the BEST theorem by de Bruijn, van Aardenne-Ehrenfest, Smith and Tutte, 1941-1951 (involving counting arborescences), or via a tailored characterization by Pevzner, 1989 (involving computing the intersection graph of simple cycles of $G$), both of which thus rely on overly complex notions for the simpler uniqueness problem. In this paper we give a new linear-time checkable characterization of directed graphs with a unique Eulerian circuit. This is based on a simple condition of when two edges must appear consecutively in all Eulerian circuits, in terms of cut nodes of the underlying undirected graph of $G$. As a by-product, we can also compute in linear-time all maximal $\textit{safe}$ walks appearing in all Eulerian circuits, for which Nagarajan and Pop proposed in 2009 a polynomial-time algorithm based on Pevzner characterization.
We prove new lower bounds on the modularity of graphs. Specifically, the modularity of a graph $G$ with average degree $\bar d$ is $\Omega(\bar{d}^{-1/2})$, under some mild assumptions on the degree sequence of $G$. The lower bound $\Omega(\bar{d}^{-1/2})$ applies, for instance, to graphs with a power-law degree sequence or a near-regular degree sequence. It has been suggested that the relatively high modularity of the Erd\H{o}s-R\'enyi random graph $G_{n,p}$ stems from the random fluctuations in its edge distribution, however our results imply high modularity for any graph with a degree sequence matching that typically found in $G_{n,p}$. The proof of the new lower bound relies on certain weight-balanced bisections with few cross-edges, which build on ideas of Alon [Combinatorics, Probability and Computing (1997)] and may be of independent interest.
The last decade has seen many attempts to generalise the definition of modes, or MAP estimators, of a probability distribution $\mu$ on a space $X$ to the case that $\mu$ has no continuous Lebesgue density, and in particular to infinite-dimensional Banach and Hilbert spaces $X$. This paper examines the properties of and connections among these definitions. We construct a systematic taxonomy -- or `periodic table' -- of modes that includes the established notions as well as large hitherto-unexplored classes. We establish implications between these definitions and provide counterexamples to distinguish them. We also distinguish those definitions that are merely `grammatically correct' from those that are `meaningful' in the sense of satisfying certain `common-sense' axioms for a mode, among them the correct handling of discrete measures and those with continuous Lebesgue densities. However, despite there being 17 such `meaningful' definitions of mode, we show that none of them satisfy the `merging property', under which the modes of $\mu|_{A}$, $\mu|_{B}$ and $\mu|_{A \cup B}$ enjoy a straightforward relationship for well-separated positive-mass events $A,B \subseteq X$.
Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.
We define a $q$-linear path in a hypergraph $H$ as a sequence $(e_1,\ldots,e_L)$ of edges of $H$ such that $|e_i \cap e_{i+1}| \in [\![1,q]\!]$ and $e_i \cap e_j=\varnothing$ if $|i-j|>1$. In this paper, we study the connected components associated to these paths when $q=k-2$ where $k$ is the rank of $H$. If $k=3$ then $q=1$ which coincides with the well-known notion of linear path or loose path. We describe the structure of the connected components, using an algorithmic proof which shows that the connected components can be computed in polynomial time. We then mention two consequences of our algorithmic result. The first one is that deciding the winner of the Maker-Breaker game on a hypergraph of rank 3 can be done in polynomial time. The second one is that tractable cases for the NP-complete problem of "Paths Avoiding Forbidden Pairs" in a graph can be deduced from the recognition of a special type of line graph of a hypergraph.
Given a set of squares and a strip of bounded width and infinite height, we consider a square strip packaging problem, which we call the square independent packing problem (SIPP), to minimize the strip height so that all the squares are packed into independent cells separated by horizontal and vertical partitions. For the SIPP, we first investigate efficient solution representations and propose a compact representation that reduces the search space from $\Omega(n!)$ to $O(2^n)$, with $n$ the number of given squares, while guaranteeing that there exists a solution representation that corresponds to an optimal solution. Based on the solution representation, we show that the problem is NP-hard, and then we propose a fully polynomial-time approximation scheme (FPTAS) to solve it. We also propose three mathematical programming formulations based on different solution representations and confirm the performance of these algorithms through computational experiments. Finally, we discuss several extensions that are relevant to practical applications.
Online polarization research currently focuses on studying single-issue opinion distributions or computing distance metrics of interaction network structures. Limited data availability often restricts studies to positive interaction data, which can misrepresent the reality of a discussion. We introduce a novel framework that aims at combining these three aspects, content and interactions, as well as their nature (positive or negative), while challenging the prevailing notion of polarization as an umbrella term for all forms of online conflict or opposing opinions. In our approach, built on the concepts of cleavage structures and structural balance of signed social networks, we factorize polarization into two distinct metrics: Antagonism and Alignment. Antagonism quantifies hostility in online discussions, based on the reactions of users to content. Alignment uses signed structural information encoded in long-term user-user relations on the platform to describe how well user interactions fit the global and/or traditional sides of discussion. We can analyse the change of these metrics through time, localizing both relevant trends but also sudden changes that can be mapped to specific contexts or events. We apply our methods to two distinct platforms: Birdwatch, a US crowd-based fact-checking extension of Twitter, and DerStandard, an Austrian online newspaper with discussion forums. In these two use cases, we find that our framework is capable of describing the global status of the groups of users (identification of cleavages) while also providing relevant findings on specific issues or in specific time frames. Furthermore, we show that our four metrics describe distinct phenomena, emphasizing their independent consideration for unpacking polarization complexities.
The main result of this paper is the discretization of Hamiltonian systems of the form $\ddot x = -K \nabla W(x)$, where $K$ is a constant symmetric matrix and $W\colon\mathbb{R}^n\to \mathbb{R}$ is a polynomial of degree $d\le 4$ in any number of variables $n$. The discretization uses the method of polarization and preserves both the energy and the invariant measure of the differential equation, as well as the dimension of the phase space. This generalises earlier work for discretizations of first order systems with $d=3$, and of second order systems with $d=4$ and $n=1$.
We consider the problem of query-efficient global max-cut on a weighted undirected graph in the value oracle model examined by [RSW18]. This model arises as a natural special case of submodular function maximization: on query $S \subseteq V$, the oracle returns the total weight of the cut between $S$ and $V \backslash S$. For most constants $c \in (0,1]$, we nail down the query complexity of achieving a $c$-approximation, for both deterministic and randomized algorithms (up to logarithmic factors). Analogously to general submodular function maximization in the same model, we observe a phase transition at $c = 1/2$: we design a deterministic algorithm for global $c$-approximate max-cut in $O(\log n)$ queries for any $c < 1/2$, and show that any randomized algorithm requires $\tilde{\Omega}(n)$ queries to find a $c$-approximate max-cut for any $c > 1/2$. Additionally, we show that any deterministic algorithm requires $\Omega(n^2)$ queries to find an exact max-cut (enough to learn the entire graph), and develop a $\tilde{O}(n)$-query randomized $c$-approximation for any $c < 1$. Our approach provides two technical contributions that may be of independent interest. One is a query-efficient sparsifier for undirected weighted graphs (prior work of [RSW18] holds only for unweighted graphs). Another is an extension of the cut dimension to rule out approximation (prior work of [GPRW20] introducing the cut dimension only rules out exact solutions).
Classical statistical theory has been developed under the assumption that the data belongs to a linear space. However, in many applications the intrinsic geometry of the data is more intricate. Neglecting this frequently yields suboptimal or outright unuseable results, i.e., taking the pixel-wise average of images typically results in noise. Incorporating the intrinsic geometry of a dataset into statistical analysis is a highly non-trivial task. In fact different underlying geometries necessitate different approaches, and allow for results of varying strength. Perhaps the most common non-linear geometries appearing in statistical applications are metric spaces of non-positive curvature, such as the manifold of symmetric, positive (semi-)definite matrices. In this paper we introduce a (strong) law of large numbers for independent, but not necessarily identically distributed random variables taking values in complete spaces of non-positive curvature. Using this law of large numbers, we justify a stochastic approximation scheme for the limit of Fr\'echet means on such spaces. Apart from rendering the problem of computing Fr\'echet means computationally more tractable, the structure of this scheme suggests, that averaging operations on Hadamard spaces are more stable than previous results would suggest.
We consider the classic 1-center problem: Given a set $P$ of $n$ points in a metric space find the point in $P$ that minimizes the maximum distance to the other points of $P$. We study the complexity of this problem in $d$-dimensional $\ell_p$-metrics and in edit and Ulam metrics over strings of length $d$. Our results for the 1-center problem may be classified based on $d$ as follows. $\bullet$ Small $d$: Assuming the hitting set conjecture (HSC), we show that when $d=\omega(\log n)$, no subquadratic algorithm can solve 1-center problem in any of the $\ell_p$-metrics, or in edit or Ulam metrics. $\bullet$ Large $d$: When $d=\Omega(n)$, we extend our conditional lower bound to rule out subquartic algorithms for 1-center problem in edit metric (assuming Quantified SETH). On the other hand, we give a $(1+\epsilon)$-approximation for 1-center in Ulam metric with running time $\tilde{O_{\varepsilon}}(nd+n^2\sqrt{d})$. We also strengthen some of the above lower bounds by allowing approximations or by reducing the dimension $d$, but only against a weaker class of algorithms which list all requisite solutions. Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of $n$ strings each of length $n$, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.