We study the parameterized problem of satisfying ``almost all'' constraints of a given formula $F$ over a fixed, finite Boolean constraint language $\Gamma$, with or without weights. More precisely, for each finite Boolean constraint language $\Gamma$, we consider the following two problems. In Min SAT$(\Gamma)$, the input is a formula $F$ over $\Gamma$ and an integer $k$, and the task is to find an assignment $\alpha \colon V(F) \to \{0,1\}$ that satisfies all but at most $k$ constraints of $F$, or determine that no such assignment exists. In Weighted Min SAT$(\Gamma$), the input additionally contains a weight function $w \colon F \to \mathbb{Z}_+$ and an integer $W$, and the task is to find an assignment $\alpha$ such that (1) $\alpha$ satisfies all but at most $k$ constraints of $F$, and (2) the total weight of the violated constraints is at most $W$. We give a complete dichotomy for the fixed-parameter tractability of these problems: We show that for every Boolean constraint language $\Gamma$, either Weighted Min SAT$(\Gamma)$ is FPT; or Weighted Min SAT$(\Gamma)$ is W[1]-hard but Min SAT$(\Gamma)$ is FPT; or Min SAT$(\Gamma)$ is W[1]-hard. This generalizes recent work of Kim et al. (SODA 2021) which did not consider weighted problems, and only considered languages $\Gamma$ that cannot express implications $(u \to v)$ (as is used to, e.g., model digraph cut problems). Our result generalizes and subsumes multiple previous results, including the FPT algorithms for Weighted Almost 2-SAT, weighted and unweighted $\ell$-Chain SAT, and Coupled Min-Cut, as well as weighted and directed versions of the latter. The main tool used in our algorithms is the recently developed method of directed flow-augmentation (Kim et al., STOC 2022).
In formal languages and automata theory, the magic number problem can be formulated as follows: for a given integer n, is it possible to find a number d in the range [n,2^n] such that there is no minimal deterministic finite automaton with d states that can be simulated by an optimal nondeterministic finite automaton with exactly n states? If such a number d exists, it is called magic. In this paper, we consider the magic number problem in the framework of deterministic automata with output, which are known to characterize automatic sequences. More precisely, we investigate magic numbers for periodic sequences viewed as either automatic, regular, or constant-recursive.
Given a convex function $f$ on $\mathbb{R}^n$ with an integer minimizer, we show how to find an exact minimizer of $f$ using $O(n^2 \log n)$ calls to a separation oracle and $O(n^4 \log n)$ time. The previous best polynomial time algorithm for this problem given in [Jiang, SODA 2021, JACM 2022] achieves $\widetilde{O}(n^2)$ oracle complexity. However, the overall runtime of Jiang's algorithm is at least $\widetilde{\Omega}(n^8)$, due to expensive sub-routines such as the Lenstra-Lenstra-Lov\'asz (LLL) algorithm [Lenstra, Lenstra, Lov\'asz, Math. Ann. 1982] and random walk based cutting plane method [Bertsimas, Vempala, JACM 2004]. Our significant speedup is obtained by a nontrivial combination of a faster version of the LLL algorithm due to [Neumaier, Stehl\'e, ISSAC 2016] that gives similar guarantees, the volumetric center cutting plane method (CPM) by [Vaidya, FOCS 1989] and its fast implementation given in [Jiang, Lee, Song, Wong, STOC 2020]. For the special case of submodular function minimization (SFM), our result implies a strongly polynomial time algorithm for this problem using $O(n^3 \log n)$ calls to an evaluation oracle and $O(n^4 \log n)$ additional arithmetic operations. Both the oracle complexity and the number of arithmetic operations of our more general algorithm are better than the previous best-known runtime algorithms for this specific problem given in [Lee, Sidford, Wong, FOCS 2015] and [Dadush, V\'egh, Zambelli, SODA 2018, MOR 2021].
Given an undirected graph $G=(V,E)$ and an integer $\ell$, the Eccentricity Shortest Path (ESP) asks to find a shortest path $P$ such that for every vertex $v\in V(G)$, there is a vertex $w\in P$ such that $d_G(v,w)\leq \ell$, where $d_G(v,w)$ represents the distance between $v$ and $w$ in $G$. Dragan and Leitert [Theor. Comput. Sci. 2017] showed that the optimization version of this problem, which asks to find the minimum $\ell$ for the ESP problem, is NP-hard even on planar bipartite graphs with maximum degree 3. They also showed that ESP is W[2]-hard when parameterized by $\ell$. On the positive side, Ku\v cera and Such\'y [IWOCA 2021] showed that the problem exhibits fixed parameter tractable (FPT) behavior when parameterized by modular width, cluster vertex deletion set, maximum leaf number, or the combined parameters disjoint paths deletion set and $\ell$. It was asked as an open question in the above paper, if ESP is FPT parameterized by disjoint paths deletion set or feedback vertex set. We answer these questions partially and obtain the following results: - ESP is FPT when parameterized by disjoint paths deletion set, split vertex deletion set or the combined parameters feedback vertex set and eccentricity of the graph. - We design a $(1+\epsilon)$-factor FPT approximation algorithm when parameterized by the feedback vertex set number. - ESP is W[2]-hard when parameterized by the chordal vertex deletion set.
Control of the ordering of transactions in modern blockchains can be extremely profitable. Rather than allow one central actor to control this revenue source, recent research has studied mechanisms for decentralizing the process of computing an ordering among multiple, distributed replicas. This problem is akin to the classic problem from social choice theory of aggregating ordinal votes, applied to a streaming setting. Prior work proposes a ``$\gamma$-batch-order-fairness'' requirement on the aggregate ordering. Under this requirement, the ordering should be divisible into contiguous batches, and when a $\gamma$ fraction of replicas receive $tx$ before $tx^\prime$, then $tx^\prime$ cannot be in an earlier batch than $tx$. We extend this definition to formalize the notion that these batches should have minimal size, thereby giving the first notion of order fairness that cannot be vacuously satisfied (by arbitrarily large batches) and that can be satisfied in the presence of faulty replicas. We then show that the Ranked Pairs aggregation method produces an ordering that satisfies our fairness definition for every choice of parameter $\gamma$ simultaneously and for any number of faulty replicas (where fairness guarantees linearly degrade as the fraction of faulty replicas increases). We then instantiate our protocol in the streaming setting. Careful analysis of the interactions between ordering dependencies enables our protocol to simulate Ranked Pairs voting in this setting, and adjustments to ordering algorithm give a protocol that (under synchronous network assumptions) always appends a transaction to the output ordering after a bounded amount of time.
It is known that, for every $k\geq 2$, $C_{2k}$-freeness can be decided by a generic Monte-Carlo algorithm running in $n^{1-1/\Theta(k^2)}$ rounds in the CONGEST model. For $2\leq k\leq 5$, faster Monte-Carlo algorithms do exist, running in $O(n^{1-1/k})$ rounds, based on upper bounding the number of messages to be forwarded, and aborting search sub-routines for which this number exceeds certain thresholds. We investigate the possible extension of these threshold-based algorithms, for the detection of larger cycles. We first show that, for every $k\geq 6$, there exists an infinite family of graphs containing a $2k$-cycle for which any threshold-based algorithm fails to detect that cycle. Hence, in particular, neither $C_{12}$-freeness nor $C_{14}$-freeness can be decided by threshold-based algorithms. Nevertheless, we show that $\{C_{12},C_{14}\}$-freeness can still be decided by a threshold-based algorithm, running in $O(n^{1-1/7})= O(n^{0.857\dots})$ rounds, which is faster than using the generic algorithm, which would run in $O(n^{1-1/22})\simeq O(n^{0.954\dots})$ rounds. Moreover, we exhibit an infinite collection of families of cycles such that threshold-based algorithms can decide $\mathcal{F}$-freeness for every $\mathcal{F}$ in this collection.
We consider the classic 1-center problem: Given a set $P$ of $n$ points in a metric space find the point in $P$ that minimizes the maximum distance to the other points of $P$. We study the complexity of this problem in $d$-dimensional $\ell_p$-metrics and in edit and Ulam metrics over strings of length $d$. Our results for the 1-center problem may be classified based on $d$ as follows. $\bullet$ Small $d$: Assuming the hitting set conjecture (HSC), we show that when $d=\omega(\log n)$, no subquadratic algorithm can solve 1-center problem in any of the $\ell_p$-metrics, or in edit or Ulam metrics. $\bullet$ Large $d$: When $d=\Omega(n)$, we extend our conditional lower bound to rule out subquartic algorithms for 1-center problem in edit metric (assuming Quantified SETH). On the other hand, we give a $(1+\epsilon)$-approximation for 1-center in Ulam metric with running time $\tilde{O_{\varepsilon}}(nd+n^2\sqrt{d})$. We also strengthen some of the above lower bounds by allowing approximations or by reducing the dimension $d$, but only against a weaker class of algorithms which list all requisite solutions. Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of $n$ strings each of length $n$, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.
Probabilities of Causation play a fundamental role in decision making in law, health care and public policy. Nevertheless, their point identification is challenging, requiring strong assumptions such as monotonicity. In the absence of such assumptions, existing work requires multiple observations of datasets that contain the same treatment and outcome variables, in order to establish bounds on these probabilities. However, in many clinical trials and public policy evaluation cases, there exist independent datasets that examine the effect of a different treatment each on the same outcome variable. Here, we outline how to significantly tighten existing bounds on the probabilities of causation, by imposing counterfactual consistency between SCMs constructed from such independent datasets ('causal marginal problem'). Next, we describe a new information theoretic approach on falsification of counterfactual probabilities, using conditional mutual information to quantify counterfactual influence. The latter generalises to arbitrary discrete variables and number of treatments, and renders the causal marginal problem more interpretable. Since the question of 'tight enough' is left to the user, we provide an additional method of inference when the bounds are unsatisfactory: A maximum entropy based method that defines a metric for the space of plausible SCMs and proposes the entropy maximising SCM for inferring counterfactuals in the absence of more information.
We consider a generalized poset sorting problem (GPS), in which we are given a query graph $G = (V, E)$ and an unknown poset $\mathcal{P}(V, \prec)$ that is defined on the same vertex set $V$, and the goal is to make as few queries as possible to edges in $G$ in order to fully recover $\mathcal{P}$, where each query $(u, v)$ returns the relation between $u, v$, i.e., $u \prec v$, $v \prec u$ or $u \not \sim v$. This generalizes both the poset sorting problem [Faigle et al., SICOMP 88] and the generalized sorting problem [Huang et al., FOCS 11]. We give algorithms with $\tilde{O}(n\cdot \mathrm{poly}(k))$ query complexity when $G$ is a complete bipartite graph or $G$ is stochastic under the \ER model, where $k$ is the \emph{width} of the poset, and these generalize [Daskalakis et al., SICOMP 11] which only studies complete graph $G$. Both results are based on a unified framework that reduces the poset sorting to partitioning the vertices with respect to a given pivot element, which may be of independent interest. Our study of GPS also leads to a new $\tilde{O}(n^{1 - 1 / (2W)})$ competitive ratio for the so-called weighted generalized sorting problem where $W$ is the number of distinct weights in the query graph. This problem was considered as an open question in [Charikar et al., JCSS 02], and our result makes important progress as it yields the first nontrivial $\tilde{O}(n)$ ratio for general weighted query graphs (and better ratio if $W$ is bounded). We obtain this via an $\tilde{O}(nk + n^{1.5})$ query complexity algorithm for the case where every edge in $G$ is guaranteed to be comparable in the poset, which generalizes the state-of-the-art $\tilde{O}(n^{1.5})$ bound for generalized sorting [Huang et al., FOCS 11].
Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper is to provide concentration inequalities on the distance between the sets of minimizers of the risks for a broad spectrum of estimation problems. In particular, the risks are defined on metric spaces through probability measures that are also supported on metric spaces. A particular attention will therefore be given to include unbounded spaces and non-convex cost functions that might also be unbounded. This work identifies a set of assumptions allowing to describe a regime that seem to govern the concentration in many estimation problems, where the empirical minimizers are stable. This stability can then be leveraged to prove parametric concentration rates in probability and in expectation. The assumptions are verified, and the bounds showcased, on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.
We present a parameterized dichotomy for the \textsc{$k$-Sparsest Cut} problem in weighted and unweighted versions. In particular, we show that the weighted \textsc{$k$-Sparsest Cut} problem is NP-hard for every $k\geq 3$ even on graphs with bounded vertex cover number. Also, the unweighted \textsc{$k$-Sparsest Cut} problem is W[1]-hard when parameterized by the three combined parameters tree-depth, feedback vertex set number, and $k$. On the positive side, we show that unweighted \textsc{$k$-Sparsest Cut} problem is FPT when parameterized by the vertex cover number and $k$, and when $k$ is fixed, it is FPT with respect to the treewidth. Moreover, we show that the generalized version \textsc{$k$-Small-Set Expansion} problem is FPT when parameterized by $k$ and the maximum degree of the graph, though it is W[1]-hard for each of these parameters separately.