We present a quasilinear time algorithm to decide the word problem on a natural algebraic structures we call orthocomplemented bisemilattices, a subtheory of boolean algebra. We use as a base a variation of Hopcroft, Ullman and Aho algorithm for tree isomorphism which we combine with a term rewriting system to decide equivalence of two terms. We prove that the rewriting system is terminating and confluent and hence the existence of a normal form, and that our algorithm is computing it. We also discuss applications and present an effective implementation in Scala.
A graph $G$ is called self-ordered (a.k.a asymmetric) if the identity permutation is its only automorphism. Equivalently, there is a unique isomorphism from $G$ to any graph that is isomorphic to $G$. We say that $G=(V,E)$ is robustly self-ordered if the size of the symmetric difference between $E$ and the edge-set of the graph obtained by permuting $V$ using any permutation $\pi:V\to V$ is proportional to the number of non-fixed-points of $\pi$. In this work, we initiate the study of the structure, construction and utility of robustly self-ordered graphs. We show that robustly self-ordered bounded-degree graphs exist (in abundance), and that they can be constructed efficiently, in a strong sense. Specifically, given the index of a vertex in such a graph, it is possible to find all its neighbors in polynomial-time (i.e., in time that is poly-logarithmic in the size of the graph). We also consider graphs of unbounded degree, seeking correspondingly unbounded robustness parameters. We again demonstrate that such graphs (of linear degree) exist (in abundance), and that they can be constructed efficiently, in a strong sense. This turns out to require very different tools. Specifically, we show that the construction of such graphs reduces to the construction of non-malleable two-source extractors (with very weak parameters but with some additional natural features). We demonstrate that robustly self-ordered bounded-degree graphs are useful towards obtaining lower bounds on the query complexity of testing graph properties both in the bounded-degree and the dense graph models. One of the results that we obtain, via such a reduction, is a subexponential separation between the query complexities of testing and tolerant testing of graph properties in the bounded-degree graph model.
Many real-world voting systems consist of voters that occur in just two different types. Indeed, each voting system with a {\lq\lq}House{\rq\rq} and a {\lq\lq}Senat{\rq\rq} is of that type. Here we present structural characterizations and explicit enumeration formulas for these so-called bipartite simple games.
We consider a fundamental online scheduling problem in which jobs with processing times and deadlines arrive online over time at their release dates. The task is to determine a feasible preemptive schedule on a single or multiple possibly unrelated machines that maximizes the number of jobs that complete before their deadline. Due to strong impossibility results for competitive analysis on a single machine, we require that jobs contain some slack $\varepsilon>0$, which means that the feasible time window for scheduling a job is at least $1+\varepsilon$ times its processing time on each eligible machine. Our contribution is two-fold: (i) We give the first non-trivial online algorithms for throughput maximization on unrelated machines, and (ii), this is the main focus of our paper, we answer the question on how to handle commitment requirements which enforce that a scheduler has to guarantee at a certain point in time the completion of admitted jobs. This is very relevant, e.g., in providing cloud-computing services, and disallows last-minute rejections of critical tasks. We present an algorithm for unrelated machines that is $\Theta\big(\frac{1}\varepsilon\big )$-competitive when the scheduler must commit upon starting a job. Somewhat surprisingly, this is the same optimal performance bound (up to constants) as for scheduling without commitment on a single machine. If commitment decisions must be made before a job's slack becomes less than a $\delta$-fraction of its size, we prove a competitive ratio of $\mathcal{O}\big(\frac{1}{\varepsilon - \delta}\big)$ for $0 < \delta < \varepsilon$. This result nicely interpolates between commitment upon starting a job and commitment upon arrival. For the latter commitment model, it is known that no (randomized) online algorithm admits any bounded competitive ratio.
In 1961, Gomory and Hu showed that the max-flow values of all $n\choose 2$ pairs of vertices in an undirected graph can be computed using only $n-1$ calls to any max-flow algorithm, and succinctly represented them in a tree (called the Gomory-Hu tree later). Even assuming a linear-time max-flow algorithm, this yields a running time of $O(mn)$ for this problem; with current max-flow algorithms, the running time is $\tilde{O}(mn + n^{5/2})$. We break this 60-year old barrier by giving an $\tilde{O}(n^{2})$-time algorithm for the Gomory-Hu tree problem, already with current max-flow algorithms. For unweighted graphs, our techniques show equivalence (up to poly-logarithmic factors in running time) between Gomory-Hu tree (i.e., all-pairs max-flow values) and a single-pair max-flow.
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is traditionally considered an unsupervised learning task. In recent years, the use of background knowledge to improve the cluster quality and promote interpretability of the clustering process has become a hot research topic at the intersection of mathematical optimization and machine learning research. The problem of taking advantage of background information in data clustering is called semi-supervised or constrained clustering. In this paper, we present a new branch-and-bound algorithm for semi-supervised MSSC, where background knowledge is incorporated as pairwise must-link and cannot-link constraints. For the lower bound procedure, we solve the semidefinite programming relaxation of the MSSC discrete optimization model, and we use a cutting-plane procedure for strengthening the bound. For the upper bound, instead, by using integer programming tools, we propose an adaptation of the k-means algorithm to the constrained case. For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points with different combinations of must-link and cannot-link constraints and with a generic number of features. This problem size is about four times larger than the one of the instances solved by state-of-the-art exact algorithms.
Identification theory for causal effects in causal models associated with hidden variable directed acyclic graphs (DAGs) is well studied. However, the corresponding algorithms are underused due to the complexity of estimating the identifying functionals they output. In this work, we bridge the gap between identification and estimation of population-level causal effects involving a single treatment and a single outcome. We derive influence function based estimators that exhibit double robustness for the identified effects in a large class of hidden variable DAGs where the treatment satisfies a simple graphical criterion; this class includes models yielding the adjustment and front-door functionals as special cases. We also provide necessary and sufficient conditions under which the statistical model of a hidden variable DAG is nonparametrically saturated and implies no equality constraints on the observed data distribution. Further, we derive an important class of hidden variable DAGs that imply observed data distributions observationally equivalent (up to equality constraints) to fully observed DAGs. In these classes of DAGs, we derive estimators that achieve the semiparametric efficiency bounds for the target of interest where the treatment satisfies our graphical criterion. Finally, we provide a sound and complete identification algorithm that directly yields a weight based estimation strategy for any identifiable effect in hidden variable causal models.
Object-oriented programming (OOP) is one of the most popular paradigms used for building software systems. However, despite its industrial and academic popularity, OOP is still missing a formal apparatus similar to lambda-calculus, which functional programming is based on. There were a number of attempts to formalize OOP, but none of them managed to cover all the features available in modern OO programming languages, such as C++ or Java. We have made yet another attempt and created phi-calculus. We also created EOLANG (also called EO), an experimental programming language based on phi-calculus.
An interesting thread in the research of Boolean functions for cryptography and coding theory is the study of secondary constructions: given a known function with a good cryptographic profile, the aim is to extend it to a (usually larger) function possessing analogous properties. In this work, we continue the investigation of a secondary construction based on cellular automata, focusing on the classes of bent and semi-bent functions. We prove that our construction preserves the algebraic degree of the local rule, and we narrow our attention to the subclass of quadratic functions, performing several experiments based on exhaustive combinatorial search and heuristic optimization through Evolutionary Strategies (ES). Finally, we classify the obtained results up to permutation equivalence, remarking that the number of equivalence classes that our CA-XOR construction can successfully extend grows very quickly with respect to the CA diameter.
Given a set of changing entities, which ones are the most uptrending over some time T? Which entities are standing out as the biggest movers? To answer this question we define the concept of momentum. Two parameters - absolute gain and relative gain over time T play the key role in defining momentum. Neither alone is sufficient since they are each biased towards a subset of entities. Absolute gain favors large entities, while relative gain favors small ones. To accommodate both absolute and relative gain in an unbiased way, we define Pareto ordering between entities. For entity E to dominate another entity F in Pareto ordering, E's absolute and relative gains over time T must be higher than F's absolute and relative gains respectively. Momentum leaders are defined as maximal elements of this partial order - the Pareto frontier. We show how to compute momentum leaders and propose linear ordering among them to help rank entities with the most momentum on the top. Additionally, we show that when vectors follow power-law, the cardinality of the set of Momentum leaders (Pareto frontier) is of the order of square root of the logarithm of the number of entities, thus it is very small.
In two-phase image segmentation, convex relaxation has allowed global minimisers to be computed for a variety of data fitting terms. Many efficient approaches exist to compute a solution quickly. However, we consider whether the nature of the data fitting in this formulation allows for reasonable assumptions to be made about the solution that can improve the computational performance further. In particular, we employ a well known dual formulation of this problem and solve the corresponding equations in a restricted domain. We present experimental results that explore the dependence of the solution on this restriction and quantify imrovements in the computational performance. This approach can be extended to analogous methods simply and could provide an efficient alternative for problems of this type.