Given a complex high-dimensional distribution over $\{\pm 1\}^n$, what is the best way to increase the expected number of $+1$'s by controlling the values of only a small number of variables? Such a problem is known as influence maximization and has been widely studied in social networks, biology, and computer science. In this paper, we consider influence maximization on the Ising model which is a prototypical example of undirected graphical models and has wide applications in many real-world problems. We establish a sharp computational phase transition for influence maximization on sparse Ising models under a bounded budget: In the high-temperature regime, we give a linear-time algorithm for finding a small subset of variables and their values which achieve nearly optimal influence; In the low-temperature regime, we show that the influence maximization problem becomes $\mathsf{NP}$-hard under commonly-believed complexity assumption. The critical temperature coincides with the tree uniqueness/non-uniqueness threshold for Ising models which is also a critical point for other computational problems including approximate sampling and counting.
Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the rows and columns or the full collection of all valid orderings and valid matrices. We show that there is a constant $c > 0$ such that the algorithm terminates in $O(n^2)$ time with high probability and in expectation for random $n \times n$ binary matrices with i.i.d.\ Bernoulli $(p)$ entries $(m_{ij})_{ij=1}^n$ such that $\frac{c\log^2(n)}{n(\log\log(n))^2} \leq p \leq \frac{1}{2}$.
For some $\epsilon > 10^{-36}$ we give a randomized $3/2-\epsilon$ approximation algorithm for metric TSP.
Given a conjunctive query $Q$ and a database $\mathbf{D}$, a direct access to the answers of $Q$ over $\mathbf{D}$ is the operation of returning, given an index $j$, the $j^{\mathsf{th}}$ answer for some order on its answers. While this problem is $\#\mathsf{P}$-hard in general with respect to combined complexity, many conjunctive queries have an underlying structure that allows for a direct access to their answers for some lexicographical ordering that takes polylogarithmic time in the size of the database after a polynomial time precomputation. Previous work has precisely characterised the tractable classes and given fine-grained lower bounds on the precomputation time needed depending on the structure of the query. In this paper, we generalise these tractability results to the case of signed conjunctive queries, that is, conjunctive queries that may contain negative atoms. Our technique is based on a class of circuits that can represent relational data. We first show that this class supports tractable direct access after a polynomial time preprocessing. We then give bounds on the size of the circuit needed to represent the answer set of signed conjunctive queries depending on their structure. Both results combined together allow us to prove the tractability of direct access for a large class of conjunctive queries. On the one hand, we recover the known tractable classes from the literature in the case of positive conjunctive queries. On the other hand, we generalise and unify known tractability results about negative conjunctive queries -- that is, queries having only negated atoms. In particular, we show that the class of $\beta$-acyclic negative conjunctive queries and the class of bounded nest set width negative conjunctive queries admit tractable direct access.
We revisit the moving least squares (MLS) approximation scheme on the sphere $\mathbb S^{d-1} \subset \mathbb R^d$, where $d>1$. It is well known that using the spherical harmonics up to degree $L \in \mathbb N$ as ansatz space yields for functions in $\mathcal C^{L+1}(\mathbb S^{d-1})$ the approximation order $\mathcal O \left( h^{L+1} \right)$, where $h$ denotes the fill distance of the sampling nodes. In this paper we show that the dimension of the ansatz space can be almost halved, by including only spherical harmonics of even or odd degree up to $L$, while preserving the same order of approximation. Numerical experiments indicate that using the reduced ansatz space is essential to ensure the numerical stability of the MLS approximation scheme as $h \to 0$. Finally, we compare our approach with an MLS approximation scheme that uses polynomials on the tangent space as ansatz space.
In the impartial selection problem, a subset of agents up to a fixed size $k$ among a group of $n$ is to be chosen based on votes cast by the agents themselves. A selection mechanism is impartial if no agent can influence its own chance of being selected by changing its vote. It is $\alpha$-optimal if, for every instance, the ratio between the votes received by the selected subset is at least a fraction of $\alpha$ of the votes received by the subset of size $k$ with the highest number of votes. We study deterministic impartial mechanisms in a more general setting with arbitrarily weighted votes and provide the first approximation guarantee, roughly $1/\lceil 2n/k\rceil$. When the number of agents to select is large enough compared to the total number of agents, this yields an improvement on the previously best known approximation ratio of $1/k$ for the unweighted setting. We further show that our mechanism can be adapted to the impartial assignment problem, in which multiple sets of up to $k$ agents are to be selected, with a loss in the approximation ratio of $1/2$.
Traditionally in the turnstile model of data streams, there is a state vector $x=(x_1,x_2,\ldots,x_n)$ which is updated through a stream of pairs $(i,k)$ where $i\in [n]$ and $k\in \Z$. Upon receiving $(i,k)$, $x_i\gets x_i + k$. A distinct count algorithm in the turnstile model takes one pass of the stream and then estimates $\norm{x}_0 = |\{i\in[n]\mid x_i\neq 0\}|$ (aka $L_0$, the Hamming norm). In this paper, we define a finite-field version of the turnstile model. Let $F$ be any finite field. Then in the $F$-turnstile model, for each $i\in [n]$, $x_i\in F$; for each update $(i,k)$, $k\in F$. The update $x_i\gets x_i+k$ is then computed in the field $F$. A distinct count algorithm in the $F$-turnstile model takes one pass of the stream and estimates $\norm{x}_{0;F} = |\{i\in[n]\mid x_i\neq 0_F\}|$. We present a simple distinct count algorithm, called $F$-\pcsa{}, in the $F$-turnstile model for any finite field $F$. The new $F$-\pcsa{} algorithm takes $m\log(n)\log (|F|)$ bits of memory and estimates $\norm{x}_{0;F}$ with $O(\frac{1}{\sqrt{m}})$ relative error where the hidden constant depends on the order of the field. $F$-\pcsa{} is straightforward to implement and has several applications in the real world with different choices of $F$. Most notably, it makes distinct count with deletions as simple as distinct count without deletions.
We study several polygonal curve problems under the Fr\'{e}chet distance via algebraic geometric methods. Let $\mathbb{X}_m^d$ and $\mathbb{X}_k^d$ be the spaces of all polygonal curves of $m$ and $k$ vertices in $\mathbb{R}^d$, respectively. We assume that $k \leq m$. Let $\mathcal{R}^d_{k,m}$ be the set of ranges in $\mathbb{X}_m^d$ for all possible metric balls of polygonal curves in $\mathbb{X}_k^d$ under the Fr\'{e}chet distance. We prove a nearly optimal bound of $O(dk\log (km))$ on the VC dimension of the range space $(\mathbb{X}_m^d,\mathcal{R}_{k,m}^d)$, improving on the previous $O(d^2k^2\log(dkm))$ upper bound and approaching the current $\Omega(dk\log k)$ lower bound. Our upper bound also holds for the weak Fr\'{e}chet distance. We also obtain exact solutions that are hitherto unknown for curve simplification, range searching, nearest neighbor search, and distance oracle.
Constructing a similarity graph from a set $X$ of data points in $\mathbb{R}^d$ is the first step of many modern clustering algorithms. However, typical constructions of a similarity graph have high time complexity, and a quadratic space dependency with respect to $|X|$. We address this limitation and present a new algorithmic framework that constructs a sparse approximation of the fully connected similarity graph while preserving its cluster structure. Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions. We compare our designed algorithm with the well-known implementations from the scikit-learn library and the FAISS library, and find that our method significantly outperforms the implementation from both libraries on a variety of datasets.
In this paper, we consider the counting function $E_P(y) = |P_{y} \cap Z^{n_x}|$ for a parametric polyhedron $P_{y} = \{x \in R^{n_x} \colon A x \leq b + B y\}$, where $y \in R^{n_y}$. We give a new representation of $E_P(y)$, called a \emph{piece-wise step-polynomial with periodic coefficients}, which is a generalization of piece-wise step-polynomials and integer/rational Ehrhart's quasi-polynomials. In terms of the computational complexity, our result gives the fastest way to calculate $E_P(y)$ in certain scenarios. The most remarkable cases are the following: 1) Consider a parametric polyhedron $P_y$ defined by a standard-form system $A x = y,\, x \geq 0$ with a fixed number of equalities. We show that there exists an $poly\bigl(n, \|A\|_{\infty}\bigr)$ preprocessing-algorithm that returns a polynomial-time computable representation of $E_P(y)$. That is, $E_(y)$ can be computed by a polynomial-time algorithm for any given $y \in Q^k$; 2) Again, assuming that the co-dimension is fixed, we show that integer/rational Ehrhart's quasi-polynomials of a polytope can be computed by FPT-algorithms, parameterized by sub-determinants of $A$ or its elements; 3) Our representation of $E_P(y)$ is more efficient than other known approaches, if the matrix $A$ has bounded elements, especially if the matrix $A$ is sparse in addition; Additionally, we provide a discussion about possible applications in the area of compiler optimization. In some "natural" assumptions on a program code, our approach has the fastest complexity bounds.
A universal partial cycle (or upcycle) for $\mathcal{A}^n$ is a cyclic sequence that covers each word of length $n$ over the alphabet $\mathcal{A}$ exactly once -- like a De Bruijn cycle, except that we also allow a wildcard symbol $\mathord{\diamond}$ that can represent any letter of $\mathcal{A}$. Chen et al. in 2017 and Goeckner et al. in 2018 showed that the existence and structure of upcycles are highly constrained, unlike those of De Bruijn cycles, which exist for any alphabet size and word length. Moreover, it was not known whether any upcycles existed for $n \ge 5$. We present several examples of upcycles over both binary and non-binary alphabets for $n = 8$. We generalize two graph-theoretic representations of De Bruijn cycles to upcycles. We then introduce novel approaches to constructing new upcycles from old ones. Notably, given any upcycle for an alphabet of size $a$, we show how to construct an upcycle for an alphabet of size $ak$ for any $k \in \mathbb{N}$, so each example generates an infinite family of upcycles. We also define folds and lifts of upcycles, which relate upcycles with differing densities of $\mathord{\diamond}$ characters. In particular, we show that every upcycle lifts to a De Bruijn cycle. Our constructions rely on a different generalization of De Bruijn cycles known as perfect necklaces, and we introduce several new examples of perfect necklaces. We extend the definitions of certain pseudorandomness properties to partial words and determine which are satisfied by all upcycles, then draw a conclusion about linear feedback shift registers. Finally, we prove new nonexistence results based on the word length $n$, alphabet size, and $\mathord{\diamond}$ density.