Motivated by the $k$-center problem in location analysis, we consider the \emph{polygon burning} (PB) problem: Given a polygonal domain $P$ with $h$ holes and $n$ vertices, find a set $S$ of $k$ vertices of $P$ that minimizes the maximum geodesic distance from any point in $P$ to its nearest vertex in $S$. Alternatively, viewing each vertex in $S$ as a site to start a fire, the goal is to select $S$ such that fires burning simultaneously and uniformly from $S$, restricted to $P$, consume $P$ entirely as quickly as possible. We prove that PB is NP-hard when $k$ is arbitrary. We show that the discrete $k$-center of the vertices of $P$ under the geodesic metric on $P$ provides a $2$-approximation for PB, resulting in an $O(n^2 \log n + hkn \log n)$-time $3$-approximation algorithm for PB. Lastly, we define and characterize a new type of polygon, the sliceable polygon. A sliceable polygon is a convex polygon that contains no Voronoi vertex from the Voronoi diagram of its vertices. We give a dynamic programming algorithm to solve PB exactly on a sliceable polygon in $O(kn^2)$ time.
Join query evaluation with ordering is a fundamental data processing task in relational database management systems. SQL and custom graph query languages such as Cypher offer this functionality by allowing users to specify the order via the ORDER BY clause. In many scenarios, the users also want to see the first $k$ results quickly (expressed by the LIMIT clause), but the value of $k$ is not predetermined as user queries are arriving in an online fashion. Recent work has made considerable progress in identifying optimal algorithms for ranked enumeration of join queries that do not contain any projections. In this paper, we initiate the study of the problem of enumerating results in ranked order for queries with projections. Our main result shows that for any acyclic query, it is possible to obtain a near-linear (in the size of the database) delay algorithm after only a linear time preprocessing step for two important ranking functions: sum and lexicographic ordering. For a practical subset of acyclic queries known as star queries, we show an even stronger result that allows a user to obtain a smooth tradeoff between faster answering time guarantees using more preprocessing time. Our results are also extensible to queries containing cycles and unions. We also perform a comprehensive experimental evaluation to demonstrate that our algorithms, which are simple to implement, improve up to three orders of magnitude in the running time over state-of-the-art algorithms implemented within open-source RDBMS and specialized graph databases.
We consider a participatory budgeting problem in which each voter submits a proposal for how to divide a single divisible resource (such as money or time) among several possible alternatives (such as public projects or activities) and these proposals must be aggregated into a single aggregate division. Under $\ell_1$ preferences -- for which a voter's disutility is given by the $\ell_1$ distance between the aggregate division and the division he or she most prefers -- the social welfare-maximizing mechanism, which minimizes the average $\ell_1$ distance between the outcome and each voter's proposal, is incentive compatible (Goel et al. 2016). However, it fails to satisfy the natural fairness notion of proportionality, placing too much weight on majority preferences. Leveraging a connection between market prices and the generalized median rules of Moulin (1980), we introduce the independent markets mechanism, which is both incentive compatible and proportional. We unify the social welfare-maximizing mechanism and the independent markets mechanism by defining a broad class of moving phantom mechanisms that includes both. We show that every moving phantom mechanism is incentive compatible. Finally, we characterize the social welfare-maximizing mechanism as the unique Pareto-optimal mechanism in this class, suggesting an inherent tradeoff between Pareto optimality and proportionality.
Continuous determinantal point processes (DPPs) are a class of repulsive point processes on $\mathbb{R}^d$ with many statistical applications. Although an explicit expression of their density is known, it is too complicated to be used directly for maximum likelihood estimation. In the stationary case, an approximation using Fourier series has been suggested, but it is limited to rectangular observation windows and no theoretical results support it. In this contribution, we investigate a different way to approximate the likelihood by looking at its asymptotic behaviour when the observation window grows towards $\mathbb{R}^d$. This new approximation is not limited to rectangular windows, is faster to compute than the previous one, does not require any tuning parameter, and some theoretical justifications are provided. It moreover provides an explicit formula for estimating the asymptotic variance of the associated estimator. The performances are assessed in a simulation study on standard parametric models on $\mathbb{R}^d$ and compare favourably to common alternative estimation methods for continuous DPPs.
The randomized singular value decomposition (SVD) is a popular and effective algorithm for computing a near-best rank $k$ approximation of a matrix $A$ using matrix-vector products with standard Gaussian vectors. Here, we generalize the randomized SVD to multivariate Gaussian vectors, allowing one to incorporate prior knowledge of $A$ into the algorithm. This enables us to explore the continuous analogue of the randomized SVD for Hilbert--Schmidt (HS) operators using operator-function products with functions drawn from a Gaussian process (GP). We then construct a new covariance kernel for GPs, based on weighted Jacobi polynomials, which allows us to rapidly sample the GP and control the smoothness of the randomly generated functions. Numerical examples on matrices and HS operators demonstrate the applicability of the algorithm.
The decoding performance of product codes (PCs) and staircase codes (SCCs) based on iterative bounded-distance decoding (iBDD) can be improved with the aid of a moderate amount of soft information, maintaining a low decoding complexity. One promising approach is error-and-erasure (EaE) decoding, whose performance can be reliably estimated with density evolution (DE). However, the extrinsic message passing (EMP) decoder required by the DE analysis entails a much higher complexity than the simple intrinsic message passing (IMP) decoder. In this paper, we simplify the EMP decoding algorithm for the EaE channel for two commonly-used EaE decoders by deriving the EMP decoding results from the IMP decoder output and some additional logical operations based on the algebraic structure of the component codes and the EaE decoding rule. Simulation results show that the number of BDD steps is reduced to being comparable with IMP. Furthermore, we propose a heuristic modification of the EMP decoder that reduces the complexity further. In numerical simulations, the decoding performance of the modified decoder yields up to 0.25 dB improvement compared to standard EMP decoding.
We study the propagation of singularities in solutions of linear convection equations with spatially heterogeneous nonlocal interactions. A spatially varying nonlocal horizon parameter is adopted in the model, which measures the range of nonlocal interactions. Via heterogeneous localization, this can lead to the seamless coupling of the local and nonlocal models. We are interested in understanding the impact on singularity propagation due to the heterogeneities of nonlocal horizon and the local and nonlocal transition. We first analytically derive equations to characterize the propagation of different types of singularities for various forms of nonlocal horizon parameters in the nonlocal regime. We then use asymptotically compatible schemes to discretize the equations and carry out numerical simulations to illustrate the propagation patterns in different scenarios.
An informative measurement is the most efficient way to gain information about an unknown state. We give a first-principles derivation of a general-purpose dynamic programming algorithm that returns an optimal sequence of informative measurements by sequentially maximizing the entropy of possible measurement outcomes. This algorithm can be used by an autonomous agent or robot to decide where best to measure next, planning a path corresponding to an optimal sequence of informative measurements. The algorithm is applicable to states and controls that are continuous or discrete, and agent dynamics that is either stochastic or deterministic; including Markov decision processes and Gaussian processes. Recent results from approximate dynamic programming and reinforcement learning, including on-line approximations such as rollout and Monte Carlo tree search, allow the measurement task to be solved in real-time. The resulting solutions include non-myopic paths and measurement sequences that can generally outperform, sometimes substantially, commonly used greedy approaches. This is demonstrated for a global search problem, where on-line planning with an extended local search is found to reduce the number of measurements in the search by approximately half. A variant of the algorithm is derived for Gaussian processes for active sensing.
We study the problem of {\sl certification}: given queries to a function $f : \{0,1\}^n \to \{0,1\}$ with certificate complexity $\le k$ and an input $x^\star$, output a size-$k$ certificate for $f$'s value on $x^\star$. This abstractly models a central problem in explainable machine learning, where we think of $f$ as a blackbox model that we seek to explain the predictions of. For monotone functions, a classic local search algorithm of Angluin accomplishes this task with $n$ queries, which we show is optimal for local search algorithms. Our main result is a new algorithm for certifying monotone functions with $O(k^8 \log n)$ queries, which comes close to matching the information-theoretic lower bound of $\Omega(k \log n)$. The design and analysis of our algorithm are based on a new connection to threshold phenomena in monotone functions. We further prove exponential-in-$k$ lower bounds when $f$ is non-monotone, and when $f$ is monotone but the algorithm is only given random examples of $f$. These lower bounds show that assumptions on the structure of $f$ and query access to it are both necessary for the polynomial dependence on $k$ that we achieve.
In the theory of linear switching systems with discrete time, as in other areas of mathematics, the problem of studying the growth rate of the norms of all possible matrix products $A_{\sigma_{n}}\cdots A_{\sigma_{0}}$ with factors from a set of matrices $\mathscr{A}$ arises. So far, only for a relatively small number of classes of matrices $\mathscr{A}$ has it been possible to accurately describe the sequences of matrices that guarantee the maximum rate of increase of the corresponding norms. Moreover, in almost all cases studied theoretically, the index sequences $\{\sigma_{n}\}$ of matrices maximizing the norms of the corresponding matrix products have been shown to be periodic or so-called Sturmian, which entails a whole set of "good" properties of the sequences $\{A_{\sigma_{n}}\}$, in particular the existence of a limiting frequency of occurrence of each matrix factor $A_{i}\in\mathscr{A}$ in them. In the paper it is shown that this is not always the case: a class of matrices is defined consisting of two $2\times 2$ matrices, similar to rotations in the plane, in which the sequence $\{A_{\sigma_{n}}\}$ maximizing the growth rate of the norms $\|A_{\sigma_{n}}\cdots A_{\sigma_{0}}\|$ is not Sturmian. All considerations are based on numerical modeling and cannot be considered mathematically rigorous in this part; rather, they should be interpreted as a set of questions for further comprehensive theoretical analysis.
We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (Ptr-Net). We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems -- finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem -- using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems.