Let $G$ be an $n$-vertex graph, and $s,t$ vertices of $G$. We present an efficient algorithm which enumerates the set of minimal $st$-separators of $G$ in ascending order of cardinality, with a delay of $O(n^{3.5})$ per separator. In particular, we present an algorithm that lists, in ascending order of cardinality, all minimal separators with at most $k$ vertices. In that case, we show that the delay of the enumeration algorithm is $O(kn^{2.5})$ per separator. Our process is based on a new method that can decide, in polynomial time, whether the set of minimal separators under certain inclusion, exclusion, and cardinality constraints is empty.
Schelling's model considers $k$ types of agents each of whom needs to select a vertex on an undirected graph, where every agent prefers to neighbor agents of the same type. We are motivated by a recent line of work that studies solutions that are optimal with respect to notions related to the welfare of the agents. We explore the parameterized complexity of computing such solutions. We focus on the well-studied notions of social welfare (WO) and Pareto optimality (PO), alongside the recently proposed notions of group-welfare optimality (GWO) and utility-vector optimality (UVO), both of which lie between WO and PO. Firstly, we focus on the fundamental case where $k=2$ and there are $r$ red agents and $b$ blue agents. We show that all solution-notions we consider are $\textsf{NP}$-hard to compute even when $b=1$ and that they are $\textsf{W}[1]$-hard when parameterized by $r$ and $b$. In addition, we show that WO and GWO are $\textsf{NP}$-hard even on cubic graphs. We complement these negative results by an $\textsf{FPT}$ algorithm parameterized by $r, b$ and the maximum degree of the graph. For the general case with $k$ types of agents, we prove that for any of the notions we consider the problem is $\textsf{W}[1]$-hard when parameterized by $k$ for a large family of graphs that includes trees. We accompany these negative results with an $\textsf{XP}$ algorithm parameterized by $k$ and the treewidth of the graph.
Linear minimum mean square error (LMMSE) estimation is often ill-conditioned, suggesting that unconstrained minimization of the mean square error is an inadequate principle for filter design. To address this, we first develop a unifying framework for studying constrained LMMSE estimation problems. Using this framework, we expose an important structural property of constrained LMMSE filters: They generally involve an inherent preconditioning step. This parameterizes all such filters only by their preconditioners. Moreover, each filters is invariant to invertible linear transformations of its preconditioner. We then clarify that merely constraining the rank of the filter does not suitably address the problem of ill-conditioning. Instead, we adopt a constraint that explicitly requires solutions to be well-conditioned in a certain specific sense. We introduce two well-conditioned filters and show that they converge to the unconstrained LMMSE filter as their truncated-power loss goes to zero, at the same rate as the low-rank Wiener filter. We also show extensions to the case of weighted trace and determinant of the error covariance as objective functions. Finally, we show quantitative results with historical VIX data to demonstrate that our two well-conditioned filters have stable performance while the standard LMMSE filter deteriorates with increasing condition number.
Given an $n$-point metric space $(\mathcal{X},d)$ where each point belongs to one of $m=O(1)$ different categories or groups and a set of integers $k_1, \ldots, k_m$, the fair Max-Min diversification problem is to select $k_i$ points belonging to category $i\in [m]$, such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1. We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor $2$-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a $6$-approximation that is guaranteed to satisfy the fairness constraints up to a factor $1-\epsilon$ for any constant $\epsilon$. We also present a linear time algorithm returning an $m+1$ approximation with exact fairness. The best previous result was a $3m-1$ approximation. 2. We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. For constant dimensions, categories and any constant $\epsilon>0$, we present a $1+\epsilon$ approximation algorithm that runs in $O(nk) + 2^{O(k)}$ time where $k=k_1+\ldots+k_m$. We can improve the running time to $O(nk)+ poly(k)$ at the expense of only picking $(1-\epsilon) k_i$ points from category $i\in [m]$. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.
The non-parametric estimation of covariance lies at the heart of functional data analysis, whether for curve or surface-valued data. The case of a two-dimensional domain poses both statistical and computational challenges, which are typically alleviated by assuming separability. However, separability is often questionable, sometimes even demonstrably inadequate. We propose a framework for the analysis of covariance operators of random surfaces that generalises separability, while retaining its major advantages. Our approach is based on the expansion of the covariance into a series of separable terms. The expansion is valid for any covariance over a two-dimensional domain. Leveraging the key notion of the partial inner product, we extend the power iteration method to general Hilbert spaces and show how the aforementioned expansion can be efficiently constructed in practice. Truncation of the expansion and retention of the leading terms automatically induces a non-parametric estimator of the covariance, whose parsimony is dictated by the truncation level. The resulting estimator can be calculated, stored and manipulated with little computational overhead relative to separability. Consistency and rates of convergence are derived under mild regularity assumptions, illustrating the trade-off between bias and variance regulated by the truncation level. The merits and practical performance of the proposed methodology are demonstrated in a comprehensive simulation study and on classification of EEG signals.
We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of $Q$-function estimation is well-posed in the sense of $L^2$-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor $\gamma$ imposed in the recent literature for obtaining the $L^2$ convergence rates of various $Q$-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of $Q$-function and its derivatives in both sup-norm and $L^2$-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for $Q$-function but also efficient estimation on the value of any target policy in off-policy settings.
The commonly quoted error rates for QMC integration with an infinite low discrepancy sequence is $O(n^{-1}\log(n)^r)$ with $r=d$ for extensible sequences and $r=d-1$ otherwise. Such rates hold uniformly over all $d$ dimensional integrands of Hardy-Krause variation one when using $n$ evaluation points. Implicit in those bounds is that for any sequence of QMC points, the integrand can be chosen to depend on $n$. In this paper we show that rates with any $r<(d-1)/2$ can hold when $f$ is held fixed as $n\to\infty$. This is accomplished following a suggestion of Erich Novak to use some unpublished results of Trojan from the 1980s as given in the information based complexity monograph of Traub, Wasilkowski and Wo\'zniakowski. The proof is made by applying a technique of Roth with the theorem of Trojan. The proof is non constructive and we do not know of any integrand of bounded variation in the sense of Hardy and Krause for which the QMC error exceeds $(\log n)^{1+\epsilon}/n$ for infinitely many $n$ when using a digital sequence such as one of Sobol's. An empirical search when $d=2$ for integrands designed to exploit known weaknesses in certain point sets showed no evidence that $r>1$ is needed. An example with $d=3$ and $n$ up to $2^{100}$ might possibly require $r>1$.
Join query evaluation with ordering is a fundamental data processing task in relational database management systems. SQL and custom graph query languages such as Cypher offer this functionality by allowing users to specify the order via the ORDER BY clause. In many scenarios, the users also want to see the first $k$ results quickly (expressed by the LIMIT clause), but the value of $k$ is not predetermined as user queries are arriving in an online fashion. Recent work has made considerable progress in identifying optimal algorithms for ranked enumeration of join queries that do not contain any projections. In this paper, we initiate the study of the problem of enumerating results in ranked order for queries with projections. Our main result shows that for any acyclic query, it is possible to obtain a near-linear (in the size of the database) delay algorithm after only a linear time preprocessing step for two important ranking functions: sum and lexicographic ordering. For a practical subset of acyclic queries known as star queries, we show an even stronger result that allows a user to obtain a smooth tradeoff between faster answering time guarantees using more preprocessing time. Our results are also extensible to queries containing cycles and unions. We also perform a comprehensive experimental evaluation to demonstrate that our algorithms, which are simple to implement, improve up to three orders of magnitude in the running time over state-of-the-art algorithms implemented within open-source RDBMS and specialized graph databases.
This paper deals with robust inference for parametric copula models. Estimation using Canonical Maximum Likelihood might be unstable, especially in the presence of outliers. We propose to use a procedure based on the Maximum Mean Discrepancy (MMD) principle. We derive non-asymptotic oracle inequalities, consistency and asymptotic normality of this new estimator. In particular, the oracle inequality holds without any assumption on the copula family, and can be applied in the presence of outliers or under misspecification. Moreover, in our MMD framework, the statistical inference of copula models for which there exists no density with respect to the Lebesgue measure on $[0,1]^d$, as the Marshall-Olkin copula, becomes feasible. A simulation study shows the robustness of our new procedures, especially compared to pseudo-maximum likelihood estimation. An R package implementing the MMD estimator for copula models is available.
The Gromov-Hausdorff distance $(d_{GH})$ proves to be a useful distance measure between shapes. In order to approximate $d_{GH}$ for compact subsets $X,Y\subset\mathbb{R}^d$, we look into its relationship with $d_{H,iso}$, the infimum Hausdorff distance under Euclidean isometries. As already known for dimension $d\geq 2$, the $d_{H,iso}$ cannot be bounded above by a constant factor times $d_{GH}$. For $d=1$, however, we prove that $d_{H,iso}\leq\frac{5}{4}d_{GH}$. We also show that the bound is tight. In effect, this gives rise to an $O(n\log{n})$-time algorithm to approximate $d_{GH}$ with an approximation factor of $\left(1+\frac{1}{4}\right)$.
We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.