We propose and study a new multilevel method for the numerical approximation of a Gibbs distribution $\pi$ on R d , based on (over-damped) Langevin diffusions. This method both inspired by [PP18] and [GMS + 20] relies on a multilevel occupation measure, i.e. on an appropriate combination of R occupation measures of (constant-step) discretized schemes of the Langevin diffusion with respective steps $\gamma$r = $\gamma$02 --r , r = 0,. .. , R. For a given diffusion, we first state a result under general assumptions which guarantees an $\epsilon$-approximation (in a L 2-sense) with a cost proportional to $\epsilon$ --2 (i.e. proportional to a Monte-Carlo method without bias) or $\epsilon$ --2 | log $\epsilon$| 3 under less contractive assumptions. This general result is then applied to over-damped Langevin diffusions in a strongly convex setting, with a study of the dependence in the dimension d and in the spectrum of the Hessian matrix D 2 U of the potential U : R d $\rightarrow$ R involved in the Gibbs distribution. This leads to strategies with cost in O(d$\epsilon$ --2 log 3 (d$\epsilon$ --2)) and in O(d$\epsilon$ --2) under an additional condition on the third derivatives of U. In particular, in our last main result, we show that, up to universal constants, an appropriate choice of the diffusion coefficient and of the parameters of the procedure leads to a cost controlled by ($\lambda$ U $\lor$1) 2 $\lambda$ 3 U d$\epsilon$ --2 (where$\lambda$U and $\lambda$ U respectively denote the supremum and the infimum of the largest and lowest eigenvalue of D 2 U). In our numerical illustrations, we show that our theoretical bounds are confirmed in practice and finally propose an opening to some theoretical or numerical strategies in order to increase the robustness of the procedure when the largest and smallest eigenvalues of D 2 U are respectively too large or too small.
A $(1+\epsilon)$-approximate distance oracle of an edge-weighted graph is a data structure that returns an approximate shortest path distance between any two query vertices up to a $(1+\epsilon)$ factor. Thorup (FOCS 2001, JACM 2004) and Klein (SODA 2002) independently constructed a $(1+\epsilon)$-approximate distance oracle with $O(n\log n)$ space, measured in number of words, and $O(1)$ query time when $G$ is an undirected planar graph with $n$ vertices and $\epsilon$ is a fixed constant. Many follow-up works gave $(1+\epsilon)$-approximate distance oracles with various trade-offs between space and query time. However, improving $O(n\log n)$ space bound without sacrificing query time remains an open problem for almost two decades. In this work, we resolve this problem affirmatively by constructing a $(1+\epsilon)$-approximate distance oracle with optimal $O(n)$ space and $O(1)$ query time for undirected planar graphs and fixed $\epsilon$. We also make substantial progress for planar digraphs with non-negative edge weights. For fixed $\epsilon > 0$, we give a $(1+\epsilon)$-approximate distance oracle with space $o(n\log(Nn))$ and $O(\log\log(Nn)$ query time; here $N$ is the ratio between the largest and smallest positive edge weight. This improves Thorup's (FOCS 2001, JACM 2004) $O(n\log(Nn)\log n)$ space bound by more than a logarithmic factor while matching the query time of his structure. This is the first improvement for planar digraphs in two decades, both in the weighted and unweighted setting.
Branchwidth determines how graphs, and more generally, arbitrary connectivity (basically symmetric and submodular) functions could be decomposed into a tree-like structure by specific cuts. We develop a general framework for designing fixed-parameter tractable (FPT) 2-approximation algorithms for branchwidth of connectivity functions. The first ingredient of our framework is combinatorial. We prove a structural theorem establishing that either a sequence of particular refinement operations could decrease the width of a branch decomposition or that the width of the decomposition is already within a factor of 2 from the optimum. The second ingredient is an efficient implementation of the refinement operations for branch decompositions that support efficient dynamic programming. We present two concrete applications of our general framework. $\bullet$ An algorithm that for a given $n$-vertex graph $G$ and integer $k$ in time $2^{2^{O(k)}} n^2$ either constructs a rank decomposition of $G$ of width at most $2k$ or concludes that the rankwidth of $G$ is more than $k$. It also yields a $(2^{2k+1}-1)$-approximation algorithm for cliquewidth within the same time complexity, which in turn, improves to $f(k)n^2$ the running times of various algorithms on graphs of cliquewidth $k$. Breaking the "cubic barrier" for rankwidth and cliquewidth was an open problem in the area. $\bullet$ An algorithm that for a given $n$-vertex graph $G$ and integer $k$ in time $2^{O(k)} n$ either constructs a branch decomposition of $G$ of width at most $2k$ or concludes that the branchwidth of $G$ is more than $k$. This improves over the 3-approximation that follows from the recent treewidth 2-approximation of Korhonen [FOCS 2021].
In this paper, we develop deterministic fully dynamic algorithms for computing approximate distances in a graph with worst-case update time guarantees. In particular we obtain improved dynamic algorithms that, given an unweighted and undirected graph $G=(V,E)$ undergoing edge insertions and deletions, and a parameter $0 < \epsilon \leq 1$, maintain $(1+\epsilon)$-approximations of the $st$ distance of a single pair of nodes, the distances from a single source to all nodes ("SSSP"), the distances from multiple sources to all nodes ("MSSP''), or the distances between all nodes ("APSP"). Our main result is a deterministic algorithm for maintaining $(1+\epsilon)$-approximate single-source distances with worst-case update time $O(n^{1.529})$ (for the current best known bound on the matrix multiplication coefficient $\omega$). This matches a conditional lower bound by [BNS, FOCS 2019]. We further show that we can go beyond this SSSP bound for the problem of maintaining approximate $st$ distances by providing a deterministic algorithm with worst-case update time $O(n^{1.447})$. This even improves upon the fastest known randomized algorithm for this problem. At the core, our approach is to combine algebraic distance maintenance data structures with near-additive emulator constructions. This also leads to novel dynamic algorithms for maintaining $(1+\epsilon, \beta)$-emulators that improve upon the state of the art, which might be of independent interest. Our techniques also lead to improvements for randomized approximate diameter maintenance.
We study dynamic algorithms for the problem of maximizing a monotone submodular function over a stream of $n$ insertions and deletions. We show that any algorithm that maintains a $(0.5+\epsilon)$-approximate solution under a cardinality constraint, for any constant $\epsilon>0$, must have an amortized query complexity that is $\mathit{polynomial}$ in $n$. Moreover, a linear amortized query complexity is needed in order to maintain a $0.584$-approximate solution. This is in sharp contrast with recent dynamic algorithms of [LMNF+20, Mon20] that achieve $(0.5-\epsilon)$-approximation with a $\mathsf{poly}\log(n)$ amortized query complexity. On the positive side, when the stream is insertion-only, we present efficient algorithms for the problem under a cardinality constraint and under a matroid constraint with approximation guarantee $1-1/e-\epsilon$ and amortized query complexities $\smash{O(\log (k/\epsilon)/\epsilon^2)}$ and $\smash{k^{\tilde{O}(1/\epsilon^2)}\log n}$, respectively, where $k$ denotes the cardinality parameter or the rank of the matroid.
The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning. Despite the robustness it adds to the algorithm, overly pessimistic reasoning can be equally damaging in precluding the discovery of good policies, which is an issue for the popular bonus-based pessimism. In this paper, we introduce the notion of Bellman-consistent pessimism for general function approximation: instead of calculating a point-wise lower bound for the value function, we implement pessimism at the initial state over the set of functions consistent with the Bellman equations. Our theoretical guarantees only require Bellman closedness as standard in the exploratory setting, in which case bonus-based pessimism fails to provide guarantees. Even in the special case of linear function approximation where stronger expressivity assumptions hold, our result improves upon a recent bonus-based approach by $\mathcal{O}(d)$ in its sample complexity when the action space is finite. Remarkably, our algorithms automatically adapt to the best bias-variance tradeoff in the hindsight, whereas most prior approaches require tuning extra hyperparameters a priori.
We study the least square estimator, in the framework of simple linear regression, when the deviance term $\varepsilon$ with respect to the linear model is modeled by a uniform distribution. In particular, we give the law of this estimator, and prove some convergence properties.
When processing data with uncertainty, it is desirable that the output of the algorithm is stable against small perturbations in the input. Varma and Yoshida [SODA'21] recently formalized this idea and proposed the notion of average sensitivity of algorithms, which is roughly speaking, the average Hamming distance between solutions for the original input and that obtained by deleting one element from the input, where the average is taken over the deleted element. In this work, we consider average sensitivity of algorithms for problems that can be solved by dynamic programming. We first present a $(1-\delta)$-approximation algorithm for finding a maximum weight chain (MWC) in a transitive directed acyclic graph with average sensitivity $O(\delta^{-1}\log^3 n)$, where $n$ is the number of vertices in the graph. We then show algorithms with small average sensitivity for various dynamic programming problems by reducing them to the MWC problem while preserving average sensitivity, including the longest increasing subsequence problem, the interval scheduling problem, the longest common subsequence problem, the longest palindromic subsequence problem, the knapsack problem with integral weight, and the RNA folding problem. For the RNA folding problem, our reduction is highly nontrivial because a naive reduction generates an exponentially large graph, which only provides a trivial average sensitivity bound.
We study the $c$-approximate near neighbor problem under the continuous Fr\'echet distance: Given a set of $n$ polygonal curves with $m$ vertices, a radius $\delta > 0$, and a parameter $k \leq m$, we want to preprocess the curves into a data structure that, given a query curve $q$ with $k$ vertices, either returns an input curve with Fr\'echet distance at most $c\cdot \delta$ to $q$, or returns that there exists no input curve with Fr\'echet distance at most $\delta$ to $q$. We focus on the case where the input and the queries are one-dimensional polygonal curves -- also called time series -- and we give a comprehensive analysis for this case. We obtain new upper bounds that provide different tradeoffs between approximation factor, preprocessing time, and query time. Our data structures improve upon the state of the art in several ways. We show that for any $0 < \varepsilon \leq 1$ an approximation factor of $(1+\varepsilon)$ can be achieved within the same asymptotic time bounds as the previously best result for $(2+\varepsilon)$. Moreover, we show that an approximation factor of $(2+\varepsilon)$ can be obtained by using preprocessing time and space $O(nm)$, which is linear in the input size, and query time in $O(\frac{1}{\varepsilon})^{k+2}$, where the previously best result used preprocessing time in $n \cdot O(\frac{m}{\varepsilon k})^k$ and query time in $O(1)^k$. We complement our upper bounds with matching conditional lower bounds based on the Orthogonal Vectors Hypothesis. Interestingly, some of our lower bounds already hold for any super-constant value of $k$. This is achieved by proving hardness of a one-sided sparse version of the Orthogonal Vectors problem as an intermediate problem, which we believe to be of independent interest.
Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.