Releasing the result size of conjunctive queries and graph pattern queries under differential privacy (DP) has received considerable attention in the literature, but existing solutions do not offer any optimality guarantees. We provide the first DP mechanism for this problem with a fairly strong notion of optimality, which can be considered as a natural relaxation of instance-optimality to a constant.
The problem of scheduling unrelated machines has been studied since the inception of algorithmic mechanism design \cite{NR99}. It is a resource allocation problem that entails assigning $m$ tasks to $n$ machines for execution. Machines are regarded as strategic agents who may lie about their execution costs so as to minimize their allocated workload. To address the situation when monetary payment is not an option to compensate the machines' costs, \citeauthor{DBLP:journals/mst/Koutsoupias14} [2014] devised two \textit{truthful} mechanisms, K and P respectively, that achieve an approximation ratio of $\frac{n+1}{2}$ and $n$, for social cost minimization. In addition, no truthful mechanism can achieve an approximation ratio better than $\frac{n+1}{2}$. Hence, mechanism K is optimal. While approximation ratio provides a strong worst-case guarantee, it also limits us to a comprehensive understanding of mechanism performance on various inputs. This paper investigates these two scheduling mechanisms beyond the worst case. We first show that mechanism K achieves a smaller social cost than mechanism P on every input. That is, mechanism K is pointwise better than mechanism P. Next, for each task $j$, when machines' execution costs $t_i^j$ are independent and identically drawn from a task-specific distribution $F^j(t)$, we show that the average-case approximation ratio of mechanism K converges to a constant. This bound is tight for mechanism K. For a better understanding of this distribution dependent constant, on the one hand, we estimate its value by plugging in a few common distributions; on the other, we show that this converging bound improves a known bound \cite{DBLP:conf/aaai/Zhang18} which only captures the single-task setting. Last, we find that the average-case approximation ratio of mechanism P converges to the same constant.
The phase retrieval problem is concerned with recovering an unknown signal $\bf{x} \in \mathbb{R}^n$ from a set of magnitude-only measurements $y_j=|\langle \bf{a}_j,\bf{x} \rangle|, \; j=1,\ldots,m$. A natural least squares formulation can be used to solve this problem efficiently even with random initialization, despite its non-convexity of the loss function. One way to explain this surprising phenomenon is the benign geometric landscape: (1) all local minimizers are global; and (2) the objective function has a negative curvature around each saddle point and local maximizer. In this paper, we show that $m=O(n \log n)$ Gaussian random measurements are sufficient to guarantee the loss function of a commonly used estimator has such benign geometric landscape with high probability. This is a step toward answering the open problem given by Sun-Qu-Wright, in which the authors suggest that $O(n \log n)$ or even $O(n)$ is enough to guarantee the favorable geometric property.
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the finite-horizon setting, where the transition kernel of the underlying Markov games can be parameterized by a linear function over the current state, both players' actions and the next state. In particular, we assume that we can control both players and aim to find the Nash Equilibrium by minimizing the duality gap. We propose an algorithm Nash-UCRL based on the principle "Optimism-in-Face-of-Uncertainty". Our algorithm only needs to find a Coarse Correlated Equilibrium (CCE), which is computationally efficient. Specifically, we show that Nash-UCRL can provably achieve an $\tilde{O}(dH\sqrt{T})$ regret, where $d$ is the linear function dimension, $H$ is the length of the game and $T$ is the total number of steps in the game. To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret. Our upper bound matches the lower bound up to logarithmic factors, which suggests the optimality of our algorithm.
We consider the question of adaptive data analysis within the framework of convex optimization. We ask how many samples are needed in order to compute $\epsilon$-accurate estimates of $O(1/\epsilon^2)$ gradients queried by gradient descent, and we provide two intermediate answers to this question. First, we show that for a general analyst (not necessarily gradient descent) $\Omega(1/\epsilon^3)$ samples are required. This rules out the possibility of a foolproof mechanism. Our construction builds upon a new lower bound (that may be of interest of its own right) for an analyst that may ask several non adaptive questions in a batch of fixed and known $T$ rounds of adaptivity and requires a fraction of true discoveries. We show that for such an analyst $\Omega (\sqrt{T}/\epsilon^2)$ samples are necessary. Second, we show that, under certain assumptions on the oracle, in an interaction with gradient descent $\tilde \Omega(1/\epsilon^{2.5})$ samples are necessary. Our assumptions are that the oracle has only \emph{first order access} and is \emph{post-hoc generalizing}. First order access means that it can only compute the gradients of the sampled function at points queried by the algorithm. Our assumption of \emph{post-hoc generalization} follows from existing lower bounds for statistical queries. More generally then, we provide a generic reduction from the standard setting of statistical queries to the problem of estimating gradients queried by gradient descent. These results are in contrast with classical bounds that show that with $O(1/\epsilon^2)$ samples one can optimize the population risk to accuracy of $O(\epsilon)$ but, as it turns out, with spurious gradients.
In this paper we generalize Dillon's switching method to characterize the exact $c$-differential uniformity of functions constructed via this method. More precisely, we modify some PcN/APcN and other functions with known $c$-differential uniformity in a controllable number of coordinates to render more such functions. We present several applications of the method in constructing PcN and APcN functions with respect to all $c\neq 1$. As a byproduct, we generalize some result of [Y. Wu, N. Li, X. Zeng, {\em New PcN and APcN functions over finite fields}, Designs Codes Crypt. 89 (2021), 2637--2651]. Computational results rendering functions with low differential uniformity, as well as, other good cryptographic properties are sprinkled throughout the paper.
In a sports competition, a team might lose a powerful incentive to exert full effort if its final rank does not depend on the outcome of the matches still to be played. Therefore, the organiser should reduce the probability of such a situation to the extent possible. Our paper provides a classification scheme to identify these weakly (where one team is indifferent) or strongly (where both teams are indifferent) stakeless games. A statistical model is estimated to simulate the UEFA Champions League groups and compare the candidate schedules used in the 2021/22 season according to the competitiveness of the matches played in the last round(s). The option followed in four of the eight groups is found to be optimal under a wide set of parameters. Minimising the number of strongly stakeless matches is verified to be a likely goal in the computer draw of the fixture that remains hidden from the public.
Computing a maximum independent set (MaxIS) is a fundamental NP-hard problem in graph theory, which has important applications in a wide spectrum of fields. Since graphs in many applications are changing frequently over time, the problem of maintaining a MaxIS over dynamic graphs has attracted increasing attention over the past few years. Due to the intractability of maintaining an exact MaxIS, this paper aims to develop efficient algorithms that can maintain an approximate MaxIS with an accuracy guarantee theoretically. In particular, we propose a framework that maintains a $(\frac{\Delta}{2} + 1)$-approximate MaxIS over dynamic graphs and prove that it achieves a constant approximation ratio in many real-world networks. To the best of our knowledge, this is the first non-trivial approximability result for the dynamic MaxIS problem. Following the framework, we implement an efficient linear-time dynamic algorithm and a more effective dynamic algorithm with near-linear expected time complexity. Our thorough experiments over real and synthetic graphs demonstrate the effectiveness and efficiency of the proposed algorithms, especially when the graph is highly dynamic.
The similarity between a pair of time series, i.e., sequences of indexed values in time order, is often estimated by the dynamic time warping (DTW) distance, instead of any in the well-studied family of measures including the longest common subsequence (LCS) length and the edit distance. Although it may seem as if the DTW and the LCS(-like) measures are essentially different, we reveal that the DTW distance can be represented by the longest increasing subsequence (LIS) length of a sequence of integers, which is the LCS length between the integer sequence and itself sorted. For a given pair of time series of length $n$ such that the dissimilarity between any elements is an integer between zero and $c$, we propose an integer sequence that represents any substring-substring DTW distance as its band-substring LIS length. The length of the produced integer sequence is $O(c n^2)$, which can be translated to $O(n^2)$ for constant dissimilarity functions. To demonstrate that techniques developed under the LCS(-like) measures are directly applicable to analysis of time series via our reduction of DTW to LIS, we present time-efficient algorithms for DTW-related problems utilizing the semi-local sequence comparison technique developed for LCS-related problems.
We present an approach to quantify and compare the privacy-accuracy trade-off for differentially private Variational Autoencoders. Our work complements previous work in two aspects. First, we evaluate the the strong reconstruction MI attack against Variational Autoencoders under differential privacy. Second, we address the data scientist's challenge of setting privacy parameter epsilon, which steers the differential privacy strength and thus also the privacy-accuracy trade-off. In our experimental study we consider image and time series data, and three local and central differential privacy mechanisms. We find that the privacy-accuracy trade-offs strongly depend on the dataset and model architecture. We do rarely observe favorable privacy-accuracy trade-off for Variational Autoencoders, and identify a case where LDP outperforms CDP.
A palindromic substring $T[i.. j]$ of a string $T$ is said to be a shortest unique palindromic substring (SUPS) in $T$ for an interval $[p, q]$ if $T[i.. j]$ is a shortest one such that $T[i.. j]$ occurs only once in $T$, and $[i, j]$ contains $[p, q]$. The SUPS problem is, given a string $T$ of length $n$, to construct a data structure that can compute all the SUPSs for any given query interval. It is known that any SUPS query can be answered in $O(\alpha)$ time after $O(n)$-time preprocessing, where $\alpha$ is the number of SUPSs to output [Inoue et al., 2018]. In this paper, we first show that $\alpha$ is at most $4$, and the upper bound is tight. Also, we present an algorithm to solve the SUPS problem for a sliding window that can answer any query in $O(\log\log W)$ time and update data structures in amortized $O(\log\sigma)$ time, where $W$ is the size of the window, and $\sigma$ is the alphabet size. Furthermore, we consider the SUPS problem in the after-edit model and present an efficient algorithm. Namely, we present an algorithm that uses $O(n)$ time for preprocessing and answers any $k$ SUPS queries in $O(\log n\log\log n + k\log\log n)$ time after single character substitution. As a by-product, we propose a fully-dynamic data structure for range minimum queries (RmQs) with a constraint where the width of each query range is limited to polylogarithmic. The constrained RmQ data structure can answer such a query in constant time and support a single-element edit operation in amortized constant time.