We study monotonicity testing of functions $f \colon \{0,1\}^d \to \{0,1\}$ using sample-based algorithms, which are only allowed to observe the value of $f$ on points drawn independently from the uniform distribution. A classic result by Bshouty-Tamon (J. ACM 1996) proved that monotone functions can be learned with $\exp(O(\min\{\frac{1}{\varepsilon}\sqrt{d},d\}))$ samples and it is not hard to show that this bound extends to testing. Prior to our work the only lower bound for this problem was $\Omega(\sqrt{\exp(d)/\varepsilon})$ in the small $\varepsilon$ parameter regime, when $\varepsilon = O(d^{-3/2})$, due to Goldreich-Goldwasser-Lehman-Ron-Samorodnitsky (Combinatorica 2000). Thus, the sample complexity of monotonicity testing was wide open for $\varepsilon \gg d^{-3/2}$. We resolve this question, obtaining a tight lower bound of $\exp(\Omega(\min\{\frac{1}{\varepsilon}\sqrt{d},d\}))$ for all $\varepsilon$ at most a sufficiently small constant. In fact, we prove a much more general result, showing that the sample complexity of $k$-monotonicity testing and learning for functions $f \colon \{0,1\}^d \to [r]$ is $\exp(\Theta(\min\{\frac{rk}{\varepsilon}\sqrt{d},d\}))$. For testing with one-sided error we show that the sample complexity is $\exp(\Theta(d))$. Beyond the hypercube, we prove nearly tight bounds (up to polylog factors of $d,k,r,1/\varepsilon$ in the exponent) of $\exp(\widetilde{\Theta}(\min\{\frac{rk}{\varepsilon}\sqrt{d},d\}))$ on the sample complexity of testing and learning measurable $k$-monotone functions $f \colon \mathbb{R}^d \to [r]$ under product distributions. Our upper bound improves upon the previous bound of $\exp(\widetilde{O}(\min\{\frac{k}{\varepsilon^2}\sqrt{d},d\}))$ by Harms-Yoshida (ICALP 2022) for Boolean functions ($r=2$).
Given a natural number $k\ge 2$, we consider the $k$-submodular cover problem ($k$-SC). The objective is to find a minimum cost subset of a ground set $\mathcal{X}$ subject to the value of a $k$-submodular utility function being at least a certain predetermined value $\tau$. For this problem, we design a bicriteria algorithm with a cost at most $O(1/\epsilon)$ times the optimal value, while the utility is at least $(1-\epsilon)\tau/r$, where $r$ depends on the monotonicity of $g$.
Suppose that $K\subset\C$ is compact and that $z_0\in\C\backslash K$ is an external point. An optimal prediction measure for regression by polynomials of degree at most $n,$ is one for which the variance of the prediction at $z_0$ is as small as possible. Hoel and Levine (\cite{HL}) have considered the case of $K=[-1,1]$ and $z_0=x_0\in \R\backslash [-1,1],$ where they show that the support of the optimal measure is the $n+1$ extremme points of the Chebyshev polynomial $T_n(x)$ and characterizing the optimal weights in terms of absolute values of fundamental interpolating Lagrange polynomials. More recently, \cite{BLO} has given the equivalence of the optimal prediction problem with that of finding polynomials of extremal growth. They also study in detail the case of $K=[-1,1]$ and $z_0=ia\in i\R,$ purely imaginary. In this work we generalize the Hoel-Levine formula to the general case when the support of the optimal measure is a finite set and give a formula for the optimal weights in terms of a $\ell_1$ minimization problem.
Angluin's L$^*$ algorithm learns the minimal deterministic finite automaton (DFA) of a regular language using membership and equivalence queries. Its probabilistic approximatively correct (PAC) version substitutes an equivalence query by numerous random membership queries to get a high level confidence to the answer. Thus it can be applied to any kind of device and may be viewed as an algorithm for synthesizing an automaton abstracting the behavior of the device based on observations. Here we are interested on how Angluin's PAC learning algorithm behaves for devices which are obtained from a DFA by introducing some noise. More precisely we study whether Angluin's algorithm reduces the noise and produces a DFA closer to the original one than the noisy device. We propose several ways to introduce the noise: (1) the noisy device inverts the classification of words w.r.t. the DFA with a small probability, (2) the noisy device modifies with a small probability the letters of the word before asking its classification w.r.t. the DFA, (3) the noisy device combines the classification of a word w.r.t. the DFA and its classification w.r.t. a counter automaton, and (4) the noisy DFA is obtained by a random process from two DFA such that the language of the first one is included in the second one. Then when a word is accepted (resp. rejected) by the first (resp. second) one, it is also accepted (resp. rejected) and in the remaining cases, it is accepted with probability 0.5. Our main experimental contributions consist in showing that: (1) Angluin's algorithm behaves well whenever the noisy device is produced by a random process, (2) but poorly with a structured noise, and, that (3) is able to eliminate pathological behaviours specified in a regular way. Theoretically, we show that randomness almost surely yields systems with non-recursively enumerable languages.
Let $\mathcal{A}$ be an arrangement of straight lines in the plane (or planes in $\mathbb{R}^3$). The $k$-crossing visibility of a point $p$ on $\mathcal{A}$ is the set of point $q$ in elements of $\mathcal{A}$ such that the segment $pq$ intersects at most $k$ elements of $\mathcal{A}$. In this paper, we obtain algorithms for computing the $k$-crossing visibility. In particular we obtain $O(n\log n + kn)$ and $O(n\log n + k^2n)$ time algorithms, for arrangements of lines in the plane and planes in $\mathbb{R}^3$; which are optimal for $k=\Omega(\log n)$ and $k=\Omega(\sqrt{\log n})$, respectively. We also introduce another algorithm for computing $k$-crossing visibilities on polygons, which reaches the same asymptotical time as the one presented by Bahoo et al. The ideas introduced in this paper can be easily adapted for obtaining $k$-crossing visibilities on other arrangements whose $(\leq k)$-level is known.
Given an arbitrary set of high dimensional points in $\ell_1$, there are known negative results that preclude the possibility of mapping them to a low dimensional $\ell_1$ space while preserving distances with small multiplicative distortion. This is in stark contrast with dimension reduction in Euclidean space ($\ell_2$) where such mappings are always possible. While the first non-trivial lower bounds for $\ell_1$ dimension reduction were established almost 20 years ago, there has been minimal progress in understanding what sets of points in $\ell_1$ are conducive to a low-dimensional mapping. In this work, we shift the focus from the worst-case setting and initiate the study of a characterization of $\ell_1$ metrics that are conducive to dimension reduction in $\ell_1$. Our characterization focuses on metrics that are defined by the disagreement of binary variables over a probability distribution -- any $\ell_1$ metric can be represented in this form. We show that, for configurations of $n$ points in $\ell_1$ obtained from tree Ising models, we can reduce dimension to $\mathrm{polylog}(n)$ with constant distortion. In doing so, we develop technical tools for embedding capped metrics (also known as truncated metrics) which have been studied because of their applications in computer vision, and are objects of independent interest in metric geometry.
We study a subclass of $n$-player stochastic games, namely, stochastic games with independent chains and unknown transition matrices. In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players. However, players' decisions are coupled through their payoff functions. We assume players can receive only realizations of their payoffs, and that the players can not observe the states and actions of other players, nor do they know the transition probability matrices of their own Markov chain. Relying on a compact dual formulation of the game based on occupancy measures and the technique of confidence set to maintain high-probability estimates of the unknown transition matrices, we propose a fully decentralized mirror descent algorithm to learn an $\epsilon$-NE for this class of games. The proposed algorithm has the desired properties of independence, scalability, and convergence. Specifically, under no assumptions on the reward functions, we show the proposed algorithm converges in polynomial time in a weaker distance (namely, the averaged Nikaido-Isoda gap) to the set of $\epsilon$-NE policies with arbitrarily high probability. Moreover, assuming the existence of a variationally stable Nash equilibrium policy, we show that the proposed algorithm converges asymptotically to the stable $\epsilon$-NE policy with arbitrarily high probability. In addition to Markov potential games and linear-quadratic stochastic games, this work provides another subclass of $n$-player stochastic games that, under some mild assumptions, admit polynomial-time learning algorithms for finding their stationary $\epsilon$-NE policies.
We study whether a discrete quantum walk can get arbitrarily close to a state whose entries have the same absolute value over all the arcs, given that the walk starts with a uniform superposition of the outgoing arcs of some vertex. We characterize this phenomenon on non-bipartite graphs using the adjacency spectrum of the graph; in particular, if this happens in some association scheme and the state we get arbitrarily close to ``respects the neighborhood", then it happens regardless of the initial vertex, and the adjacency algebra of the graph contains a real (regular) Hadamard matrix. We then find infinite families of primitive strongly regular graphs that admit this phenomenon. We also derive some results on a strengthening of this phenomenon called simultaneous $\epsilon$-uniform mixing, which enables local $\epsilon$-uniform mixing at every vertex.
The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of $n$ points in $\mathbb{R}^d$ and any fixed $\epsilon>0$, it reduces the dimension $d$ to $O(\log n)$ while preserving, with high probability, all the pairwise Euclidean distances within factor $1+\epsilon$. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem, e.g., max-cut or $k$-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below $O(\log n)$ does not preserve the optimal value within factor $1+\epsilon$. We propose to focus on another regime, of \emph{moderate dimension reduction}, where a problem's value is preserved within factor $\alpha=O(1)$ (or even larger) using target dimension $\log n / \mathrm{poly}(\alpha)$. We establish the viability of this approach and show that the famous $k$-center problem is $\alpha$-approximated when reducing to dimension $O(\tfrac{\log n}{\alpha^2}+\log k)$. Along the way, we address the diameter problem via the special case $k=1$. Our result extends to several important variants of $k$-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input's doubling dimension. While our $poly(\alpha)$-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for $k$-center in dynamic geometric streams, that achieves $O(\alpha)$-approximation using space $\mathrm{poly}(kdn^{1/\alpha^2})$. This is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$. Furthermore, it extends to the $k$-center variants mentioned above.
We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled examples. Though distance-based methods are top-performing for unsupervised anomaly detection, they suffer heavily from the sensitivity to the choice of the number of the nearest neighbors. In this paper, we propose a new distance-based algorithm called bagged regularized $k$-distances for anomaly detection (BRDAD) converting the unsupervised anomaly detection problem into a convex optimization problem. Our BRDAD algorithm selects the weights by minimizing the surrogate risk, i.e., the finite sample bound of the empirical risk of the bagged weighted $k$-distances for density estimation (BWDDE). This approach enables us to successfully address the sensitivity challenge of the hyperparameter choice in distance-based algorithms. Moreover, when dealing with large-scale datasets, the efficiency issues can be addressed by the incorporated bagging technique in our BRDAD algorithm. On the theoretical side, we establish fast convergence rates of the AUC regret of our algorithm and demonstrate that the bagging technique significantly reduces the computational complexity. On the practical side, we conduct numerical experiments on anomaly detection benchmarks to illustrate the insensitivity of parameter selection of our algorithm compared with other state-of-the-art distance-based methods. Moreover, promising improvements are brought by applying the bagging technique in our algorithm on real-world datasets.
In this paper we prove that the $\ell_0$ isoperimetric coefficient for any axis-aligned cubes, $\psi_{\mathcal{C}}$, is $\Theta(n^{-1/2})$ and that the isoperimetric coefficient for any measurable body $K$, $\psi_K$, is of order $O(n^{-1/2})$. As a corollary we deduce that axis-aligned cubes essentially "maximize" the $\ell_0$ isoperimetric coefficient: There exists a positive constant $q > 0$ such that $\psi_K \leq q \cdot \psi_{\mathcal{C}}$, whenever $\mathcal{C}$ is an axis-aligned cube and $K$ is any measurable set. Lastly, we give immediate applications of our results to the mixing time of Coordinate-Hit-and-Run for sampling points uniformly from convex bodies.