99视频在线播放喷射-高清国产三级在线播放

A set of high dimensional points $X=\{x_1, x_2,\ldots, x_n\} \subset R^d$ in isotropic position is said to be $\delta$-anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $|\langle x_i,v \rangle |\leq \delta$ is at most $O(\delta)$. Motivated by applications to list-decodable learning and clustering, recent works have considered the problem of constructing efficient certificates of anti-concentration in the average case, when the set of points $X$ corresponds to samples from a Gaussian distribution. Their certificates played a crucial role in several subsequent works in algorithmic robust statistics on list-decodable learning and settling the robust learnability of arbitrary Gaussian mixtures, yet remain limited to rotationally invariant distributions. This work presents a new (and arguably the most natural) formulation for anti-concentration. Using this formulation, we give quasi-polynomial time verifiable sum-of-squares certificates of anti-concentration that hold for a wide class of non-Gaussian distributions including anti-concentrated bounded product distributions and uniform distributions over $L_p$ balls (and their affine transformations). Consequently, our method upgrades and extends results in algorithmic robust statistics e.g., list-decodable learning and clustering, to such distributions. Our approach constructs a canonical integer program for anti-concentration and analysis a sum-of-squares relaxation of it, independent of the intended application. We rely on duality and analyze a pseudo-expectation on large subsets of the input points that take a small value in some direction. Our analysis uses the method of polynomial reweightings to reduce the problem to analyzing only analytically dense or sparse directions.

相關內容

穩健性

關注 3

噪聲 · 平滑 · CASE · Analysis · 離散化 ·

2024 年 7 月 4 日

Smoothed Analysis of the Komlós Conjecture: Rademacher Noise

Elad Aigner-Horev,Dan Hefetz,Michael Trushkin

The {\em discrepancy} of a matrix $M \in \mathbb{R}^{d \times n}$ is given by $\mathrm{DISC}(M) := \min_{\boldsymbol{x} \in \{-1,1\}^n} \|M\boldsymbol{x}\|_\infty$. An outstanding conjecture, attributed to Koml\'os, stipulates that $\mathrm{DISC}(M) = O(1)$, whenever $M$ is a Koml\'os matrix, that is, whenever every column of $M$ lies within the unit sphere. Our main result asserts that $\mathrm{DISC}(M + R/\sqrt{d}) = O(d^{-1/2})$ holds asymptotically almost surely, whenever $M \in \mathbb{R}^{d \times n}$ is Koml\'os, $R \in \mathbb{R}^{d \times n}$ is a Rademacher random matrix, $d = \omega(1)$, and $n = \omega(d \log d)$. The factor $d^{-1/2}$ normalising $R$ is essentially best possible and the dependency between $n$ and $d$ is asymptotically best possible. Our main source of inspiration is a result by Bansal, Jiang, Meka, Singla, and Sinha (ICALP 2022). They obtained an assertion similar to the one above in the case that the smoothing matrix is Gaussian. They asked whether their result can be attained with the optimal dependency $n = \omega(d \log d)$ in the case of Bernoulli random noise or any other types of discretely distributed noise; the latter types being more conducive for Smoothed Analysis in other discrepancy settings such as the Beck-Fiala problem. For Bernoulli noise, their method works if $n = \omega(d^2)$. In the case of Rademacher noise, we answer the question posed by Bansal, Jiang, Meka, Singla, and Sinha. Our proof builds upon their approach in a strong way and provides a discrete version of the latter. Breaking the $n = \omega(d^2)$ barrier and reaching the optimal dependency $n = \omega(d \log d)$ for Rademacher noise requires additional ideas expressed through a rather meticulous counting argument, incurred by the need to maintain a high level of precision all throughout the discretisation process.

Weight · 歐幾里得距離 · 圖 · MoDELS · 決策樹 ·

2024 年 7 月 3 日

An Improved Algorithm for Shortest Paths in Weighted Unit-Disk Graphs

Bruce W. Brewer,Haitao Wang

from arxiv, To appear in CCCG 2024

Let $V$ be a set of $n$ points in the plane. The unit-disk graph $G = (V, E)$ has vertex set $V$ and an edge $e_{uv} \in E$ between vertices $u, v \in V$ if the Euclidean distance between $u$ and $v$ is at most 1. The weight of each edge $e_{uv}$ is the Euclidean distance between $u$ and $v$. Given $V$ and a source point $s\in V$, we consider the problem of computing shortest paths in $G$ from $s$ to all other vertices. The previously best algorithm for this problem runs in $O(n \log^2 n)$ time [Wang and Xue, SoCG'19]. The problem has an $\Omega(n\log n)$ lower bound under the algebraic decision tree model. In this paper, we present an improved algorithm of $O(n \log^2 n / \log \log n)$ time (under the standard real RAM model). Furthermore, we show that the problem can be solved using $O(n\log n)$ comparisons under the algebraic decision tree model, matching the $\Omega(n\log n)$ lower bound.

Toeplitz矩陣 · Analysis · 秩 · 核范數 · 相互獨立的 ·

2024 年 7 月 3 日

Low-Rank Toeplitz Matrix Restoration: Descent Cone Analysis and Structured Random Matrix

Gao Huang,Song Li

from arxiv, 14pages

This note demonstrates that we can stably recover rank $r$ Toeplitz matrix $\pmb{X}\in\mathbb{R}^{n\times n}$ from a number of rank one subgaussian measurements on the order of $r\log^{2} n$ with an exponentially decreasing failure probability by employing a nuclear norm minimization program. Our approach utilizes descent cone analysis through Mendelson's small ball method with the Toeplitz constraint. The key ingredient is to determine the spectral norm of the random matrix of the Topelitz structure, which may be of independent interest.This improves upon earlier analyses and resolves the conjecture in Chen et al. (IEEE Transactions on Information Theory, 2015).

核化 · 核密度估計 · 估計/估計量 · 核矩陣 · 向量化 ·

2024 年 7 月 2 日

Finer-Grained Hardness of Kernel Density Estimation

Josh Alman,Yunfeng Guan

from arxiv, 30 pages, to appear in the 39th Computational Complexity Conference (CCC 2024)

In batch Kernel Density Estimation (KDE) for a kernel function $f$, we are given as input $2n$ points $x^{(1)}, \cdots, x^{(n)}, y^{(1)}, \cdots, y^{(n)}$ in dimension $m$, as well as a vector $v \in \mathbb{R}^n$. These inputs implicitly define the $n \times n$ kernel matrix $K$ given by $K[i,j] = f(x^{(i)}, y^{(j)})$. The goal is to compute a vector $v$ which approximates $K w$ with $|| Kw - v||_\infty < \varepsilon ||w||_1$. A recent line of work has proved fine-grained lower bounds conditioned on SETH. Backurs et al. first showed the hardness of KDE for Gaussian-like kernels with high dimension $m = \Omega(\log n)$ and large scale $B = \Omega(\log n)$. Alman et al. later developed new reductions in roughly this same parameter regime, leading to lower bounds for more general kernels, but only for very small error $\varepsilon < 2^{- \log^{\Omega(1)} (n)}$. In this paper, we refine the approach of Alman et al. to show new lower bounds in all parameter regimes, closing gaps between the known algorithms and lower bounds. In the setting where $m = C\log n$ and $B = o(\log n)$, we prove Gaussian KDE requires $n^{2-o(1)}$ time to achieve additive error $\varepsilon < \Omega(m/B)^{-m}$, matching the performance of the polynomial method up to low-order terms. In the low dimensional setting $m = o(\log n)$, we show that Gaussian KDE requires $n^{2-o(1)}$ time to achieve $\varepsilon$ such that $\log \log (\varepsilon^{-1}) > \tilde \Omega ((\log n)/m)$, matching the error bound achievable by FMM up to low-order terms. To our knowledge, no nontrivial lower bound was previously known in this regime. Our new lower bounds make use of an intricate analysis of a special case of the kernel matrix -- the `counting matrix'. As a key technical lemma, we give a novel approach to bounding the entries of its inverse by using Schur polynomials from algebraic combinatorics.

AI · MoDELS · Performer · Taxonomy · 可辨認的 ·

2024 年 7 月 2 日

Privacy Risks of General-Purpose AI Systems: A Foundation for Investigating Practitioner Perspectives

Stephen Meisenbacher,Alexandra Klymenko,Patrick Gage Kelley,Sai Teja Peddinti,Kurt Thomas,Florian Matthes

from arxiv, 5 pages. Accepted to SUPA@SOUPS'24

The rise of powerful AI models, more formally $\textit{General-Purpose AI Systems}$ (GPAIS), has led to impressive leaps in performance across a wide range of tasks. At the same time, researchers and practitioners alike have raised a number of privacy concerns, resulting in a wealth of literature covering various privacy risks and vulnerabilities of AI models. Works surveying such risks provide differing focuses, leading to disparate sets of privacy risks with no clear unifying taxonomy. We conduct a systematic review of these survey papers to provide a concise and usable overview of privacy risks in GPAIS, as well as proposed mitigation strategies. The developed privacy framework strives to unify the identified privacy risks and mitigations at a technical level that is accessible to non-experts. This serves as the basis for a practitioner-focused interview study to assess technical stakeholder perceptions of privacy risks and mitigations in GPAIS.

近似 · Oracle · Weight · 邊 · 線性的 ·

2024 年 7 月 2 日

A Nearly Linear Time Construction of Approximate Single-Source Distance Sensitivity Oracles

Kaito Harada,Naoki Kitamura,Taisuke Izumi,Toshimitsu Masuzawa

An \emph{$\alpha$-approximate vertex fault-tolerant distance sensitivity oracle} (\emph{$\alpha$-VSDO}) for a weighted input graph $G=(V, E, w)$ and a source vertex $s \in V$ is the data structure answering an $\alpha$-approximate distance from $s$ to $t$ in $G-x$ for any given query $(x, t) \in V \times V$. It is a data structure version of the so-called single-source replacement path problem (SSRP). In this paper, we present a new \emph{nearly linear-time} algorithm of constructing a $(1 + \epsilon)$-VSDO for any directed input graph with polynomially bounded integer edge weights. More precisely, the presented oracle attains $\tilde{O}(m \log (nW)/ \epsilon + n \log^2 (nW)/\epsilon^2)$ construction time, $\tilde{O}(n \log (nW) / \epsilon)$ size, and $\tilde{O}(1/\epsilon)$ query time, where $n$ is the number of vertices, $m$ is the number of edges, and $W$ is the maximum edge weight. These bounds are all optimal up to polylogarithmic factors. To the best of our knowledge, this is the first non-trivial algorithm for SSRP/VSDO beating $\tilde{O}(mn)$ computation time for directed graphs with general edge weight functions, and also the first nearly linear-time construction breaking approximation factor 3. Such a construction has been unknown even for undirected and unweighted graphs. In addition, our result implies that the known conditional lower bounds for the exact SSRP computation does not apply to the case of approximation.

秩 · INFORMS · 信息理論 · 泛函 ·

2024 年 7 月 1 日

An XOR Lemma for Deterministic Communication Complexity

Siddharth Iyer,Anup Rao

We prove a lower bound on the communication complexity of computing the $n$-fold xor of an arbitrary function $f$, in terms of the communication complexity and rank of $f$. We prove that $D(f^{\oplus n}) \geq n \cdot \Big(\frac{\Omega(D(f))}{\log \mathsf{rk}(f)} -\log \mathsf{rk}(f)\Big )$, where here $D(f), D(f^{\oplus n})$ represent the deterministic communication complexity, and $\mathsf{rk}(f)$ is the rank of $f$. Our methods involve a new way to use information theory to reason about deterministic communication complexity.

INFORMS · 泛函 · 優化器 · 情景 · 求逆 ·

2024 年 7 月 1 日

The Inverted 3-Sum Box: General Formulation and Quantum Information Theoretic Optimality

Yuhang Yao,Syed A. Jafar

The $N$-sum box protocol specifies a class of $\mathbb{F}_d$ linear functions $f(W_1,\cdots,W_K)=V_1W_1+V_2W_2+\cdots+V_KW_K\in\mathbb{F}_d^{m\times 1}$ that can be computed at information theoretically optimal communication cost (minimum number of qudits $\Delta_1,\cdots,\Delta_K$ sent by the transmitters Alice$_1$, Alice$_2$,$\cdots$, Alice$_K$, respectively, to the receiver, Bob, per computation instance) over a noise-free quantum multiple access channel (QMAC), when the input data streams $W_k\in\mathbb{F}_d^{m_k\times 1}, k\in[K]$, originate at the distributed transmitters, who share quantum entanglement in advance but are not otherwise allowed to communicate with each other. In prior work this set of optimally computable functions is identified in terms of a strong self-orthogonality (SSO) condition on the transfer function of the $N$-sum box. In this work we consider an `inverted' scenario, where instead of a feasible $N$-sum box transfer function, we are given an arbitrary $\mathbb{F}_d$ linear function, i.e., arbitrary matrices $V_k\in\mathbb{F}_d^{m\times m_k}$ are specified, and the goal is to characterize the set of all feasible communication cost tuples $(\Delta_1,\cdots,\Delta_K)$, not just based on $N$-sum box protocols, but across all possible quantum coding schemes. As our main result, we fully solve this problem for $K=3$ transmitters ($K\geq 4$ settings remain open). Coding schemes based on the $N$-sum box protocol (along with elementary ideas such as treating qudits as classical dits, time-sharing and batch-processing) are shown to be information theoretically optimal in all cases. As an example, in the symmetric case where rk$(V_1)$=rk$(V_2)$=rk$(V_3) \triangleq r_1$, rk$([V_1, V_2])$=rk$([V_2, V_3])$=rk$([V_3, V_1])\triangleq r_2$, and rk$([V_1, V_2, V_3])\triangleq r_3$ (rk = rank), the minimum total-download cost is $\max \{1.5r_1 + 0.75(r_3 - r_2), r_3\}$.

情景 · 哈爾濱工業大學（HIT） · Weight · ONCE · CASE ·

2024 年 6 月 29 日

Unweighted Geometric Hitting Set for Line-Constrained Disks and Related Problems

Gang Liu,Haitao Wang

from arxiv, To appear in MFCS 2024

Given a set $P$ of $n$ points and a set $S$ of $m$ disks in the plane, the disk hitting set problem asks for a smallest subset of $P$ such that every disk of $S$ contains at least one point in the subset. The problem is NP-hard. In this paper, we consider a line-constrained version in which all disks have their centers on a line. We present an $O(m\log^2n+(n+m)\log(n+m))$ time algorithm for the problem. This improves the previously best result of $O(m^2\log m+(n+m)\log(n+m))$ time for the weighted case of the problem where every point of $P$ has a weight and the objective is to minimize the total weight of the hitting set. Our algorithm actually solves a more general line-separable problem with a single intersection property: The points of $P$ and the disk centers are separated by a line $\ell$ and the boundary of every two disks intersect at most once on the side of $\ell$ containing $P$.

Weight · 情景 · 覆蓋 · 相同 · 分離的 ·

2024 年 6 月 29 日

On Line-Separable Weighted Unit-Disk Coverage and Related Problems

Gang Liu,Haitao Wang

from arxiv, To appear in MFCS 2024

Given a set $P$ of $n$ points and a set $S$ of $n$ weighted disks in the plane, the disk coverage problem is to compute a subset of disks of smallest total weight such that the union of the disks in the subset covers all points of $P$. The problem is NP-hard. In this paper, we consider a line-separable unit-disk version of the problem where all disks have the same radius and their centers are separated from the points of $P$ by a line $\ell$. We present an $O(n^{3/2}\log^2 n)$ time algorithm for the problem. This improves the previously best work of $O(n^2\log n)$ time. Our result leads to an algorithm of $O(n^{{7}/{2}}\log^2 n)$ time for the halfplane coverage problem (i.e., using $n$ weighted halfplanes to cover $n$ points), an improvement over the previous $O(n^4\log n)$ time solution. If all halfplanes are lower ones, our algorithm runs in $O(n^{{3}/{2}}\log^2 n)$ time, while the previous best algorithm takes $O(n^2\log n)$ time. Using duality, the hitting set problems under the same settings can be solved with similar time complexities.