久草精品视频在线观看,日本成年黄色一区二区三区

from arxiv, 31 pages (26 of content), 4 figures (last one as 3 subfigures). Minor corrections since publication: notably our analysis of the Brazil et al. k-steiner tree algorithm in section 4.1 (results unchanged), and a correction to some corollary statements to correct the space usage. Also added short discussion of how the algorithm still works when the MST of the input point set is not unique

Given a set $P$ of $n$ points in $\mathbb{R}^2$ and an input line $\gamma$ in $\mathbb{R}^2$, we present an algorithm that runs in optimal $\Theta(n\log n)$ time and $\Theta(n)$ space to solve a restricted version of the $1$-Steiner tree problem. Our algorithm returns a minimum-weight tree interconnecting $P$ using at most one Steiner point $s \in \gamma$, where edges are weighted by the Euclidean distance between their endpoints. We then extend the result to $j$ input lines. Following this, we show how the algorithm of Brazil et al. ("Generalised k-Steiner Tree Problems in Normed Planes", arXiv:1111.1464) that solves the $k$-Steiner tree problem in $\mathbb{R}^2$ in $O(n^{2k})$ time can be adapted to our setting. For $k>1$, restricting the (at most) $k$ Steiner points to lie on an input line, the runtime becomes $O(n^{k})$. Next we show how the results of Brazil et al. ("Generalised k-Steiner Tree Problems in Normed Planes", arXiv:1111.1464) allow us to maintain the same time and space bounds while extending to some non-Euclidean norms and different tree cost functions. Lastly, we extend the result to $j$ input curves.

相關內容

情景

關注 1

容差 · MoDELS · 樣本 · 設計 · 算法與數據結構 ·

2023 年 8 月 8 日

Tolerant Testing of High-Dimensional Samplers with Subcube Conditioning

Gunjan Kumar,Kuldeep S. Meel,Yash Pote

We study the tolerant testing problem for high-dimensional samplers. Given as input two samplers $\mathcal{P}$ and $\mathcal{Q}$ over the $n$-dimensional space $\{0,1\}^n$, and two parameters $\varepsilon_2 > \varepsilon_1$, the goal of tolerant testing is to test whether the distributions generated by $\mathcal{P}$ and $\mathcal{Q}$ are $\varepsilon_1$-close or $\varepsilon_2$-far. Since exponential lower bounds (in $n$) are known for the problem in the standard sampling model, research has focused on models where one can draw \textit{conditional} samples. Among these models, \textit{subcube conditioning} ($\mathsf{SUBCOND}$), which allows conditioning on arbitrary subcubes of the domain, holds the promise of widespread adoption in practice owing to its ability to capture the natural behavior of samplers in constrained domains. To translate the promise into practice, we need to overcome two crucial roadblocks for tests based on $\mathsf{SUBCOND}$: the prohibitively large number of queries ($\tilde{\mathcal{O}}(n^5/\varepsilon_2^5)$) and limitation to non-tolerant testing (i.e., $\varepsilon_1 = 0$). The primary contribution of this work is to overcome the above challenges: we design a new tolerant testing methodology (i.e., $\varepsilon_1 \geq 0$) that allows us to significantly improve the upper bound to $\tilde{\mathcal{O}}(n^3/(\varepsilon_2-\varepsilon_1)^5)$.

Minimax · 優化器 · 再生核希爾伯特空間 · 泛函 · 核化 ·

2023 年 8 月 8 日

On the Optimality of Misspecified Spectral Algorithms

Haobo Zhang,Yicheng Li,Qian Lin

from arxiv, 48 pages, 2 figures

In the misspecified spectral algorithms problem, researchers usually assume the underground true function $f_{\rho}^{*} \in [\mathcal{H}]^{s}$, a less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS) $\mathcal{H}$ for some $s\in (0,1)$. The existing minimax optimal results require $\|f_{\rho}^{*}\|_{L^{\infty}}<\infty$ which implicitly requires $s > \alpha_{0}$ where $\alpha_{0}\in (0,1)$ is the embedding index, a constant depending on $\mathcal{H}$. Whether the spectral algorithms are optimal for all $s\in (0,1)$ is an outstanding problem lasting for years. In this paper, we show that spectral algorithms are minimax optimal for any $\alpha_{0}-\frac{1}{\beta} < s < 1$, where $\beta$ is the eigenvalue decay rate of $\mathcal{H}$. We also give several classes of RKHSs whose embedding index satisfies $ \alpha_0 = \frac{1}{\beta} $. Thus, the spectral algorithms are minimax optimal for all $s\in (0,1)$ on these RKHSs.

簇 · 圖 · 劃分 · SCC · state-of-the-art ·

2023 年 8 月 7 日

TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs

Laxman Dhulipala,Jason Lee,Jakub ??cki,Vahab Mirrokni

from arxiv, To appear at SIGMOD 2024

We introduce TeraHAC, a $(1+\epsilon)$-approximate hierarchical agglomerative clustering (HAC) algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to computing $(1+\epsilon)$-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of $(1+\epsilon)$-approximate HAC. Our approach allows us to partition the graph among multiple machines and make significant progress in computing the clustering within each partition before any communication with other partitions is needed. We evaluate TeraHAC on a number of real-world and synthetic graphs of up to 8 trillion edges. We show that TeraHAC requires over 100x fewer rounds compared to previously known approaches for computing HAC. It is up to 8.3x faster than SCC, the state-of-the-art distributed algorithm for hierarchical clustering, while achieving 1.16x higher quality. In fact, TeraHAC essentially retains the quality of the celebrated HAC algorithm while significantly improving the running time.

近似 · Weight · 劃分 · 無向圖 · 圖 ·

2023 年 8 月 7 日

An Improved Approximation Algorithm for the Max-$3$-Section Problem

Dor Katzelnick,Aditya Pillai,Roy Schwartz,Mohit Singh

We consider the Max-$3$-Section problem, where we are given an undirected graph $ G=(V,E)$ equipped with non-negative edge weights $w :E\rightarrow \mathbb{R}_+$ and the goal is to find a partition of $V$ into three equisized parts while maximizing the total weight of edges crossing between different parts. Max-$3$-Section is closely related to other well-studied graph partitioning problems, e.g., Max-$k$-Cut, Max-$3$-Cut, and Max-Bisection. We present a polynomial time algorithm achieving an approximation of $ 0.795$, that improves upon the previous best known approximation of $ 0.673$. The requirement of multiple parts that have equal sizes renders Max-$3$-Section much harder to cope with compared to, e.g., Max-Bisection. We show a new algorithm that combines the existing approach of Lassere hierarchy along with a random cut strategy that suffices to give our result.

圖 · 完全圖 · 標注 · 正則的 · 線性的 ·

2023 年 8 月 7 日

Tyshkevich's Graph Decomposition and the Distinguishing Numbers of Unigraphs

Christine T. Cheng

from arxiv, 22 pages plus an appendix with 8 pages

A $c$-labeling $\phi: V(G) \rightarrow \{1, 2, \hdots, c \}$ of graph $G$ is distinguishing if, for every non-trivial automorphism $\pi$ of $G$, there is some vertex $v$ so that $\phi(v) \neq \phi(\pi(v))$. The distinguishing number of $G$, $D(G)$, is the smallest $c$ such that $G$ has a distinguishing $c$-labeling. We consider a compact version of Tyshkevich's graph decomposition theorem where trivial components are maximally combined to form a complete graph or a graph of isolated vertices. Suppose the compact canonical decomposition of $G$ is $G_{k} \circ G_{k-1} \circ \cdots \circ G_1 \circ G_0$. We prove that $\phi$ is a distinguishing labeling of $G$ if and only if $\phi$ is a distinguishing labeling of $G_i$ when restricted to $V(G_i)$ for $i = 0, \hdots, k$. Thus, $D(G) = \max \{D(G_i), i = 0, \hdots, k \}$. We then present an algorithm that computes the distinguishing number of a unigraph in linear time.

樣本 · 近似 · 線性的 · 均勻采樣 · 線性回歸 ·

2023 年 8 月 6 日

Gradient Coding through Iterative Block Leverage Score Sampling

Neophytos Charalambides,Mert Pilanci,Alfred Hero

from arxiv, 26 pages, 6 figures, 1 table,

We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distributed computational networks, \textit{i.e.} stragglers. We replicate the data across the distributed network, to attain the approximation guarantees through the induced sampling distribution. The significance and main contribution of this work, is that it unifies randomized numerical linear algebra with approximate coded computing, while attaining an induced $\ell_2$-subspace embedding through uniform sampling. The transition to uniform sampling is done without applying a random projection, as in the case of the subsampled randomized Hadamard transform. Furthermore, by incorporating this technique to coded computing, our scheme is an iterative sketching approach to approximately solving linear regression. We also propose weighting when sketching takes place through sampling with replacement, for further compression.

泛函 · 哈希學習 · FAST · 縮放 · Extensibility ·

2023 年 8 月 6 日

Parallel and External-Memory Construction of Minimal Perfect Hash Functions with PTHash

Giulio Ermanno Pibiri,Roberto Trani

from arxiv, Accepted by IEEE TKDE

A function $f : U \to \{0,\ldots,n-1\}$ is a minimal perfect hash function for a set $S \subseteq U$ of size $n$, if $f$ bijectively maps $S$ into the first $n$ natural numbers. These functions are important for many practical applications in computing, such as search engines, computer networks, and databases. Several algorithms have been proposed to build minimal perfect hash functions that: scale well to large sets, retain fast evaluation time, and take very little space, e.g., 2 - 3 bits/key. PTHash is one such algorithm, achieving very fast evaluation in compressed space, typically several times faster than other techniques. In this work, we propose a new construction algorithm for PTHash enabling: (1) multi-threading, to either build functions more quickly or more space-efficiently, and (2) external-memory processing to scale to inputs much larger than the available internal memory. Only few other algorithms in the literature share these features, despite of their big practical impact. We conduct an extensive experimental assessment on large real-world string collections and show that, with respect to other techniques, PTHash is competitive in construction time and space consumption, but retains 2 - 6$\times$ better lookup time.

情景 · 哈希學習 · 總回報 · 操作 · state-of-the-art ·

2023 年 8 月 4 日

Tight Cell-Probe Lower Bounds for Dynamic Succinct Dictionaries

Tianxiao Li,Jingxun Liang,Huacheng Yu,Renfei Zhou

from arxiv, 35 pages; in FOCS 2023

A dictionary data structure maintains a set of at most $n$ keys from the universe $[U]$ under key insertions and deletions, such that given a query $x \in [U]$, it returns if $x$ is in the set. Some variants also store values associated to the keys such that given a query $x$, the value associated to $x$ is returned when $x$ is in the set. This fundamental data structure problem has been studied for six decades since the introduction of hash tables in 1953. A hash table occupies $O(n\log U)$ bits of space with constant time per operation in expectation. There has been a vast literature on improving its time and space usage. The state-of-the-art dictionary by Bender, Farach-Colton, Kuszmaul, Kuszmaul and Liu [BFCK+22] has space consumption close to the information-theoretic optimum, using a total of \[ \log\binom{U}{n}+O(n\log^{(k)} n) \] bits, while supporting all operations in $O(k)$ time, for any parameter $k \leq \log^* n$. The term $O(\log^{(k)} n) = O(\underbrace{\log\cdots\log}_k n)$ is referred to as the wasted bits per key. In this paper, we prove a matching cell-probe lower bound: For $U=n^{1+\Theta(1)}$, any dictionary with $O(\log^{(k)} n)$ wasted bits per key must have expected operational time $\Omega(k)$, in the cell-probe model with word-size $w=\Theta(\log U)$. Furthermore, if a dictionary stores values of $\Theta(\log U)$ bits, we show that regardless of the query time, it must have $\Omega(k)$ expected update time. It is worth noting that this is the first cell-probe lower bound on the trade-off between space and update time for general data structures.

Extensibility · 學習器 · MoDELS · INFORMS · 泛化理論 ·

2019 年 6 月 2 日

Sequential Scenario-Specific Meta Learner for Online Recommendation

Zhengxiao Du,Xiaowei Wang,Hongxia Yang,Jingren Zhou,Jie Tang

from arxiv, Accepted to KDD 2019

Cold-start problems are long-standing challenges for practical recommendations. Most existing recommendation algorithms rely on extensive observed data and are brittle to recommendation scenarios with few interactions. This paper addresses such problems using few-shot learning and meta learning. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. To accomplish this, we combine the scenario-specific learning with a model-agnostic sequential meta-learning and unify them into an integrated end-to-end framework, namely Scenario-specific Sequential Meta learner (or s^2 meta). By doing so, our meta-learner produces a generic initial model through aggregating contextual information from a variety of prediction tasks while effectively adapting to specific tasks by leveraging learning-to-learn knowledge. Extensive experiments on various real-world datasets demonstrate that our proposed model can achieve significant gains over the state-of-the-arts for cold-start problems in online recommendation. Deployment is at the Guess You Like session, the front page of the Mobile Taobao.

異常點 · 異常檢測 · CIFAR-10 · Extensibility · Performance ·

2018 年 12 月 21 日

Deep Anomaly Detection with Outlier Exposure

Dan Hendrycks,Mantas Mazeika,Thomas G. Dietterich

from arxiv, ICLR 2019; PyTorch code available at //github.com/hendrycks/outlier-exposure

It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.