国产成人精品三级在线-在线欧美视频一区二区三区

Our aim is to develop dynamic data structures that support $k$-nearest neighbors ($k$-NN) queries for a set of $n$ point sites in $O(f(n) + k)$ time, where $f(n)$ is some polylogarithmic function of $n$. The key component is a general query algorithm that allows us to find the $k$-NN spread over $t$ substructures simultaneously, thus reducing a $O(tk)$ term in the query time to $O(k)$. Combining this technique with the logarithmic method allows us to turn any static $k$-NN data structure into a data structure supporting both efficient insertions and queries. For the fully dynamic case, this technique allows us to recover the deterministic, worst-case, $O(\log^2n/\log\log n +k)$ query time for the Euclidean distance claimed before, while preserving the polylogarithmic update times. We adapt this data structure to also support fully dynamic \emph{geodesic} $k$-NN queries among a set of sites in a simple polygon. For this purpose, we design a shallow cutting based, deletion-only $k$-NN data structure. More generally, we obtain a dynamic $k$-NN data structure for any type of distance functions for which we can build vertical shallow cuttings. We apply all of our methods in the plane for the Euclidean distance, the geodesic distance, and general, constant-complexity, algebraic distance functions.

相關內容

可約的

關注 2

Extensibility · 秩 · 情景 · 泛函 · Performer ·

2021 年 11 月 16 日

Rank-Regret Minimization

Xingxing Xiao,Jianzhong Li

Multi-criteria decision-making often requires finding a small representative subset from the database. A recently proposed method is the regret minimization set (RMS) query. RMS returns a fixed size subset S of dataset D that minimizes the regret ratio of S (the difference between the score of top1 in S and the score of top-1 in D, for any possible utility function). Existing work showed that the regret-ratio is not able to accurately quantify the regret level of a user. Further, relative to the regret-ratio, users do understand the notion of rank. Consequently, it considered the problem of finding a minimal set S with at most k rank-regret (the minimal rank of tuples of S in the sorted list of D). Corresponding to RMS, we focus on the dual version of the above problem, defined as the rank-regret minimization (RRM) problem, which seeks to find a fixed size set S that minimizes the maximum rank-regret for all possible utility functions. Further, we generalize RRM and propose the restricted rank-regret minimization (RRRM) problem to minimize the rank-regret of S for functions in a restricted space. The solution for RRRM usually has a lower regret level and can better serve the specific preferences of some users. In 2D space, we design a dynamic programming algorithm 2DRRM to find the optimal solution for RRM. In HD space, we propose an algorithm HDRRM for RRM that bounds the output size and introduces a double approximation guarantee for rank-regret. Both 2DRRM and HDRRM can be generalized to the RRRM problem. Extensive experiments are performed on the synthetic and real datasets to verify the efficiency and effectiveness of our algorithms.

子空間 · 估計/估計量 · 估計誤差 · 泛函 · 方陣 ·

2021 年 11 月 16 日

Learning functions varying along a central subspace

Hao Liu,Wenjing Liao

Many functions of interest are in a high-dimensional space but exhibit low-dimensional structures. This paper studies regression of a $s$-H\"{o}lder function $f$ in $\mathbb{R}^D$ which varies along a central subspace of dimension $d$ while $d\ll D$. A direct approximation of $f$ in $\mathbb{R}^D$ with an $\varepsilon$ accuracy requires the number of samples $n$ in the order of $\varepsilon^{-(2s+D)/s}$. In this paper, we analyze the Generalized Contour Regression (GCR) algorithm for the estimation of the central subspace and use piecewise polynomials for function approximation. GCR is among the best estimators for the central subspace, but its sample complexity is an open question. We prove that GCR leads to a mean squared estimation error of $O(n^{-1})$ for the central subspace, if a variance quantity is exactly known. The estimation error of this variance quantity is also given in this paper. The mean squared regression error of $f$ is proved to be in the order of $\left(n/\log n\right)^{-\frac{2s}{2s+d}}$ where the exponent depends on the dimension of the central subspace $d$ instead of the ambient space $D$. This result demonstrates that GCR is effective in learning the low-dimensional central subspace. We also propose a modified GCR with improved efficiency. The convergence rate is validated through several numerical experiments.

相互獨立的 · 標注 · 學成 · 歐幾里得距離 · binary ·

2021 年 11 月 15 日

Margin-Independent Online Multiclass Learning via Convex Geometry

Guru Guruganesh,Allen Liu,Jon Schneider,Joshua Wang

We consider the problem of multi-class classification, where a stream of adversarially chosen queries arrive and must be assigned a label online. Unlike traditional bounds which seek to minimize the misclassification rate, we minimize the total distance from each query to the region corresponding to its correct label. When the true labels are determined via a nearest neighbor partition -- i.e. the label of a point is given by which of $k$ centers it is closest to in Euclidean distance -- we show that one can achieve a loss that is independent of the total number of queries. We complement this result by showing that learning general convex sets requires an almost linear loss per query. Our results build off of regret guarantees for the geometric problem of contextual search. In addition, we develop a novel reduction technique from multiclass classification to binary classification which may be of independent interest.

估計/估計量 · 統計量 · 線性的 · 相互獨立的 · 平滑 ·

2021 年 11 月 15 日

Properties of linear spectral statistics of frequency-smoothed estimated spectral coherence matrix of high-dimensional Gaussian time series

Philippe Loubaton,Alexis Rosuel

from arxiv, arXiv admin note: substantial text overlap with arXiv:2007.08806

The asymptotic behaviour of Linear Spectral Statistics (LSS) of the smoothed periodogram estimator of the spectral coherency matrix of a complex Gaussian high-dimensional time series $(\y_n)_{n \in \mathbb{Z}}$ with independent components is studied under the asymptotic regime where the sample size $N$ converges towards $+\infty$ while the dimension $M$ of $\y$ and the smoothing span of the estimator grow to infinity at the same rate in such a way that $\frac{M}{N} \rightarrow 0$. It is established that, at each frequency, the estimated spectral coherency matrix is close from the sample covariance matrix of an independent identically $\mathcal{N}_{\mathbb{C}}(0,\I_M)$ distributed sequence, and that its empirical eigenvalue distribution converges towards the Marcenko-Pastur distribution. This allows to conclude that each LSS has a deterministic behaviour that can be evaluated explicitly. Using concentration inequalities, it is shown that the order of magnitude of the supremum over the frequencies of the deviation of each LSS from its deterministic approximation is of the order of $\frac{1}{M} + \frac{\sqrt{M}}{N}+ (\frac{M}{N})^{3}$ where $N$ is the sample size. Numerical simulations supports our results.

Extensibility · SimPLe · 全 · 邊 · 圖 ·

2021 年 11 月 15 日

Dynamic Complexity of Parity Exists Queries

Nils Vortmeier,Thomas Zeume

Given a graph whose nodes may be coloured red, the parity of the number of red nodes can easily be maintained with first-order update rules in the dynamic complexity framework DynFO of Patnaik and Immerman. Can this be generalised to other or even all queries that are definable in first-order logic extended by parity quantifiers? We consider the query that asks whether the number of nodes that have an edge to a red node is odd. Already this simple query of quantifier structure parity-exists is a major roadblock for dynamically capturing extensions of first-order logic. We show that this query cannot be maintained with quantifier-free first-order update rules, and that variants induce a hierarchy for such update rules with respect to the arity of the maintained auxiliary relations. Towards maintaining the query with full first-order update rules, it is shown that degree-restricted variants can be maintained.

估計/估計量 · Performer · 控制器 · 近似 · 泛函 ·

2021 年 11 月 15 日

A posteriori error estimates for domain decomposition methods

Johannes Kraus,Sergey Repin

from arxiv, 24 pages, 4 figures, 4 tables

Nowadays, a posteriori error control methods have formed a new important part of the numerical analysis. Their purpose is to obtain computable error estimates in various norms and error indicators that show distributions of global and local errors of a particular numerical solution. In this paper, we focus on a particular class of domain decomposition methods (DDM), which are among the most efficient numerical methods for solving PDEs. We adapt functional type a posteriori error estimates and construct a special form of error majorant which allows efficient error control of approximations computed via these DDM by performing only subdomain-wise computations. The presented guaranteed error bounds use an extended set of admissible fluxes which arise naturally in DDM.

子空間 · CASES · Storage · BASIC · 線性的 ·

2021 年 11 月 14 日

The interplay of different metrics for the construction of constant dimension codes

Sascha Kurz

from arxiv, 19 pages; typos corrected

A basic problem for constant dimension codes is to determine the maximum possible size $A_q(n,d;k)$ of a set of $k$-dimensional subspaces in $\mathbb{F}_q^n$, called codewords, such that the subspace distance satisfies $d_S(U,W):=2k-2\dim(U\cap W)\ge d$ for all pairs of different codewords $U$, $W$. Constant dimension codes have applications in e.g.\ random linear network coding, cryptography, and distributed storage. Bounds for $A_q(n,d;k)$ are the topic of many recent research papers. Providing a general framework we survey many of the latest constructions and show up the potential for further improvements. As examples we give improved constructions for the cases $A_q(10,4;5)$, $A_q(11,4;4)$, $A_q(12,6;6)$, and $A_q(15,4;4)$. We also derive general upper bounds for subcodes arising in those constructions.

多重集 · 泛化理論 · Principle · Pair · CASE ·

2021 年 11 月 13 日

Classical and quantum dynamic programming for Subset-Sum and variants

Jonathan Allcock,Yassine Hamoudi,Antoine Joux,Felix Klingelh?fer,Miklos Santha

from arxiv, 29 pages, 1 figure

Subset-Sum is an NP-complete problem where one must decide if a multiset of $n$ integers contains a subset whose elements sum to a target value $m$. The best known classical and quantum algorithms run in time $\tilde{O}(2^{n/2})$ and $\tilde{O}(2^{n/3})$, respectively, based on the well-known meet-in-the-middle technique. Here we introduce a novel dynamic programming data structure with applications to Subset-Sum and a number of variants, including Equal-Sums (where one seeks two disjoint subsets with the same sum), 2-Subset-Sum (a relaxed version of Subset-Sum where each item in the input set can be used twice in the summation), and Shifted-Sums, a generalization of both of these variants, where one seeks two disjoint subsets whose sums differ by some specified value. Given any modulus $p$, our data structure can be constructed in time $O(np)$, after which queries can be made in time $O(n)$ to the lists of subsets summing to a same value modulo $p$. We use this data structure to give new $\tilde{O}(2^{n/2})$ and $\tilde{O}(2^{n/3})$ classical and quantum algorithms for Subset-Sum, not based on the meet-in-the-middle method. We then use the data structure in combination with variable time amplitude amplification and a quantum pair finding algorithm, extending quantum element distinctness and claw finding algorithms to the multiple solutions case, to give an $O(2^{0.504n})$ quantum algorithm for Shifted-Sums, an improvement on the best known $O(2^{0.773n})$ classical running time. We also study Pigeonhole Equal-Sums and Pigeonhole Modular Equal-Sums, where the existence of a solution is guaranteed by the pigeonhole principle. For the former problem we give classical and quantum algorithms with running time $\tilde{O}(2^{n/2})$ and $\tilde{O}(2^{2n/5})$, respectively. For the more general modular problem we give a classical algorithm which also runs in time $\tilde{O}(2^{n/2})$.

語言模型化 · MoDELS · 近鄰 · 神經語言模型 · 推斷 ·

2021 年 11 月 12 日

Efficient Nearest Neighbor Language Models

Junxian He,Graham Neubig,Taylor Berg-Kirkpatrick

from arxiv, EMNLP 2021. Update to fix typos. Code is at //github.com/jxhe/efficient-knnlm

Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore, which allows them to learn through explicitly memorizing the training datapoints. While effective, these models often require retrieval from a large datastore at test time, significantly increasing the inference overhead and thus limiting the deployment of non-parametric NLMs in practical applications. In this paper, we take the recently proposed $k$-nearest neighbors language model (Khandelwal et al., 2019) as an example, exploring methods to improve its efficiency along various dimensions. Experiments on the standard WikiText-103 benchmark and domain-adaptation datasets show that our methods are able to achieve up to a 6x speed-up in inference speed while retaining comparable performance. The empirical analysis we present may provide guidelines for future research seeking to develop or deploy more efficient non-parametric NLMs.

MoDELS · entity · 學成 · SPARQL · 端到端 ·

2018 年 11 月 16 日

End-to-End Learning for Answering Structured Queries Directly over Text

Paul Groth,Antony Scerri,Ron Daniel, Jr.,Bradley P. Allen

from arxiv, 18 pages, 6 figures

Structured queries expressed in languages (such as SQL, SPARQL, or XQuery) offer a convenient and explicit way for users to express their information needs for a number of tasks. In this work, we present an approach to answer these directly over text data without storing results in a database. We specifically look at the case of knowledge bases where queries are over entities and the relations between them. Our approach combines distributed query answering (e.g. Triple Pattern Fragments) with models built for extractive question answering. Importantly, by applying distributed querying answering we are able to simplify the model learning problem. We train models for a large portion (572) of the relations within Wikidata and achieve an average 0.70 F1 measure across all models. We also present a systematic method to construct the necessary training data for this task from knowledge graphs and describe a prototype implementation.