国产特级黄色片A级无毛视频_日本一区不卡在线观看_国产在线操大学生_亚洲第一网站男人都懂2021_久久亚洲少妇无码_精品旗袍丝袜国产在线页_亚洲性色午夜无码一区二区

For nearly six decades, the central open question in the study of hash tables has been to determine the optimal achievable tradeoff curve between time and space. State-of-the-art hash tables offer the following guarantee: If keys/values are Theta(log n) bits each, then it is possible to achieve constant-time insertions/deletions/queries while wasting only O(loglog n) bits of space per key when compared to the information-theoretic optimum. Even prior to this bound being achieved, the target of O(loglog n) wasted bits per key was known to be a natural end goal, and was proven to be optimal for a number of closely related problems (e.g., stable hashing, dynamic retrieval, and dynamically-resized filters). This paper shows that O(loglog n) wasted bits per key is not the end of the line for hashing. In fact, for any k \in [log* n], it is possible to achieve O(k)-time insertions/deletions, O(1)-time queries, and O(\log^{(k)} n) wasted bits per key (all with high probability in n). This means that, each time we increase insertion/deletion time by an \emph{additive constant}, we reduce the wasted bits per key \emph{exponentially}. We further show that this tradeoff curve is the best achievable by any of a large class of hash tables, including any hash table designed using the current framework for making constant-time hash tables succinct.

相關內容

哈希(xi)學習

關注 1418

SimPLe · FAST · Processing（編程語言） · 解碼 · 通道 ·

2022 年 1 月 6 日

A simple coding-decoding algorithm for the Hamming code

Omar Khadir

In this work, we present a new simple way to encode/decode messages transmitted via a noisy channel and protected against errors by the Hamming method. We also propose a fast and efficient algorithm for the encoding and the decoding process which do not use neither the generator matrix nor the parity-check matrix of the Hamming code.

INFORMS · 優化器 · 信息檢索 · 閉式 · CASES ·

2022 年 1 月 6 日

Optimal Rate-Distortion-Leakage Tradeoff for Single-Server Information Retrieval

Yauhen Yakimenka,Hsuan-Yin Lin,Eirik Rosnes,J?rg Kliewer

from arxiv, 14 pages, 3 figures. Accepted for publication in IEEE Journal on Selected Areas in Communications, Special Issue on Private Information Retrieval, Private Coded Computing over Distributed Servers, and Privacy in Distributed Learning

Private information retrieval protocols guarantee that a user can privately and losslessly retrieve a single file from a database stored across multiple servers. In this work, we propose to simultaneously relax the conditions of perfect retrievability and privacy in order to obtain improved download rates when all files are stored uncoded on a single server. Information leakage is measured in terms of the average success probability for the server of correctly guessing the identity of the desired file. The main findings are: i) The derivation of the optimal tradeoff between download rate, distortion, and information leakage when the file size is infinite. Closed-form expressions of the optimal tradeoff for the special cases of "no-leakage" and "no-privacy" are also given. ii) A novel approach based on linear programming (LP) to construct schemes for a finite file size and an arbitrary number of files. The proposed LP approach can be leveraged to find provably optimal schemes with corresponding closed-form expressions for the rate-distortion-leakage tradeoff when the database contains at most four bits. Finally, for a database that contains 320 bits, we compare two construction methods based on the LP approach with a nonconstructive scheme downloading subsets of files using a finite-length lossy compressor based on random coding.

優化器 · 學成 · Performer · SimPLe · BASIC ·

2022 年 1 月 5 日

Balsa: Learning a Query Optimizer Without Expert Demonstrations

Zongheng Yang,Wei-Lin Chiang,Sifei Luan,Gautam Mittal,Michael Luo,Ion Stoica

from arxiv, Preprint, SIGMOD 2022

Query optimizers are a performance-critical component in every database system. Due to their complexity, optimizers take experts months to write and years to refine. In this work, we demonstrate for the first time that learning to optimize queries without learning from an expert optimizer is both possible and efficient. We present Balsa, a query optimizer built by deep reinforcement learning. Balsa first learns basic knowledge from a simple, environment-agnostic simulator, followed by safe learning in real execution. On the Join Order Benchmark, Balsa matches the performance of two expert query optimizers, both open-source and commercial, with two hours of learning, and outperforms them by up to 2.8$\times$ in workload runtime after a few more hours. Balsa thus opens the possibility of automatically learning to optimize in future compute environments where expert-designed optimizers do not exist.

2022 年 1 月 3 日

Check-based generation of one-time tables using qutrits

Li Yu,Xue-Tong Zhang,Fuqun Wang,Chui-Ping Yang

from arxiv, 20 pages, 1 figure. Replaced Prop. 6, and minor revisions in the proof of Theorem 7

One-time tables are a class of two-party correlations that can help achieve information-theoretically secure two-party (interactive) classical or quantum computation. In this work we propose a bipartite quantum protocol for generating a simple type of one-time tables (the correlation in the Popescu-Rohrlich nonlocal box) with partial security. We then show that by running many instances of the first protocol and performing checks on some of them, asymptotically information-theoretically secure generation of one-time tables can be achieved. The first protocol is adapted from a protocol for semi-honest quantum oblivious transfer, with some changes so that no entangled state needs to be prepared, and the communication involves only one qutrit in each direction. We show that some information tradeoffs in the first protocol are similar to that in the semi-honest oblivious transfer protocol. We also obtain two types of inequalities about guessing probabilities in some protocols for generating one-time tables, from a single type of inequality about guessing probabilities in semi-honest quantum oblivious transfer protocols.

Performer · 優化器 · 向量化 · 端到端 · INFORMS ·

2021 年 8 月 2 日

Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance

Jingtao Zhan,Jiaxin Mao,Yiqun Liu,Jiafeng Guo,Min Zhang,Shaoping Ma

from arxiv, 10 pages, 11 figures

Recently, Information Retrieval community has witnessed fast-paced advances in Dense Retrieval (DR), which performs first-stage retrieval by encoding documents in a low-dimensional embedding space and querying them with embedding-based search. Despite the impressive ranking performance, previous studies usually adopt brute-force search to acquire candidates, which is prohibitive in practical Web search scenarios due to its tremendous memory usage and time cost. To overcome these problems, vector compression methods, a branch of Approximate Nearest Neighbor Search (ANNS), have been adopted in many practical embedding-based retrieval applications. One of the most popular methods is Product Quantization (PQ). However, although existing vector compression methods including PQ can help improve the efficiency of DR, they incur severely decayed retrieval performance due to the separation between encoding and compression. To tackle this problem, we present JPQ, which stands for Joint optimization of query encoding and Product Quantization. It trains the query encoder and PQ index jointly in an end-to-end manner based on three optimization strategies, namely ranking-oriented loss, PQ centroid optimization, and end-to-end negative sampling. We evaluate JPQ on two publicly available retrieval benchmarks. Experimental results show that JPQ significantly outperforms existing popular vector compression methods in terms of different trade-off settings. Compared with previous DR models that use brute-force search, JPQ almost matches the best retrieval performance with 30x compression on index size. The compressed index further brings 10x speedup on CPU and 2x speedup on GPU in query latency.

Performer · 秩 · 優化器 · Performance · Better ·

2021 年 4 月 16 日

Optimizing Dense Retrieval Model Training with Hard Negatives

Jingtao Zhan,Jiaxin Mao,Yiqun Liu,Jiafeng Guo,Min Zhang,Shaoping Ma

from arxiv, To be published in SIGIR2021

Ranking has always been one of the top concerns in information retrieval researches. For decades, the lexical matching signal has dominated the ad-hoc retrieval process, but solely using this signal in retrieval may cause the vocabulary mismatch problem. In recent years, with the development of representation learning techniques, many researchers turn to Dense Retrieval (DR) models for better ranking performance. Although several existing DR models have already obtained promising results, their performance improvement heavily relies on the sampling of training examples. Many effective sampling strategies are not efficient enough for practical usage, and for most of them, there still lacks theoretical analysis in how and why performance improvement happens. To shed light on these research questions, we theoretically investigate different training strategies for DR models and try to explain why hard negative sampling performs better than random sampling. Through the analysis, we also find that there are many potential risks in static hard negative sampling, which is employed by many existing training methods. Therefore, we propose two training strategies named a Stable Training Algorithm for dense Retrieval (STAR) and a query-side training Algorithm for Directly Optimizing Ranking pErformance (ADORE), respectively. STAR improves the stability of DR training process by introducing random negatives. ADORE replaces the widely-adopted static hard negative sampling method with a dynamic one to directly optimize the ranking performance. Experimental results on two publicly available retrieval benchmark datasets show that either strategy gains significant improvements over existing competitive baselines and a combination of them leads to the best performance.

圖卷積神經網絡/圖卷積網絡 · Performer · 圖卷積 · Networking · 圖 ·

2020 年 12 月 15 日

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

Xin Chen,Lingxi Xie,Jun Wu,Longhui Wei,Yuhui Xu,Qi Tian

from arxiv, Accepted to AAAI 2021

Neural architecture search has attracted wide attentions in both academia and industry. To accelerate it, researchers proposed weight-sharing methods which first train a super-network to reuse computation among different operators, from which exponentially many sub-networks can be sampled and efficiently evaluated. These methods enjoy great advantages in terms of computational costs, but the sampled sub-networks are not guaranteed to be estimated precisely unless an individual training process is taken. This paper owes such inaccuracy to the inevitable mismatch between assembled network layers, so that there is a random error term added to each estimation. We alleviate this issue by training a graph convolutional network to fit the performance of sampled sub-networks so that the impact of random errors becomes minimal. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates, which consequently leads to better performance of the final architecture. In addition, our approach also enjoys the flexibility of being used under different hardware constraints, since the graph convolutional network has provided an efficient lookup table of the performance of architectures in the entire search space.

哈希學習 · Performer · 分解的 · state-of-the-art · 英特爾 (Intel) ·

2020 年 3 月 16 日

Dash: Scalable Hashing on Persistent Memory

Baotong Lu,Xiangpeng Hao,Tianzheng Wang,Eric Lo

from arxiv, To appear at VLDB 2020 (PVLDB Vol. 13), 15 pages

Byte-addressable persistent memory (PM) brings hash tables the potential of low latency, cheap persistence and instant recovery. The recent advent of Intel Optane DC Persistent Memory Modules (DCPMM) further accelerates this trend. Many new hash table designs have been proposed, but most of them were based on emulation and perform sub-optimally on real PM. They were also piece-wise and partial solutions that side-step many important properties, in particular good scalability, high load factor and instant recovery. We present Dash, a holistic approach to building dynamic and scalable hash tables on real PM hardware with all the aforementioned properties. Based on Dash, we adapted two popular dynamic hashing schemes (extendible hashing and linear hashing). On a 24-core machine with Intel Optane DCPMM, we show that compared to state-of-the-art, Dash-enabled hash tables can achieve up to ~3.9X higher performance with up to over 90% load factor and an instant recovery time of 57ms regardless of data size.

秩 · MoDELS · 優化器 · 奇異值分解 · 列 ·

2018 年 10 月 18 日

Testing Matrix Rank, Optimally

Maria-Florina Balcan,Yi Li,David P. Woodruff,Hongyang Zhang

from arxiv, 51 pages. To appear in SODA 2019

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.

MoDELS · SimPLe · CC · 模型評估 · 高斯混合（模型） ·

2018 年 2 月 24 日

The Search Problem in Mixture Models

Avik Ray,Joe Neeman,Sujay Sanghavi,Sanjay Shakkottai

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.