爱琴海论坛视频播放三免费,欧美狂野视频一区国产精品

Minwise hashing (MinHash) is a classical method for efficiently estimating the Jaccrad similarity in massive binary (0/1) data. To generate $K$ hash values for each data vector, the standard theory of MinHash requires $K$ independent permutations. Interestingly, the recent work on "circulant MinHash" (C-MinHash) has shown that merely two permutations are needed. The first permutation breaks the structure of the data and the second permutation is re-used $K$ time in a circulant manner. Surprisingly, the estimation accuracy of C-MinHash is proved to be strictly smaller than that of the original MinHash. The more recent work further demonstrates that practically only one permutation is needed. Note that C-MinHash is different from the well-known work on "One Permutation Hashing (OPH)" published in NIPS'12. OPH and its variants using different "densification" schemes are popular alternatives to the standard MinHash. The densification step is necessary in order to deal with empty bins which exist in One Permutation Hashing. In this paper, we propose to incorporate the essential ideas of C-MinHash to improve the accuracy of One Permutation Hashing. Basically, we develop a new densification method for OPH, which achieves the smallest estimation variance compared to all existing densification schemes for OPH. Our proposed method is named C-OPH (Circulant OPH). After the initial permutation (which breaks the existing structure of the data), C-OPH only needs a "shorter" permutation of length $D/K$ (instead of $D$), where $D$ is the original data dimension and $K$ is the total number of bins in OPH. This short permutation is re-used in $K$ bins in a circulant shifting manner. It can be shown that the estimation variance of the Jaccard similarity is strictly smaller than that of the existing (densified) OPH methods.

相關內容

估計/估計量

關注 3

Neural Networks · 分離的 · Networking · 近似 · Better ·

2022 年 1 月 24 日

Optimization-Based Separations for Neural Networks

Itay Safran,Jason D. Lee

Depth separation results propose a possible theoretical explanation for the benefits of deep neural networks over shallower architectures, establishing that the former possess superior approximation capabilities. However, there are no known results in which the deeper architecture leverages this advantage into a provable optimization guarantee. We prove that when the data are generated by a distribution with radial symmetry which satisfies some mild assumptions, gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations, and where the hidden layer is held fixed throughout training. By building on and refining existing techniques for approximation lower bounds of neural networks with a single layer of non-linearities, we show that there are $d$-dimensional radial distributions on the data such that ball indicators cannot be learned efficiently by any algorithm to accuracy better than $\Omega(d^{-4})$, nor by a standard gradient descent implementation to accuracy better than a constant. These results establish what is to the best of our knowledge, the first optimization-based separations where the approximation benefits of the stronger architecture provably manifest in practice. Our proof technique introduces new tools and ideas that may be of independent interest in the theoretical study of both the approximation and optimization of neural networks.

Performer · 解碼 · GROUP · 可約的 · 全 ·

2022 年 1 月 23 日

Decoding Reed-Muller Codes with Successive Codeword Permutations

Nghia Doan,Seyyed Ali Hashemi,Marco Mondelli,Warren J. Gross

from arxiv, Submitted to an IEEE journal for possible publication

A novel recursive list decoding (RLD) algorithm of Reed-Muller (RM) codes based on successive permutations (SP) of the codeword is presented. An SP scheme that performs maximum likelihood decoding on a subset of the symmetry group of RM codes is first proposed to carefully select a good codeword permutation on the fly. Then, the proposed SP technique is applied to an improved RLD algorithm that initializes different decoding paths with random codeword permutations, which are sampled from the full symmetry group of RM codes. Finally, an efficient latency reduction scheme is introduced that virtually preserves the error-correction performance of the proposed decoder. Simulation results demonstrate that for the RM code of size $256$ with $163$ information bits, the proposed decoder reduces $39\%$ of the computational complexity, $36\%$ of the decoding latency, and $74\%$ of the memory requirement of the state-of-the-art RLD algorithm with list size $64$ that also uses the permutations from the full symmetry group of RM codes, while only incurring an error-correction performance degradation of $0.1$ dB at the target frame error rate of $10^{-3}$.

流形 · 優化器 · Microsoft Surface · Weight · 極大 ·

2022 年 1 月 22 日

Manifold Optimization Based Multi-user Rate Maximization Aided by Intelligent Reflecting Surface

Liyue Zhang,Qing Wang,Haozhi Wang,Peng Chen,Hua Chen,Wei Liu,Zhiqiang Wu

In this work, two problems associated with a downlink multi-user system are considered with the aid of intelligent reflecting surface (IRS): weighted sum-rate maximization and weighted minimal-rate maximization. For the first problem, a novel DOuble Manifold ALternating Optimization (DOMALO) algorithm is proposed by exploiting the matrix manifold theory and introducing the beamforming matrix and reflection vector using complex sphere manifold and complex oblique manifold, respectively, which incorporate the inherent geometrical structure and the required constraint. A smooth double manifold alternating optimization (S-DOMALO) algorithm is then developed based on the Dinkelbach-type algorithm and smooth exponential penalty function for the second problem. Finally, possible cooperative beamforming gain between IRSs and the IRS phase shift with limited resolution is studied, providing a reference for practical implementation. Numerical results show that our proposed algorithms can significantly outperform the benchmark schemes.

近似 · Neural Networks · Networking · 泛函 · 平滑 ·

2022 年 1 月 21 日

Simultaneous Neural Network Approximation for Smooth Functions

Sean Hon,Haizhao Yang

We establish in this work approximation results of deep neural networks for smooth functions measured in Sobolev norms, motivated by recent development of numerical solvers for partial differential equations using deep neural networks. {Our approximation results are nonasymptotic in the sense that the error bounds are explicitly characterized in terms of both the width and depth of the networks simultaneously with all involved constants explicitly determined.} Namely, for $f\in C^s([0,1]^d)$, we show that deep ReLU networks of width $\mathcal{O}(N\log{N})$ and of depth $\mathcal{O}(L\log{L})$ can achieve a nonasymptotic approximation rate of $\mathcal{O}(N^{-2(s-1)/d}L^{-2(s-1)/d})$ with respect to the $\mathcal{W}^{1,p}([0,1]^d)$ norm for $p\in[1,\infty)$. If either the ReLU function or its square is applied as activation functions to construct deep neural networks of width $\mathcal{O}(N\log{N})$ and of depth $\mathcal{O}(L\log{L})$ to approximate $f\in C^s([0,1]^d)$, the approximation rate is $\mathcal{O}(N^{-2(s-n)/d}L^{-2(s-n)/d})$ with respect to the $\mathcal{W}^{n,p}([0,1]^d)$ norm for $p\in[1,\infty)$. An extension of similar approximation results is also provided for target functions in the H\"{o}lder space.

秩 · 極小點 · 無限 ·

2022 年 1 月 20 日

Non-minimum tensor rank Gabidulin codes

Daniele Bartoli,Giovanni Zini,Ferdinando Zullo

The tensor rank of some Gabidulin codes of small dimension is investigated. In particular, we determine the tensor rank of any rank metric code equivalent to an $8$-dimensional $\mathbb{F}_q$-linear generalized Gabidulin code in $\mathbb{F}_{q}^{4\times4}$. This shows that such a code is never minimum tensor rank. In this way, we detect the first infinite family of Gabidulin codes which are not minimum tensor rank.

離散化 · 歐氏空間 · 近似 · 神經網絡 · SimPLe ·

2020 年 4 月 13 日

Products of Euclidean metrics and applications to proximity questions among curves

Ioannis Z. Emiris,Ioannis Psarros

from arxiv, 18 pages

The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments to time-series in general dimension. For $\ell_p$-products of Euclidean metrics, for any $p$, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fr\'echet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms, our algorithm is especially efficient when the length of the curves is bounded.

哈希學習 · Performer · 分解的 · state-of-the-art · 英特爾 (Intel) ·

2020 年 3 月 16 日

Dash: Scalable Hashing on Persistent Memory

Baotong Lu,Xiangpeng Hao,Tianzheng Wang,Eric Lo

from arxiv, To appear at VLDB 2020 (PVLDB Vol. 13), 15 pages

Byte-addressable persistent memory (PM) brings hash tables the potential of low latency, cheap persistence and instant recovery. The recent advent of Intel Optane DC Persistent Memory Modules (DCPMM) further accelerates this trend. Many new hash table designs have been proposed, but most of them were based on emulation and perform sub-optimally on real PM. They were also piece-wise and partial solutions that side-step many important properties, in particular good scalability, high load factor and instant recovery. We present Dash, a holistic approach to building dynamic and scalable hash tables on real PM hardware with all the aforementioned properties. Based on Dash, we adapted two popular dynamic hashing schemes (extendible hashing and linear hashing). On a 24-core machine with Intel Optane DCPMM, we show that compared to state-of-the-art, Dash-enabled hash tables can achieve up to ~3.9X higher performance with up to over 90% load factor and an instant recovery time of 57ms regardless of data size.

binary · 圖像檢索 · 哈希學習 · Networking · 損失函數（機器學習） ·

2018 年 8 月 2 日

Binary Constrained Deep Hashing Network for Image Retrieval without Manual Annotation

Thanh-Toan Do,Khoa Le,Trung Pham,Tuan Hoang,Huu Le,Ngai-Man Cheung,Ian Reid

Learning compact binary codes for image retrieval problem using deep neural networks has attracted increasing attention recently. However, training deep hashing networks is challenging due to the binary constraints on the hash codes, the similarity preserving property, and the requirement for a vast amount of labelled images. To the best of our knowledge, none of the existing methods has tackled all of these challenges completely in a unified framework. In this work, we propose a novel end-to-end deep hashing approach, which is trained to produce binary codes directly from image pixels without the need of manual annotation. In particular, we propose a novel pairwise binary constrained loss function, which simultaneously encodes the distances between pairs of hash codes, and the binary quantization error. In order to train the network with the proposed loss function, we also propose an efficient parameter learning algorithm. In addition, to provide similar/dissimilar training images to train the network, we exploit 3D models reconstructed from unlabelled images for automatic generation of enormous similar/dissimilar pairs. Extensive experiments on three image retrieval benchmark datasets demonstrate the superior performance of the proposed method over the state-of-the-art hashing methods on the image retrieval problem.

估計/估計量 · 話題模型 · 話題 · 優化器 · FAST ·

2018 年 6 月 12 日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Xin Bing,Florentina Bunea,Marten Wegkamp

We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.

哈希學習 · binary · 漢明距離 · 圖像檢索 · MoDELS ·

2018 年 3 月 6 日

Zero-Shot Sketch-Image Hashing

Yuming Shen,Li Liu,Fumin Shen,Ling Shao

from arxiv, Accepted as spotlight at CVPR 2018

Recent studies show that large-scale sketch-based image retrieval (SBIR) can be efficiently tackled by cross-modal binary representation learning methods, where Hamming distance matching significantly speeds up the process of similarity search. Providing training and test data subjected to a fixed set of pre-defined categories, the cutting-edge SBIR and cross-modal hashing works obtain acceptable retrieval performance. However, most of the existing methods fail when the categories of query sketches have never been seen during training. In this paper, the above problem is briefed as a novel but realistic zero-shot SBIR hashing task. We elaborate the challenges of this special task and accordingly propose a zero-shot sketch-image hashing (ZSIH) model. An end-to-end three-network architecture is built, two of which are treated as the binary encoders. The third network mitigates the sketch-image heterogeneity and enhances the semantic relations among data by utilizing the Kronecker fusion layer and graph convolution, respectively. As an important part of ZSIH, we formulate a generative hashing scheme in reconstructing semantic knowledge representations for zero-shot retrieval. To the best of our knowledge, ZSIH is the first zero-shot hashing work suitable for SBIR and cross-modal search. Comprehensive experiments are conducted on two extended datasets, i.e., Sketchy and TU-Berlin with a novel zero-shot train-test split. The proposed model remarkably outperforms related works.