一本色道综合久久欧美日韩精品,久热精品视频一区二区三区

In this article, we investigate the spectral behavior of random features kernel matrices of the type ${\bf K} = \mathbb{E}_{{\bf w}} \left[\sigma\left({\bf w}^{\sf T}{\bf x}_i\right)\sigma\left({\bf w}^{\sf T}{\bf x}_j\right)\right]_{i,j=1}^n$, with nonlinear function $\sigma(\cdot)$, data ${\bf x}_1, \ldots, {\bf x}_n \in \mathbb{R}^p$, and random projection vector ${\bf w} \in \mathbb{R}^p$ having i.i.d. entries. In a high-dimensional setting where the number of data $n$ and their dimension $p$ are both large and comparable, we show, under a Gaussian mixture model for the data, that the eigenspectrum of ${\bf K}$ is independent of the distribution of the i.i.d.(zero-mean and unit-variance) entries of ${\bf w}$, and only depends on $\sigma(\cdot)$ via its (generalized) Gaussian moments $\mathbb{E}_{z\sim \mathcal N(0,1)}[\sigma'(z)]$ and $\mathbb{E}_{z\sim \mathcal N(0,1)}[\sigma''(z)]$. As a result, for any kernel matrix ${\bf K}$ of the form above, we propose a novel random features technique, called Ternary Random Feature (TRF), that (i) asymptotically yields the same limiting kernel as the original ${\bf K}$ in a spectral sense and (ii) can be computed and stored much more efficiently, by wisely tuning (in a data-dependent manner) the function $\sigma$ and the random vector ${\bf w}$, both taking values in $\{-1,0,1\}$. The computation of the proposed random features requires no multiplication, and a factor of $b$ times less bits for storage compared to classical random features such as random Fourier features, with $b$ the number of bits to store full precision values. Besides, it appears in our experiments on real data that the substantial gains in computation and storage are accompanied with somewhat improved performances compared to state-of-the-art random features compression/quantization methods.

相關內容

Performer

關注 10

混合時間 · MoDELS · 圖 · 特化 · 混合 ·

2021 年 11 月 29 日

On Sparsity Awareness in Distributed Computations

Keren Censor-Hillel,Dean Leitersdorf,Volodymyr Polosukhin

from arxiv, To appear in SPAA 2021

We extract a core principle underlying seemingly different fundamental distributed settings, showing sparsity awareness may induce faster algorithms for problems in these settings. To leverage this, we establish a new framework by developing an intermediate auxiliary model weak enough to be simulated in the CONGEST model given low mixing time, as well as in the recently introduced HYBRID model. We prove that despite imposing harsh restrictions, this artificial model allows balancing massive data transfers with high bandwidth utilization. We exemplify the power of our methods, by deriving shortest-paths algorithms improving upon the state-of-the-art. Specifically, we show the following for graphs of $n$ nodes: A $(3+\epsilon)$ approximation for weighted APSP in $(n^{1/2}+n/\delta)\tau_{mix}\cdot 2^{O(\sqrt\log n)}$ rounds in the CONGEST model, where $\delta$ is the minimum degree of the graph and $\tau_{mix}$ is its mixing time. For graphs with $\delta=\tau_{mix}\cdot 2^{\omega(\sqrt\log n)}$, this takes $o(n)$ rounds, despite the $\Omega(n)$ lower bound for general graphs [Nanongkai, STOC'14]. An $(n^{7/6}/m^{1/2}+n^2/m)\cdot\tau_{mix}\cdot 2^{O(\sqrt\log n)}$-round exact SSSP algorithm in the CONGNEST model, for graphs with $m$ edges and a mixing time of $\tau_{mix}$. This improves upon the algorithm of [Chechik and Mukhtar, PODC'20] for significant ranges of values of $m$ and $\tau_{mix}$. A CONGESTED CLIQUE simulation in the CONGEST model improving upon the state-of-the-art simulation of [Ghaffari, Kuhn, and SU, PODC'17] by a factor proportional to the average degree in the graph. An $\tilde O(n^{5/17}/\epsilon^9)$-round algorithm for a $(1+\epsilon)$ approximation for SSSP in the HYBRID model. The only previous $o(n^{1/3})$ round algorithm for distance approximations in this model is for a much larger factor [Augustine, Hinnenthal, Kuhn, Scheideler, Schneider, SODA'20].

簇 · 優化器 · 高斯混合（模型） · 最大似然估計 · 樣本復雜度 ·

2021 年 11 月 29 日

Clustering a Mixture of Gaussians with Unknown Covariance

Damek Davis,Mateo Díaz,Kaizheng Wang

from arxiv, 89 pages

We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We prove its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we gather numerical and theoretical evidence that supports the existence of a statistical-computational gap. Finally, we generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights. It enjoys similar optimality guarantees for mixtures of distributions that satisfy a transportation-cost inequality, encompassing Gaussian and strongly log-concave distributions.

可辨認的 · 自回歸過程 · Processing（編程語言） · 統計量 · 置信度 ·

2021 年 11 月 29 日

Detecting frequency modulation in stochastic time series data

Adrian L. Hauber,Christian Sigloch,Jens Timmer

We propose a new test to identify non-stationary frequency-modulated stochastic processes from time series data. Our method uses the instantaneous phase as a discriminatory statistics with confidence bands derived from surrogate data. We simulated an oscillatory second-order autoregressive process to evaluate the size and power of the test. We found that the test we propose is able to correctly identify more than 99% of non-stationary data when the frequency of simulated data is doubled after the first half of the time series. Our method is easily interpretable, computationally cheap and does not require choosing hyperparameters that are dependent on the data.

可約的 · 評論員 · 多峰值 · 估計/估計量 · CASE ·

2021 年 11 月 29 日

Improved annealing for sampling from multimodal distributions via landscape modification

Michael C. H. Choi,Jing Zhang

from arxiv, 25 pages

Given a target distribution $\mu \propto e^{-\mathcal{H}}$ to sample from with Hamiltonian $\mathcal{H}$, in this paper we propose and analyze new Metropolis-Hastings sampling algorithms that target an alternative distribution $\mu^f_{1,\alpha,c} \propto e^{-\mathcal{H}^{f}_{1,\alpha,c}}$, where $\mathcal{H}^{f}_{1,\alpha,c}$ is a landscape-modified Hamiltonian which we introduce explicitly. The advantage of the Metropolis dynamics which targets $\pi^f_{1,\alpha,c}$ is that it enjoys reduced critical height described by the threshold parameter $c$, function $f$, and a penalty parameter $\alpha \geq 0$ that controls the state-dependent effect. First, we investigate the case of fixed $\alpha$ and propose a self-normalized estimator that corrects for the bias of sampling and prove asymptotic convergence results and Chernoff-type bound of the proposed estimator. Next, we consider the case of annealing the penalty parameter $\alpha$. We prove strong ergodicity and bounds on the total variation mixing time of the resulting non-homogeneous chain subject to appropriate assumptions on the decay of $\alpha$. We illustrate the proposed algorithms by comparing their mixing times with the original Metropolis dynamics on statistical physics models including the ferromagnetic Ising model on the hypercube or the complete graph and the $q$-state Potts model on the two-dimensional torus. In these cases, the mixing times of the classical Glauber dynamics are at least exponential in the system size as the critical height grows at least linearly with the size, while the proposed annealing algorithm, with appropriate choice of $f$, $c$, and annealing schedule on $\alpha$, mixes rapidly with at most polynomial dependence on the size. The crux of the proof harnesses on the important observation that the reduced critical height can be bounded independently of the size that gives rise to rapid mixing.

線性的 · Extensibility · INFORMS · Performer · CASE ·

2021 年 11 月 28 日

Subexponential and Linear Subpacketization Coded Caching via Projective Geometry

Hari Hara Suthan Chittoor,Prasad Krishnan,K V Sushena Sree,Bhavana MVN

from arxiv, Published in the IEEE Transactions on Information Theory Sept 2021. Parts of this manuscript were presented at IEEE ITW 2018, IEEE ISIT 2019, IEEE ISCIT 2019 and IEEE GLOBECOM 2019

Large gains in the rate of cache-aided broadcast communication are obtained using coded caching, but to obtain this most existing centralized coded caching schemes require that the files at the server be divisible into a large number of parts (this number is called subpacketization). In fact, most schemes require the subpacketization to be growing asymptotically as exponential in $\sqrt[\leftroot{-1}\uproot{1}r]{K}$ for some positive integer $r$ and $K$ being the number of users. On the other extreme, few schemes having subpacketization linear in $K$ are known; however, they require large number of users to exist, or they offer only little gain in the rate. In this work, we propose two new centralized coded caching schemes with low subpacketization and moderate rate gains utilizing projective geometries over finite fields. Both the schemes achieve the same asymptotic subpacketization, which is exponential in $O((\log K)^2)$ (thus improving on the $\sqrt[\leftroot{-1}\uproot{1}r]{K}$ exponent). The first scheme has a larger cache requirement but has at most a constant rate (with increasing $K$), while the second has small cache requirement but has a larger rate. As a special case of our second scheme, we get a new linear subpacketization scheme, which has a more flexible range of parameters than the existing linear subpacketization schemes. Extending our techniques, we also obtain low subpacketization schemes for other multi-receiver settings such as distributed computing and the cache-aided interference channel. We validate the performance of all our schemes via extensive numerical comparisons. For a special class of symmetric caching schemes with a given subpacketization level, we propose two new information theoretic lower bounds on the optimal rate of coded caching.

成對型 · INFORMS · 互信息 · 度量學習 · 損失 ·

2021 年 11 月 26 日

A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses

Malik Boudiaf,Jér?me Rony,Imtiaz Masud Ziko,Eric Granger,Marco Pedersoli,Pablo Piantanida,Ismail Ben Ayed

from arxiv, ECCV 2020 (Spotlight) - Code available at: //github.com/jeromerony/dml_cross_entropy

Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses, which require convoluted schemes to ease optimization, such as sample mining or pair weighting. The standard cross-entropy loss for classification has been largely overlooked in DML. On the surface, the cross-entropy may seem unrelated and irrelevant to metric learning as it does not explicitly involve pairwise distances. However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses. Our connections are drawn from two different perspectives: one based on an explicit optimization insight; the other on discriminative and generative views of the mutual information between the labels and the learned features. First, we explicitly demonstrate that the cross-entropy is an upper bound on a new pairwise loss, which has a structure similar to various pairwise losses: it minimizes intra-class distances while maximizing inter-class distances. As a result, minimizing the cross-entropy can be seen as an approximate bound-optimization (or Majorize-Minimize) algorithm for minimizing this pairwise loss. Second, we show that, more generally, minimizing the cross-entropy is actually equivalent to maximizing the mutual information, to which we connect several well-known pairwise losses. Furthermore, we show that various standard pairwise losses can be explicitly related to one another via bound relationships. Our findings indicate that the cross-entropy represents a proxy for maximizing the mutual information -- as pairwise losses do -- without the need for convoluted sample-mining heuristics. Our experiments over four standard DML benchmarks strongly support our findings. We obtain state-of-the-art results, outperforming recent and complex DML methods.

矩 · 稀疏 · 圖 · 分解的 · 離散數學 ·

2021 年 11 月 25 日

Colouring locally sparse graphs with the first moment method

Fran?ois Pirot,Eoin Hurley

We give a short proof of a bound on the list chromatic number of graphs $G$ of maximum degree $\Delta$ where each neighbourhood has density at most $d$, namely $\chi_\ell(G) \le (1+o(1)) \frac{\Delta}{\ln \frac{\Delta}{d+1}}$ as $\frac{\Delta}{d+1} \to \infty$. This bound is tight up to an asymptotic factor $2$, which is the best possible barring a breakthrough in Ramsey theory, and strengthens results due to Vu, and more recently Davies, P., Kang, and Sereni. Our proof relies on the first moment method, and adapts a clever counting argument developed by Rosenfeld in the context of non-repetitive colourings. As a final touch, we show that our method provides an asymptotically tight lower bound on the number of colourings of locally sparse graphs.

方陣 · CASE · 跡 ·

2021 年 11 月 25 日

On the dimension and structure of the square of the dual of a Goppa code

Rocco Mora,Jean-Pierre Tillich

The Goppa Code Distinguishing (GD) problem asks to distinguish efficiently a generator matrix of a Goppa code from a randomly drawn one. We revisit a distinguisher for alternant and Goppa codes through a new approach, namely by studying the dimension of square codes. We provide here a rigorous upper bound for the dimension of the square of the dual of an alternant or Goppa code, while the previous approach only provided algebraic explanations based on heuristics. Moreover, for Goppa codes, our proof extends to the non-binary case as well, thus providing an algebraic explanation for the distinguisher which was missing up to now. All the upper bounds are tight and match experimental evidence. Our work also introduces new algebraic results about products of trace codes in general and of dual of alternant and Goppa codes in particular, clarifying their square code structure. This might be of interest for cryptanalysis purposes.

微分熵 · 估計/估計量 · Continuity · INFORMS · 概率密度函數 ·

2021 年 11 月 24 日

On the Estimation of Information Measures of Continuous Distributions

Georg Pichler,Pablo Piantanida,Günther Koliander

from arxiv, 20 pages

The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.

秩 · MoDELS · 優化器 · 奇異值分解 · 列 ·

2018 年 10 月 18 日

Testing Matrix Rank, Optimally

Maria-Florina Balcan,Yi Li,David P. Woodruff,Hongyang Zhang

from arxiv, 51 pages. To appear in SODA 2019

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.