亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this article, we investigate the spectral behavior of random features kernel matrices of the type ${\bf K} = \mathbb{E}_{{\bf w}} \left[\sigma\left({\bf w}^{\sf T}{\bf x}_i\right)\sigma\left({\bf w}^{\sf T}{\bf x}_j\right)\right]_{i,j=1}^n$, with nonlinear function $\sigma(\cdot)$, data ${\bf x}_1, \ldots, {\bf x}_n \in \mathbb{R}^p$, and random projection vector ${\bf w} \in \mathbb{R}^p$ having i.i.d. entries. In a high-dimensional setting where the number of data $n$ and their dimension $p$ are both large and comparable, we show, under a Gaussian mixture model for the data, that the eigenspectrum of ${\bf K}$ is independent of the distribution of the i.i.d.(zero-mean and unit-variance) entries of ${\bf w}$, and only depends on $\sigma(\cdot)$ via its (generalized) Gaussian moments $\mathbb{E}_{z\sim \mathcal N(0,1)}[\sigma'(z)]$ and $\mathbb{E}_{z\sim \mathcal N(0,1)}[\sigma''(z)]$. As a result, for any kernel matrix ${\bf K}$ of the form above, we propose a novel random features technique, called Ternary Random Feature (TRF), that (i) asymptotically yields the same limiting kernel as the original ${\bf K}$ in a spectral sense and (ii) can be computed and stored much more efficiently, by wisely tuning (in a data-dependent manner) the function $\sigma$ and the random vector ${\bf w}$, both taking values in $\{-1,0,1\}$. The computation of the proposed random features requires no multiplication, and a factor of $b$ times less bits for storage compared to classical random features such as random Fourier features, with $b$ the number of bits to store full precision values. Besides, it appears in our experiments on real data that the substantial gains in computation and storage are accompanied with somewhat improved performances compared to state-of-the-art random features compression/quantization methods.

相關內容

We extract a core principle underlying seemingly different fundamental distributed settings, showing sparsity awareness may induce faster algorithms for problems in these settings. To leverage this, we establish a new framework by developing an intermediate auxiliary model weak enough to be simulated in the CONGEST model given low mixing time, as well as in the recently introduced HYBRID model. We prove that despite imposing harsh restrictions, this artificial model allows balancing massive data transfers with high bandwidth utilization. We exemplify the power of our methods, by deriving shortest-paths algorithms improving upon the state-of-the-art. Specifically, we show the following for graphs of $n$ nodes: A $(3+\epsilon)$ approximation for weighted APSP in $(n^{1/2}+n/\delta)\tau_{mix}\cdot 2^{O(\sqrt\log n)}$ rounds in the CONGEST model, where $\delta$ is the minimum degree of the graph and $\tau_{mix}$ is its mixing time. For graphs with $\delta=\tau_{mix}\cdot 2^{\omega(\sqrt\log n)}$, this takes $o(n)$ rounds, despite the $\Omega(n)$ lower bound for general graphs [Nanongkai, STOC'14]. An $(n^{7/6}/m^{1/2}+n^2/m)\cdot\tau_{mix}\cdot 2^{O(\sqrt\log n)}$-round exact SSSP algorithm in the CONGNEST model, for graphs with $m$ edges and a mixing time of $\tau_{mix}$. This improves upon the algorithm of [Chechik and Mukhtar, PODC'20] for significant ranges of values of $m$ and $\tau_{mix}$. A CONGESTED CLIQUE simulation in the CONGEST model improving upon the state-of-the-art simulation of [Ghaffari, Kuhn, and SU, PODC'17] by a factor proportional to the average degree in the graph. An $\tilde O(n^{5/17}/\epsilon^9)$-round algorithm for a $(1+\epsilon)$ approximation for SSSP in the HYBRID model. The only previous $o(n^{1/3})$ round algorithm for distance approximations in this model is for a much larger factor [Augustine, Hinnenthal, Kuhn, Scheideler, Schneider, SODA'20].

We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We prove its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we gather numerical and theoretical evidence that supports the existence of a statistical-computational gap. Finally, we generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights. It enjoys similar optimality guarantees for mixtures of distributions that satisfy a transportation-cost inequality, encompassing Gaussian and strongly log-concave distributions.

We propose a new test to identify non-stationary frequency-modulated stochastic processes from time series data. Our method uses the instantaneous phase as a discriminatory statistics with confidence bands derived from surrogate data. We simulated an oscillatory second-order autoregressive process to evaluate the size and power of the test. We found that the test we propose is able to correctly identify more than 99% of non-stationary data when the frequency of simulated data is doubled after the first half of the time series. Our method is easily interpretable, computationally cheap and does not require choosing hyperparameters that are dependent on the data.

Given a target distribution $\mu \propto e^{-\mathcal{H}}$ to sample from with Hamiltonian $\mathcal{H}$, in this paper we propose and analyze new Metropolis-Hastings sampling algorithms that target an alternative distribution $\mu^f_{1,\alpha,c} \propto e^{-\mathcal{H}^{f}_{1,\alpha,c}}$, where $\mathcal{H}^{f}_{1,\alpha,c}$ is a landscape-modified Hamiltonian which we introduce explicitly. The advantage of the Metropolis dynamics which targets $\pi^f_{1,\alpha,c}$ is that it enjoys reduced critical height described by the threshold parameter $c$, function $f$, and a penalty parameter $\alpha \geq 0$ that controls the state-dependent effect. First, we investigate the case of fixed $\alpha$ and propose a self-normalized estimator that corrects for the bias of sampling and prove asymptotic convergence results and Chernoff-type bound of the proposed estimator. Next, we consider the case of annealing the penalty parameter $\alpha$. We prove strong ergodicity and bounds on the total variation mixing time of the resulting non-homogeneous chain subject to appropriate assumptions on the decay of $\alpha$. We illustrate the proposed algorithms by comparing their mixing times with the original Metropolis dynamics on statistical physics models including the ferromagnetic Ising model on the hypercube or the complete graph and the $q$-state Potts model on the two-dimensional torus. In these cases, the mixing times of the classical Glauber dynamics are at least exponential in the system size as the critical height grows at least linearly with the size, while the proposed annealing algorithm, with appropriate choice of $f$, $c$, and annealing schedule on $\alpha$, mixes rapidly with at most polynomial dependence on the size. The crux of the proof harnesses on the important observation that the reduced critical height can be bounded independently of the size that gives rise to rapid mixing.

Large gains in the rate of cache-aided broadcast communication are obtained using coded caching, but to obtain this most existing centralized coded caching schemes require that the files at the server be divisible into a large number of parts (this number is called subpacketization). In fact, most schemes require the subpacketization to be growing asymptotically as exponential in $\sqrt[\leftroot{-1}\uproot{1}r]{K}$ for some positive integer $r$ and $K$ being the number of users. On the other extreme, few schemes having subpacketization linear in $K$ are known; however, they require large number of users to exist, or they offer only little gain in the rate. In this work, we propose two new centralized coded caching schemes with low subpacketization and moderate rate gains utilizing projective geometries over finite fields. Both the schemes achieve the same asymptotic subpacketization, which is exponential in $O((\log K)^2)$ (thus improving on the $\sqrt[\leftroot{-1}\uproot{1}r]{K}$ exponent). The first scheme has a larger cache requirement but has at most a constant rate (with increasing $K$), while the second has small cache requirement but has a larger rate. As a special case of our second scheme, we get a new linear subpacketization scheme, which has a more flexible range of parameters than the existing linear subpacketization schemes. Extending our techniques, we also obtain low subpacketization schemes for other multi-receiver settings such as distributed computing and the cache-aided interference channel. We validate the performance of all our schemes via extensive numerical comparisons. For a special class of symmetric caching schemes with a given subpacketization level, we propose two new information theoretic lower bounds on the optimal rate of coded caching.

Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses, which require convoluted schemes to ease optimization, such as sample mining or pair weighting. The standard cross-entropy loss for classification has been largely overlooked in DML. On the surface, the cross-entropy may seem unrelated and irrelevant to metric learning as it does not explicitly involve pairwise distances. However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses. Our connections are drawn from two different perspectives: one based on an explicit optimization insight; the other on discriminative and generative views of the mutual information between the labels and the learned features. First, we explicitly demonstrate that the cross-entropy is an upper bound on a new pairwise loss, which has a structure similar to various pairwise losses: it minimizes intra-class distances while maximizing inter-class distances. As a result, minimizing the cross-entropy can be seen as an approximate bound-optimization (or Majorize-Minimize) algorithm for minimizing this pairwise loss. Second, we show that, more generally, minimizing the cross-entropy is actually equivalent to maximizing the mutual information, to which we connect several well-known pairwise losses. Furthermore, we show that various standard pairwise losses can be explicitly related to one another via bound relationships. Our findings indicate that the cross-entropy represents a proxy for maximizing the mutual information -- as pairwise losses do -- without the need for convoluted sample-mining heuristics. Our experiments over four standard DML benchmarks strongly support our findings. We obtain state-of-the-art results, outperforming recent and complex DML methods.

We give a short proof of a bound on the list chromatic number of graphs $G$ of maximum degree $\Delta$ where each neighbourhood has density at most $d$, namely $\chi_\ell(G) \le (1+o(1)) \frac{\Delta}{\ln \frac{\Delta}{d+1}}$ as $\frac{\Delta}{d+1} \to \infty$. This bound is tight up to an asymptotic factor $2$, which is the best possible barring a breakthrough in Ramsey theory, and strengthens results due to Vu, and more recently Davies, P., Kang, and Sereni. Our proof relies on the first moment method, and adapts a clever counting argument developed by Rosenfeld in the context of non-repetitive colourings. As a final touch, we show that our method provides an asymptotically tight lower bound on the number of colourings of locally sparse graphs.

The Goppa Code Distinguishing (GD) problem asks to distinguish efficiently a generator matrix of a Goppa code from a randomly drawn one. We revisit a distinguisher for alternant and Goppa codes through a new approach, namely by studying the dimension of square codes. We provide here a rigorous upper bound for the dimension of the square of the dual of an alternant or Goppa code, while the previous approach only provided algebraic explanations based on heuristics. Moreover, for Goppa codes, our proof extends to the non-binary case as well, thus providing an algebraic explanation for the distinguisher which was missing up to now. All the upper bounds are tight and match experimental evidence. Our work also introduces new algebraic results about products of trace codes in general and of dual of alternant and Goppa codes in particular, clarifying their square code structure. This might be of interest for cryptanalysis purposes.

The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.

北京阿比特科技有限公司