亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Relying on random matrix theory (RMT), this paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise. Using the variational definition of the singular vectors and values of [Lim, 2005], we show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric \textit{block-wise} random matrix, that is constructed from \textit{contractions} of the studied tensor with the singular vectors associated to its best rank-1 approximation. Our approach allows the exact characterization of the almost sure asymptotic singular value and alignments of the corresponding singular vectors with the true spike components, when $\frac{n_i}{\sum_{j=1}^d n_j}\to c_i\in [0, 1]$ with $n_i$'s the tensor dimensions. In contrast to other works that rely mostly on tools from statistical physics to study random tensors, our results rely solely on classical RMT tools such as Stein's lemma. Finally, classical RMT results concerning spiked random matrices are recovered as a particular case.

相關內容

We study the enumerative and analytic properties of some sequences constructed using tensor invariant theory. The octant sequences are constructed from the exceptional Lie group $G_2$ and the quadrant sequences from the special linear group $SL(3)$. In each case we show that the corresponding sequences are related by binomial transforms. The first three octant sequences and the first four quadrant sequences are listed in the On-Line Encyclopedia of Integer Sequences (OEIS). These sequences all have interpretations as enumerating two-dimensional lattice walks but for the octant sequences the boundary conditions are unconventional. These sequences are all P-recursive and we give the corresponding recurrence relations. In all cases the associated differential operators are of third order and have the remarkable property that they can be solved to give closed formulae for the ordinary generating functions in terms of classical Gaussian hypergeometric functions. Moreover, we show that the octant sequences and the quadrant sequences are related by the branching rules for the inclusion of $SL(3)$ in $G_2$.

Graph Convolutional Networks (GCNs) are one of the most popular architectures that are used to solve classification problems accompanied by graphical information. We present a rigorous theoretical understanding of the effects of graph convolutions in multi-layer networks. We study these effects through the node classification problem of a non-linearly separable Gaussian mixture model coupled with a stochastic block model. First, we show that a single graph convolution expands the regime of the distance between the means where multi-layer networks can classify the data by a factor of at least $1/\sqrt[4]{\mathbb{E}{\rm deg}}$, where $\mathbb{E}{\rm deg}$ denotes the expected degree of a node. Second, we show that with a slightly stronger graph density, two graph convolutions improve this factor to at least $1/\sqrt[4]{n}$, where $n$ is the number of nodes in the graph. Finally, we provide both theoretical and empirical insights into the performance of graph convolutions placed in different combinations among the layers of a network, concluding that the performance is mutually similar for all combinations of the placement. We present extensive experiments on both synthetic and real-world data that illustrate our results.

This paper explores formalizing Geometric (or Clifford) algebras into the Lean 3 theorem prover, building upon the substantial body of work that is the Lean mathematics library, mathlib. As we use Lean source code to demonstrate many of our ideas, we include a brief introduction to the Lean language targeted at a reader with no prior experience with Lean or theorem provers in general. We formalize the multivectors as the quotient of the tensor algebra by a suitable relation, which provides the ring structure automatically, then go on to establish the universal property of the Clifford algebra. We show that this is quite different to the approach taken by existing formalizations of Geometric algebra in other theorem provers; most notably, our approach does not require a choice of basis. We go on to show how operations and structure such as involutions, versors, and the $\mathbb{Z}_2$-grading can be defined using the universal property alone, and how to recover an induction principle from the universal property suitable for proving statements about these definitions. We outline the steps needed to formalize the wedge product and $\mathbb{N}$-grading, and some of the gaps in mathlib that currently make this challenging.

In the realm of unsupervised learning, Bayesian nonparametric mixture models, exemplified by the Dirichlet Process Mixture Model (DPMM), provide a principled approach for adapting the complexity of the model to the data. Such models are particularly useful in clustering tasks where the number of clusters is unknown. Despite their potential and mathematical elegance, however, DPMMs have yet to become a mainstream tool widely adopted by practitioners. This is arguably due to a misconception that these models scale poorly as well as the lack of high-performance (and user-friendly) software tools that can handle large datasets efficiently. In this paper we bridge this practical gap by proposing a new, easy-to-use, statistical software package for scalable DPMM inference. More concretely, we provide efficient and easily-modifiable implementations for high-performance distributed sampling-based inference in DPMMs where the user is free to choose between either a multiple-machine, multiple-core, CPU implementation (written in Julia) and a multiple-stream GPU implementation (written in CUDA/C++). Both the CPU and GPU implementations come with a common (and optional) python wrapper, providing the user with a single point of entry with the same interface. On the algorithmic side, our implementations leverage a leading DPMM sampler from (Chang and Fisher III, 2013). While Chang and Fisher III's implementation (written in MATLAB/C++) used only CPU and was designed for a single multi-core machine, the packages we proposed here distribute the computations efficiently across either multiple multi-core machines or across mutiple GPU streams. This leads to speedups, alleviates memory and storage limitations, and lets us fit DPMMs to significantly larger datasets and of higher dimensionality than was possible previously by either (Chang and Fisher III, 2013) or other DPMM methods.

Many existing algorithms for streaming geometric data analysis have been plagued by exponential dependencies in the space complexity, which are undesirable for processing high-dimensional data sets. In particular, once $d\geq\log n$, there are no known non-trivial streaming algorithms for problems such as maintaining convex hulls and L\"owner-John ellipsoids of $n$ points, despite a long line of work in streaming computational geometry since [AHV04]. We simultaneously improve these results to $\mathrm{poly}(d,\log n)$ bits of space by trading off with a $\mathrm{poly}(d,\log n)$ factor distortion. We achieve these results in a unified manner, by designing the first streaming algorithm for maintaining a coreset for $\ell_\infty$ subspace embeddings with $\mathrm{poly}(d,\log n)$ space and $\mathrm{poly}(d,\log n)$ distortion. Our algorithm also gives similar guarantees in the \emph{online coreset} model. Along the way, we sharpen results for online numerical linear algebra by replacing a log condition number dependence with a $\log n$ dependence, answering a question of [BDM+20]. Our techniques provide a novel connection between leverage scores, a fundamental object in numerical linear algebra, and computational geometry. For $\ell_p$ subspace embeddings, we give nearly optimal trade-offs between space and distortion for one-pass streaming algorithms. For instance, we give a deterministic coreset using $O(d^2\log n)$ space and $O((d\log n)^{1/2-1/p})$ distortion for $p>2$, whereas previous deterministic algorithms incurred a $\mathrm{poly}(n)$ factor in the space or the distortion [CDW18]. Our techniques have implications in the offline setting, where we give optimal trade-offs between the space complexity and distortion of subspace sketch data structures. To do this, we give an elementary proof of a "change of density" theorem of [LT80] and make it algorithmic.

SVD (singular value decomposition) is one of the basic tools of machine learning, allowing to optimize basis for a given matrix. However, sometimes we have a set of matrices $\{A_k\}_k$ instead, and would like to optimize a single common basis for them: find orthogonal matrices $U$, $V$, such that $\{U^T A_k V\}$ set of matrices is somehow simpler. For example DCT-II is orthonormal basis of functions commonly used in image/video compression - as discussed here, this kind of basis can be quickly automatically optimized for a given dataset. While also discussed gradient descent optimization might be computationally costly, there is proposed CSVD (common SVD): fast general approach based on SVD. Specifically, we choose $U$ as built of eigenvectors of $\sum_i (w_k)^q (A_k A_k^T)^p$ and $V$ of $\sum_k (w_k)^q (A_k^T A_k)^p$, where $w_k$ are their weights, $p,q>0$ are some chosen powers e.g. 1/2, optionally with normalization e.g. $A \to A - rc^T$ where $r_i=\sum_j A_{ij}, c_j =\sum_i A_{ij}$.

Let $X^{(n)}$ be an observation sampled from a distribution $P_{\theta}^{(n)}$ with an unknown parameter $\theta,$ $\theta$ being a vector in a Banach space $E$ (most often, a high-dimensional space of dimension $d$). We study the problem of estimation of $f(\theta)$ for a functional $f:E\mapsto {\mathbb R}$ of some smoothness $s>0$ based on an observation $X^{(n)}\sim P_{\theta}^{(n)}.$ Assuming that there exists an estimator $\hat \theta_n=\hat \theta_n(X^{(n)})$ of parameter $\theta$ such that $\sqrt{n}(\hat \theta_n-\theta)$ is sufficiently close in distribution to a mean zero Gaussian random vector in $E,$ we construct a functional $g:E\mapsto {\mathbb R}$ such that $g(\hat \theta_n)$ is an asymptotically normal estimator of $f(\theta)$ with $\sqrt{n}$ rate provided that $s>\frac{1}{1-\alpha}$ and $d\leq n^{\alpha}$ for some $\alpha\in (0,1).$ We also derive general upper bounds on Orlicz norm error rates for estimator $g(\hat \theta)$ depending on smoothness $s,$ dimension $d,$ sample size $n$ and the accuracy of normal approximation of $\sqrt{n}(\hat \theta_n-\theta).$ In particular, this approach yields asymptotically efficient estimators in some high-dimensional exponential models.

A High-dimensional and sparse (HiDS) matrix is frequently encountered in a big data-related application like an e-commerce system or a social network services system. To perform highly accurate representation learning on it is of great significance owing to the great desire of extracting latent knowledge and patterns from it. Latent factor analysis (LFA), which represents an HiDS matrix by learning the low-rank embeddings based on its observed entries only, is one of the most effective and efficient approaches to this issue. However, most existing LFA-based models perform such embeddings on a HiDS matrix directly without exploiting its hidden graph structures, thereby resulting in accuracy loss. To address this issue, this paper proposes a graph-incorporated latent factor analysis (GLFA) model. It adopts two-fold ideas: 1) a graph is constructed for identifying the hidden high-order interaction (HOI) among nodes described by an HiDS matrix, and 2) a recurrent LFA structure is carefully designed with the incorporation of HOI, thereby improving the representa-tion learning ability of a resultant model. Experimental results on three real-world datasets demonstrate that GLFA outperforms six state-of-the-art models in predicting the missing data of an HiDS matrix, which evidently supports its strong representation learning ability to HiDS data.

The lossless compression of a single source $X^n$ was recently shown to be achievable with a notion of strong locality; any $X_i$ can be decoded from a {\emph{constant}} number of compressed bits, with a vanishing in $n$ probability of error. In contrast with the single source setup, we show that for two separately encoded sources $(X^n,Y^n)$, lossless compression and strong locality is generally not possible. More precisely, we show that for the class of "confusable" sources strong locality cannot be achieved whenever one of the sources is compressed below its entropy. In this case, irrespectively of $n$, the probability of error of decoding any $(X_i,Y_i)$ is lower bounded by $2^{-O(d_{\mathrm{loc}})}$, where $d_{\mathrm{loc}}$ denotes the number of compressed bits accessed by the local decoder. Conversely, if the source is not confusable, strong locality is possible even if one of the sources is compressed below its entropy. Results extend to any number of sources.

Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.

北京阿比特科技有限公司