亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Correlation measure of order $k$ is an important measure of randomness in binary sequences. This measure tries to look for dependence between several shifted version of a sequence. We study the relation between the correlation measure of order $k$ and another two pseudorandom measures: the $N$th linear complexity and the $N$th maximum order complexity. We simplify and improve several state-of-the-art lower bounds for these two measures using the Hamming bound as well as weaker bounds derived from it.

相關內容

Conventional information-theoretic quantities assume access to probability distributions. Estimating such distributions is not trivial. Here, we consider function-based formulations of cross entropy that sidesteps this a priori estimation requirement. We propose three measures of R\'enyi's $\alpha$-cross-entropies in the setting of reproducing-kernel Hilbert spaces. Each measure has its appeals. We prove that we can estimate these measures in an unbiased, non-parametric, and minimax-optimal way. We do this via sample-constructed Gram matrices. This yields matrix-based estimators of R\'enyi's $\alpha$-cross-entropies. These estimators satisfy all of the axioms that R\'enyi established for divergences. Our cross-entropies can thus be used for assessing distributional differences. They are also appropriate for handling high-dimensional distributions, since the convergence rate of our estimator is independent of the sample dimensionality. Python code for implementing these measures can be found at //github.com/isledge/MBRCE

The prospect of achieving quantum advantage with Quantum Neural Networks (QNNs) is exciting. Understanding how QNN properties (e.g., the number of parameters $M$) affect the loss landscape is crucial to the design of scalable QNN architectures. Here, we rigorously analyze the overparametrization phenomenon in QNNs with periodic structure. We define overparametrization as the regime where the QNN has more than a critical number of parameters $M_c$ that allows it to explore all relevant directions in state space. Our main results show that the dimension of the Lie algebra obtained from the generators of the QNN is an upper bound for $M_c$, and for the maximal rank that the quantum Fisher information and Hessian matrices can reach. Underparametrized QNNs have spurious local minima in the loss landscape that start disappearing when $M\geq M_c$. Thus, the overparametrization onset corresponds to a computational phase transition where the QNN trainability is greatly improved by a more favorable landscape. We then connect the notion of overparametrization to the QNN capacity, so that when a QNN is overparametrized, its capacity achieves its maximum possible value. We run numerical simulations for eigensolver, compilation, and autoencoding applications to showcase the overparametrization computational phase transition. We note that our results also apply to variational quantum algorithms and quantum optimal control.

Stochastic gradient methods have enabled variational inference for high-dimensional models and large data sets. However, the direction of steepest ascent in the parameter space of a statistical model is given not by the commonly used Euclidean gradient, but the natural gradient which premultiplies the Euclidean gradient by the inverse of the Fisher information matrix. Use of natural gradients in optimization can improve convergence significantly, but inverting the Fisher information matrix is daunting in high-dimensions. The contribution of this article is twofold. First, we derive the natural gradient updates of a Gaussian variational approximation in terms of the mean and Cholesky factor of the covariance matrix, and show that these updates depend only on the first derivative of the variational objective function. Second, we derive complete natural gradient updates for structured variational approximations with a minimal conditional exponential family representation, which include highly flexible mixture of exponential family distributions that can fit skewed or multimodal posteriors. These updates, albeit more complex than those presented priorly, account fully for the dependence between the mixing distribution and the distributions of the components. Further experiments will be carried out to evaluate the performance of proposed methods.

Order effects occur when judgments about a hypothesis's probability given a sequence of information do not equal the probability of the same hypothesis when the information is reversed. Different experiments have been performed in the literature that supports evidence of order effects. We proposed a Bayesian update model for order effects where each question can be thought of as a mini-experiment where the respondents reflect on their beliefs. We showed that order effects appear, and they have a simple cognitive explanation: the respondent's prior belief that two questions are correlated. The proposed Bayesian model allows us to make several predictions: (1) we found certain conditions on the priors that limit the existence of order effects; (2) we show that, for our model, the QQ equality is not necessarily satisfied (due to symmetry assumptions); and (3) the proposed Bayesian model has the advantage of possessing fewer parameters than its quantum counterpart.

There are various cluster validity measures used for evaluating clustering results. One of the main objective of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weakness that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal options that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios including the well-known iris data set and a real-world marketing application have been conducted in order to compare the proposed validity indices with several well-known ones.

We consider the power of local algorithms for approximately solving Max $k$XOR, a generalization of two constraint satisfaction problems previously studied with classical and quantum algorithms (MaxCut and Max E3LIN2). On instances with either random signs or no overlapping clauses and $D+1$ clauses per variable, we calculate the average satisfying fraction of the depth-1 QAOA and compare with a generalization of the local threshold algorithm. Notably, the quantum algorithm outperforms the threshold algorithm for $k > 4$. On the other hand, we highlight potential difficulties for the QAOA to achieve computational quantum advantage on this problem. We first compute a tight upper bound on the maximum satisfying fraction of nearly all large random regular Max $k$XOR instances by numerically calculating the ground state energy density $P(k)$ of a mean-field $k$-spin glass. The upper bound grows with $k$ much faster than the performance of both one-local algorithms. We also identify a new obstruction result for low-depth quantum circuits (including the QAOA) when $k=3$, generalizing a result of Bravyi et al [arXiv:1910.08980] when $k=2$. We conjecture that a similar obstruction exists for all $k$.

We study the theoretical properties of a variational Bayes method in the Gaussian Process regression model. We consider the inducing variables method introduced by Titsias (2009a) and derive sufficient conditions for obtaining contraction rates for the corresponding variational Bayes (VB) posterior. As examples we show that for three particular covariance kernels (Mat\'ern, squared exponential, random series prior) the VB approach can achieve optimal, minimax contraction rates for a sufficiently large number of appropriately chosen inducing variables. The theoretical findings are demonstrated by numerical experiments.

Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.

Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.

北京阿比特科技有限公司