亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Available in the literature are properties which characterize the gamma distribution via independence of two appropriately chosen statistics. Well-known is the classical result when one of the statistics is the sample mean and the other one the sample coefficient of variation. In this paper, we elaborate on a version of Anosov's theorem which allows to establish a general result, Theorem 1, and a series of seven corollaries providing new characterization results for gamma distributions. We keep the sample mean as one of involved statistics, while now the second one can be taken from a quite large class of homogeneous feasible definite statistics. It is relevant to mention that there is an interesting parallel between the new characterization results for gamma distributions and recent characterization results for the normal distribution.

相關內容

In this paper we obtain quantitative Bernstein-von Mises type bounds on the normal approximation of the posterior distribution in exponential family models when centering either around the posterior mode or around the maximum likelihood estimator. Our bounds, obtained through a version of Stein's method, are non-asymptotic, and data dependent; they are of the correct order both in the total variation and Wasserstein distances, as well as for approximations for expectations of smooth functions of the posterior. All our results are valid for univariate and multivariate posteriors alike, and do not require a conjugate prior setting. We illustrate our findings on a variety of exponential family distributions, including Poisson, multinomial and normal distribution with unknown mean and variance. The resulting bounds have an explicit dependence on the prior distribution and on sufficient statistics of the data from the sample, and thus provide insight into how these factors may affect the quality of the normal approximation. The performance of the bounds is also assessed with simulations.

Using information-theoretic principles, we consider the generalization error (gen-error) of iterative semi-supervised learning (SSL) algorithms that iteratively generate pseudo-labels for a large amount of unlabelled data to progressively refine the model parameters. In contrast to most previous works that {\em bound} the gen-error, we provide an {\em exact} expression for the gen-error and particularize it to the binary Gaussian mixture model. Our theoretical results suggest that when the class conditional variances are not too large, the gen-error decreases with the number of iterations, but quickly saturates. On the flip side, if the class conditional variances (and so amount of overlap between the classes) are large, the gen-error increases with the number of iterations. To mitigate this undesirable effect, we show that regularization can reduce the gen-error. The theoretical results are corroborated by extensive experiments on the MNIST and CIFAR datasets in which we notice that for easy-to-distinguish classes, the gen-error improves after several pseudo-labelling iterations, but saturates afterwards, and for more difficult-to-distinguish classes, regularization improves the generalization performance.

Asymptotic study on the partition function $p(n)$ began with the work of Hardy and Ramanujan. Later Rademacher obtained a convergent series for $p(n)$ and an error bound was given by Lehmer. Despite having this, a full asymptotic expansion for $p(n)$ with an explicit error bound is not known. Recently O'Sullivan studied the asymptotic expansion of $p^{k}(n)$-partitions into $k$th powers, initiated by Wright, and consequently obtained an asymptotic expansion for $p(n)$ along with a concise description of the coefficients involved in the expansion but without any estimation of the error term. Here we consider a detailed and comprehensive analysis on an estimation of the error term obtained by truncating the asymptotic expansion for $p(n)$ at any positive integer $n$. This gives rise to an infinite family of inequalities for $p(n)$ which finally answers to a question proposed by Chen. Our error term estimation predominantly relies on applications of algorithmic methods from symbolic summation.

A goodness-of-fit test for one-parameter count distributions with finite second moment is proposed. The test statistic is derived from the $L_1$-distance of a function of the probability generating function of the model under the null hypothesis and that of the random variable actually generating data, when the latter belongs to a suitable wide class of alternatives. The test statistic has a rather simple form and it is asymptotically normally distributed under the null hypothesis, allowing a straightforward implementation of the test. Moreover, the test is consistent for alternative distributions belonging to the class, but also for all the alternative distributions whose probability of zero is different from that under the null hypothesis. Thus, the use of the test is proposed and investigated also for alternatives not in the class. The finite-sample properties of the test are assessed by means of an extensive simulation study.

A consistent test based on the probability generating function is proposed for assessing Poissonity against a wide class of count distributions, which includes some of the most frequently adopted alternatives to the Poisson distribution. The statistic, in addition to have an intuitive and simple form, is asymptotically normally distributed, allowing a straightforward implementation of the test. The finite sample properties of the test are investigated by means of an extensive simulation study. The test shows a satisfactory behaviour compared to other tests with known limit distribution.

We obtain new closed-form formulas for the moments and absolute moments of the variance-gamma distribution. We thus deduce new formulas for the moments and absolute moments of the product of two correlated zero mean normal random variables.

The ultimate purpose of the statistical analysis of ordinal patterns is to characterize the distribution of the features they induce. In particular, knowing the joint distribution of the pair Entropy-Statistical Complexity for a large class of time series models would allow statistical tests that are unavailable to date. Working in this direction, we characterize the asymptotic distribution of the empirical Shannon's Entropy for any model under which the true normalized Entropy is neither zero nor one. We obtain the asymptotic distribution from the Central Limit Theorem (assuming large time series), the Multivariate Delta Method, and a third-order correction of its mean value. We discuss the applicability of other results (exact, first-, and second-order corrections) regarding their accuracy and numerical stability. Within a general framework for building test statistics about Shannon's Entropy, we present a bilateral test that verifies if there is enough evidence to reject the hypothesis that two signals produce ordinal patterns with the same Shannon's Entropy. We applied this bilateral test to the daily maximum temperature time series from three cities (Dublin, Edinburgh, and Miami) and obtained sensible results.

Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.

We show that under minimal assumptions on a random vector $X\in\mathbb{R}^d$, and with high probability, given $m$ independent copies of $X$, the coordinate distribution of each vector $(\langle X_i,\theta \rangle)_{i=1}^m$ is dictated by the distribution of the true marginal $\langle X,\theta \rangle$. Formally, we show that with high probability, \[\sup_{\theta \in S^{d-1}} \left( \frac{1}{m}\sum_{i=1}^m \left|\langle X_i,\theta \rangle^\sharp - \lambda^\theta_i \right|^2 \right)^{1/2} \leq c \left( \frac{d}{m} \right)^{1/4},\] where $\lambda^{\theta}_i = m\int_{(\frac{i-1}{m}, \frac{i}{m}]} F_{ \langle X,\theta \rangle }^{-1}(u)^2 \,du$ and $a^\sharp$ denotes the monotone non-decreasing rearrangement of $a$. The proof follows from the optimal estimate on the worst Wasserstein distance between a marginal of $X$ and its empirical counterpart, $\frac{1}{m} \sum_{i=1}^m \delta_{\langle X_i, \theta \rangle}$. We then use the accurate information on the structures of the vectors $(\langle X_i,\theta \rangle)_{i=1}^m$ to construct the first non-gaussian ensemble that yields the optimal estimate in the Dvoretzky-Milman Theorem: the ensemble exhibits almost Euclidean sections in arbitrary normed spaces of the same dimension as the gaussian embedding -- despite being very far from gaussian (in fact, it happens to be heavy-tailed).

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.

北京阿比特科技有限公司