亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We establish connections between invariant theory and maximum likelihood estimation for discrete statistical models. We show that norm minimization over a torus orbit is equivalent to maximum likelihood estimation in log-linear models. We use notions of stability under a torus action to characterize the existence of the maximum likelihood estimate, and discuss connections to scaling algorithms.

相關內容

在(zai)(zai)統(tong)計(ji)(ji)學中,最(zui)大似(si)然估計(ji)(ji)(maximum likelihood estimation, MLE)是通過最(zui)大化(hua)似(si)然函數(shu)估計(ji)(ji)概率分布參(can)數(shu)的(de)一種方法(fa),使觀測數(shu)據(ju)在(zai)(zai)假設(she)的(de)統(tong)計(ji)(ji)模型下最(zui)有可能。參(can)數(shu)空間中使似(si)然函數(shu)最(zui)大化(hua)的(de)點稱為最(zui)大似(si)然估計(ji)(ji)。最(zui)大似(si)然邏(luo)輯既直觀又(you)靈活,因(yin)此(ci)該方法(fa)已成為統(tong)計(ji)(ji)推斷的(de)主要手(shou)段(duan)。

We consider the twin problems of estimating the effective rank and the Schatten norms $\|{\bf A}\|_{s}$ of a rectangular $p\times q$ matrix ${\bf A}$ from noisy observations. When $s$ is an even integer, we introduce a polynomial-time estimator of $\|{\bf A}\|_s$ that achieves the minimax rate $(pq)^{1/4}$. Interestingly, this optimal rate does not depend on the underlying rank of the matrix. When $s$ is not an even integer, the optimal rate is much slower. A simple thresholding estimator of the singular values achieves the rate $(q\wedge p)(pq)^{1/4}$, which turns out to be optimal up to a logarithmic multiplicative term. The tight minimax rate is achieved by a more involved polynomial approximation method. This allows us to build estimators for a class of effective rank indices. As a byproduct, we also characterize the minimax rate for estimating the sequence of singular values of a matrix.

Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are typically controlled by free parameters that are estimated from data by maximum-likelihood estimation. However, when faced with real-world datasets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the datasets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data. We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models, VAEs and normalising flows, from incomplete data. The proposed method, whilst general-purpose, achieves competitive or better performance than existing model-specific estimation methods.

Simulation-based inference with conditional neural density estimators is a powerful approach to solving inverse problems in science. However, these methods typically treat the underlying forward model as a black box, with no way to exploit geometric properties such as equivariances. Equivariances are common in scientific models, however integrating them directly into expressive inference networks (such as normalizing flows) is not straightforward. We here describe an alternative method to incorporate equivariances under joint transformations of parameters and data. Our method -- called group equivariant neural posterior estimation (GNPE) -- is based on self-consistently standardizing the "pose" of the data while estimating the posterior over parameters. It is architecture-independent, and applies both to exact and approximate equivariances. As a real-world application, we use GNPE for amortized inference of astrophysical binary black hole systems from gravitational-wave observations. We show that GNPE achieves state-of-the-art accuracy while reducing inference times by three orders of magnitude.

Inverse probability weighting (IPW) is widely used in many areas when data are subject to unrepresentativeness, missingness, or selection bias. An inevitable challenge with the use of IPW is that the IPW estimator can be remarkably unstable if some probabilities are very close to zero. To overcome this problem, at least three remedies have been developed in the literature: stabilizing, thresholding, and trimming. However the final estimators are still IPW type estimators, and inevitably inherit certain weaknesses of the naive IPW estimator: they may still be unstable or biased. We propose a biased-sample empirical likelihood weighting (ELW) method to serve the same general purpose as IPW, while completely overcoming the instability of IPW-type estimators by circumventing the use of inverse probabilities. The ELW weights are always well defined and easy to implement. We show theoretically that the ELW estimator is asymptotically normal and more efficient than the IPW estimator and its stabilized version for missing data problems and unequal probability sampling without replacement. Its asymptotic normality is also established under unequal probability sampling with replacement. Our simulation results and a real data analysis indicate that the ELW estimator is shift-equivariant, nearly unbiased, and usually outperforms the IPW-type estimators in terms of mean square error.

This paper studies distributed binary test of statistical independence under communication (information bits) constraints. While testing independence is very relevant in various applications, distributed independence test is particularly useful for event detection in sensor networks where data correlation often occurs among observations of devices in the presence of a signal of interest. By focusing on the case of two devices because of their tractability, we begin by investigating conditions on Type I error probability restrictions under which the minimum Type II error admits an exponential behavior with the sample size. Then, we study the finite sample-size regime of this problem. We derive new upper and lower bounds for the gap between the minimum Type II error and its exponential approximation under different setups, including restrictions imposed on the vanishing Type I error probability. Our theoretical results shed light on the sample-size regimes at which approximations of the Type II error probability via error exponents became informative enough in the sense of predicting well the actual error probability. We finally discuss an application of our results where the gap is evaluated numerically, and we show that exponential approximations are not only tractable but also a valuable proxy for the Type II probability of error in the finite-length regime.

The Principle of Insufficient Reason (PIR) assigns equal probabilities to each alternative of a random experiment whenever there is no reason to prefer one over the other. The Maximum Entropy Principle (MaxEnt) generalizes PIR to the case where statistical information like expectations are given. It is known that both principles result in paradoxical probability updates for joint distributions of cause and effect. This is because constraints on the conditional P(effect|cause) result in changes of P(cause) that assign higher probability to those values of the cause that offer more options for the effect, suggesting "intentional behaviour". Earlier work therefore suggested sequentially maximizing (conditional) entropy according to the causal order, but without further justification apart from plausibility on toy examples. We justify causal modifications of PIR and MaxEnt by separating constraints into restrictions for the cause and restrictions for the mechanism that generates the effect from the cause. We further sketch why Causal PIR also entails "Information Geometric Causal Inference". We briefly discuss problems of generalizing the causal version of MaxEnt to arbitrary causal DAGs.

Probabilistic linear discriminant analysis (PLDA) has been widely used in open-set verification tasks, such as speaker verification. A potential issue of this model is that the training set often contains limited number of classes, which makes the estimation for the between-class variance unreliable. This unreliable estimation often leads to degraded generalization. In this paper, we present an MAP estimation for the between-class variance, by employing an Inverse-Wishart prior. A key problem is that with hierarchical models such as PLDA, the prior is placed on the variance of class means while the likelihood is based on class members, which makes the posterior inference intractable. We derive a simple MAP estimation for such a model, and test it in both PLDA scoring and length normalization. In both cases, the MAP-based estimation delivers interesting performance improvement.

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.

We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.

北京阿比特科技有限公司