亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper provides a unified perspective for the Kullback-Leibler (KL)-divergence and the integral probability metrics (IPMs) from the perspective of maximum likelihood density-ratio estimation (DRE). Both the KL-divergence and the IPMs are widely used in various fields in applications such as generative modeling. However, a unified understanding of these concepts has still been unexplored. In this paper, we show that the KL-divergence and the IPMs can be represented as maximal likelihoods differing only by sampling schemes, and use this result to derive a unified form of the IPMs and a relaxed estimation method. To develop the estimation problem, we construct an unconstrained maximum likelihood estimator to perform DRE with a stratified sampling scheme. We further propose a novel class of probability divergences, called the Density Ratio Metrics (DRMs), that interpolates the KL-divergence and the IPMs. In addition to these findings, we also introduce some applications of the DRMs, such as DRE and generative adversarial networks. In experiments, we validate the effectiveness of our proposed methods.

相關內容

Given its status as a classic problem and its importance to both theoreticians and practitioners, edit distance provides an excellent lens through which to understand how the theoretical analysis of algorithms impacts practical implementations. From an applied perspective, the goals of theoretical analysis are to predict the empirical performance of an algorithm and to serve as a yardstick to design novel algorithms that perform well in practice. In this paper, we systematically survey the types of theoretical analysis techniques that have been applied to edit distance and evaluate the extent to which each one has achieved these two goals. These techniques include traditional worst-case analysis, worst-case analysis parametrized by edit distance or entropy or compressibility, average-case analysis, semi-random models, and advice-based models. We find that the track record is mixed. On one hand, two algorithms widely used in practice have been born out of theoretical analysis and their empirical performance is captured well by theoretical predictions. On the other hand, all the algorithms developed using theoretical analysis as a yardstick since then have not had any practical relevance. We conclude by discussing the remaining open problems and how they can be tackled.

Truncated densities are probability density functions defined on truncated domains. They share the same parametric form with their non-truncated counterparts up to a normalizing constant. Since the computation of their normalizing constants is usually infeasible, Maximum Likelihood Estimation cannot be easily applied to estimate truncated density models. Score Matching (SM) is a powerful tool for fitting parameters using only unnormalized models. However, it cannot be directly applied here as boundary conditions used to derive a tractable SM objective are not satisfied by truncated densities. In this paper, we study parameter estimation for truncated probability densities using SM. The estimator minimizes a weighted Fisher divergence. The weight function is simply the shortest distance from a data point to the boundary of the domain. We show this choice of weight function naturally arises from minimizing the Stein discrepancy as well as upperbounding the finite-sample estimation error. The usefulness of our method is demonstrated by numerical experiments and a study on the Chicago crime data set. We also show that the proposed density estimation can correct the outlier-trimming bias caused by aggressive outlier detection methods.

We consider statistical models arising from the common set of solutions to a sparse polynomial system with general coefficients. The maximum likelihood degree counts the number of critical points of the likelihood function restricted to the model. We prove the maximum likelihood degree of a sparse polynomial system is determined by its Newton polytopes and equals the mixed volume of a related Lagrange system of equations.

Covariance estimation for matrix-valued data has received an increasing interest in applications. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are established, and the derived minimax lower bound shows our proposed estimator is rate-optimal under certain divergence regimes of matrix size. We further introduce a class of robust covariance estimators and provide theoretical guarantees to deal with heavy-tailed data. We demonstrate the superior finite-sample performance of our methods using simulations and real applications from a gridded temperature anomalies dataset and a S&P 500 stock data analysis.

Many recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms by encouraging iterative refinements toward a stable flow estimation. However, these RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation. They can converge poorly and thereby suffer from performance degradation. To combat these drawbacks, we propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer (using any black-box solver), and differentiates through this fixed point analytically (thus requiring $O(1)$ training memory). This implicit-depth approach is not predicated on any specific model, and thus can be applied to a wide range of SOTA flow estimation model designs. The use of these DEQ flow estimators allows us to compute the flow faster using, e.g., fixed-point reuse and inexact gradients, consumes $4\sim6\times$ times less training memory than the recurrent counterpart, and achieves better results with the same computation budget. In addition, we propose a novel, sparse fixed-point correction scheme to stabilize our DEQ flow estimators, which addresses a longstanding challenge for DEQ models in general. We test our approach in various realistic settings and show that it improves SOTA methods on Sintel and KITTI datasets with substantially better computational and memory efficiency.

Computing a maximum independent set (MaxIS) is a fundamental NP-hard problem in graph theory, which has important applications in a wide spectrum of fields. Since graphs in many applications are changing frequently over time, the problem of maintaining a MaxIS over dynamic graphs has attracted increasing attention over the past few years. Due to the intractability of maintaining an exact MaxIS, this paper aims to develop efficient algorithms that can maintain an approximate MaxIS with an accuracy guarantee theoretically. In particular, we propose a framework that maintains a $(\frac{\Delta}{2} + 1)$-approximate MaxIS over dynamic graphs and prove that it achieves a constant approximation ratio in many real-world networks. To the best of our knowledge, this is the first non-trivial approximability result for the dynamic MaxIS problem. Following the framework, we implement an efficient linear-time dynamic algorithm and a more effective dynamic algorithm with near-linear expected time complexity. Our thorough experiments over real and synthetic graphs demonstrate the effectiveness and efficiency of the proposed algorithms, especially when the graph is highly dynamic.

Randomized Maximum Likelihood (RML) is an approximate posterior sampling methodology, widely used in Bayesian inverse problems with complex forward models, particularly in petroleum engineering applications. The procedure involves solving a multi-objective optimization problem, which can be challenging in high-dimensions and when there are constraints on computational costs. We propose a new methodology for tackling the RML optimization problem based on the high-dimensional Bayesian optimization literature. By sharing data between the different objective functions, we are able to implement RML at a greatly reduced computational cost. We demonstrate the benefits of our methodology in comparison with the solutions obtained by alternative optimization methods on a variety of synthetic and real-world problems, including medical and fluid dynamics applications. Furthermore, we show that the samples produced by our method cover well the high-posterior density regions in all of the experiments.

Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventional methods for off-policy PG estimation often suffer from either significant bias or exponentially large variance. In this paper, we propose the double Fitted PG estimation (FPG) algorithm. FPG can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. In the case of linear value function approximation, we provide a tight finite-sample upper bound on policy gradient estimation error, that is governed by the amount of distribution mismatch measured in feature space. We also establish the asymptotic normality of FPG estimation error with a precise covariance characterization, which is further shown to be statistically optimal with a matching Cramer-Rao lower bound. Empirically, we evaluate the performance of FPG on both policy gradient estimation and policy optimization, using either softmax tabular or ReLU policy networks. Under various metrics, our results show that FPG significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.

One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.

Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. This formulation directly contradicts the manifold hypothesis, which states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. In this paper we investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting. We also show that these procedures enable density estimation on the manifolds learned by implicit models, such as generative adversarial networks, hence addressing a major shortcoming of these models. Several recently proposed methods are instances of our two-step procedures; we thus unify, extend, and theoretically justify a large class of models.

北京阿比特科技有限公司