亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We introduce a new metric ($W_\nu$ $\nu$-based Wasserstein metric) on the set of probability measures on $X \subseteq \mathbb{R}^m$, based on a slight refinement of the notion of generalized geodesics with respect to a base measure $\nu$, relevant in particular for the case when $\nu$ is singular with respect to $m$-dimensional Lebesgue measure. $W_\nu$ is defined in terms of an iterated variational problem involving optimal transport to $\nu$; we also characterize it in terms of integrations of classical Wasserstein distance between the conditional probabilities with respect to $\nu$, and through limits of certain multi-marginal optimal transport problems. We also introduce a class of metrics which are dual in a certain sense to $W_\nu$ on the set of measures which are absolutely continuous with respect to a second fixed based measure $\sigma$.As we vary the base measure $\nu$, $W_\nu$ interpolates between the usual quadratic Wasserstein distance and a metric associated with the uniquely defined generalized geodesics obtained when $\nu$ is sufficiently regular. When $\nu$ concentrates on a lower dimensional submanifold of $\mathbb{R}^m$, we prove that the variational problem in the definition of the $\nu$-based Wasserstein distance has a unique solution. We establish geodesic convexity of the usual class of functionals and of the set of source measures $\mu$ such that optimal transport between $\mu$ and $\nu$ satisfies a strengthening of the generalized nestedness condition introduced in \cite{McCannPass20}. We also present two applications of the ideas introduced here. First, our dual metric is used to prove convergence of an iterative scheme to solve a variational problem arising in game theory. We also use the multi-marginal formulation to characterize solutions to the multi-marginal problem by an ordinary differential equation, yielding a new numerical method for it.

相關內容

Interacting agents receive public information at no cost and flexibly acquire private information at a cost proportional to entropy reduction. When a policymaker provides more public information, agents acquire less private information, thus lowering information costs. Does more public information raise or reduce uncertainty faced by agents? Is it beneficial or detrimental to welfare? To address these questions, we examine the impacts of public information on flexible information acquisition in a linear-quadratic-Gaussian game with arbitrary quadratic material welfare. More public information raises uncertainty if and only if the game exhibits strategic complementarity, which can be harmful to welfare. However, when agents acquire a large amount of information, more provision of public information increases welfare through a substantial reduction in the cost of information. We give a necessary and sufficient condition for welfare to increase with public information and identify optimal public information disclosure, which is either full or partial disclosure depending upon the welfare function and the slope of the best response.

We study approaches for compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces (RKHSs).In this context, the empirical measure is contained within a natural convex set and can be approximated using convex optimization methods. Such an approximation gives under certain conditions rise to a coreset of data points. A key quantity that controls how large such a coreset has to be is the size of the largest ball around the empirical measure that is contained within the empirical convex set. The bulk of our work is concerned with deriving high probability lower bounds on the size of such a ball under various conditions. We complement this derivation of the lower bound by developing techniques that allow us to apply the compression approach to concrete inference problems such as kernel ridge regression. We conclude with a construction of an infinite dimensional RKHS for which the compression is poor, highlighting some of the difficulties one faces when trying to move to infinite dimensional RKHSs.

We consider the question of adaptive data analysis within the framework of convex optimization. We ask how many samples are needed in order to compute $\epsilon$-accurate estimates of $O(1/\epsilon^2)$ gradients queried by gradient descent, and we provide two intermediate answers to this question. First, we show that for a general analyst (not necessarily gradient descent) $\Omega(1/\epsilon^3)$ samples are required. This rules out the possibility of a foolproof mechanism. Our construction builds upon a new lower bound (that may be of interest of its own right) for an analyst that may ask several non adaptive questions in a batch of fixed and known $T$ rounds of adaptivity and requires a fraction of true discoveries. We show that for such an analyst $\Omega (\sqrt{T}/\epsilon^2)$ samples are necessary. Second, we show that, under certain assumptions on the oracle, in an interaction with gradient descent $\tilde \Omega(1/\epsilon^{2.5})$ samples are necessary. Our assumptions are that the oracle has only \emph{first order access} and is \emph{post-hoc generalizing}. First order access means that it can only compute the gradients of the sampled function at points queried by the algorithm. Our assumption of \emph{post-hoc generalization} follows from existing lower bounds for statistical queries. More generally then, we provide a generic reduction from the standard setting of statistical queries to the problem of estimating gradients queried by gradient descent. These results are in contrast with classical bounds that show that with $O(1/\epsilon^2)$ samples one can optimize the population risk to accuracy of $O(\epsilon)$ but, as it turns out, with spurious gradients.

In this paper we get error bounds for fully discrete approximations of infinite horizon problems via the dynamic programming approach. It is well known that considering a time discretization with a positive step size $h$ an error bound of size $h$ can be proved for the difference between the value function (viscosity solution of the Hamilton-Jacobi-Bellman equation corresponding to the infinite horizon) and the value function of the discrete time problem. However, including also a spatial discretization based on elements of size $k$ an error bound of size $O(k/h)$ can be found in the literature for the error between the value functions of the continuous problem and the fully discrete problem. In this paper we revise the error bound of the fully discrete method and prove, under similar assumptions to those of the time discrete case, that the error of the fully discrete case is in fact $O(h+k)$ which gives first order in time and space for the method. This error bound matches the numerical experiments of many papers in the literature in which the behaviour $1/h$ from the bound $O(k/h)$ have not been observed.

This paper introduces a new simulation-based inference procedure to model and sample from multi-dimensional probability distributions given access to i.i.d. samples, circumventing the usual approaches of explicitly modeling the density function or designing Markov chain Monte Carlo. Motivated by the seminal work on distance and isomorphism between metric measure spaces, we propose a new notion called the Reversible Gromov-Monge (RGM) distance and study how RGM can be used to design new transform samplers to perform simulation-based inference. Our RGM sampler can also estimate optimal alignments between two heterogeneous metric measure spaces $(\mathcal{X}, \mu, c_{\mathcal{X}})$ and $(\mathcal{Y}, \nu, c_{\mathcal{Y}})$ from empirical data sets, with estimated maps that approximately push forward one measure $\mu$ to the other $\nu$, and vice versa. Analytic properties of the RGM distance are derived; statistical rate of convergence, representation, and optimization questions regarding the induced sampler are studied. Synthetic and real-world examples showcasing the effectiveness of the RGM sampler are also demonstrated.

SVD (singular value decomposition) is one of the basic tools of machine learning, allowing to optimize basis for a given matrix. However, sometimes we have a set of matrices $\{A_k\}_k$ instead, and would like to optimize a single common basis for them: find orthogonal matrices $U$, $V$, such that $\{U^T A_k V\}$ set of matrices is somehow simpler. For example DCT-II is orthonormal basis of functions commonly used in image/video compression - as discussed here, this kind of basis can be quickly automatically optimized for a given dataset. While also discussed gradient descent optimization might be computationally costly, there is proposed CSVD (common SVD): fast general approach based on SVD. Specifically, we choose $U$ as built of eigenvectors of $\sum_i (w_k)^q (A_k A_k^T)^p$ and $V$ of $\sum_k (w_k)^q (A_k^T A_k)^p$, where $w_k$ are their weights, $p,q>0$ are some chosen powers e.g. 1/2, optionally with normalization e.g. $A \to A - rc^T$ where $r_i=\sum_j A_{ij}, c_j =\sum_i A_{ij}$.

We propose a novel framework for learning a low-dimensional representation of data based on nonlinear dynamical systems, which we call dynamical dimension reduction (DDR). In the DDR model, each point is evolved via a nonlinear flow towards a lower-dimensional subspace; the projection onto the subspace gives the low-dimensional embedding. Training the model involves identifying the nonlinear flow and the subspace. Following the equation discovery method, we represent the vector field that defines the flow using a linear combination of dictionary elements, where each element is a pre-specified linear/nonlinear candidate function. A regularization term for the average total kinetic energy is also introduced and motivated by optimal transport theory. We prove that the resulting optimization problem is well-posed and establish several properties of the DDR method. We also show how the DDR method can be trained using a gradient-based optimization method, where the gradients are computed using the adjoint method from optimal control theory. The DDR method is implemented and compared on synthetic and example datasets to other dimension reductions methods, including PCA, t-SNE, and Umap.

Multigrid is a powerful solver for large-scale linear systems arising from discretized partial differential equations. The convergence theory of multigrid methods for symmetric positive definite problems has been well developed over the past decades, while, for nonsymmetric problems, such theory is still not mature. As a foundation for multigrid analysis, two-grid convergence theory plays an important role in motivating multigrid algorithms. Regarding two-grid methods for nonsymmetric problems, most previous works focus on the spectral radius of iteration matrix or rely on convergence measures that are typically difficult to compute in practice. Moreover, the existing results are confined to two-grid methods with exact solution of the coarse-grid system. In this paper, we analyze the convergence of a two-grid method for nonsymmetric positive definite problems (e.g., linear systems arising from the discretizations of convection-diffusion equations). In the case of exact coarse solver, we establish an elegant identity for characterizing two-grid convergence factor, which is measured by a smoother-induced norm. The identity can be conveniently used to derive a class of optimal restriction operators and analyze how the convergence factor is influenced by restriction. More generally, we present some convergence estimates for an inexact variant of the two-grid method, in which both linear and nonlinear coarse solvers are considered.

We consider smooth optimization problems with a Hermitian positive semi-definite fixed-rank constraint, where a quotient geometry with three Riemannian metrics $g^i(\cdot, \cdot)$ $(i=1,2,3)$ is used to represent this constraint. By taking the nonlinear conjugate gradient method (CG) as an example, we show that CG on the quotient geometry with metric $g^1$ is equivalent to CG on the factor-based optimization framework, which is often called the Burer--Monteiro approach. We also show that CG on the quotient geometry with metric $g^3$ is equivalent to CG on the commonly-used embedded geometry. We call two CG methods equivalent if they produce an identical sequence of iterates $\{X_k\}$. In addition, we show that if the limit point of the sequence $\{X_k\}$ generated by an algorithm has lower rank, that is $X_k\in \mathbb C^{n\times n}, k = 1, 2, \ldots$ has rank $p$ and the limit point $X_*$ has rank $r < p$, then the condition number of the Riemannian Hessian with metric $g^1$ can be unbounded, but those of the other two metrics stay bounded. Numerical experiments show that the Burer--Monteiro CG method has slower local convergence rate if the limit point has a reduced rank, compared to CG on the quotient geometry under the other two metrics. This slower convergence rate can thus be attributed to the large condition number of the Hessian near a minimizer.

We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'

北京阿比特科技有限公司