亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The optimal transport problem has many applications in machine learning, physics, biology, economics, etc. Although its goal is very clear and mathematically well-defined, finding its optimal solution can be challenging for large datasets in high-dimensional space. Here, we propose a homotopy algorithm that first transforms the problem into an easy form, by changing the target distribution. It then transforms the problem back to the original form through a series of iterations, tracing a path of solutions until it finds the optimal solution for the original problem. We define the homotopy path as a subspace rotation based on the orthogonal Procrustes problem, and then we discretize the homotopy path using eigenvalue decomposition of the rotation matrix. Our goal is to provide an algorithm with complexity bound $\mathcal{O}(n^2 \log(n))$, faster than the existing methods in the literature.

相關內容

Min-max optimization problems arise in several key machine learning setups, including adversarial learning and generative modeling. In their general form, in absence of convexity/concavity assumptions, finding pure equilibria of the underlying two-player zero-sum game is computationally hard [Daskalakis et al., 2021]. In this work we focus instead in finding mixed equilibria, and consider the associated lifted problem in the space of probability measures. By adding entropic regularization, our main result establishes global convergence towards the global equilibrium by using simultaneous gradient ascent-descent with respect to the Wasserstein metric -- a dynamics that admits efficient particle discretization in high-dimensions, as opposed to entropic mirror descent. We complement this positive result with a related entropy-regularized loss which is not bilinear but still convex-concave in the Wasserstein geometry, and for which simultaneous dynamics do not converge yet timescale separation does. Taken together, these results showcase the benign geometry of bilinear games in the space of measures, enabling particle dynamics with global qualitative convergence guarantees.

Generalizing knowledge beyond source domains is a crucial prerequisite for many biomedical applications such as drug design and molecular property prediction. To meet this challenge, researchers have used optimal transport (OT) to perform representation alignment between the source and target domains. Yet existing OT algorithms are mainly designed for classification tasks. Accordingly, we consider regression tasks in the unsupervised and semi-supervised settings in this paper. To exploit continuous labels, we propose novel metrics to measure domain distances and introduce a posterior variance regularizer on the transport plan. Further, while computationally appealing, OT suffers from ambiguous decision boundaries and biased local data distributions brought by the mini-batch training. To address those issues, we propose to couple OT with metric learning to yield more robust boundaries and reduce bias. Specifically, we present a dynamic hierarchical triplet loss to describe the global data distribution, where the cluster centroids are progressively adjusted among consecutive iterations. We evaluate our method on both unsupervised and semi-supervised learning tasks in biochemistry. Experiments show the proposed method significantly outperforms state-of-the-art baselines across various benchmark datasets of small molecules and material crystals.

Mini-batch optimal transport (m-OT) has been widely used recently to deal with the memory issue of OT in large-scale applications. Despite their practicality, m-OT suffers from misspecified mappings, namely, mappings that are optimal on the mini-batch level but are partially wrong in the comparison with the optimal transportation plan between the original measures. Motivated by the misspecified mappings issue, we propose a novel mini-batch method by using partial optimal transport (POT) between mini-batch empirical measures, which we refer to as mini-batch partial optimal transport (m-POT). Leveraging the insight from the partial transportation, we explain the source of misspecified mappings from the m-OT and motivate why limiting the amount of transported masses among mini-batches via POT can alleviate the incorrect mappings. Finally, we carry out extensive experiments on various applications such as deep domain adaptation, partial domain adaptation, deep generative model, color transfer, and gradient flow to demonstrate the favorable performance of m-POT compared to current mini-batch methods.

We propose new tools for the geometric exploration of data objects taking values in a general separable metric space $(\Omega, d)$. Given a probability measure on $\Omega$, we introduce depth profiles, where the depth profile of an element $\omega\in\Omega$ refers to the distribution of the distances between $\omega$ and the other elements of $\Omega$. Depth profiles can be harnessed to define transport ranks, which capture the centrality of each element in $\Omega$ with respect to the entire data cloud based on the optimal transport maps between the depth profiles. We study the properties of transport ranks and show that they provide an effective device for detecting and visualizing patterns in samples of random objects. Specifically, we study estimates of depth profiles and transport ranks based on samples of random objects and establish the convergence of the empirical estimates to the population targets using empirical process theory. We demonstrate the usefulness of depth profiles and associated transport ranks and visualizations for distributional data through a sample of age-at-death distributions for various countries, for compositional data through energy usage for U.S. states and for network data through New York taxi trips.

Assume that we observe i.i.d.~points lying close to some unknown $d$-dimensional $\mathcal{C}^k$ submanifold $M$ in a possibly high-dimensional space. We study the problem of reconstructing the probability distribution generating the sample. After remarking that this problem is degenerate for a large class of standard losses ($L_p$, Hellinger, total variation, etc.), we focus on the Wasserstein loss, for which we build an estimator, based on kernel density estimation, whose rate of convergence depends on $d$ and the regularity $s\leq k-1$ of the underlying density, but not on the ambient dimension. In particular, we show that the estimator is minimax and matches previous rates in the literature in the case where the manifold $M$ is a $d$-dimensional cube. The related problem of the estimation of the volume measure of $M$ for the Wasserstein loss is also considered, for which a minimax estimator is exhibited.

Successful quantitative investment usually relies on precise predictions of the future movement of the stock price. Recently, machine learning based solutions have shown their capacity to give more accurate stock prediction and become indispensable components in modern quantitative investment systems. However, the i.i.d. assumption behind existing methods is inconsistent with the existence of diverse trading patterns in the stock market, which inevitably limits their ability to achieve better stock prediction performance. In this paper, we propose a novel architecture, Temporal Routing Adaptor (TRA), to empower existing stock prediction models with the ability to model multiple stock trading patterns. Essentially, TRA is a lightweight module that consists of a set of independent predictors for learning multiple patterns as well as a router to dispatch samples to different predictors. Nevertheless, the lack of explicit pattern identifiers makes it quite challenging to train an effective TRA-based model. To tackle this challenge, we further design a learning algorithm based on Optimal Transport (OT) to obtain the optimal sample to predictor assignment and effectively optimize the router with such assignment through an auxiliary loss term. Experiments on the real-world stock ranking task show that compared to the state-of-the-art baselines, e.g., Attention LSTM and Transformer, the proposed method can improve information coefficient (IC) from 0.053 to 0.059 and 0.051 to 0.056 respectively. Our dataset and code used in this work are publicly available: //github.com/microsoft/qlib/tree/main/examples/benchmarks/TRA.

Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.

In this paper, we propose to tackle the problem of reducing discrepancies between multiple domains referred to as multi-source domain adaptation and consider it under the target shift assumption: in all domains we aim to solve a classification problem with the same output classes, but with labels' proportions differing across them. We design a method based on optimal transport, a theory that is gaining momentum to tackle adaptation problems in machine learning due to its efficiency in aligning probability distributions. Our method performs multi-source adaptation and target shift correction simultaneously by learning the class probabilities of the unlabeled target sample and the coupling allowing to align two (or more) probability distributions. Experiments on both synthetic and real-world data related to satellite image segmentation task show the superiority of the proposed method over the state-of-the-art.

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

北京阿比特科技有限公司