99热日韩这里只有国产中文精品-日韩纯肉无遮挡一区二区视频

We consider the vector embedding problem. We are given a finite set of items, with the goal of assigning a representative vector to each one, possibly under some constraints (such as the collection of vectors being standardized, i.e., having zero mean and unit covariance). We are given data indicating that some pairs of items are similar, and optionally, some other pairs are dissimilar. For pairs of similar items, we want the corresponding vectors to be near each other, and for dissimilar pairs, we want the corresponding vectors to not be near each other, measured in Euclidean distance. We formalize this by introducing distortion functions, defined for some pairs of the items. Our goal is to choose an embedding that minimizes the total distortion, subject to the constraints. We call this the minimum-distortion embedding (MDE) problem. The MDE framework is simple but general. It includes a wide variety of embedding methods, such as spectral embedding, principal component analysis, multidimensional scaling, dimensionality reduction methods (like Isomap and UMAP), force-directed layout, and others. It also includes new embeddings, and provides principled ways of validating historical and new embeddings alike. We develop a projected quasi-Newton method that approximately solves MDE problems and scales to large data sets. We implement this method in PyMDE, an open-source Python package. In PyMDE, users can select from a library of distortion functions and constraints or specify custom ones, making it easy to rapidly experiment with different embeddings. Our software scales to data sets with millions of items and tens of millions of distortion functions. To demonstrate our method, we compute embeddings for several real-world data sets, including images, an academic co-author network, US county demographic data, and single-cell mRNA transcriptomes.

相關內容

向量化

關注 1

泛化理論 · 可辨認的 · 情景 · 回合 · 不變 ·

2021 年 10 月 15 日

Nonlinear Invariant Risk Minimization: A Causal Approach

Chaochao Lu,Yuhuai Wu,Jo?e Miguel Hernández-Lobato,Bernhard Sch?lkopf

Due to spurious correlations, machine learning systems often fail to generalize to environments whose distributions differ from the ones used at training time. Prior work addressing this, either explicitly or implicitly, attempted to find a data representation that has an invariant relationship with the target. This is done by leveraging a diverse set of training environments to reduce the effect of spurious features and build an invariant predictor. However, these methods have generalization guarantees only when both data representation and classifiers come from a linear model class. We propose invariant Causal Representation Learning (iCaRL), an approach that enables out-of-distribution (OOD) generalization in the nonlinear setting (i.e., nonlinear representations and nonlinear classifiers). It builds upon a practical and general assumption: the prior over the data representation (i.e., a set of latent variables encoding the data) given the target and the environment belongs to general exponential family distributions. Based on this, we show that it is possible to identify the data representation up to simple transformations. We also prove that all direct causes of the target can be fully discovered, which further enables us to obtain generalization guarantees in the nonlinear setting. Extensive experiments on both synthetic and real-world datasets show that our approach outperforms a variety of baseline methods. Finally, in the discussion, we further explore the aforementioned assumption and propose a more general hypothesis, called the Agnostic Hypothesis: there exist a set of hidden causal factors affecting both inputs and outcomes. The Agnostic Hypothesis can provide a unifying view of machine learning. More importantly, it can inspire a new direction to explore a general theory for identifying hidden causal factors, which is key to enabling the OOD generalization guarantees.

秩 · 線性的 · 泛化理論 · CASE · 數學 ·

2021 年 10 月 15 日

Linear maximum rank distance codes of exceptional type

Daniele Bartoli,Giovanni Zini,Ferdinando Zullo

Scattered polynomials of a given index over finite fields are intriguing rare objects with many connections within mathematics. Of particular interest are the exceptional ones, as defined in 2018 by the first author and Zhou, for which partial classification results are known. In this paper we propose a unified algebraic description of $\mathbb{F}_{q^n}$-linear maximum rank distance codes, introducing the notion of exceptional linear maximum rank distance codes of a given index. Such a connection naturally extends the notion of exceptionality for a scattered polynomial in the rank metric framework and provides a generalization of Moore sets in the monomial MRD context. We move towards the classification of exceptional linear MRD codes, by showing that the ones of index zero are generalized Gabidulin codes and proving that in the positive index case the code contains an exceptional scattered polynomial of the same index.

INFORMS · 圖 · 結構化學習 · 匯聚 · Extensibility ·

2021 年 10 月 15 日

Distribution Knowledge Embedding for Graph Pooling

Kaixuan Chen,Jie Song,Shunyu Liu,Na Yu,Zunlei Feng,Gengshi Han,Mingli Song

from arxiv, 8 pages, 4 figures, 4 tables

Graph-level representation learning is the pivotal step for downstream tasks that operate on the whole graph. The most common approach to this problem heretofore is graph pooling, where node features are typically averaged or summed to obtain the graph representations. However, pooling operations like averaging or summing inevitably cause massive information missing, which may severely downgrade the final performance. In this paper, we argue what is crucial to graph-level downstream tasks includes not only the topological structure but also the distribution from which nodes are sampled. Therefore, powered by existing Graph Neural Networks (GNN), we propose a new plug-and-play pooling module, termed as Distribution Knowledge Embedding (DKEPool), where graphs are rephrased as distributions on top of GNNs and the pooling goal is to summarize the entire distribution information instead of retaining a certain feature vector by simple predefined pooling operations. A DKEPool network de facto disassembles representation learning into two stages, structure learning and distribution learning. Structure learning follows a recursive neighborhood aggregation scheme to update node features where structure information is obtained. Distribution learning, on the other hand, omits node interconnections and focuses more on the distribution depicted by all the nodes. Extensive experiments demonstrate that the proposed DKEPool significantly and consistently outperforms the state-of-the-art methods.

優化器 · 線性的 · 類別 · Performer · binary ·

2021 年 10 月 15 日

Optimal Decision Trees for Nonlinear Metrics

Emir Demirovi?,Peter J. Stuckey

Nonlinear metrics, such as the F1-score, Matthews correlation coefficient, and Fowlkes-Mallows index, are often used to evaluate the performance of machine learning models, in particular, when facing imbalanced datasets that contain more samples of one class than the other. Recent optimal decision tree algorithms have shown remarkable progress in producing trees that are optimal with respect to linear criteria, such as accuracy, but unfortunately nonlinear metrics remain a challenge. To address this gap, we propose a novel algorithm based on bi-objective optimisation, which treats misclassifications of each binary class as a separate objective. We show that, for a large class of metrics, the optimal tree lies on the Pareto frontier. Consequently, we obtain the optimal tree by using our method to generate the set of all nondominated trees. To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics. Our approach leads to a trade-off when compared to optimising linear metrics: the resulting trees may be more desirable according to the given nonlinear metric at the expense of higher runtimes. Nevertheless, the experiments illustrate that runtimes are reasonable for majority of the tested datasets.

INFORMS · 泛函 · SMI · 配分函數 · 極小點 ·

2021 年 9 月 7 日

Using the Semantic Information G Measure to Explain and Extend Rate-Distortion Functions and Maximum Entropy Distributions

Chenguang Lu

from arxiv, 22 pages, 5 figures

In the rate-distortion function and the Maximum Entropy (ME) method, Minimum Mutual In-formation (MMI) distributions and ME distributions are expressed by Bayes-like formulas, in-cluding Negative Exponential Functions (NEFs) and partition functions. Why do these non-probability functions exist in Bayes-like formulas? On the other hand, the rate-distortion function has three disadvantages: (1) the distortion function is subjectively defined; (2) the defi-nition of the distortion function between instances and labels is often difficult; (3) it cannot be used for data compression according to the labels' semantic meanings. The author has proposed using the semantic information G measure with both statistical probability and logical probability before. We can now explain NEFs as truth functions, partition functions as logical probabilities, Bayes-like formulas as semantic Bayes' formulas, MMI as Semantic Mutual Information (SMI), and ME as extreme ME minus SMI. In overcoming the above disadvantages, this paper sets up the relationship between truth functions and distortion functions, obtains truth functions from samples by machine learning, and constructs constraint conditions with truth functions to extend rate-distortion functions. Two examples are used to help readers understand the MMI iteration and to support the theoretical results. Using truth functions and the semantic information G measure, we can combine machine learning and data compression, including semantic com-pression. We need further studies to explore general data compression and recovery, according to the semantic meaning.

估計/估計量 · 平滑 · 相互獨立的 · 樣本 · 統計量 ·

2021 年 6 月 24 日

On the asymptotic distribution of the maximum sample spectral coherence of Gaussian time series in the high dimensional regime

Alexis Rosuel,Philippe Loubaton,Pascal Vallet

We investigate the asymptotic distribution of the maximum of a frequency smoothed estimate of the spectral coherence of a M-variate complex Gaussian time series with mutually independent components when the dimension M and the number of samples N both converge to infinity. If B denotes the smoothing span of the underlying smoothed periodogram estimator, a type I extreme value limiting distribution is obtained under the rate assumptions M N $\rightarrow$ 0 and M B $\rightarrow$ c $\in$ (0, +$\infty$). This result is then exploited to build a statistic with controlled asymptotic level for testing independence between the M components of the observed time series. Numerical simulations support our results.

貪心逐層預訓練 · 近似 · 模型評估 · 統計量 · 配分函數 ·

2021 年 6 月 22 日

Large N limit of the knapsack problem

Mobolaji Williams

from arxiv, 18 pages, 6 figures, 1 table

In the simplest formulation of the knapsack problem, one seeks to maximize the total value of a collection of objects such that the total weight remains below a certain limit. In this work, we move from computer science to physics and formulate the knapsack problem as a statistical physics system and compute the corresponding partition function. We approximate the result in the large number limit and from this approximation develop a new algorithm for the problem. We compare the performance of this algorithm to that of other approximation algorithms, finding that the new algorithm is faster than most of these approaches while still retaining high accuracy. From its speed and accuracy relationship, we argue that the algorithm is a manifestation of a greedy algorithm. We conclude by discussing ways to extend the formalism to make its underlying heuristics more rigorous or to apply the approach to other combinatorial optimization problems. In all, this work exists at the intersection between computer science and statistical physics and represents a new analytical approach to solving the problems in the former using methods of the latter.

圖片分類 · 字典學習 · 稀疏 · 正則化項 · 目標函數 ·

2019 年 3 月 7 日

Label Embedded Dictionary Learning for Image Classification

Shuai Shao,Yan-Jiang Wang,Bao-Di Liu,Weifeng Liu

from arxiv, 9 pages, 13 figures

Recently, label consistent k-svd(LC-KSVD) algorithm has been successfully applied in image classification. The objective function of LC-KSVD is consisted of reconstruction error, classification error and discriminative sparse codes error with l0-norm sparse regularization term. The l0-norm, however, leads to NP-hard issue. Despite some methods such as orthogonal matching pursuit can help solve this problem to some extent, it is quite difficult to find the optimum sparse solution. To overcome this limitation, we propose a label embedded dictionary learning(LEDL) method to utilise the $\ell_1$-norm as the sparse regularization term so that we can avoid the hard-to-optimize problem by solving the convex optimization problem. Alternating direction method of multipliers and blockwise coordinate descent algorithm are then used to optimize the corresponding objective function. Extensive experimental results on six benchmark datasets illustrate that the proposed algorithm has achieved superior performance compared to some conventional classification algorithms.

分段 · 圖像分割 · 稀疏 · 簇 · 拉普拉斯特征映射 ·

2018 年 5 月 20 日

Piecewise Flat Embedding for Image Segmentation

Chaowei Fang,Zicheng Liao,Yizhou Yu

We introduce a new multi-dimensional nonlinear embedding -- Piecewise Flat Embedding (PFE) -- for image segmentation. Based on the theory of sparse signal recovery, piecewise flat embedding with diverse channels attempts to recover a piecewise constant image representation with sparse region boundaries and sparse cluster value scattering. The resultant piecewise flat embedding exhibits interesting properties such as suppressing slowly varying signals, and offers an image representation with higher region identifiability which is desirable for image segmentation or high-level semantic analysis tasks. We formulate our embedding as a variant of the Laplacian Eigenmap embedding with an $L_{1,p} (0<p\leq1)$ regularization term to promote sparse solutions. First, we devise a two-stage numerical algorithm based on Bregman iterations to compute $L_{1,1}$-regularized piecewise flat embeddings. We further generalize this algorithm through iterative reweighting to solve the general $L_{1,p}$-regularized problem. To demonstrate its efficacy, we integrate PFE into two existing image segmentation frameworks, segmentation based on clustering and hierarchical segmentation based on contour detection. Experiments on four major benchmark datasets, BSDS500, MSRC, Stanford Background Dataset, and PASCAL Context, show that segmentation algorithms incorporating our embedding achieve significantly improved results.

最大平均偏差 · 優化器 · Performer · CASES · tuning ·

2018 年 1 月 30 日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Ben Usman,Kate Saenko,Brian Kulis

from arxiv, ICLR 2018 Conference Invite to Workshop

Methods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.