三级电影一区二区三区_中文字幕无线在线视频观看_日本高清不卡全码视频_亚洲国产日韩精品夜夜嗨_亚洲国产精品久久久久婷蜜_一区二区三区污网站在线观看_国产一区二区三区免费下载

We construct a 2-categorical extension of the relative entropy functor of Baez and Fritz, and show that our construction is functorial with respect to vertical morphisms. Moreover, we show such a `2-relative entropy' satisfies natural 2-categorial analogues of convex linearity, vanishing under optimal hypotheses, and lower semicontinuity. While relative entropy is a relative measure of information between probability distributions, we view our construction as a relative measure of information between channels.

相關內容

相對熵

關注 0

相(xiang)對(dui)熵(shang)（relative entropy），又被稱(cheng)為Kullback-Leibler散(san)度（Kullback-Leibler divergence）或信(xin)息(xi)散(san)度（information divergence），是兩個概率(lv)分布（probability distribution）間差異(yi)的非對(dui)稱(cheng)性度量。在(zai)(zai)在(zai)(zai)信(xin)息(xi)理論中，相(xiang)對(dui)熵(shang)等價于(yu)兩個概率(lv)分布的信(xin)息(xi)熵(shang)（Shannon entropy）的差值.

經驗風險 · 經驗風險最小化 · 相對熵 · 正則化項 · 優化器 ·

2022 年 2 月 9 日

Empirical Risk Minimization with Relative Entropy Regularization: Optimality and Sensitivity Analysis

Samir M. Perlaza,Gaetan Bisson,I?aki Esnaola,Alain Jean-Marie,Stefano Rini

from arxiv, Submitted to the IEEE International Symposium on Information Theory (ISIT), Aalto, Finland, Jul., 2022

The optimality and sensitivity of the empirical risk minimization problem with relative entropy regularization (ERM-RER) are investigated for the case in which the reference is a sigma-finite measure instead of a probability measure. This generalization allows for a larger degree of flexibility in the incorporation of prior knowledge over the set of models. In this setting, the interplay of the regularization parameter, the reference measure, the risk function, and the empirical risk induced by the solution of the ERM-RER problem is characterized. This characterization yields necessary and sufficient conditions for the existence of a regularization parameter that achieves an arbitrarily small empirical risk with arbitrarily high probability. The sensitivity of the expected empirical risk to deviations from the solution of the ERM-RER problem is studied. The sensitivity is then used to provide upper and lower bounds on the expected empirical risk. Moreover, it is shown that the expectation of the sensitivity is upper bounded, up to a constant factor, by the square root of the lautum information between the models and the datasets.

Processing（編程語言） · 正則化項 · 約束 · 優化器 · 泛函 ·

2022 年 2 月 9 日

A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

Donghao Ying,Yuhao Ding,Javad Lavaei

from arxiv, 23 pages, AISTATS22

We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By leveraging the entropy regularization, our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primal optimality gap and the constraint violation. Furthermore, we propose an accelerated dual-descent method for entropy-regularized CMDPs. We prove that our method achieves the global convergence rate $\widetilde{\mathcal{O}}(1/T)$ for both the optimality gap and the constraint violation for entropy-regularized CMDPs. A discussion about a linear convergence rate for CMDPs with a single constraint is also provided.

泛函 · INFORMS · CASES · 離散化 · 相關系數 ·

2022 年 2 月 9 日

On the $Φ$-Stability and Related Conjectures

Lei Yu

from arxiv, 36 pages, 1 figure

Let $\mathbf{X}$ be a random variable uniformly distributed on the discrete cube $\{ -1,1\} ^{n}$, and let $T_{\rho}$ be the noise operator acting on Boolean functions $f:\{ -1,1\} ^{n}\to\{ 0,1\} $, where $\rho\in[0,1]$ is the noise parameter, representing the correlation coefficient between each coordination of $\mathbf{X}$ and its noise-corrupted version. Given a convex function $\Phi$ and the mean $\mathbb{E}f(\mathbf{X})=a\in[0,1]$, which Boolean function $f$ maximizes the $\Phi$-stability $\mathbb{E}[\Phi(T_{\rho}f(\mathbf{X}))]$ of $f$? Special cases of this problem include the (symmetric and asymmetric) $\alpha$-stability problems and the "Most Informative Boolean Function" problem. In this paper, we provide several upper bounds for the maximal $\Phi$-stability. When specializing $\Phi$ to some particular forms, by these upper bounds, we partially resolve Mossel and O'Donnell's conjecture on $\alpha$-stability with $\alpha>2$, Li and M\'edard's conjecture on $\alpha$-stability with $1<\alpha<2$, and Courtade and Kumar's conjecture on the "Most Informative Boolean Function" which corresponds to a conjecture on $\alpha$-stability with $\alpha=1$. Our proofs are based on discrete Fourier analysis, optimization theory, and improvements of the Friedgut--Kalai--Naor (FKN) theorem. Our improvements of the FKN theorem are sharp or asymptotically sharp for certain cases.

Continuity · 優化器 · Sphering · 張量核范數 · 過完備 ·

2022 年 2 月 8 日

A Super-Resolution Framework for Tensor Decomposition

Qiuwei Li,Ashley Prater,Lixin Shen,Gongguo Tang

This work considers a super-resolution framework for overcomplete tensor decomposition. Specifically, we view tensor decomposition as a super-resolution problem of recovering a sum of Dirac measures on the sphere and solve it by minimizing a continuous analog of the $\ell_1$ norm on the space of measures. The optimal value of this optimization defines the tensor nuclear norm. Similar to the separation condition in the super-resolution problem, by explicitly constructing a dual certificate, we develop incoherence conditions of the tensor factors so that they form the unique optimal solution of the continuous analog of $\ell_1$ norm minimization. Remarkably, the derived incoherence conditions are satisfied with high probability by random tensor factors uniformly distributed on the sphere, implying global identifiability of random tensor factors.

通道 · 輸出 · binary · 極大 · 相同 ·

2022 年 2 月 7 日

The Input and Output Entropies of the $k$-Deletion/Insertion Channel

Shubhransh Singhvi,Omer Sabary,Daniella Bar-Lev,Eitan Yaakobi

The channel output entropy of a transmitted word is the entropy of the possible channel outputs and similarly the input entropy of a received word is the entropy of all possible transmitted words. The goal of this work is to study these entropy values for the $k$-deletion, $k$-insertion channel, where exactly $k$ symbols are deleted, inserted in the transmitted word, respectively. If all possible words are transmitted with the same probability then studying the input and output entropies is equivalent. For both the 1-insertion and 1-deletion channels, it is proved that among all words with a fixed number of runs, the input entropy is minimized for words with a skewed distribution of their run lengths and it is maximized for words with a balanced distribution of their run lengths. Among our results, we establish a conjecture by Atashpendar et al. which claims that for the binary 1-deletion channel, the input entropy is maximized for the alternating words.

binary · Performer · Alphabet · 變換 · 類別 ·

2022 年 2 月 5 日

Logarithmic equal-letter runs for BWT of purely morphic words

Andrea Frosini,Ilaria Mancini,Simone Rinaldi,Giuseppe Romana,Marinella Sciortino

In this paper we study the number $r_{bwt}$ of equal-letter runs produced by the Burrows-Wheeler transform ($BWT$) when it is applied to purely morphic finite words, which are words generated by iterating prolongable morphisms. Such a parameter $r_{bwt}$ is very significant since it provides a measure of the performances of the $BWT$, in terms of both compressibility and indexing. In particular, we prove that, when $BWT$ is applied to any purely morphic finite word on a binary alphabet, $r_{bwt}$ is $\mathcal{O}(\log n)$, where $n$ is the length of the word. Moreover, we prove that $r_{bwt}$ is $\Theta(\log n)$ for the binary words generated by a large class of prolongable binary morphisms. These bounds are proved by providing some new structural properties of the \emph{bispecial circular factors} of such words.

entity · INFORMS · 信息抽取 · BERT · MoDELS ·

2019 年 9 月 26 日

BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction

Weipeng Huang,Xingyi Cheng,Taifeng Wang,Wei Chu

from arxiv, To appear at NLPCC 2019

In this paper, we report our method for the Information Extraction task in 2019 Language and Intelligence Challenge. We incorporate BERT into the multi-head selection framework for joint entity-relation extraction. This model extends existing approaches from three perspectives. First, BERT is adopted as a feature extraction layer at the bottom of the multi-head selection framework. We further optimize BERT by introducing a semantic-enhanced task during BERT pre-training. Second, we introduce a large-scale Baidu Baike corpus for entity recognition pre-training, which is of weekly supervised learning since there is no actual named entity label. Third, soft label embedding is proposed to effectively transmit information between entity recognition and relation extraction. Combining these three contributions, we enhance the information extracting ability of the multi-head selection model and achieve F1-score 0.876 on testset-1 with a single model. By ensembling four variants of our model, we finally achieve F1 score 0.892 (1st place) on testset-1 and F1 score 0.8924 (2nd place) on testset-2.

圖像字幕 · INFORMS · Networking · Performer · 多樣性 ·

2019 年 9 月 22 日

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Dong-Jin Kim,Jinsoo Choi,Tae-Hyun Oh,In So Kweon

from arxiv, CVPR 2019 (Under review for journal extension). Project page : //sites.google.com/view/relcap

Our goal in this work is to train an image captioning model that generates more dense and informative captions. We introduce "relational captioning," a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in an image. Relational captioning is a framework that is advantageous in both diversity and amount of information, leading to image understanding based on relationships. Part-of speech (POS, i.e. subject-object-predicate categories) tags can be assigned to every English word. We leverage the POS as a prior to guide the correct sequence of words in a caption. To this end, we propose a multi-task triple-stream network (MTTSNet) which consists of three recurrent units for the respective POS and jointly performs POS prediction and captioning. We demonstrate more diverse and richer representations generated by the proposed model against several baselines and competing methods.

正則化項 · Better · 判別器 · GAN · INFORMS ·

2018 年 5 月 9 日

Improving GAN Training via Binarized Representation Entropy (BRE) Regularization

Yanshuai Cao,Gavin Weiguang Ding,Kry Yik-Chau Lui,Ruitong Huang

from arxiv, Published as a conference paper at the 6th International Conference on Learning Representations (ICLR 2018)

We propose a novel regularizer to improve the training of Generative Adversarial Networks (GANs). The motivation is that when the discriminator D spreads out its model capacity in the right way, the learning signals given to the generator G are more informative and diverse. These in turn help G to explore better and discover the real data manifold while avoiding large unstable jumps due to the erroneous extrapolation made by D. Our regularizer guides the rectifier discriminator D to better allocate its model capacity, by encouraging the binary activation patterns on selected internal layers of D to have a high joint entropy. Experimental results on both synthetic data and real datasets demonstrate improvements in stability and convergence speed of the GAN training, as well as higher sample quality. The approach also leads to higher classification accuracies in semi-supervised learning.

Pair · 聯合分布 · 生成式對抗網絡 · Networking · Re-ID ·

2018 年 1 月 4 日

Crossing Generative Adversarial Networks for Cross-View Person Re-identification

Chengyuan Zhang,Lin Wu,Yang Wang

from arxiv, 12 pages. arXiv admin note: text overlap with arXiv:1702.03431 by other authors

Person re-identification (\textit{re-id}) refers to matching pedestrians across disjoint yet non-overlapping camera views. The most effective way to match these pedestrians undertaking significant visual variations is to seek reliably invariant features that can describe the person of interest faithfully. Most of existing methods are presented in a supervised manner to produce discriminative features by relying on labeled paired images in correspondence. However, annotating pair-wise images is prohibitively expensive in labors, and thus not practical in large-scale networked cameras. Moreover, seeking comparable representations across camera views demands a flexible model to address the complex distributions of images. In this work, we study the co-occurrence statistic patterns between pairs of images, and propose to crossing Generative Adversarial Network (Cross-GAN) for learning a joint distribution for cross-image representations in a unsupervised manner. Given a pair of person images, the proposed model consists of the variational auto-encoder to encode the pair into respective latent variables, a proposed cross-view alignment to reduce the view disparity, and an adversarial layer to seek the joint distribution of latent representations. The learned latent representations are well-aligned to reflect the co-occurrence patterns of paired images. We empirically evaluate the proposed model against challenging datasets, and our results show the importance of joint invariant features in improving matching rates of person re-id with comparison to semi/unsupervised state-of-the-arts.