2020久久精品亚洲热综合_日本成年黄色一区二区三区_亚洲AV日韩AV天堂影片人人网_中文字幕在线人成视频欧美_18分钟破外女在线高清在线_色爱AV综合网国产精品_色欲AV无码一区二区蜜桃臀

Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains (sometimes, even worse) on resource-rich NMT on par with its Random-Initialization (RI) counterpart. We take the first step to investigate the complementarity between PT and RI in resource-rich scenarios via two probing analyses, and find that: 1) PT improves NOT the accuracy, but the generalization by achieving flatter loss landscapes than that of RI; 2) PT improves NOT the confidence of lexical choice, but the negative diversity by assigning smoother lexical probability distributions than that of RI. Based on these insights, we propose to combine their complementarities with a model fusion algorithm that utilizes optimal transport to align neurons between PT and RI. Experiments on two resource-rich translation benchmarks, WMT'17 English-Chinese (20M) and WMT'19 English-German (36M), show that PT and RI could be nicely complementary to each other, achieving substantial improvements considering both translation accuracy, generalization, and negative diversity. Probing tools and code are released at: //github.com/zanchangtong/PTvsRI.

相關內容

Machine Translation

關注 209

機器翻譯（Machine Translation）涵蓋計算語言學和語言工程的所有分支，包含多語言方面。特色論文涵蓋理論，描述或計算方面的任何下列主題:雙語和多語語料庫的編寫和使用，計算機輔助語言教學，非羅馬字符集的計算含義，連接主義翻譯方法，對比語言學等。官網地址：

語言模型化 · MoDELS · 詞元分析器 · 輸出 · state-of-the-art ·

2022 年 11 月 28 日

Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models

Mirelle Bueno,Carlos Gemmell,Jeffrey Dalton,Roberto Lotufo,Rodrigo Nogueira

The ability to extrapolate, i.e., to make predictions on sequences that are longer than those presented as training examples, is a challenging problem for current deep learning models. Recent work shows that this limitation persists in state-of-the-art Transformer-based models. Most solutions to this problem use specific architectures or training methods that do not generalize to other tasks. We demonstrate that large language models can succeed in extrapolation without modifying their architecture or training procedure. Our experimental results show that generating step-by-step rationales and introducing marker tokens are both required for effective extrapolation. First, we induce a language model to produce step-by-step rationales before outputting the answer to effectively communicate the task to the model. However, as sequences become longer, we find that current models struggle to keep track of token positions. To address this issue, we interleave output tokens with markup tokens that act as explicit positional and counting symbols. Our findings show how these two complementary approaches enable remarkable sequence extrapolation and highlight a limitation of current architectures to effectively generalize without explicit surface form guidance. Code available at //github.com/MirelleB/induced-rationales-markup-tokens

Guidance · INFORMS · Integration · Performer · Branch ·

2022 年 11 月 25 日

Mutual Guidance and Residual Integration for Image Enhancement

Kun Zhou,KenKun Liu,Wenbo Li,Xiaoguang Han,Jiangbo Lu

from arxiv, 17 pages, 15 figures

Previous studies show the necessity of global and local adjustment for image enhancement. However, existing convolutional neural networks (CNNs) and transformer-based models face great challenges in balancing the computational efficiency and effectiveness of global-local information usage. Especially, existing methods typically adopt the global-to-local fusion mode, ignoring the importance of bidirectional interactions. To address those issues, we propose a novel mutual guidance network (MGN) to perform effective bidirectional global-local information exchange while keeping a compact architecture. In our design, we adopt a two-branch framework where one branch focuses more on modeling global relations while the other is committed to processing local information. Then, we develop an efficient attention-based mutual guidance approach throughout our framework for bidirectional global-local interactions. As a result, both the global and local branches can enjoy the merits of mutual information aggregation. Besides, to further refine the results produced by our MGN, we propose a novel residual integration scheme following the divide-and-conquer philosophy. The extensive experiments demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance on several public image enhancement benchmarks.

Learning · Performer · 可理解性 · 分解的 · Prompt ·

2022 年 11 月 25 日

Complementary Explanations for Effective In-Context Learning

Xi Ye,Srinivasan Iyer,Asli Celikyilmaz,Ves Stoyanov,Greg Durrett,Ramakanth Pasunuru

Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts. Yet, there has been limited understanding of what makes explanations effective for in-context learning. This work aims to better understand the mechanisms by which explanations are used for in-context learning. We first study the impact of two different factors on prompting performance when using explanations: the computation trace (the way the solution is decomposed) and the natural language of the prompt. By perturbing explanations on three controlled tasks, we show that both factors contribute to the effectiveness of explanations, indicating that LLMs do faithfully follow the explanations to some extent. We further study how to form maximally effective sets of explanations for solving a given test query. We find that LLMs can benefit from the complementarity of the explanation set as they are able to fuse different reasoning specified by individual exemplars in prompts. Additionally, having relevant exemplars also contributes to more effective prompts. Therefore, we propose a maximal-marginal-relevance-based exemplar selection approach for constructing exemplar sets that are both relevant as well as complementary, which successfully improves the in-context learning performance across three real-world tasks on multiple LLMs.

結點 · 圖 · Performer · 圖形處理器 · MoDELS ·

2022 年 11 月 25 日

Let Graph be the Go Board: Gradient-free Node Injection Attack for Graph Neural Networks via Reinforcement Learning

Mingxuan Ju,Yujie Fan,Chuxu Zhang,Yanfang Ye

from arxiv, AAAI 2023. v2: update acknowledgement section. arXiv admin note: substantial text overlap with arXiv:2202.09389

Graph Neural Networks (GNNs) have drawn significant attentions over the years and been broadly applied to essential applications requiring solid robustness or vigorous security standards, such as product recommendation and user behavior modeling. Under these scenarios, exploiting GNN's vulnerabilities and further downgrading its performance become extremely incentive for adversaries. Previous attackers mainly focus on structural perturbations or node injections to the existing graphs, guided by gradients from the surrogate models. Although they deliver promising results, several limitations still exist. For the structural perturbation attack, to launch a proposed attack, adversaries need to manipulate the existing graph topology, which is impractical in most circumstances. Whereas for the node injection attack, though being more practical, current approaches require training surrogate models to simulate a white-box setting, which results in significant performance downgrade when the surrogate architecture diverges from the actual victim model. To bridge these gaps, in this paper, we study the problem of black-box node injection attack, without training a potentially misleading surrogate model. Specifically, we model the node injection attack as a Markov decision process and propose Gradient-free Graph Advantage Actor Critic, namely G2A2C, a reinforcement learning framework in the fashion of advantage actor critic. By directly querying the victim model, G2A2C learns to inject highly malicious nodes with extremely limited attacking budgets, while maintaining a similar node feature distribution. Through our comprehensive experiments over eight acknowledged benchmark datasets with different characteristics, we demonstrate the superior performance of our proposed G2A2C over the existing state-of-the-art attackers. Source code is publicly available at: //github.com/jumxglhf/G2A2C}.

知識 (knowledge) · Machine Learning · MoDELS · 學成 · Conformer ·

2022 年 5 月 10 日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Julian W?rmann,Daniel Bogdoll,Etienne Bührle,Han Chen,Evaristus Fuh Chuo,Kostadin Cvejoski,Ludger van Elst,Tobias Glei?ner,Philip Gottschall,Stefan Griesche,Christian Hellert,Christian Hesels,Sebastian Houben,Tim Joseph,Niklas Keil,Johann Kelsch,Hendrik K?nigshof,Erwin Kraft,Leonie Kreuser,Kevin Krone,Tobias Latka,Denny Mattern,Stefan Matthes,Mohsin Munir,Moritz Nekolla,Adrian Paschke,Maximilian Alexander Pintz,Tianming Qiu,Faraz Qureishi,Syed Tahseen Raza Rizvi,J?rg Reichardt,Laura von Rueden,Stefan Rudolph,Alexander Sagel,Gerhard Schunk,Hao Shen,Hendrik Stapelbroek,Vera Stehr,Gurucharan Srinivas,Anh Tuan Tran,Abhishek Vivekanandan,Ya Wang,Florian Wasserrab,Tino Werner,Christian Wirth,Stefan Zwicklbauer

from arxiv, 93 pages

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.

Automator · 圖 · Extensibility · Machine Learning · 有向 ·

2022 年 1 月 4 日

Automated Graph Machine Learning: Approaches, Libraries and Directions

Xin Wang,Ziwei Zhang,Wenwu Zhu

from arxiv, 13 pages, 1 figures. arXiv admin note: substantial text overlap with arXiv:2103.00742

Graph machine learning has been extensively studied in both academic and industry. However, as the literature on graph learning booms with a vast number of emerging methods and techniques, it becomes increasingly difficult to manually design the optimal machine learning algorithm for different graph-related tasks. To tackle the challenge, automated graph machine learning, which aims at discovering the best hyper-parameter and neural architecture configuration for different graph tasks/data without manual design, is gaining an increasing number of attentions from the research community. In this paper, we extensively discuss automated graph machine approaches, covering hyper-parameter optimization (HPO) and neural architecture search (NAS) for graph machine learning. We briefly overview existing libraries designed for either graph machine learning or automated machine learning respectively, and further in depth introduce AutoGL, our dedicated and the world's first open-source library for automated graph machine learning. Last but not least, we share our insights on future research directions for automated graph machine learning. This paper is the first systematic and comprehensive discussion of approaches, libraries as well as directions for automated graph machine learning.

主動學習 · 自由能 · Extensibility · 學成 · TAP ·

2021 年 12 月 2 日

Active Learning for Domain Adaptation: An Energy-based Approach

Binhui Xie,Longhui Yuan,Shuang Li,Chi Harold Liu,Xinjing Cheng,Guoren Wang

from arxiv, Accepted by AAAI 2022. Code is available at //github.com/BIT-DA/EADA

Unsupervised domain adaptation has recently emerged as an effective paradigm for generalizing deep neural networks to new target domains. However, there is still enormous potential to be tapped to reach the fully supervised performance. In this paper, we present a novel active learning strategy to assist knowledge transfer in the target domain, dubbed active domain adaptation. We start from an observation that energy-based models exhibit free energy biases when training (source) and test (target) data come from different distributions. Inspired by this inherent mechanism, we empirically reveal that a simple yet efficient energy-based sampling strategy sheds light on selecting the most valuable target samples than existing approaches requiring particular architectures or computation of the distances. Our algorithm, Energy-based Active Domain Adaptation (EADA), queries groups of targe data that incorporate both domain characteristic and instance uncertainty into every selection round. Meanwhile, by aligning the free energy of target data compact around the source domain via a regularization term, domain gap can be implicitly diminished. Through extensive experiments, we show that EADA surpasses state-of-the-art methods on well-known challenging benchmarks with substantial improvements, making it a useful option in the open world. Code is available at //github.com/BIT-DA/EADA.

LayoutLM · INFORMS · 可理解性 · SCAN · MoDELS ·

2020 年 2 月 19 日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Yiheng Xu,Minghao Li,Lei Cui,Shaohan Huang,Furu Wei,Ming Zhou

from arxiv, Work in progress

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread of pre-training models for NLP applications, they almost focused on text-level manipulation, while neglecting the layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model the interaction between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage the image features to incorporate the visual information of words into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at //github.com/microsoft/unilm/tree/master/layoutlm.

entity · 命名實體識別 · CASE · Extensibility · MoDELS ·

2019 年 11 月 14 日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Qianhui Wu,Zijia Lin,Guoxin Wang,Hui Chen,B?rje F. Karlsson,Biqing Huang,Chin-Yew Lin

from arxiv, This paper is accepted by AAAI2020

For languages with no annotated resources, transferring knowledge from rich-resource languages is an effective solution for named entity recognition (NER). While all existing methods directly transfer from source-learned model to a target language, in this paper, we propose to fine-tune the learned model with a few similar examples given a test case, which could benefit the prediction by leveraging the structural and semantic information conveyed in such similar examples. To this end, we present a meta-learning algorithm to find a good model parameter initialization that could fast adapt to the given test case and propose to construct multiple pseudo-NER tasks for meta-training by computing sentence similarities. To further improve the model's generalization ability across different languages, we introduce a masking scheme and augment the loss function with an additional maximum term during meta-training. We conduct extensive experiments on cross-lingual named entity recognition with minimal resources over five target languages. The results show that our approach significantly outperforms existing state-of-the-art methods across the board.

Transformer · 機器翻譯 ·

2019 年 10 月 17 日

[付費5元查看完整內容]Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

專知會員服務

專知，提供專業可信的知識分發服務，讓認知協作更快更好！

《Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation》K Murray, J Kinnison, T Q. Nguyen, W Scheirer, D Chiang [University of Notre Dame] (2019)

付費5元查看完整內容