2020久久精品亚洲热综合_无码精品视频一区二区免费_一区二区三区大少妇99_欧美一区二区AA视频在线观看_国产午夜不卡AV高清_羞羞网站在线观看_中文无码精品一区二区三区四季

from arxiv, Upcoming Publication, Gendered Technology in Translation and Interpreting Centering Rights in the Development of Language Technology, Routledge 2024

This chapter focuses on gender-related errors in machine translation (MT) in the context of low-resource languages. We begin by explaining what low-resource languages are, examining the inseparable social and computational factors that create such linguistic hierarchies. We demonstrate through a case study of our mother tongue Bengali, a global language spoken by almost 300 million people but still classified as low-resource, how gender is assumed and inferred in translations to and from the high(est)-resource English when no such information is provided in source texts. We discuss the postcolonial and societal impacts of such errors leading to linguistic erasure and representational harms, and conclude by discussing potential solutions towards uplifting languages by providing them more agency in MT conversations.

相關內容

Machine Translation

關注 209

機(ji)器翻譯（Machine Translation）涵蓋計(ji)算(suan)語(yu)言(yan)學和語(yu)言(yan)工程的(de)所有分支，包(bao)含(han)(han)多語(yu)言(yan)方面(mian)。特(te)色論文涵蓋理論，描述或計(ji)算(suan)方面(mian)的(de)任何下列主題:雙(shuang)語(yu)和多語(yu)語(yu)料庫的(de)編寫和使用，計(ji)算(suan)機(ji)輔(fu)助語(yu)言(yan)教(jiao)學，非(fei)羅馬(ma)字(zi)符集的(de)計(ji)算(suan)含(han)(han)義，連接(jie)主義翻譯方法，對比(bi)語(yu)言(yan)學等。官網(wang)地址：

語言模型化 · Prompt · MoDELS · CoT · 多樣性 ·

2024 年 4 月 23 日

Pattern-Aware Chain-of-Thought Prompting in Large Language Models

Yufeng Zhang,Xuepeng Wang,Lingxiang Wu,Jinqiao Wang

Chain-of-thought (CoT) prompting can guide language models to engage in complex multi-step reasoning. The quality of provided demonstrations significantly impacts the success of downstream inference tasks. While existing automated methods prioritize accuracy and semantics in these demonstrations, we show that the underlying reasoning patterns play a more crucial role in such tasks. In this paper, we propose Pattern-Aware CoT, a prompting method that considers the diversity of demonstration patterns. By incorporating patterns such as step length and reasoning process within intermediate steps, PA-CoT effectively mitigates the issue of bias induced by demonstrations and enables better generalization to diverse scenarios. We conduct experiments on nine reasoning benchmark tasks using two open-source LLMs. The results show that our method substantially enhances reasoning performance and exhibits robustness to errors. The code will be made publicly available.

詞向量表示 · Weight · MoDELS · INFORMS · 連續詞袋模型 ·

2024 年 4 月 23 日

Learning Word Embedding with Better Distance Weighting and Window Size Scheduling

Chaohao Yang

Distributed word representation (a.k.a. word embedding) is a key focus in natural language processing (NLP). As a highly successful word embedding model, Word2Vec offers an efficient method for learning distributed word representations on large datasets. However, Word2Vec lacks consideration for distances between center and context words. We propose two novel methods, Learnable Formulated Weights (LFW) and Epoch-based Dynamic Window Size (EDWS), to incorporate distance information into two variants of Word2Vec, the Continuous Bag-of-Words (CBOW) model and the Continuous Skip-gram (Skip-gram) model. For CBOW, LFW uses a formula with learnable parameters that best reflects the relationship of influence and distance between words to calculate distance-related weights for average pooling, providing insights for future NLP text modeling research. For Skip-gram, we improve its dynamic window size strategy to introduce distance information in a more balanced way. Experiments prove the effectiveness of LFW and EDWS in enhancing Word2Vec's performance, surpassing previous state-of-the-art methods.

Principle · MoDELS · 互信息 · 標注 · 任務對話系統 ·

2024 年 4 月 22 日

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

Jan-Philipp Fr?nken,Eric Zelikman,Rafael Rafailov,Kanishk Gandhi,Tobias Gerstenberg,Noah D. Goodman

When prompting a language model (LM), users frequently expect the model to adhere to a set of behavioral principles across diverse tasks, such as producing insightful content while avoiding harmful or biased language. Instilling such principles into a model can be resource-intensive and technically challenging, generally requiring human preference labels or examples. We introduce SAMI, a method for teaching a pretrained LM to follow behavioral principles that does not require any preference labels or demonstrations. SAMI is an iterative algorithm that finetunes a pretrained LM to increase the conditional mutual information between constitutions and self-generated responses given queries from a datasest. On single-turn dialogue and summarization, a SAMI-trained mistral-7b outperforms the initial pretrained model, with win rates between 66% and 77%. Strikingly, it also surpasses an instruction-finetuned baseline (mistral-7b-instruct) with win rates between 55% and 57% on single-turn dialogue. SAMI requires a "principle writer" model; to avoid dependence on stronger models, we further evaluate aligning a strong pretrained model (mixtral-8x7b) using constitutions written by a weak instruction-finetuned model (mistral-7b-instruct). The SAMI-trained mixtral-8x7b outperforms both the initial model and the instruction-finetuned model, achieving a 65% win rate on summarization. Our results indicate that a pretrained LM can learn to follow constitutions without using preference labels, demonstrations, or human oversight.

MoDELS · Continuity · 表示 · Performer · state-of-the-art ·

2024 年 4 月 22 日

Time Series Continuous Modeling for Imputation and Forecasting with Implicit Neural Representations

Etienne Le Naour,Louis Serrano,Léon Migus,Yuan Yin,Ghislain Agoua,Nicolas Baskiotis,Patrick Gallinari,Vincent Guigue

We introduce a novel modeling approach for time series imputation and forecasting, tailored to address the challenges often encountered in real-world data, such as irregular samples, missing data, or unaligned measurements from multiple sensors. Our method relies on a continuous-time-dependent model of the series' evolution dynamics. It leverages adaptations of conditional, implicit neural representations for sequential data. A modulation mechanism, driven by a meta-learning algorithm, allows adaptation to unseen samples and extrapolation beyond observed time-windows for long-term predictions. The model provides a highly flexible and unified framework for imputation and forecasting tasks across a wide range of challenging scenarios. It achieves state-of-the-art performance on classical benchmarks and outperforms alternative time-continuous models.

Bug · Automator · MoDELS · 語言模型化 · Performer ·

2024 年 4 月 20 日

A Deep Dive into Large Language Models for Automated Bug Localization and Repair

Soneya Binta Hossain,Nan Jiang,Qiang Zhou,Xiaopeng Li,Wen-Hao Chiang,Yingjun Lyu,Hoan Nguyen,Omer Tripp

Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug fixing utilizing LLMs. In contrast to many deep learning-based APR methods that assume known bug locations, rely on line-level localization tools, or address bug prediction and fixing in one step, our approach uniquely employs LLMs to predict bug location at the token level and subsequently utilizes them for bug fixing. This methodological separation of bug localization and fixing using different LLMs enables effective integration of diverse contextual information and improved incorporation of inductive biases. We introduce Toggle: Token-Granulated Bug Localization and Repair, a comprehensive program repair framework that integrates a bug localization model, an adjustment unit, and a bug-fixing model. Toggle takes a buggy function as input and generates a complete corrected function. We investigate various styles of prompting to the bug fixing model to identify the most effective prompts that better utilize the inductive bias and significantly outperform others. Toggle achieves the new state-of-the-art (SOTA) performance on the CodeXGLUE code refinement benchmark, and exhibits better and comparable performance on several other widely-used APR datasets, including Defects4J.

線性的 · binary · MoDELS · 量子計算 · 可約的 ·

2024 年 4 月 19 日

Conversion of Boolean and Integer FlatZinc Builtins to Quadratic or Linear Integer Problems

Armin Wolf

Constraint satisfaction or optimisation models -- even if they are formulated in high-level modelling languages -- need to be reduced into an equivalent format before they can be solved by the use of Quantum Computing. In this paper we show how Boolean and integer FlatZinc builtins over finite-domain integer variables can be equivalently reformulated as linear equations, linear inequalities or binary products of those variables, i.e. as finite-domain quadratic integer programs. Those quadratic integer programs can be further transformed into equivalent Quadratic Unconstrained Binary Optimisation problem models, i.e. a general format for optimisation problems to be solved on Quantum Computers especially on Quantum Annealers.

泛化理論 · 位置編碼 · 變換 · Attention · 可辨認的 ·

2024 年 4 月 18 日

Length Generalization of Causal Transformers without Position Encoding

Jie Wang,Tao Ji,Yuanbin Wu,Hang Yan,Tao Gui,Qi Zhang,Xuanjing Huang,Xiaoling Wang

Generalizing to longer sentences is important for recent Transformer-based language models. Besides algorithms manipulating explicit position features, the success of Transformers without position encodings (NoPE) provides a new way to overcome the challenge. In this paper, we study the length generalization property of NoPE. We find that although NoPE can extend to longer sequences than the commonly used explicit position encodings, it still has a limited context length. We identify a connection between the failure of NoPE's generalization and the distraction of attention distributions. We propose a parameter-efficient tuning for searching attention heads' best temperature hyper-parameters, which substantially expands NoPE's context size. Experiments on long sequence language modeling, the synthetic passkey retrieval task and real-world long context tasks show that NoPE can achieve competitive performances with state-of-the-art length generalization algorithms. The source code is publicly accessible

自動問答 · 語言模型化 · 圖 · Performer · MoDELS ·

2021 年 5 月 27 日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Michihiro Yasunaga,Hongyu Ren,Antoine Bosselut,Percy Liang,Jure Leskovec

from arxiv, NAACL 2021. Code & data available at //github.com/michiyasunaga/qagnn

The problem of answering questions using knowledge from pre-trained language models (LMs) and knowledge graphs (KGs) presents two challenges: given a QA context (question and answer choice), methods need to (i) identify relevant knowledge from large KGs, and (ii) perform joint reasoning over the QA context and KG. In this work, we propose a new model, QA-GNN, which addresses the above challenges through two key innovations: (i) relevance scoring, where we use LMs to estimate the importance of KG nodes relative to the given QA context, and (ii) joint reasoning, where we connect the QA context and KG to form a joint graph, and mutually update their representations through graph neural networks. We evaluate QA-GNN on the CommonsenseQA and OpenBookQA datasets, and show its improvement over existing LM and LM+KG models, as well as its capability to perform interpretable and structured reasoning, e.g., correctly handling negation in questions.

小樣本學習 · MoDELS · 門控 · 圖 · 模型復雜度 ·

2021 年 4 月 27 日

Relational Learning with Gated and Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion

Guanglin Niu,Yang Li,Chengguang Tang,Ruiying Geng,Jian Dai,Qiao Liu,Hao Wang,Jian Sun,Fei Huang,Luo Si

from arxiv, The full version of a paper accepted to SIGIR 2021

Aiming at expanding few-shot relations' coverage in knowledge graphs (KGs), few-shot knowledge graph completion (FKGC) has recently gained more research interests. Some existing models employ a few-shot relation's multi-hop neighbor information to enhance its semantic representation. However, noise neighbor information might be amplified when the neighborhood is excessively sparse and no neighbor is available to represent the few-shot relation. Moreover, modeling and inferring complex relations of one-to-many (1-N), many-to-one (N-1), and many-to-many (N-N) by previous knowledge graph completion approaches requires high model complexity and a large amount of training instances. Thus, inferring complex relations in the few-shot scenario is difficult for FKGC models due to limited training instances. In this paper, we propose a few-shot relational learning with global-local framework to address the above issues. At the global stage, a novel gated and attentive neighbor aggregator is built for accurately integrating the semantics of a few-shot relation's neighborhood, which helps filtering the noise neighbors even if a KG contains extremely sparse neighborhoods. For the local stage, a meta-learning based TransH (MTransH) method is designed to model complex relations and train our model in a few-shot learning fashion. Extensive experiments show that our model outperforms the state-of-the-art FKGC approaches on the frequently-used benchmark datasets NELL-One and Wiki-One. Compared with the strong baseline model MetaR, our model achieves 5-shot FKGC performance improvements of 8.0% on NELL-One and 2.8% on Wiki-One by the metric Hits@10.

損失函數（機器學習） · 學習的學習 · 學成 · entity · 泛函 ·

2019 年 9 月 9 日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Jiawei Wu,Wenhan Xiong,William Yang Wang

from arxiv, 11pages, 5 figures, accepted to EMNLP 2019

Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.