一本色道综合久久欧美日韩精品,全网最新黄色网站,YY6080新视觉伦午夜无码

Improving neural machine translation (NMT) systems with prompting has achieved significant progress in recent years. In this work, we focus on how to integrate multi-knowledge, multiple types of knowledge, into NMT models to enhance the performance with prompting. We propose a unified framework, which can integrate effectively multiple types of knowledge including sentences, terminologies/phrases and translation templates into NMT models. We utilize multiple types of knowledge as prefix-prompts of input for the encoder and decoder of NMT models to guide the translation process. The approach requires no changes to the model architecture and effectively adapts to domain-specific translation without retraining. The experiments on English-Chinese and English-German translation demonstrate that our approach significantly outperform strong baselines, achieving high translation quality and terminology match accuracy.

相關內容

Integration

關注 7

Integration：Integration, the VLSI Journal。 Explanation：集成，VLSI雜志。 Publisher：Elsevier。 SIT：

估計/估計量 · APT · 規范化的 · 無偏 · 似然 ·

2024 年 1 月 30 日

Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods

Xiliang Yang,Yifei Xiong,Zhijian He

from arxiv, 28 pages, 4 figures

Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. They are devoted to learning the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As a SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (2019) performs notably and scales to high dimensional data. However, the APT method bears the computation of an expectation of the logarithm of an intractable normalizing constant, i.e., a nested expectation. Although atomic APT was proposed to solve this by discretizing the normalizing constant, it remains challenging to analyze the convergence of learning. In this paper, we propose a nested APT method to estimate the involved nested expectation instead. This facilitates establishing the convergence analysis. Since the nested estimators for the loss function and its gradient are biased, we make use of unbiased multi-level Monte Carlo (MLMC) estimators for debiasing. To further reduce the excessive variance of the unbiased estimators, this paper also develops some truncated MLMC estimators by taking account of the trade-off between the bias and the average cost. Numerical experiments for approximating complex posteriors with multimodal in moderate dimensions are provided.

Learning · MoDELS · 語義相似度 · MNIST (數據集) · 可理解性 ·

2024 年 1 月 29 日

Autoencoder-Based Domain Learning for Semantic Communication with Conceptual Spaces

Dylan Wheeler,Balasubramaniam Natarajan

from arxiv, 6 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Communication with the goal of accurately conveying meaning, rather than accurately transmitting symbols, has become an area of growing interest. This paradigm, termed semantic communication, typically leverages modern developments in artificial intelligence and machine learning to improve the efficiency and robustness of communication systems. However, a standard model for capturing and quantifying the details of "meaning" is lacking, with many leading approaches to semantic communication adopting a black-box framework with little understanding of what exactly the model is learning. One solution is to utilize the conceptual spaces framework, which models meaning explicitly in a geometric manner. Though prior work studying semantic communication with conceptual spaces has shown promising results, these previous attempts involve hand-crafting a conceptual space model, severely limiting the scalability and practicality of the approach. In this work, we develop a framework for learning a domain of a conceptual space model using only the raw data with high-level property labels. In experiments using the MNIST and CelebA datasets, we show that the domains learned using the framework maintain semantic similarity relations and possess interpretable dimensions.

Performer · 語言模型化 · 知識 (knowledge) · 大語言模型 · MoDELS ·

2024 年 1 月 29 日

Knowledge-Aware Code Generation with Large Language Models

Tao Huang,Zhihong Sun,Zhi Jin,Ge Li,Chen Lyu

from arxiv, 12 pages, 7 figures

Large Language Models (LLMs) perform well on basic programming problems. However, they encounter challenges when dealing with complex tasks involving the use of diverse algorithmic and data structure skills, particularly programming competition-level problems. Notably, ChatGPT exhibits proficient performance on problems it has encountered during its pre-training phase, but this performance deteriorates when faced with novel problems. Consequently, enhancing the ability of LLMs to address unfamiliar problems has emerged as a pivotal research focus. The problem-solving process of LLMs mirrors human programmers' approach to a certain extent. When confronted with new programming tasks, human programmers engage in task planning and code writing with the previously acquired knowledge about algorithms and data structures. Despite having learned such knowledge, LLMs struggle to effectively apply it when faced with specific new problems. To address this issue, we constructed a novel dataset, CodeF, which contains a portion of programming problems that ChatGPT has not previously encountered. Furthermore, we developed a Knowledge Library tailored for Python programming contest problems and introduced the concept of Knowledge-Aware Code Generation (KareCoder). KareCoder bolsters the models' understanding and problem-solving capabilities by integrating prompt and knowledge from the library into the LLMs' code generation reasoning process, especially on Pass@1 metrics. Upon testing on the CodeF and APPS datasets, KareCoder demonstrated outstanding performance in handling novel problems previously unencountered by LLMs. In contrast with the code directly generated by ChatGPT, KareCoder achieved a relative improvement of 23.3% on the Pass@1 metric on the CodeF post2021-9 dataset. Additionally, it performs well compared to other methods when dealing with problems that LLMs have previously encountered.

state-of-the-art · Analysis · Automator · 可辨認的 · 可約的 ·

2024 年 1 月 26 日

Accelerating Patch Validation for Program Repair with Interception-Based Execution Scheduling

Yuan-An Xiao,Chenyang Yang,Bo Wang,Yingfei Xiong

Long patch validation time is a limiting factor for automated program repair (APR). Though the duality between patch validation and mutation testing is recognized, so far there exists no study of systematically adapting mutation testing techniques to general-purpose patch validation. To address this gap, we investigate existing mutation testing techniques and identify five classes of acceleration techniques that are suitable for general-purpose patch validation. Among them, mutant schemata and mutant deduplication have not been adapted to general-purpose patch validation due to the arbitrary changes that third-party APR approaches may introduce. This presents two problems for adaption: 1) the difficulty of implementing the static equivalence analysis required by the state-of-the-art mutant deduplication approach; 2) the difficulty of capturing the changes of patches to the system state at runtime. To overcome these problems, we propose two novel approaches: 1) execution scheduling, which detects the equivalence between patches online, avoiding the static equivalence analysis and its imprecision; 2) interception-based instrumentation, which intercepts the changes of patches to the system state, avoiding a full interpreter and its overhead. Based on the contributions above, we implement ExpressAPR, a general-purpose patch validator for Java that integrates all recognized classes of techniques suitable for patch validation. Our large-scale evaluation with four APR approaches shows that ExpressAPR accelerates patch validation by 137.1x over plainvalidation or 8.8x over the state-of-the-art approach, making patch validation no longer the time bottleneck of APR. Patch validation time for a single bug can be reduced to within a few minutes on mainstream CPUs.

MoDELS · Performer · CASE · 塑造 · Learning ·

2024 年 1 月 25 日

The Case for Co-Designing Model Architectures with Hardware

Quentin Anthony,Jacob Hatef,Deepak Narayanan,Stella Biderman,Stas Bekman,Junqi Yin,Aamir Shafi,Hari Subramoni,Dhabaleswar Panda

While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL model to be more amenable to the target hardware can significantly improve the runtime performance of DL training and inference. In this paper, we provide a set of guidelines for users to maximize the runtime performance of their transformer models. These guidelines have been created by carefully considering the impact of various model hyperparameters controlling model shape on the efficiency of the underlying computation kernels executed on the GPU. We find the throughput of models with efficient model shapes is up to 39\% higher while preserving accuracy compared to models with a similar number of parameters but with unoptimized shapes.

Med-PaLM 2 · Performer · 語言模型化 · MoDELS · 自動問答 ·

2023 年 5 月 16 日

Towards Expert-Level Medical Question Answering with Large Language Models

Karan Singhal,Tao Tu,Juraj Gottweis,Rory Sayres,Ellery Wulczyn,Le Hou,Kevin Clark,Stephen Pfohl,Heather Cole-Lewis,Darlene Neal,Mike Schaekermann,Amy Wang,Mohamed Amin,Sami Lachgar,Philip Mansfield,Sushant Prakash,Bradley Green,Ewa Dominowska,Blaise Aguera y Arcas,Nenad Tomasev,Yun Liu,Renee Wong,Christopher Semturs,S. Sara Mahdavi,Joelle Barral,Dale Webster,Greg S. Corrado,Yossi Matias,Shekoofeh Azizi,Alan Karthikesalingam,Vivek Natarajan

Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach. Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets. We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.

INFORMS · 圖 · 結構化學習 · Extensibility · 學成 ·

2021 年 12 月 16 日

Graph Structure Learning with Variational Information Bottleneck

Qingyun Sun,Jianxin Li,Hao Peng,Jia Wu,Xingcheng Fu,Cheng Ji,Philip S. Yu

from arxiv, Accepted by AAAI 2022, Preprint version with Appendix

Graph Neural Networks (GNNs) have shown promising results on a broad spectrum of applications. Most empirical studies of GNNs directly take the observed graph as input, assuming the observed structure perfectly depicts the accurate and complete relations between nodes. However, graphs in the real world are inevitably noisy or incomplete, which could even exacerbate the quality of graph representations. In this work, we propose a novel Variational Information Bottleneck guided Graph Structure Learning framework, namely VIB-GSL, in the perspective of information theory. VIB-GSL advances the Information Bottleneck (IB) principle for graph structure learning, providing a more elegant and universal framework for mining underlying task-relevant relations. VIB-GSL learns an informative and compressive graph structure to distill the actionable information for specific downstream tasks. VIB-GSL deduces a variational approximation for irregular graph data to form a tractable IB objective function, which facilitates training stability. Extensive experimental results demonstrate that the superior effectiveness and robustness of VIB-GSL.

INFORMS · 可辨認的 · Networking · Neural Networks · 黑盒 ·

2021 年 10 月 4 日

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

Yang Zhang,Ashkan Khakzar,Yawei Li,Azade Farshad,Seong Tae Kim,Nassir Navab

from arxiv, Accepted in NeurIPS 2021 (Neural Information Processing Systems)

One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network's prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features' information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.

圖 · 鏈路預測 · 正交 · 知識圖譜 · Better ·

2020 年 4 月 15 日

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Yun Tang,Jing Huang,Guangtao Wang,Xiaodong He,Bowen Zhou

from arxiv, Accepted by ACL 2020

Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE. However, N-1, 1-N and N-N predictions still remain challenging. In this work, we propose a novel translational distance-based approach for knowledge graph link prediction. The proposed method includes two-folds, first we extend the RotatE from 2D complex domain to high dimension space with orthogonal transforms to model relations for better modeling capacity. Second, the graph context is explicitly modeled via two directed context representations. These context representations are used as part of the distance scoring function to measure the plausibility of the triples during training and inference. The proposed approach effectively improves prediction accuracy on the difficult N-1, 1-N and N-N cases for knowledge graph link prediction task. The experimental results show that it achieves better performance on two benchmark data sets compared to the baseline RotatE, especially on data set (FB15k-237) with many high in-degree connection nodes.

entity · 圖 · 知識圖譜 · Extensibility · MoDELS ·

2018 年 11 月 12 日

Explainable Reasoning over Knowledge Graphs for Recommendation

Xiang Wang,Dingxian Wang,Canran Xu,Xiangnan He,Yixin Cao,Tat-Seng Chua

from arxiv, 8 pages, 5 figures, AAAI-2019

Incorporating knowledge graph into recommender systems has attracted increasing attention in recent years. By exploring the interlinks within a knowledge graph, the connectivity between users and items can be discovered as paths, which provide rich and complementary information to user-item interactions. Such connectivity not only reveals the semantics of entities and relations, but also helps to comprehend a user's interest. However, existing efforts have not fully explored this connectivity to infer user preferences, especially in terms of modeling the sequential dependencies within and holistic semantics of a path. In this paper, we contribute a new model named Knowledge-aware Path Recurrent Network (KPRN) to exploit knowledge graph for recommendation. KPRN can generate path representations by composing the semantics of both entities and relations. By leveraging the sequential dependencies within a path, we allow effective reasoning on paths to infer the underlying rationale of a user-item interaction. Furthermore, we design a new weighted pooling operation to discriminate the strengths of different paths in connecting a user with an item, endowing our model with a certain level of explainability. We conduct extensive experiments on two datasets about movie and music, demonstrating significant improvements over state-of-the-art solutions Collaborative Knowledge Base Embedding and Neural Factorization Machine.