云南虫谷在线观看免费观看电视剧_中文字幕无码乱人伦漫画_国产免费午夜一区二区视频_91精品国产色综合久久成人_亚洲日韩精品毛片一区二区三区_国产情趣丝袜高跟AV色_激情亚洲精品狼人狠狠做久久

Through additional training, we explore embedding specialized scientific knowledge into the Llama 2 Large Language Model (LLM). Key findings reveal that effective knowledge integration requires reading texts from multiple perspectives, especially in instructional formats. We utilize text augmentation to tackle the scarcity of specialized texts, including style conversions and translations. Hyperparameter optimization proves crucial, with different size models (7b, 13b, and 70b) reasonably undergoing additional training. Validating our methods, we construct a dataset of 65,000 scientific papers. Although we have succeeded in partially embedding knowledge, the study highlights the complexities and limitations of incorporating specialized information into LLMs, suggesting areas for further improvement.

相關內容

大語言模型

關注 56

大(da)語(yu)(yu)言(yan)模(mo)型(xing)是基于(yu)海量文(wen)本(ben)數據訓練的(de)(de)(de)(de)(de)深度學習模(mo)型(xing)。它(ta)不僅能(neng)(neng)(neng)夠生(sheng)成(cheng)自(zi)然語(yu)(yu)言(yan)文(wen)本(ben)，還能(neng)(neng)(neng)夠深入(ru)理解(jie)文(wen)本(ben)含(han)義(yi)，處(chu)理各種自(zi)然語(yu)(yu)言(yan)任(ren)務(wu)，如(ru)文(wen)本(ben)摘(zhai)要、問答(da)、翻譯等(deng)。2023年，大(da)語(yu)(yu)言(yan)模(mo)型(xing)及其在人(ren)工智(zhi)能(neng)(neng)(neng)領(ling)域的(de)(de)(de)(de)(de)應用(yong)(yong)已(yi)成(cheng)為全球(qiu)科技(ji)研究(jiu)的(de)(de)(de)(de)(de)熱點，其在規模(mo)上的(de)(de)(de)(de)(de)增長尤(you)為引人(ren)注目，參(can)數量已(yi)從最初的(de)(de)(de)(de)(de)十(shi)幾億躍升到如(ru)今(jin)的(de)(de)(de)(de)(de)一萬億。參(can)數量的(de)(de)(de)(de)(de)提(ti)升使得模(mo)型(xing)能(neng)(neng)(neng)夠更加精(jing)細地(di)捕捉(zhuo)人(ren)類語(yu)(yu)言(yan)微(wei)妙之處(chu)，更加深入(ru)地(di)理解(jie)人(ren)類語(yu)(yu)言(yan)的(de)(de)(de)(de)(de)復(fu)雜(za)性(xing)。在過去的(de)(de)(de)(de)(de)一年里，大(da)語(yu)(yu)言(yan)模(mo)型(xing)在吸納新知識、分解(jie)復(fu)雜(za)任(ren)務(wu)以及圖文(wen)對齊(qi)等(deng)多方面都有(you)顯著(zhu)提(ti)升。隨(sui)著(zhu)技(ji)術(shu)的(de)(de)(de)(de)(de)不斷(duan)成(cheng)熟，它(ta)將不斷(duan)拓(tuo)展其應用(yong)(yong)范圍，為人(ren)類提(ti)供(gong)更加智(zhi)能(neng)(neng)(neng)化和個(ge)性(xing)化的(de)(de)(de)(de)(de)服務(wu)，進一步改善人(ren)們的(de)(de)(de)(de)(de)生(sheng)活和生(sheng)產方式(shi)。

Learning · 遷移學習 · MoDELS · Taxonomy · FM ·

2024 年 2 月 6 日

Grounding Foundation Models through Federated Transfer Learning: A General Framework

Yan Kang,Tao Fan,Hanlin Gu,Xiaojin Zhang,Lixin Fan,Qiang Yang

from arxiv, In progress. fixed some typos, errors, and revised the text a little bit

Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges, stemming primarily from constrained computing resources, data privacy, model heterogeneity, and model ownership. Federated Transfer Learning (FTL), the combination of federated learning and transfer learning, provides promising solutions to address these challenges. In recent years, the need for grounding FMs leveraging FTL, coined FTL-FM, has arisen strongly in both academia and industry. Motivated by the strong growth in FTL-FM research and the potential impact of FTL-FM on industrial applications, we propose an FTL-FM framework that formulates problems of grounding FMs in the federated learning setting, construct a detailed taxonomy based on the FTL-FM framework to categorize state-of-the-art FTL-FM works, and comprehensively overview FTL-FM works based on the proposed taxonomy. We also establish correspondences between FTL-FM and conventional phases of adapting FM so that FM practitioners can align their research works with FTL-FM. In addition, we overview advanced efficiency-improving and privacy-preserving techniques because efficiency and privacy are critical concerns in FTL-FM. Last, we discuss opportunities and future research directions of FTL-FM.

Integration · 代碼 · INFORMS · TOOLS · 可辨認的 ·

2024 年 2 月 6 日

Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context

Yichen Li,Yun Peng,Yintong Huo,Michael R. Lyu

Large Language Models (LLMs) have achieved remarkable success in code completion, as evidenced by their essential roles in developing code assistant services such as Copilot. Being trained on in-file contexts, current LLMs are quite effective in completing code for single source files. However, it is challenging for them to conduct repository-level code completion for large software projects that require cross-file information. Existing research on LLM-based repository-level code completion identifies and integrates cross-file contexts, but it suffers from low accuracy and limited context length of LLMs. In this paper, we argue that Integrated Development Environments (IDEs) can provide direct, accurate and real-time cross-file information for repository-level code completion. We propose IDECoder, a practical framework that leverages IDE native static contexts for cross-context construction and diagnosis results for self-refinement. IDECoder utilizes the rich cross-context information available in IDEs to enhance the capabilities of LLMs of repository-level code completion. We conducted preliminary experiments to validate the performance of IDECoder and observed that this synergy represents a promising trend for future exploration.

entity · 監督 · 實體對齊 · MoDELS · 可理解性 ·

2024 年 2 月 5 日

Understanding and Guiding Weakly Supervised Entity Alignment with Potential Isomorphism Propagation

Yuanyi Wang,Wei Tang,Haifeng Sun,Zirui Zhuang,Xiaoyuan Fu,Jingyu Wang,Qi Qi,Jianxin Liao

Weakly Supervised Entity Alignment (EA) is the task of identifying equivalent entities across diverse knowledge graphs (KGs) using only a limited number of seed alignments. Despite substantial advances in aggregation-based weakly supervised EA, the underlying mechanisms in this setting remain unexplored. In this paper, we present a propagation perspective to analyze weakly supervised EA and explain the existing aggregation-based EA models. Our theoretical analysis reveals that these models essentially seek propagation operators for pairwise entity similarities. We further prove that, despite the structural heterogeneity of different KGs, the potentially aligned entities within aggregation-based EA models have isomorphic subgraphs, which is the core premise of EA but has not been investigated. Leveraging this insight, we introduce a potential isomorphism propagation operator to enhance the propagation of neighborhood information across KGs. We develop a general EA framework, PipEA, incorporating this operator to improve the accuracy of every type of aggregation-based model without altering the learning process. Extensive experiments substantiate our theoretical findings and demonstrate PipEA's significant performance gains over state-of-the-art weakly supervised EA methods. Our work not only advances the field but also enhances our comprehension of aggregation-based weakly supervised EA.

Prompt · 優化器 · 大語言模型 · GROUP · 語言模型化 ·

2024 年 2 月 5 日

Robust Prompt Optimization for Large Language Models Against Distribution Shifts

Moxin Li,Wenjie Wang,Fuli Feng,Yixin Cao,Jizhi Zhang,Tat-Seng Chua

from arxiv, EMNLP 2023 Main

Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks. However, their effectiveness is highly dependent on the phrasing of the task prompt, leading to research on automatic prompt optimization using labeled task data. We reveal that these prompt optimization techniques are vulnerable to distribution shifts such as subpopulation shifts, which are common for LLMs in real-world scenarios such as customer reviews analysis. In this light, we propose a new problem of robust prompt optimization for LLMs against distribution shifts, which requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group. To solve this problem, we propose Generalized Prompt Optimization framework, which incorporates the unlabeled data from the target group into prompt optimization. Extensive experimental results demonstrate the effectiveness of the proposed framework with significant performance improvement on the target group and comparable performance on the source group.

估計/估計量 · Learning · 統計效率 · 統計量 · 設計 ·

2024 年 2 月 5 日

Statistically Efficient Bayesian Sequential Experiment Design via Reinforcement Learning with Cross-Entropy Estimators

Tom Blau,Iadine Chades,Amir Dezfouli,Daniel Steinberg,Edwin V. Bonilla

Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current amortised methods rely on estimators of expected information gain (EIG) that require an exponential number of samples on the magnitude of the EIG to achieve an unbiased estimation. We propose the use of an alternative estimator based on the cross-entropy of the joint model distribution and a flexible proposal distribution. This proposal distribution approximates the true posterior of the model parameters given the experimental history and the design policy. Our method overcomes the exponential-sample complexity of previous approaches and provide more accurate estimates of high EIG values. More importantly, it allows learning of superior design policies, and is compatible with continuous and discrete design spaces, non-differentiable likelihoods and even implicit probabilistic models.

Networking · Neural Networks · Learning · Performer · Processing（編程語言） ·

2024 年 2 月 1 日

Benchmarking Spiking Neural Network Learning Methods with Varying Locality

Jiaqi Lin,Sen Lu,Malyaban Bal,Abhronil Sengupta

Spiking Neural Networks (SNNs), providing more realistic neuronal dynamics, have shown to achieve performance comparable to Artificial Neural Networks (ANNs) in several machine learning tasks. Information is processed as spikes within SNNs in an event-based mechanism that significantly reduces energy consumption. However, training SNNs is challenging due to the non-differentiable nature of the spiking mechanism. Traditional approaches, such as Backpropagation Through Time (BPTT), have shown effectiveness but comes with additional computational and memory costs and are biologically implausible. In contrast, recent works propose alternative learning methods with varying degrees of locality, demonstrating success in classification tasks. In this work, we show that these methods share similarities during the training process, while they present a trade-off between biological plausibility and performance. Further, this research examines the implicitly recurrent nature of SNNs and investigates the influence of addition of explicit recurrence to SNNs. We experimentally prove that the addition of explicit recurrent weights enhances the robustness of SNNs. We also investigate the performance of local learning methods under gradient and non-gradient based adversarial attacks.

INFORMS · 圖 · 結構化學習 · Extensibility · 學成 ·

2021 年 12 月 16 日

Graph Structure Learning with Variational Information Bottleneck

Qingyun Sun,Jianxin Li,Hao Peng,Jia Wu,Xingcheng Fu,Cheng Ji,Philip S. Yu

from arxiv, Accepted by AAAI 2022, Preprint version with Appendix

Graph Neural Networks (GNNs) have shown promising results on a broad spectrum of applications. Most empirical studies of GNNs directly take the observed graph as input, assuming the observed structure perfectly depicts the accurate and complete relations between nodes. However, graphs in the real world are inevitably noisy or incomplete, which could even exacerbate the quality of graph representations. In this work, we propose a novel Variational Information Bottleneck guided Graph Structure Learning framework, namely VIB-GSL, in the perspective of information theory. VIB-GSL advances the Information Bottleneck (IB) principle for graph structure learning, providing a more elegant and universal framework for mining underlying task-relevant relations. VIB-GSL learns an informative and compressive graph structure to distill the actionable information for specific downstream tasks. VIB-GSL deduces a variational approximation for irregular graph data to form a tractable IB objective function, which facilitates training stability. Extensive experimental results demonstrate that the superior effectiveness and robustness of VIB-GSL.

結構化學習 · 圖 · 稀疏 · 圖形處理器 · Neural Networks ·

2021 年 12 月 13 日

Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Yinhua Piao,Sangseon Lee,Dohoon Lee,Sun Kim

from arxiv, Accepted by AAAI 2022

Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structures can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document.

估計/估計量 · contrastive · INFORMS · 互信息 · 表示學習 ·

2021 年 6 月 25 日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Alessandro Sordoni,Nouha Dziri,Hannes Schulz,Geoff Gordon,Phil Bachman,Remi Tachet

from arxiv, ICML 2021

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

圖 · 學成 · 知識圖譜 · FreeBASIC · 強化學習 ·

2018 年 1 月 8 日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Wenhan Xiong,Thien Hoang,William Yang Wang

We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.