国产乱伦对白刺激视频,欧美亚洲一区电影,天天爽夜夜爽人人爽一区二

Peter West,Ronan Le Bras,Taylor Sorensen,Bill Yuchen Lin,Liwei Jiang,Ximing Lu,Khyathi Chandu,Jack Hessel,Ashutosh Baheti,Chandra Bhagavatula,Yejin Choi

We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models. Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks; compared to general task models like Flan-T5, it explicitly centers knowledge, enabling superior performance for commonsense reasoning. NovaCOMET leverages the knowledge of opaque proprietary models to create an open knowledge pipeline. First, knowledge is symbolically distilled into NovATOMIC, a publicly-released discrete knowledge graph which can be audited, critiqued, and filtered. Next, we train NovaCOMET on NovATOMIC by fine-tuning an open-source pretrained model. NovaCOMET uses an open-format training objective, replacing the fixed relation sets of past knowledge models, enabling arbitrary structures within the data to serve as inputs or outputs. The resulting generation model, optionally augmented with human annotation, matches or exceeds comparable open task models like Flan-T5 on a range of commonsense generation tasks. NovaCOMET serves as a counterexample to the contemporary focus on instruction tuning only, demonstrating a distinct advantage to explicitly modeling commonsense knowledge as well.

相關內容

知識 (knowledge)

關注 12

通過學習、實踐或探索所獲得的認識、判斷或技能。

MoDELS · Perplexity · HTTPS · 語言模型化 · 相關系數 ·

2024 年 1 月 31 日

APPLS: Evaluating Evaluation Metrics for Plain Language Summarization

Yue Guo,Tal August,Gondy Leroy,Trevor Cohen,Lucy Lu Wang

While there has been significant development of models for Plain Language Summarization (PLS), evaluation remains a challenge. PLS lacks a dedicated assessment metric, and the suitability of text generation evaluation metrics is unclear due to the unique transformations involved (e.g., adding background explanations, removing specialized terminology). To address these concerns, our study presents a granular meta-evaluation testbed, APPLS, designed to evaluate metrics for PLS. We define a set of perturbations along four criteria inspired by previous work that a PLS metric should capture: informativeness, simplification, coherence, and faithfulness. An analysis of metrics using our testbed reveals that current metrics fail to capture simplification consistently. In response, we introduce POMME, a new metric designed to assess text simplification in PLS; the metric is calculated as the normalized perplexity difference between an in-domain and out-of-domain language model. We demonstrate POMME's correlation with fine-grained variations in simplification and validate its sensitivity across 4 text simplification datasets. This work contributes the first meta-evaluation testbed for PLS and a comprehensive evaluation of existing metrics. The APPLS testbed and POMME is available at //github.com/LinguisticAnomalies/APPLS.

可辨認的 · 分類數據 · state-of-the-art · 離散化 · TOOLS ·

2024 年 1 月 30 日

Decapodes: A Diagrammatic Tool for Representing, Composing, and Computing Spatialized Partial Differential Equations

Luke Morris,Andrew Baas,Jesus Arias,Maia Gatlin,Evan Patterson,James P. Fairbanks

We present Decapodes, a diagrammatic tool for representing, composing, and solving partial differential equations. Decapodes provides an intuitive diagrammatic representation of the relationships between variables in a system of equations, a method for composing systems of partial differential equations using an operad of wiring diagrams, and an algorithm for deriving solvers using hypergraphs and string diagrams. The string diagrams are in turn compiled into executable programs using the techniques of categorical data migration, graph traversal, and the discrete exterior calculus. The generated solvers produce numerical solutions consistent with state-of-the-art open source tools as demonstrated by benchmark comparisons with SU2. These numerical experiments demonstrate the feasibility of this approach to multiphysics simulation and identify areas requiring further development.

代碼 · Learning · 深度強化學習 · MoDELS · 強化學習 ·

2024 年 1 月 30 日

IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion

Bolun Li,Zhihong Sun,Tao Huang,Hongyu Zhang,Yao Wan,Ge Li,Zhi Jin,Chen Lyu

from arxiv, Accepted for the 32nd ACM Symposium on the Foundations of Software Engineering (FSE 2024)

Code completion aims to enhance programming productivity by predicting potential code based on the current programming context. Recently, pretrained language models (LMs) have become prominent in this field. Various approaches have been proposed to fine-tune LMs using supervised fine-tuning (SFT) techniques for code completion. However, the inherent exposure bias of these models can cause errors to accumulate early in the sequence completion, leading to even more errors in subsequent completions. To address this problem, deep reinforcement learning (DRL) is an alternative technique for fine-tuning LMs for code completion, which can improve the generalization capabilities and overall performance. Nevertheless, integrating DRL-based strategies into code completion faces two major challenges: 1) The dynamic nature of the code context requires the completion model to quickly adapt to changes, which poses difficulties for conventional DRL strategies that focus on delayed rewarding of the final code state. 2) It is difficult to evaluate the correctness of partial code, thus the reward redistribution-based strategies cannot be adapted to code completion. To tackle these challenges, we propose IRCoCo, a code completion-specific DRL-based fine-tuning framework. This framework is designed to provide immediate rewards as feedback for detecting dynamic context changes arising from continuous edits during code completion. With the aid of immediate feedback, the fine-tuned LM can gain a more precise understanding of the current context, thereby enabling effective adjustment of the LM and optimizing code completion in a more refined manner. Experimental results demonstrate that fine-tuning pretrained LMs with IRCoCo leads to significant improvements in the code completion task, outperforming both SFT-based and other DRL-based baselines.

Continuity · Learning · MoDELS · 知識 (knowledge) · PTM ·

2024 年 1 月 29 日

Continual Learning with Pre-Trained Models: A Survey

Da-Wei Zhou,Hai-Long Sun,Jingyi Ning,Han-Jia Ye,De-Chuan Zhan

from arxiv, Code is available at: //github.com/sun-hailong/LAMDA-PILOT

Nowadays, real-world applications often face streaming data, which requires the learning system to absorb new knowledge as data evolves. Continual Learning (CL) aims to achieve this goal and meanwhile overcome the catastrophic forgetting of former knowledge when learning new ones. Typical CL methods build the model from scratch to grow with incoming data. However, the advent of the pre-trained model (PTM) era has sparked immense research interest, particularly in leveraging PTMs' robust representational capabilities. This paper presents a comprehensive survey of the latest advancements in PTM-based CL. We categorize existing methodologies into three distinct groups, providing a comparative analysis of their similarities, differences, and respective advantages and disadvantages. Additionally, we offer an empirical study contrasting various state-of-the-art methods to highlight concerns regarding fairness in comparisons. The source code to reproduce these evaluations is available at: //github.com/sun-hailong/LAMDA-PILOT

Agent · WEB · 多峰值 · 端到端 · MoDELS ·

2024 年 1 月 28 日

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Hongliang He,Wenlin Yao,Kaixin Ma,Wenhao Yu,Yong Dai,Hongming Zhang,Zhenzhong Lan,Dong Yu

The advancement of large language models (LLMs) leads to a new era marked by the development of autonomous applications in the real world, which drives innovation in the creation of advanced web-based agents. Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots, greatly limiting their applicability in real-world scenarios. To bridge this gap, we introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites. Moreover, we propose a new evaluation protocol for web agents to address the challenges of automatic evaluation of open-ended web agent tasks, leveraging the robust multimodal comprehension capabilities of GPT-4V. We create a new benchmark by gathering real-world tasks from 15 widely used websites to evaluate our agents. We show that WebVoyager achieves a 55.7% task success rate, significantly surpassing the performance of both GPT-4 (All Tools) and the WebVoyager (text-only) setups, underscoring the exceptional capability of WebVoyager in practical applications. We found that our proposed automatic evaluation achieves 85.3% agreement with human judgment, paving the way for further development of web agents in a real-world setting.

MoDELS · 相關系數 · 得分 · API · 規范化的 ·

2024 年 1 月 26 日

F-Eval: Asssessing Fundamental Abilities with Refined Evaluation Methods

Yu Sun,Keyu Chen,Shujie Wang,Qipeng Guo,Hang Yan,Xipeng Qiu,Xuanjing Huang,Dahua Lin

from arxiv, 21 pages, 7 figures

Large language models (LLMs) garner significant attention for their unprecedented performance, leading to an increasing number of researches evaluating LLMs. However, these evaluation benchmarks are limited to assessing the instruction-following capabilities, overlooking the fundamental abilities that emerge during the pre-training stage. Previous subjective evaluation methods mainly reply on scoring by API models. However, in the absence of references, large models have shown limited ability to discern subtle differences. To bridge the gap, we propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic. The tasks in F-Eval include multi-choice objective tasks, open-ended objective tasks, reference-based subjective tasks and reference-free subjective tasks. For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models. We conduct evaluations on 13 advanced LLMs. Results show that our evaluation methods show higher correlation coefficients and larger distinction than other evaluators. Additionally, we discuss the influence of different model sizes, dimensions, and normalization methods. We anticipate that F-Eval will facilitate the study of LLMs' fundamental abilities.

3D · 控制器 · 多樣性 · 目標檢測 · Nuance ·

2024 年 1 月 26 日

MagicDrive: Street View Generation with Diverse 3D Geometry Control

Ruiyuan Gao,Kai Chen,Enze Xie,Lanqing Hong,Zhenguo Li,Dit-Yan Yeung,Qiang Xu

from arxiv, Project Page: //flymin.github.io/magicdrive

Recent advancements in diffusion models have significantly enhanced the data synthesis with 2D control. Yet, precise 3D control in street view generation, crucial for 3D perception tasks, remains elusive. Specifically, utilizing Bird's-Eye View (BEV) as the primary condition often leads to challenges in geometry control (e.g., height), affecting the representation of object shapes, occlusion patterns, and road surface elevations, all of which are essential to perception data synthesis, especially for 3D object detection tasks. In this paper, we introduce MagicDrive, a novel street view generation framework offering diverse 3D geometry controls, including camera poses, road maps, and 3D bounding boxes, together with textual descriptions, achieved through tailored encoding strategies. Besides, our design incorporates a cross-view attention module, ensuring consistency across multiple camera views. With MagicDrive, we achieve high-fidelity street-view synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.

估計/估計量 · Learning · SimPLe · 生成模型 · MoDELS ·

2024 年 1 月 26 日

Generative Modeling with Flow-Guided Density Ratio Learning

Alvin Heng,Abdul Fatir Ansari,Harold Soh

We present Flow-Guided Density Ratio Learning (FDRL), a simple and scalable approach to generative modeling which builds on the stale (time-independent) approximation of the gradient flow of entropy-regularized f-divergences introduced in DGflow. In DGflow, the intractable time-dependent density ratio is approximated by a stale estimator given by a GAN discriminator. This is sufficient in the case of sample refinement, where the source and target distributions of the flow are close to each other. However, this assumption is invalid for generation and a naive application of the stale estimator fails due to the large chasm between the two distributions. FDRL proposes to train a density ratio estimator such that it learns from progressively improving samples during the training process. We show that this simple method alleviates the density chasm problem, allowing FDRL to generate images of dimensions as high as $128\times128$, as well as outperform existing gradient flow baselines on quantitative benchmarks. We also show the flexibility of FDRL with two use cases. First, unconditional FDRL can be easily composed with external classifiers to perform class-conditional generation. Second, FDRL can be directly applied to unpaired image-to-image translation with no modifications needed to the framework. Code is publicly available at //github.com/ajrheng/FDRL.

entity · Performer · 圖 · 知識圖譜 · 自動問答 ·

2018 年 1 月 16 日

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs

Mohnish Dubey,Debayan Banerjee,Debanjan Chaudhuri,Jens Lehmann

In order to answer natural language questions over knowledge graphs, most processing pipelines involve entity and relation linking. Traditionally, entity linking and relation linking has been performed either as dependent sequential tasks or independent parallel tasks. In this paper, we propose a framework called "EARL", which performs entity linking and relation linking as a joint single task. EARL uses a graph connection based solution to the problem. We model the linking task as an instance of the Generalised Travelling Salesman Problem (GTSP) and use GTSP approximate algorithm solutions. We later develop EARL which uses a pair-wise graph-distance based solution to the problem.The system determines the best semantic connection between all keywords of the question by referring to a knowledge graph. This is achieved by exploiting the "connection density" between entity candidates and relation candidates. The "connection density" based solution performs at par with the approximate GTSP solution.We have empirically evaluated the framework on a dataset with 5000 questions. Our system surpasses state-of-the-art scores for entity linking task by reporting an accuracy of 0.65 to 0.40 from the next best entity linker.

圖 · 學成 · 知識圖譜 · FreeBASIC · 強化學習 ·

2018 年 1 月 8 日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Wenhan Xiong,Thien Hoang,William Yang Wang

We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.