干逼视频无码免费网站,国产一国产一级毛片A久久久,国产成人午夜精品一区二区三区

Despite the strong performance of large language models (LLMs) in tasks like mathematical reasoning, their practical use is limited by high computational demands and proprietary restrictions. Chain-of-thought (CoT) and program-of-thought (PoT) fine-tuning are common methods to transfer LLM knowledge to small language models (SLMs). However, CoT often leads to calculation errors in SLMs, while PoT has shown more promise. While most PoT-based approaches focus on direct problem-to-code conversion or extracting only the key information from questions and then providing code solution for it, this work emphasizes filling the gaps in the question to clearly illustrate the solution path, which can be challenging for an SLM to understand when such information is not explicitly provided. Therefore, this paper introduces Gap-Filling Prompting (GFP), a novel two-step prompting strategy designed to enhance the problem-solving process for SLMs. The first step identifies these gaps and provides hints for filling them, while the second step adds the hints to the question to generate a final code solution. Experimental results on two benchmark datasets demonstrate that GFP significantly improves the mathematical reasoning abilities of SLMs.

相關內容

Prompt

關注 10

Learning · 優化器 · 價值函數 · 泛函 · MoDELS ·

2024 年 12 月 20 日

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Huaijie Wang,Shibo Hao,Hanze Dong,Shenao Zhang,Yilin Bao,Ziran Yang,Yi Wu

Improving the multi-step reasoning ability of large language models (LLMs) with offline reinforcement learning (RL) is essential for quickly adapting them to complex tasks. While Direct Preference Optimization (DPO) has shown promise in aligning LLMs with human preferences, it is less suitable for multi-step reasoning tasks because (1) DPO relies on paired preference data, which is not readily available for multi-step reasoning tasks, and (2) it treats all tokens uniformly, making it ineffective for credit assignment in multi-step reasoning tasks, which often come with sparse reward. In this work, we propose OREO (Offline Reasoning Optimization), an offline RL method for enhancing LLM multi-step reasoning. Building on insights from previous works of maximum entropy reinforcement learning, it jointly learns a policy model and value function by optimizing the soft Bellman Equation. We show in principle that it reduces the need to collect pairwise data and enables better credit assignment. Empirically, OREO surpasses existing offline learning methods on multi-step reasoning benchmarks, including mathematical reasoning tasks (GSM8K, MATH) and embodied agent control (ALFWorld). The approach can be extended to a multi-iteration framework when additional resources are available. Furthermore, the learned value function can be leveraged to guide the tree search for free, which can further boost performance during test time.

樣本 · 多樣性 · 數據集 · 論文 · 大語言模型 ·

2024 年 12 月 20 日

Template-Driven LLM-Paraphrased Framework for Tabular Math Word Problem Generation

Xiaoqiang Kang,Zimu Wang,Xiaobo Jin,Wei Wang,Kaizhu Huang,Qiufeng Wang

from arxiv, Accepted at AAAI 2025, extended version with appendix

Solving tabular math word problems (TMWPs) has become a critical role in evaluating the mathematical reasoning ability of large language models (LLMs), where large-scale TMWP samples are commonly required for LLM fine-tuning. Since the collection of high-quality TMWP datasets is costly and time-consuming, recent research has concentrated on automatic TMWP generation. However, current generated samples usually suffer from issues of either correctness or diversity. In this paper, we propose a Template-driven LLM-paraphrased (TeLL) framework for generating high-quality TMWP samples with diverse backgrounds and accurate tables, questions, answers, and solutions. To this end, we first extract templates from existing real samples to generate initial problems, ensuring correctness. Then, we adopt an LLM to extend templates and paraphrase problems, obtaining diverse TMWP samples. Furthermore, we find the reasoning annotation is important for solving TMWPs. Therefore, we propose to enrich each solution with illustrative reasoning steps. Through the proposed framework, we construct a high-quality dataset TabMWP-TeLL by adhering to the question types in the TabMWP dataset, and we conduct extensive experiments on a variety of LLMs to demonstrate the effectiveness of TabMWP-TeLL in improving TMWP solving performance. The code and data of this paper are available at: //github.com/Jason8Kang/TELL.

核化 · IB · 各向同性 · 泛函 · 模型評估 ·

2024 年 12 月 19 日

Local Divergence-Free Immersed Finite Element-Difference Method Using Composite B-Splines

Lianxia Li,Cole Gruninger,Jae H. Lee,Boyce E. Griffith

In the class of immersed boundary (IB) methods, the choice of the delta function plays a crucial role in transferring information between fluid and solid domains. Most prior work has used isotropic kernels that do not preserve the divergence-free condition of the velocity field, leading to loss of incompressibility of the solid when interpolating velocity to Lagrangian markers. To address this issue, in simulations involving large deformations of incompressible hyperelastic structures immersed in fluid, researchers often use stabilization approaches such as adding a volumetric energy term. Composite B-spline (CBS) kernels offer an alternative by maintaining the discrete divergence-free property. This work evaluates CBS kernels in terms of volume conservation and accuracy, comparing them with isotropic kernel functions using a construction introduced by Peskin (IB kernels) and B-spline (BS) kernels. Benchmark tests include pressure-loaded and shear-dominated flows, such as an elastic band under pressure loads, a pressurized membrane, a compressed block, Cook's membrane, and a slanted channel flow. Additionally, we validate our methodology using a complex fluid-structure interaction model of bioprosthetic heart valve dynamics. Results demonstrate that CBS kernels achieve superior volume conservation compared to isotropic kernels, eliminating the need for stabilization techniques. Further, CBS kernels converge on coarser fluid grids, while IB and BS kernels need finer grids for comparable accuracy. Unlike IB and BS kernels, which perform better with larger mesh ratios, CBS kernels improve with smaller mesh ratios. Wider kernels provide more accurate results across all methods, but CBS kernels are less sensitive to grid spacing variations than isotropic kernels.

約束 · 圖 · Performer · TOOLS · INTERACT ·

2024 年 12 月 19 日

Logic Induced High-Order Reasoning Network for Event-Event Relation Extraction

Peixin Huang,Xiang Zhao,Minghao Hu,Zhen Tan,Weidong Xiao

To understand a document with multiple events, event-event relation extraction (ERE) emerges as a crucial task, aiming to discern how natural events temporally or structurally associate with each other. To achieve this goal, our work addresses the problems of temporal event relation extraction (TRE) and subevent relation extraction (SRE). The latest methods for such problems have commonly built document-level event graphs for global reasoning across sentences. However, the edges between events are usually derived from external tools heuristically, which are not always reliable and may introduce noise. Moreover, they are not capable of preserving logical constraints among event relations, e.g., coreference constraint, symmetry constraint and conjunction constraint. These constraints guarantee coherence between different relation types,enabling the generation of a uniffed event evolution graph. In this work, we propose a novel method named LogicERE, which performs high-order event relation reasoning through modeling logic constraints. Speciffcally, different from conventional event graphs, we design a logic constraint induced graph (LCG) without any external tools. LCG involves event nodes where the interactions among them can model the coreference constraint, and event pairs nodes where the interactions among them can retain the symmetry constraint and conjunction constraint. Then we perform high-order reasoning on LCG with relational graph transformer to obtain enhanced event and event pair embeddings. Finally, we further incorporate logic constraint information via a joint logic learning module. Extensive experiments demonstrate the effectiveness of the proposed method with state-of-the-art performance on benchmark datasets.

MoDELS · 語言模型化 · 標量 · Boosting（一種模型訓練加速方式） · 模型評估 ·

2024 年 12 月 19 日

Self-Generated Critiques Boost Reward Modeling for Language Models

Yue Yu,Zhengxing Chen,Aston Zhang,Liang Tan,Chenguang Zhu,Richard Yuanzhe Pang,Yundi Qian,Xuewei Wang,Suchin Gururangan,Chao Zhang,Melanie Kambadur,Dhruv Mahajan,Rui Hou

from arxiv, 20 pages

Reward modeling is crucial for aligning large language models (LLMs) with human preferences, especially in reinforcement learning from human feedback (RLHF). However, current reward models mainly produce scalar scores and struggle to incorporate critiques in a natural language format. We hypothesize that predicting both critiques and the scalar reward would improve reward modeling ability. Motivated by this, we propose Critic-RM, a framework that improves reward models using self-generated critiques without extra supervision. Critic-RM employs a two-stage process: generating and filtering high-quality critiques, followed by joint fine-tuning on reward prediction and critique generation. Experiments across benchmarks show that Critic-RM improves reward modeling accuracy by 3.7%-7.3% compared to standard reward models and LLM judges, demonstrating strong performance and data efficiency. Additional studies further validate the effectiveness of generated critiques in rectifying flawed reasoning steps with 2.5%-3.2% gains in improving reasoning accuracy.

Next · 變換 · 詞元分析器 · 長期規劃 · Transformer ·

2024 年 12 月 18 日

Transformers Can Navigate Mazes With Multi-Step Prediction

Niklas Nolte,Ouail Kitouni,Adina Williams,Mike Rabbat,Mark Ibrahim

from arxiv, 20 pages, 15 figures

Despite their remarkable success in language modeling, transformers trained to predict the next token in a sequence struggle with long-term planning. This limitation is particularly evident in tasks requiring foresight to plan multiple steps ahead such as maze navigation. The standard next single token prediction objective, however, offers no explicit mechanism to predict multiple steps ahead - or revisit the path taken so far. Consequently, in this work we study whether explicitly predicting multiple steps ahead (and backwards) can improve transformers' maze navigation. We train parameter-matched transformers from scratch, under identical settings, to navigate mazes of varying types and sizes with standard next token prediction and MLM-U, an objective explicitly predicting multiple steps ahead and backwards. We find that MLM-U considerably improves transformers' ability to navigate mazes compared to standard next token prediction across maze types and complexities. We also find MLM-U training is 4x more sample efficient and converges 2x faster in terms of GPU training hours relative to next token training. Finally, for more complex mazes we find MLM-U benefits from scaling to larger transformers. Remarkably, we find transformers trained with MLM-U outperform larger transformers trained with next token prediction using additional supervision from A* search traces. We hope these findings underscore the promise of learning objectives to advance transformers' capacity for long-term planning. The code can be found at //github.com/facebookresearch/maze_navigation_MLMU

設計 · Performance · 可約的 · 編譯器 · Processing（編程語言） ·

2024 年 12 月 18 日

AI-Powered Algorithm-Centric Quantum Processor Topology Design

Tian Li,Xiao-Yue Xu,Chen Ding,Tian-Ci Tian,Wei-You Liao,Shuo Zhang,He-Liang Huang

from arxiv, Accepted by AAAI 2025

Quantum computing promises to revolutionize various fields, yet the execution of quantum programs necessitates an effective compilation process. This involves strategically mapping quantum circuits onto the physical qubits of a quantum processor. The qubits' arrangement, or topology, is pivotal to the circuit's performance, a factor that often defies traditional heuristic or manual optimization methods due to its complexity. In this study, we introduce a novel approach leveraging reinforcement learning to dynamically tailor qubit topologies to the unique specifications of individual quantum circuits, guiding algorithm-driven quantum processor topology design for reducing the depth of mapped circuit, which is particularly critical for the output accuracy on noisy quantum processors. Our method marks a significant departure from previous methods that have been constrained to mapping circuits onto a fixed processor topology. Experiments demonstrate that we have achieved notable enhancements in circuit performance, with a minimum of 20\% reduction in circuit depth in 60\% of the cases examined, and a maximum enhancement of up to 46\%. Furthermore, the pronounced benefits of our approach in reducing circuit depth become increasingly evident as the scale of the quantum circuits increases, exhibiting the scalability of our method in terms of problem size. This work advances the co-design of quantum processor architecture and algorithm mapping, offering a promising avenue for future research and development in the field.

表示 · 類別 · Learning · 表示學習 · 評論員 ·

2024 年 12 月 18 日

Multi-Granularity Open Intent Classification via Adaptive Granular-Ball Decision Boundary

Yanhua Li,Xiaocao Ouyang,Chaofan Pan,Jie Zhang,Sen Zhao,Shuyin Xia,Xin Yang,Guoyin Wang,Tianrui Li

from arxiv, This paper has been Accepted on AAAI2025

Open intent classification is critical for the development of dialogue systems, aiming to accurately classify known intents into their corresponding classes while identifying unknown intents. Prior boundary-based methods assumed known intents fit within compact spherical regions, focusing on coarse-grained representation and precise spherical decision boundaries. However, these assumptions are often violated in practical scenarios, making it difficult to distinguish known intent classes from unknowns using a single spherical boundary. To tackle these issues, we propose a Multi-granularity Open intent classification method via adaptive Granular-Ball decision boundary (MOGB). Our MOGB method consists of two modules: representation learning and decision boundary acquiring. To effectively represent the intent distribution, we design a hierarchical representation learning method. This involves iteratively alternating between adaptive granular-ball clustering and nearest sub-centroid classification to capture fine-grained semantic structures within known intent classes. Furthermore, multi-granularity decision boundaries are constructed for open intent classification by employing granular-balls with varying centroids and radii. Extensive experiments conducted on three public datasets demonstrate the effectiveness of our proposed method.

entity · 鏈路預測 · 圖 · 知識圖譜 · MoDELS ·

2019 年 12 月 25 日

Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

Zhanqiu Zhang,Jianyu Cai,Yongdong Zhang,Jie Wang

from arxiv, Accepted to AAAI 2020

Knowledge graph embedding, which aims to represent entities and relations as low dimensional vectors (or matrices, tensors, etc.), has been shown to be a powerful technique for predicting missing links in knowledge graphs. Existing knowledge graph embedding models mainly focus on modeling relation patterns such as symmetry/antisymmetry, inversion, and composition. However, many existing approaches fail to model semantic hierarchies, which are common in real-world applications. To address this challenge, we propose a novel knowledge graph embedding model---namely, Hierarchy-Aware Knowledge Graph Embedding (HAKE)---which maps entities into the polar coordinate system. HAKE is inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy. Specifically, the radial coordinate aims to model entities at different levels of the hierarchy, and entities with smaller radii are expected to be at higher levels; the angular coordinate aims to distinguish entities at the same level of the hierarchy, and these entities are expected to have roughly the same radii but different angles. Experiments demonstrate that HAKE can effectively model the semantic hierarchies in knowledge graphs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the link prediction task.

自動問答 · MoDELS · Networking · Processing（編程語言） · state-of-the-art ·

2018 年 6 月 1 日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Mantong Zhou,Minlie Huang,Xiaoyan Zhu

from arxiv, COLING 2018, 13pages

Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.