动漫AV观看网站不卡无码_欧美精品一区二区视频在线观看_亚洲欧美精品久久久_亚洲无码图片一区_精品一区二区视频免费观看_男人天堂网站在线观看_欧美激情视频综合二三区

This paper presents SibylSat, a novel SAT-based method designed to efficiently solve totally-ordered HTN problems (TOHTN). In contrast to prevailing SAT-based HTN planners that employ a breadth-first search strategy, SibylSat adopts a greedy search approach, enabling it to identify promising decompositions for expansion. The selection process is facilitated by a heuristic derived from solving a relaxed problem, which is also expressed as a SAT problem. Our experimental evaluations demonstrate that SibylSat outperforms existing SAT-based TOHTN approaches in terms of both runtime and plan quality on most of the IPC benchmarks, while also solving a larger number of problems.

相關內容

SAT

關注 0

SAT是研究者關注命題可滿足性問題的理論與應用的第一次年度會議。除了簡單命題可滿足性外，它還包括布爾優化（如MaxSAT和偽布爾（PB）約束）、量化布爾公式（QBF）、可滿足性模理論（SMT）和約束規劃（CP），用于與布爾級推理有明確聯系的問題。官網鏈接： · MoDELS · 可約的 · 語言模型化 · Vision ·

2024 年 12 月 16 日

RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models

Sangmin Woo,Jaehyuk Jang,Donguk Kim,Yubin Choi,Changick Kim

from arxiv, Project: //sangminwoo.github.io/RITUAL/

Recent advancements in Large Vision Language Models (LVLMs) have revolutionized how machines understand and generate textual responses based on visual inputs, yet they often produce "hallucinatory" outputs that misinterpret visual information, posing challenges in reliability and trustworthiness. We propose RITUAL, a simple decoding method that reduces hallucinations by leveraging randomly transformed images as complementary inputs during decoding, adjusting the output probability distribution without additional training or external models. Our key insight is that random transformations expose the model to diverse visual perspectives, enabling it to correct misinterpretations that lead to hallucinations. Specifically, when a model hallucinates based on the original image, the transformed images -- altered in aspects such as orientation, scale, or color -- provide alternative viewpoints that help recalibrate the model's predictions. By integrating the probability distributions from both the original and transformed images, RITUAL effectively reduces hallucinations. To further improve reliability and address potential instability from arbitrary transformations, we introduce RITUAL+, an extension that selects image transformations based on self-feedback from the LVLM. Instead of applying transformations randomly, RITUAL+ uses the LVLM to evaluate and choose transformations that are most beneficial for reducing hallucinations in a given context. This self-adaptive approach mitigates the potential negative impact of certain transformations on specific tasks, ensuring more consistent performance across different scenarios. Experiments demonstrate that RITUAL and RITUAL+ significantly reduce hallucinations across several object hallucination benchmarks.

詞元分析器 · 優化器 · MoDELS · 語言模型化 · Performer ·

2024 年 12 月 16 日

When Every Token Counts: Optimal Segmentation for Low-Resource Language Models

Bharath Raj S,Garvit Suri,Vikrant Dewangan,Raghav Sonavane

from arxiv, LoResLM @ COLING 2025

Traditional greedy tokenization methods have been a critical step in Natural Language Processing (NLP), influencing how text is converted into tokens and directly impacting model performance. While subword tokenizers like Byte-Pair Encoding (BPE) are widely used, questions remain about their optimality across model scales and languages. In this work, we demonstrate through extensive experiments that an optimal BPE configuration significantly reduces token count compared to greedy segmentation, yielding improvements in token-saving percentages and performance benefits, particularly for smaller models. We evaluate tokenization performance across various intrinsic and extrinsic tasks, including generation and classification. Our findings suggest that compression-optimized tokenization strategies could provide substantial advantages for multilingual and low-resource language applications, highlighting a promising direction for further research and inclusive NLP.

估計/估計量 · 近似 · 3D · Projection · 離散化 ·

2024 年 12 月 14 日

Error Estimates for Discontinuous Galerkin Approximations to the Vlasov-Unsteady Stokes System

Harsha Hutridurga,Krishan Kumar,Amiya K. Pani

In the first part of this paper, uniqueness of strong solution is established for the Vlasov-unsteady Stokes problem in 3D. The second part deals with a semi discrete scheme, which is based on the coupling of discontinuous Galerkin approximations for the Vlasov and the Stokes equations for the 2D problem. The proposed method is both mass and momentum conservative. Based on a special projection and also the Stokes projection, optimal error estimates in the case of smooth compactly supported initial data are derived. Moreover, the generalization of error estimates to 3D problem is also indicated. Finally, based on time splitting algorithm, some numerical experiments are conducted whose results confirm our theoretical findings.

有偏 · MoDELS · 穩健性 · 模型評估 · Learning ·

2024 年 12 月 13 日

Err on the Side of Texture: Texture Bias on Real Data

Blaine Hoak,Ryan Sheatsley,Patrick McDaniel

from arxiv, Accepted to IEEE Secure and Trustworthy Machine Learning (SaTML)

Bias significantly undermines both the accuracy and trustworthiness of machine learning models. To date, one of the strongest biases observed in image classification models is texture bias-where models overly rely on texture information rather than shape information. Yet, existing approaches for measuring and mitigating texture bias have not been able to capture how textures impact model robustness in real-world settings. In this work, we introduce the Texture Association Value (TAV), a novel metric that quantifies how strongly models rely on the presence of specific textures when classifying objects. Leveraging TAV, we demonstrate that model accuracy and robustness are heavily influenced by texture. Our results show that texture bias explains the existence of natural adversarial examples, where over 90% of these samples contain textures that are misaligned with the learned texture of their true label, resulting in confident mispredictions.

輸出 · 得分 · 相關系數 · 自動問答 · 數據集 ·

2024 年 12 月 12 日

LCFO: Long Context and Long Form Output Dataset and Benchmarking

Marta R. Costa-jussà,Pierre Andrews,Mariano Coria Meglioli,Joy Chen,Joe Chuang,David Dale,Christophe Ropers,Alexandre Mourachko,Eduardo Sánchez,Holger Schwenk,Tuan Tran,Arina Turkatenko,Carleigh Wood

This paper presents the Long Context and Form Output (LCFO) benchmark, a novel evaluation framework for assessing gradual summarization and summary expansion capabilities across diverse domains. LCFO consists of long input documents (5k words average length), each of which comes with three summaries of different lengths (20%, 10%, and 5% of the input text), as well as approximately 15 questions and answers (QA) related to the input content. Notably, LCFO also provides alignments between specific QA pairs and corresponding summaries in 7 domains. The primary motivation behind providing summaries of different lengths is to establish a controllable framework for generating long texts from shorter inputs, i.e. summary expansion. To establish an evaluation metric framework for summarization and summary expansion, we provide human evaluation scores for human-generated outputs, as well as results from various state-of-the-art large language models (LLMs). GPT-4o-mini achieves best human scores among automatic systems in both summarization and summary expansion tasks (~ +10% and +20%, respectively). It even surpasses human output quality in the case of short summaries (~ +7%). Overall automatic metrics achieve low correlations with human evaluation scores (~ 0.4) but moderate correlation on specific evaluation aspects such as fluency and attribution (~ 0.6). The LCFO benchmark offers a standardized platform for evaluating summarization and summary expansion performance, as well as corresponding automatic metrics, thereby providing an important evaluation framework to advance generative AI.

可辨認的 · Performer · 數學 · 語言模型化 · Notability ·

2024 年 12 月 12 日

RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios

Ruiwen Zhou,Wenyue Hua,Liangming Pan,Sitao Cheng,Xiaobao Wu,En Yu,William Yang Wang

from arxiv, Data and Codes are available at //github.com/skyriver-2000/RuleArena

This paper introduces RuleArena, a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning. Covering three practical domains -- airline baggage fees, NBA transactions, and tax regulations -- RuleArena assesses LLMs' proficiency in handling intricate natural language instructions that demand long-context understanding, logical reasoning, and accurate mathematical computation. Two key attributes distinguish RuleArena from traditional rule-based reasoning benchmarks: (1) it extends beyond standard first-order logic representations, and (2) it is grounded in authentic, practical scenarios, providing insights into the suitability and reliability of LLMs for real-world applications. Our findings reveal several notable limitations in LLMs: (1) they struggle to identify and apply the appropriate rules, frequently becoming confused by similar but distinct regulations, (2) they cannot consistently perform accurate mathematical computations, even when they correctly identify the relevant rules, and (3) in general, they perform poorly in the benchmark. These results highlight significant challenges in advancing LLMs' rule-guided reasoning capabilities in real-life applications.

多峰值 · 語言模型化 · MoDELS · 可理解性 · 模態 ·

2023 年 11 月 10 日

How to Bridge the Gap between Modalities: A Comprehensive Survey on Multimodal Large Language Model

Shezheng Song,Xiaopeng Li,Shasha Li

This review paper explores Multimodal Large Language Models (MLLMs), which integrate Large Language Models (LLMs) like GPT-4 to handle multimodal data such as text and vision. MLLMs demonstrate capabilities like generating image narratives and answering image-based questions, bridging the gap towards real-world human-computer interactions and hinting at a potential pathway to artificial general intelligence. However, MLLMs still face challenges in processing the semantic gap in multimodality, which may lead to erroneous generation, posing potential risks to society. Choosing the appropriate modality alignment method is crucial, as improper methods might require more parameters with limited performance improvement. This paper aims to explore modality alignment methods for LLMs and their existing capabilities. Implementing modality alignment allows LLMs to address environmental issues and enhance accessibility. The study surveys existing modal alignment methods in MLLMs into four groups: (1) Multimodal Converters that change data into something LLMs can understand; (2) Multimodal Perceivers to improve how LLMs perceive different types of data; (3) Tools Assistance for changing data into one common format, usually text; and (4) Data-Driven methods that teach LLMs to understand specific types of data in a dataset. This field is still in a phase of exploration and experimentation, and we will organize and update various existing research methods for multimodal information alignment.

知識 (knowledge) · 圖 · 知識圖譜 · 數據集 · Vine ·

2023 年 5 月 22 日

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Yuqi Zhu,Xiaohan Wang,Jing Chen,Shuofei Qiao,Yixin Ou,Yunzhi Yao,Shumin Deng,Huajun Chen,Ningyu Zhang

from arxiv, Work in progress

This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We employ eight distinct datasets that encompass aspects including entity, relation and event extraction, link prediction, and question answering. Empirically, our findings suggest that GPT-4 outperforms ChatGPT in the majority of tasks and even surpasses fine-tuned models in certain reasoning and question-answering datasets. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, which culminates in the presentation of the Virtual Knowledge Extraction task and the development of the VINE dataset. Drawing on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs for KG construction and reasoning, which aims to chart the future of this field and offer exciting opportunities for advancement. We anticipate that our research can provide invaluable insights for future undertakings of KG\footnote{Code and datasets will be available in //github.com/zjunlp/AutoKG.

文本分類 · 圖 · Neural Networks · 圖形處理器 · Networking ·

2023 年 4 月 27 日

Graph Neural Networks for Text Classification: A Survey

Kunze Wang,Yihao Ding,Soyeon Caren Han

from arxiv, 28 pages

Text Classification is the most essential and fundamental problem in Natural Language Processing. While numerous recent text classification models applied the sequential deep learning technique, graph neural network-based models can directly deal with complex structured text data and exploit global information. Many real text classification applications can be naturally cast into a graph, which captures words, documents, and corpus global features. In this survey, we bring the coverage of methods up to 2023, including corpus-level and document-level graph neural networks. We discuss each of these methods in detail, dealing with the graph construction mechanisms and the graph-based learning process. As well as the technological survey, we look at issues behind and future directions addressed in text classification using graph neural networks. We also cover datasets, evaluation metrics, and experiment design and present a summary of published performance on the publicly available benchmarks. Note that we present a comprehensive comparison between different techniques and identify the pros and cons of various evaluation metrics in this survey.

損失函數（機器學習） · 學習的學習 · 學成 · entity · 泛函 ·

2019 年 9 月 9 日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Jiawei Wu,Wenhan Xiong,William Yang Wang

from arxiv, 11pages, 5 figures, accepted to EMNLP 2019

Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.