苍井空无码免费换线,国产日本亚洲一区二区三区,中文字幕一区婷婷在线

Text-to-SQL parsing and end-to-end question answering (E2E TQA) are two main approaches for Table-based Question Answering task. Despite success on multiple benchmarks, they have yet to be compared and their synergy remains unexplored. In this paper, we identify different strengths and weaknesses through evaluating state-of-the-art models on benchmark datasets: Text-to-SQL demonstrates superiority in handling questions involving arithmetic operations and long tables; E2E TQA excels in addressing ambiguous questions, non-standard table schema, and complex table contents. To combine both strengths, we propose a Synergistic Table-based Question Answering approach that integrate different models via answer selection, which is agnostic to any model types. Further experiments validate that ensembling models by either feature-based or LLM-based answer selector significantly improves the performance over individual models.

相關內容

E2E

關注 0

3D · 多樣性 · MoDELS · 數據集 · Extensibility ·

2024 年 10 月 31 日

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

Weicai Ye,Chenhao Ji,Zheng Chen,Junyao Gao,Xiaoshui Huang,Song-Hai Zhang,Wanli Ouyang,Tong He,Cairong Zhao,Guofeng Zhang

from arxiv, NeurIPS2024, Project: //github.com/zju3dv/DiffPano; Code: //github.com/zju3dv/DiffPano

Diffusion-based methods have achieved remarkable achievements in 2D image or 3D object generation, however, the generation of 3D scenes and even $360^{\circ}$ images remains constrained, due to the limited number of scene datasets, the complexity of 3D scenes themselves, and the difficulty of generating consistent multi-view images. To address these issues, we first establish a large-scale panoramic video-text dataset containing millions of consecutive panoramic keyframes with corresponding panoramic depths, camera poses, and text descriptions. Then, we propose a novel text-driven panoramic generation framework, termed DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation. Specifically, benefiting from the powerful generative capabilities of stable diffusion, we fine-tune a single-view text-to-panorama diffusion model with LoRA on the established panoramic video-text dataset. We further design a spherical epipolar-aware multi-view diffusion model to ensure the multi-view consistency of the generated panoramic images. Extensive experiments demonstrate that DiffPano can generate scalable, consistent, and diverse panoramic images with given unseen text descriptions and camera poses.

知識 (knowledge) · 圖 · 語言模型化 · MoDELS · 大語言模型 ·

2024 年 10 月 31 日

Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs

Liyi Chen,Panrong Tong,Zhongming Jin,Ying Sun,Jieping Ye,Hui Xiong

Large Language Models (LLMs) have shown remarkable reasoning capabilities on complex tasks, but they still suffer from out-of-date knowledge, hallucinations, and opaque decision-making. In contrast, Knowledge Graphs (KGs) can provide explicit and editable knowledge for LLMs to alleviate these issues. Existing paradigm of KG-augmented LLM manually predefines the breadth of exploration space and requires flawless navigation in KGs. However, this paradigm cannot adaptively explore reasoning paths in KGs based on the question semantics and self-correct erroneous reasoning paths, resulting in a bottleneck in efficiency and effect. To address these limitations, we propose a novel self-correcting adaptive planning paradigm for KG-augmented LLM named Plan-on-Graph (PoG), which first decomposes the question into several sub-objectives and then repeats the process of adaptively exploring reasoning paths, updating memory, and reflecting on the need to self-correct erroneous reasoning paths until arriving at the answer. Specifically, three important mechanisms of Guidance, Memory, and Reflection are designed to work together, to guarantee the adaptive breadth of self-correcting planning for graph reasoning. Finally, extensive experiments on three real-world datasets demonstrate the effectiveness and efficiency of PoG.

圖 · 數據集 · Integration · INFORMS · 語言模型化 ·

2024 年 10 月 31 日

GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning

Yanbin Wei,Shuai Fu,Weisen Jiang,Zejian Zhang,Zhixiong Zeng,Qi Wu,James T. Kwok,Yu Zhang

from arxiv, NeurIPS 2024; Project Page: v-graph.github.io; Code: //github.com/WEIYanbin1999/GITA/

Large Language Models (LLMs) are increasingly used for various tasks with graph structures. Though LLMs can process graph information in a textual format, they overlook the rich vision modality, which is an intuitive way for humans to comprehend structural information and conduct general graph reasoning. The potential benefits and capabilities of representing graph structures as visual images (i.e., $\textit{visual graph}$) are still unexplored. To fill the gap, we innovatively propose an end-to-end framework, called $\textbf{G}$raph to v$\textbf{I}$sual and $\textbf{T}$extual Integr$\textbf{A}$tion (GITA), which firstly incorporates visual graphs into general graph reasoning. Besides, we establish $\textbf{G}$raph-based $\textbf{V}$ision-$\textbf{L}$anguage $\textbf{Q}$uestion $\textbf{A}$nswering (GVLQA) dataset from existing graph data, which is the first vision-language dataset for general graph reasoning purposes. Extensive experiments on the GVLQA dataset and five real-world datasets show that GITA outperforms mainstream LLMs in terms of general graph reasoning capabilities. Moreover, We highlight the effectiveness of the layout augmentation on visual graphs and pretraining on the GVLQA dataset.

MoDELS · 優化器 · 知識 (knowledge) · 估計/估計量 · 語言模型化 ·

2024 年 10 月 31 日

OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models

Junda Wu,Xintong Li,Ruoyu Wang,Yu Xia,Yuxin Xiong,Jianing Wang,Tong Yu,Xiang Chen,Branislav Kveton,Lina Yao,Jingbo Shang,Julian McAuley

from arxiv, 10 pages

Offline evaluation of LLMs is crucial in understanding their capacities, though current methods remain underexplored in existing research. In this work, we focus on the offline evaluation of the chain-of-thought capabilities and show how to optimize LLMs based on the proposed evaluation method. To enable offline feedback with rich knowledge and reasoning paths, we use knowledge graphs (e.g., Wikidata5m) to provide feedback on the generated chain of thoughts. Due to the heterogeneity between LLM reasoning and KG structures, direct interaction and feedback from KGs on LLM behavior are challenging, as they require accurate entity linking and grounding of LLM-generated chains of thought in the KG. To address the above challenge, we propose an offline chain-of-thought evaluation framework, OCEAN, which models chain-of-thought reasoning in LLMs as an MDP and evaluate the policy's alignment with KG preference modeling. To overcome the reasoning heterogeneity and grounding problems, we leverage on-policy KG exploration and RL to model a KG policy that generates token-level likelihood distributions for LLM-generated chain-of-thought reasoning paths, simulating KG reasoning preference. Then we incorporate the knowledge-graph feedback on the validity and alignment of the generated reasoning paths into inverse propensity scores and propose KG-IPS estimator. Theoretically, we prove the unbiasedness of the proposed KG-IPS estimator and provide a lower bound on its variance. With the off-policy evaluated value function, we can directly enable off-policy optimization to further enhance chain-of-thought alignment. Our empirical study shows that OCEAN can be efficiently optimized for generating chain-of-thought reasoning paths with higher estimated values without affecting LLMs' general abilities in downstream tasks or their internal knowledge.

Vision · 變換 · Integration · 評論員 · Performer ·

2024 年 10 月 30 日

DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET

Yitong Li,Morteza Ghahremani,Youssef Wally,Christian Wachinger

from arxiv, Accepted by IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

Diagnosing dementia, particularly for Alzheimer's Disease (AD) and frontotemporal dementia (FTD), is complex due to overlapping symptoms. While magnetic resonance imaging (MRI) and positron emission tomography (PET) data are critical for the diagnosis, integrating these modalities in deep learning faces challenges, often resulting in suboptimal performance compared to using single modalities. Moreover, the potential of multi-modal approaches in differential diagnosis, which holds significant clinical importance, remains largely unexplored. We propose a novel framework, DiaMond, to address these issues with vision Transformers to effectively integrate MRI and PET. DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance. DiaMond significantly outperforms existing multi-modal methods across various datasets, achieving a balanced accuracy of 92.4% in AD diagnosis, 65.2% for AD-MCI-CN classification, and 76.5% in differential diagnosis of AD and FTD. We also validated the robustness of DiaMond in a comprehensive ablation study. The code is available at //github.com/ai-med/DiaMond.

變換 · Vision · 奇異值分解 · 分解 · 酉矩陣 ·

2024 年 10 月 30 日

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

Wei Dong,Yuan Sun,Yiting Yang,Xing Zhang,Zhijun Lin,Qingsen Yan,Haokui Zhang,Peng Wang,Yang Yang,Hengtao Shen

A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix. This matrix is decomposed into a product of down-projection and up-projection matrices, with the bottleneck dimensionality being crucial for reducing the number of learnable parameters, as exemplified by prevalent methods like LoRA and Adapter. However, these low-rank strategies typically employ a fixed bottleneck dimensionality, which limits their flexibility in handling layer-wise variations. To address this limitation, we propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix. We utilize Householder transformations to construct orthogonal matrices that efficiently mimic the unitary matrices, requiring only a vector. The diagonal values are learned in a layer-wise manner, allowing them to flexibly capture the unique properties of each layer. This approach enables the generation of adaptation matrices with varying ranks across different layers, providing greater flexibility in adapting pre-trained models. Experiments on standard downstream vision tasks demonstrate that our method achieves promising fine-tuning performance.

Performer · MoDELS · 優化器 · FAST · 穩健性 ·

2024 年 10 月 30 日

Coupling deal.II and FROSch: A Sustainable and Accessible (O)RAS Preconditioner

Alexander Heinlein,Sebastian Kinnewig,Thomas Wick

In this work, restricted additive Schwarz (RAS) and optimized restricted additive Schwarz (ORAS) preconditioners from the Trilinos package FROSch (Fast and Robust Overlapping Schwarz) are employed to solve model problems implemented using deal.II (differential equations analysis library). Therefore, a Tpetra-based interface for coupling deal.II and FROSch is implemented. While RAS preconditioners have been available before, ORAS preconditioners have been newly added to FROSch. The FROSch-deal.II interface works for both Lagrange-based and N\'ed\'elec finite elements. Here, as model problems, nonstationary, nonlinear, variational-monolithic fluid-structure interaction and the indefinite time-harmonic Maxwell's equations are considered. Several numerical experiments in two and three spatial dimensions confirm the performance of the preconditioners as well as the FROSch-deal.II interface. In conclusion, the overall software interface is straightforward and easy to use while giving satisfactory solver performances for challenging PDE systems.

Learning · MoDELS · 聯邦學習 · Machine Learning · Performer ·

2024 年 10 月 30 日

Byzantine-Robust Federated Learning: An Overview With Focus on Developing Sybil-based Attacks to Backdoor Augmented Secure Aggregation Protocols

Atharv Deshmukh

from arxiv, 16 pages, 4 figures, 1 appendix

Federated Learning (FL) paradigms enable large numbers of clients to collaboratively train Machine Learning models on private data. However, due to their multi-party nature, traditional FL schemes are left vulnerable to Byzantine attacks that attempt to hurt model performance by injecting malicious backdoors. A wide variety of prevention methods have been proposed to protect frameworks from such attacks. This paper provides a exhaustive and updated taxonomy of existing methods and frameworks, before zooming in and conducting an in-depth analysis of the strengths and weaknesses of the Robustness of Federated Learning (RoFL) protocol. From there, we propose two novel Sybil-based attacks that take advantage of vulnerabilities in RoFL. Finally, we conclude with comprehensive proposals for future testing, describe and detail implementation of the proposed attacks, and offer direction for improvements in the RoFL protocol as well as Byzantine-robust frameworks as a whole.

cache · 詞元分析器 · MoDELS · 特化 · 推斷 ·

2024 年 10 月 29 日

VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

Dezhan Tu,Danylo Vashchilenko,Yuzhe Lu,Panpan Xu

Vision-Language Models (VLMs) have demonstrated impressive performance across a versatile set of tasks. A key challenge in accelerating VLMs is storing and accessing the large Key-Value (KV) cache that encodes long visual contexts, such as images or videos. While existing KV cache compression methods are effective for Large Language Models (LLMs), directly migrating them to VLMs yields suboptimal accuracy and speedup. To bridge the gap, we propose VL-Cache, a novel KV cache compression recipe tailored for accelerating VLM inference. In this paper, we first investigate the unique sparsity pattern of VLM attention by distinguishing visual and text tokens in prefill and decoding phases. Based on these observations, we introduce a layer-adaptive sparsity-aware cache budget allocation method that effectively distributes the limited cache budget across different layers, further reducing KV cache size without compromising accuracy. Additionally, we develop a modality-aware token scoring policy to better evaluate the token importance. Empirical results on multiple benchmark datasets demonstrate that retaining only 10% of KV cache achieves accuracy comparable to that with full cache. In a speed benchmark, our method accelerates end-to-end latency of generating 100 tokens by up to 2.33x and speeds up decoding by up to 7.08x, while reducing the memory footprint of KV cache in GPU by 90%.

剪枝 · Better · CAP · contrastive · MoDELS ·

2021 年 12 月 14 日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Runxin Xu,Fuli Luo,Chengyu Wang,Baobao Chang,Jun Huang,Songfang Huang,Fei Huang

from arxiv, Accepted to AAAI 2022

Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge. Besides, to better retain the performance of the pruned model, the snapshots (i.e., the intermediate models at each pruning iteration) also serve as effective supervisions for pruning. Our extensive experiments show that adopting CAP consistently yields significant improvements, especially in extremely high sparsity scenarios. With only 3% model parameters reserved (i.e., 97% sparsity), CAP successfully achieves 99.2% and 96.3% of the original BERT performance in QQP and MNLI tasks. In addition, our probing experiments demonstrate that the model pruned by CAP tends to achieve better generalization ability.