男女一边脱一边亲一边膜,亚洲视频华人在线播放,国精品拍偷在线视频分类,成人国产一级视频免费看

We introduce ``Generative Fusion Decoding'' (GFD), a novel shallow fusion framework, utilized to integrate Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recognition (OCR). We derive the formulas necessary to enable GFD to operate across mismatched token spaces of different models by mapping text token space to byte token space, enabling seamless fusion during the decoding process. The framework is plug-and-play, compatible with various auto-regressive models, and does not require re-training for feature alignment, thus overcoming limitations of previous fusion techniques. We highlight three main advantages of GFD: First, by simplifying the complexity of aligning different model sample spaces, GFD allows LLMs to correct errors in tandem with the recognition model, reducing computation latencies. Second, the in-context learning ability of LLMs is fully capitalized by GFD, increasing robustness in long-form speech recognition and instruction aware speech recognition. Third, GFD enables fusing recognition models deficient in Chinese text recognition with LLMs extensively trained on Chinese. Our evaluation demonstrates that GFD significantly improves performance in ASR and OCR tasks, with ASR reaching state-of-the-art in the NTUML2021 benchmark. GFD provides a significant step forward in model integration, offering a unified solution that could be widely applicable to leveraging existing pre-trained models through step by step fusion.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · INTERACT · 點云 · Networking · 模態 ·

2024 年 7 月 4 日

Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion

Hang Xu,Chen Long,Wenxiao Zhang,Yuan Liu,Zhen Cao,Zhen Dong,Bisheng Yang

from arxiv, ECCV 2024

In this paper, we explore a novel framework, EGIInet (Explicitly Guided Information Interaction Network), a model for View-guided Point cloud Completion (ViPC) task, which aims to restore a complete point cloud from a partial one with a single view image. In comparison with previous methods that relied on the global semantics of input images, EGIInet efficiently combines the information from two modalities by leveraging the geometric nature of the completion task. Specifically, we propose an explicitly guided information interaction strategy supported by modal alignment for point cloud completion. First, in contrast to previous methods which simply use 2D and 3D backbones to encode features respectively, we unified the encoding process to promote modal alignment. Second, we propose a novel explicitly guided information interaction strategy that could help the network identify critical information within images, thus achieving better guidance for completion. Extensive experiments demonstrate the effectiveness of our framework, and we achieved a new state-of-the-art (+16% CD over XMFnet) in benchmark datasets despite using fewer parameters than the previous methods. The pre-trained model and code and are available at //github.com/WHU-USI3DV/EGIInet.

核化 · MoDELS · Linux · Performer · 可理解性 ·

2024 年 7 月 4 日

KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution

Alex Mathai,Chenxi Huang,Petros Maniatis,Aleksandr Nogikh,Franjo Ivancic,Junfeng Yang,Baishakhi Ray

Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks. In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel. Unlike application-level software, a systems codebase like Linux is multilingual (low-level C/Assembly/Bash/Rust); gigantic (>20 million lines); critical (impacting billions of devices worldwide), and highly concurrent (involving complex multi-threading). To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym (a platform) and kBench (a dataset). The kGym platform provides a SE environment for large-scale experiments on the Linux kernel, including compiling and running kernels in parallel across several virtual machines, detecting operations and crashes, inspecting logs, and querying and patching the code base. We use kGym to facilitate evaluation on kBench, a crash resolution benchmark drawn from real-world Linux kernel bugs. An example bug in kBench contains crashing stack traces, a bug-reproducer file, a developer-written fix, and other associated data. To understand current performance, we conduct baseline experiments by prompting LLMs to resolve Linux kernel crashes. Our initial evaluations reveal that the best performing LLM achieves 0.72% and 5.38% in the unassisted and assisted (i.e., buggy files disclosed to the model) settings, respectively. These results highlight the need for further research to enhance model performance in SE tasks. Improving performance on kBench requires models to master new learning skills, including understanding the cause of crashes and repairing faults, writing memory-safe and hardware-aware code, and understanding concurrency. As a result, this work opens up multiple avenues of research at the intersection of machine learning and systems software.

多峰值 · 可理解性 · HTTPS · 穩健性 · MoDELS ·

2024 年 7 月 3 日

OpenVNA: A Framework for Analyzing the Behavior of Multimodal Language Understanding System under Noisy Scenarios

Ziqi Yuan,Baozheng Zhang,Hua Xu,Zhiyun Liang,Kai Gao

from arxiv, 10 pages, 4 figures, to be published in ACL 2024 System Demonstration Track

We present OpenVNA, an open-source framework designed for analyzing the behavior of multimodal language understanding systems under noisy conditions. OpenVNA serves as an intuitive toolkit tailored for researchers, facilitating convenience batch-level robustness evaluation and on-the-fly instance-level demonstration. It primarily features a benchmark Python library for assessing global model robustness, offering high flexibility and extensibility, thereby enabling customization with user-defined noise types and models. Additionally, a GUI-based interface has been developed to intuitively analyze local model behavior. In this paper, we delineate the design principles and utilization of the created library and GUI-based web platform. Currently, OpenVNA is publicly accessible at \url{//github.com/thuiar/OpenVNA}, with a demonstration video available at \url{//youtu.be/0Z9cW7RGct4}.

Networking · Neural Networks · 值域 · Analysis · 輸出 ·

2024 年 7 月 2 日

Output Range Analysis for Deep Neural Networks based on Simulated Annealing Processes

Helder Rojas,Nilton Rojas,Espinoza J. B.,Luis Huamanchumo

This paper tackles the challenging problem of output range estimation for Deep Neural Networks (DNNs), introducing a novel algorithm based on Simulated Annealing (SA). Our approach addresses the lack of local geometric information and high non-linearity in DNNs, making it versatile across various architectures, especially Residual Neural Networks (ResNets). We present a straightforward, implementation-friendly algorithm that avoids restrictive assumptions about network architecture. Through theoretical analysis and experimental evaluations, including tests on the Ackley function, we demonstrate our algorithm's effectiveness in navigating complex, non-convex surfaces and accurately estimating DNN output ranges. Futhermore, the Python codes of this experimental evaluation that support our results are available in our GitHub repository (//github.com/Nicerova7/output-range-analysis-for-deep-neural-networks-with-simulated-annealing).

Networking · Performance · 表示學習 · Learning · 表示 ·

2024 年 7 月 2 日

SiamTST: A Novel Representation Learning Framework for Enhanced Multivariate Time Series Forecasting applied to Telco Networks

Simen Kristoffersen,Peter Skaar Nordby,Sara Malacarne,Massimiliano Ruocco,Pablo Ortiz

from arxiv, 14 pages, 3 figures, public codebase

We introduce SiamTST, a novel representation learning framework for multivariate time series. SiamTST integrates a Siamese network with attention, channel-independent patching, and normalization techniques to achieve superior performance. Evaluated on a real-world industrial telecommunication dataset, SiamTST demonstrates significant improvements in forecasting accuracy over existing methods. Notably, a simple linear network also shows competitive performance, achieving the second-best results, just behind SiamTST. The code is available at //github.com/simenkristoff/SiamTST.

MoDELS · 知識 (knowledge) · 圖 · 集成 · Performer ·

2024 年 7 月 2 日

DynaSemble: Dynamic Ensembling of Textual and Structure-Based Models for Knowledge Graph Completion

Ananjan Nandi,Navdeep Kaur,Parag Singla, Mausam

from arxiv, 12 pages, 2 figures, 15 tables Accepted to ACL 2024

We consider two popular approaches to Knowledge Graph Completion (KGC): textual models that rely on textual entity descriptions, and structure-based models that exploit the connectivity structure of the Knowledge Graph (KG). Preliminary experiments show that these approaches have complementary strengths: structure-based models perform exceptionally well when the gold answer is easily reachable from the query head in the KG, while textual models exploit descriptions to give good performance even when the gold answer is not easily reachable. In response, we propose DynaSemble, a novel method for learning query-dependent ensemble weights to combine these approaches by using the distributions of scores assigned by the models in the ensemble to all candidate entities. DynaSemble achieves state-of-the-art results on three standard KGC datasets, with up to 6.8 pt MRR and 8.3 pt Hits@1 gains over the best baseline model for the WN18RR dataset.

MoDELS · 模型評估 · 卡爾曼濾波 · Projection · 線性的 ·

2024 年 7 月 2 日

PO-MSCKF: An Efficient Visual-Inertial Odometry by Reconstructing the Multi-State Constrained Kalman Filter with the Pose-only Theory

Du Xueyu,Zhang Lilian,Liu Ruochen,Wang Maosong,Wu Wenqi,Mao Jun

Efficient Visual-Inertial Odometry (VIO) is crucial for payload-constrained robots. Though modern optimization-based algorithms have achieved superior accuracy, the MSCKF-based VIO algorithms are still widely demanded for their efficient and consistent performance. As MSCKF is built upon the conventional multi-view geometry, the measured residuals are not only related to the state errors but also related to the feature position errors. To apply EKF fusion, a projection process is required to remove the feature position error from the observation model, which can lead to model and accuracy degradation. To obtain an efficient visual-inertial fusion model, while also preserving the model consistency, we propose to reconstruct the MSCKF VIO with the novel Pose-Only (PO) multi-view geometry description. In the newly constructed filter, we have modeled PO reprojection residuals, which are solely related to the motion states and thus overcome the requirements of space projection. Moreover, the new filter does not require any feature position information, which removes the computational cost and linearization errors brought in by the 3D reconstruction procedure. We have conducted comprehensive experiments on multiple datasets, where the proposed method has shown accuracy improvements and consistent performance in challenging sequences.

自動問答 · 維基百科 · entity · HotpotQA · 多跳 ·

2020 年 12 月 31 日

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Shaobo Li,Xiaoguang Li,Lifeng Shang,Xin Jiang,Qun Liu,Chengjie Sun,Zhenzhou Ji,Bingquan Liu

from arxiv, Accepted at AAAI 2021

Collecting supporting evidence from large corpora of text (e.g., Wikipedia) is of great challenge for open-domain Question Answering (QA). Especially, for multi-hop open-domain QA, scattered evidence pieces are required to be gathered together to support the answer extraction. In this paper, we propose a new retrieval target, hop, to collect the hidden reasoning evidence from Wikipedia for complex question answering. Specifically, the hop in this paper is defined as the combination of a hyperlink and the corresponding outbound link document. The hyperlink is encoded as the mention embedding which models the structured knowledge of how the outbound link entity is mentioned in the textual context, and the corresponding outbound link document is encoded as the document embedding representing the unstructured knowledge within it. Accordingly, we build HopRetriever which retrieves hops over Wikipedia to answer complex questions. Experiments on the HotpotQA dataset demonstrate that HopRetriever outperforms previously published evidence retrieval methods by large margins. Moreover, our approach also yields quantifiable interpretations of the evidence collection process.

BERT · 語言表示 · state-of-the-art · 可理解性 · 自動問答 ·

2018 年 10 月 11 日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova

from arxiv, 13 pages

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.

判別器 · Performer · 降維 · 卷積神經網絡 · 多任務學習 ·

2018 年 1 月 25 日

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Yuan Gao,Qi She,Jiayi Ma,Mingbo Zhao,Wei Liu,Alan L. Yuille

from arxiv, 11 pages, 5 figures, 7 tables

State-of-the-art Convolutional Neural Network (CNN) benefits a lot from multi-task learning (MTL), which learns multiple related tasks simultaneously to obtain shared or mutually related representations for different tasks. The most widely-used MTL CNN structure is based on an empirical or heuristic split on a specific layer (e.g., the last convolutional layer) to minimize different task-specific losses. However, this heuristic sharing/splitting strategy may be harmful to the final performance of one or multiple tasks. In this paper, we propose a novel CNN structure for MTL, which enables automatic feature fusing at every layer. Specifically, we first concatenate features from different tasks according to their channel dimension, and then formulate the feature fusing problem as discriminative dimensionality reduction. We show that this discriminative dimensionality reduction can be done by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN, which we refer to as Neural Discriminative Dimensionality Reduction (NDDR). We perform ablation analysis in details for different configurations in training the network. The experiments carried out on different network structures and different task sets demonstrate the promising performance and desirable generalizability of our proposed method.