99视频在线播放喷射,国产高清一区二区在线影院,亚洲AV日韩AV无码A影音先锋,影音先锋欧美激情在线观看

In this work, we investigate the potential of a large language model (LLM) to directly comprehend visual signals without the necessity of fine-tuning on multi-modal datasets. The foundational concept of our method views an image as a linguistic entity, and translates it to a set of discrete words derived from the LLM's vocabulary. To achieve this, we present the Vision-to-Language Tokenizer, abbreviated as V2T Tokenizer, which transforms an image into a ``foreign language'' with the combined aid of an encoder-decoder, the LLM vocabulary, and a CLIP model. With this innovative image encoding, the LLM gains the ability not only for visual comprehension but also for image denoising and restoration in an auto-regressive fashion-crucially, without any fine-tuning. We undertake rigorous experiments to validate our method, encompassing understanding tasks like image recognition, image captioning, and visual question answering, as well as image denoising tasks like inpainting, outpainting, deblurring, and shift restoration. Code and models are available at //github.com/zh460045050/V2L-Tokenizer.

相關內容

大語言模型

關注 56

大語言模型是基于海量文本數據訓練的深度學習模型。它不僅能夠生成自然語言文本，還能夠深入理解文本含義，處理各種自然語言任務，如文本摘要、問答、翻譯等。2023年，大語言模型及其在人工智能領域的應用已成為全球科技研究的熱點，其在規模上的增長尤為引人注目，參數量已從最初的十幾億躍升到如今的一萬億。參數量的提升使得模型能夠更加精細地捕捉人類語言微妙之處，更加深入地理解人類語言的復雜性。在過去的一年里，大語言模型在吸納新知識、分解復雜任務以及圖文對齊等多方面都有顯著提升。隨著技術的不斷成熟，它將不斷拓展其應用范圍，為人類提供更加智能化和個性化的服務，進一步改善人們的生活和生產方式。

知識 (knowledge) · HTTPS · 基 · 知識庫 · 描述符 ·

2024 年 4 月 23 日

CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies

Weiyan Shi,Ryan Li,Yutong Zhang,Caleb Ziems,Chunhua yu,Raya Horesh,Rogério Abreu de Paula,Diyi Yang

from arxiv, 32 pages, 7 figures, preprint

To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale. With the pipeline, we construct CultureBank, a knowledge base built upon users' self-narratives with 12K cultural descriptors sourced from TikTok and 11K from Reddit. Unlike previous cultural knowledge resources, CultureBank contains diverse views on cultural descriptors to allow flexible interpretation of cultural knowledge, and contextualized cultural scenarios to help grounded evaluation. With CultureBank, we evaluate different LLMs' cultural awareness, and identify areas for improvement. We also fine-tune a language model on CultureBank: experiments show that it achieves better performances on two downstream cultural tasks in a zero-shot setting. Finally, we offer recommendations based on our findings for future culturally aware language technologies. The project page is //culturebank.github.io . The code and model is at //github.com/SALT-NLP/CultureBank . The released CultureBank dataset is at //huggingface.co/datasets/SALT-NLP/CultureBank .

INTERACT · 回合 · Amazon · 得分 · 設計 ·

2024 年 4 月 22 日

Sign Language-Based versus Touch-Based Input for Deaf Users with Interactive Personal Assistants in Simulated Kitchen Environments

Paige DeVries,Nina Tran,Keith Delk,Melanie Miga,Richard Taulbee,Pranav Pidathala,Abraham Glasser,Raja Kushlanagar,Christian Vogler

from arxiv, To appear in Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA 2024, May 11-16, 2024, Honolulu, HI, USA. ACM, New York, NY, USA, 9 pages. //doi.org/10.1145/3613905.3651075

In this study, we assess the usability of interactive personal assistants (IPAs), such as Amazon Alexa, in a simulated kitchen smart home environment, with deaf and hard of hearing users. Participants engage in activities in a way that causes their hands to get dirty. With these dirty hands, they are tasked with two different input methods for IPAs: American Sign Language (ASL) in a Wizard-of-Oz design, and smart home apps with a touchscreen. Usability ratings show that participants significantly preferred ASL over touch-based apps with dirty hands, although not to a larger extent than in comparable previous work with clean hands. Participants also expressed significant enthusiasm for ASL-based IPA interaction in Netpromoter scores and in questions about their overall preferences. Preliminary observations further suggest that having dirty hands may affect the way people sign, which may pose challenges for building IPAs that natively support sign language input.

可約的 · Learning · 回合 · 聯邦學習 · 多樣性 ·

2024 年 4 月 22 日

Apodotiko: Enabling Efficient Serverless Federated Learning in Heterogeneous Environments

Mohak Chadha,Alexander Jensen,Jianfeng Gu,Osama Abboud,Michael Gerndt

from arxiv, Accepted at IEEE/ACM CCGrid'24

Federated Learning (FL) is an emerging machine learning paradigm that enables the collaborative training of a shared global model across distributed clients while keeping the data decentralized. Recent works on designing systems for efficient FL have shown that utilizing serverless computing technologies, particularly Function-as-a-Service (FaaS) for FL, can enhance resource efficiency, reduce training costs, and alleviate the complex infrastructure management burden on data holders. However, current serverless FL systems still suffer from the presence of stragglers, i.e., slow clients that impede the collaborative training process. While strategies aimed at mitigating stragglers in these systems have been proposed, they overlook the diverse hardware resource configurations among FL clients. To this end, we present Apodotiko, a novel asynchronous training strategy designed for serverless FL. Our strategy incorporates a scoring mechanism that evaluates each client's hardware capacity and dataset size to intelligently prioritize and select clients for each training round, thereby minimizing the effects of stragglers on system performance. We comprehensively evaluate Apodotiko across diverse datasets, considering a mix of CPU and GPU clients, and compare its performance against five other FL training strategies. Results from our experiments demonstrate that Apodotiko outperforms other FL training strategies, achieving an average speedup of 2.75x and a maximum speedup of 7.03x. Furthermore, our strategy significantly reduces cold starts by a factor of four on average, demonstrating suitability in serverless environments.

量子計算 · Performer · 逼真度 · Integration · 查準率/準確率 ·

2024 年 4 月 22 日

HamilToniQ: An Open-Source Benchmark Toolkit for Quantum Computers

Xiaotian Xu,Kuan-Cheng Chen,Robert Wille

from arxiv, 11 pages, 13 figures

In this paper, we introduce HamilToniQ, an open-source, and application-oriented benchmarking toolkit for the comprehensive evaluation of Quantum Processing Units (QPUs). Designed to navigate the complexities of quantum computations, HamilToniQ incorporates a methodological framework assessing QPU types, topologies, and multi-QPU systems. The toolkit facilitates the evaluation of QPUs' performance through multiple steps including quantum circuit compilation and quantum error mitigation (QEM), integrating strategies that are unique to each stage. HamilToniQ's standardized score, H-Score, quantifies the fidelity and reliability of QPUs, providing a multidimensional perspective of QPU performance. With a focus on the Quantum Approximate Optimization Algorithm (QAOA), the toolkit enables direct, comparable analysis of QPUs, enhancing transparency and equity in benchmarking. Demonstrated in this paper, HamilToniQ has been validated on various IBM QPUs, affirming its effectiveness and robustness. Overall, HamilToniQ significantly contributes to the advancement of the quantum computing field by offering precise and equitable benchmarking metrics.

MoDELS · 大學 · 語言模型化 · Processing（編程語言） · 訓練數據 ·

2024 年 4 月 22 日

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

Javier Rando,Francesco Croce,Kry?tof Mitka,Stepan Shabalin,Maksym Andriushchenko,Nicolas Flammarion,Florian Tramèr

from arxiv, Competition Report

Large language models are aligned to be safe, preventing users from generating harmful content like misinformation or instructions for illegal activities. However, previous work has shown that the alignment process is vulnerable to poisoning attacks. Adversaries can manipulate the safety training data to inject backdoors that act like a universal sudo command: adding the backdoor string to any prompt enables harmful responses from models that, otherwise, behave safely. Our competition, co-located at IEEE SaTML 2024, challenged participants to find universal backdoors in several large language models. This report summarizes the key findings and promising ideas for future research.

Automator · Integration · 語言模型化 · 情景 · 大語言模型 ·

2024 年 4 月 20 日

Lemur: Integrating Large Language Models in Automated Program Verification

Haoze Wu,Clark Barrett,Nina Narodytska

from arxiv, Accepted at ICLR'24

The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of transition rules and prove its soundness. We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks.

INFORMS · MoDELS · Guidance · 特征提取 · 層 ·

2024 年 4 月 19 日

Modeling Multi-Granularity Context Information Flow for Pavement Crack Detection

Junbiao Pang,Baocheng Xiong,Jiaqi Wu

Crack detection has become an indispensable, interesting yet challenging task in the computer vision community. Specially, pavement cracks have a highly complex spatial structure, a low contrasting background and a weak spatial continuity, posing a significant challenge to an effective crack detection method. In this paper, we address these problems from a view that utilizes contexts of the cracks and propose an end-to-end deep learning method to model the context information flow. To precisely localize crack from an image, it is critical to effectively extract and aggregate multi-granularity context, including the fine-grained local context around the cracks (in spatial-level) and the coarse-grained semantics (in segment-level). Concretely, in Convolutional Neural Network (CNN), low-level features extracted by the shallow layers represent the local information, while the deep layers extract the semantic features. Additionally, a second main insight in this work is that the semantic context should be an guidance to local context feature. By the above insights, the proposed method we first apply the dilated convolution as the backbone feature extractor to model local context, then we build a context guidance module to leverage semantic context to guide local feature extraction at multiple stages. To handle label alignment between stages, we apply the Multiple Instance Learning (MIL) strategy to align the high-level feature to the low-level ones in the stage-wise context flow. In addition, compared with these public crack datasets, to our best knowledge, we release the largest, most complex and most challenging Bitumen Pavement Crack (BPC) dataset. The experimental results on the three crack datasets demonstrate that the proposed method performs well and outperforms the current state-of-the-art methods.

可辨認的 · Learning · 聯邦學習 · 服務器 · 同分布的 ·

2024 年 4 月 18 日

The Dog Walking Theory: Rethinking Convergence in Federated Learning

Kun Zhai,Yifeng Gao,Xingjun Ma,Difan Zou,Guangnan Ye,Yu-Gang Jiang

Federated learning (FL) is a collaborative learning paradigm that allows different clients to train one powerful global model without sharing their private data. Although FL has demonstrated promising results in various applications, it is known to suffer from convergence issues caused by the data distribution shift across different clients, especially on non-independent and identically distributed (non-IID) data. In this paper, we study the convergence of FL on non-IID data and propose a novel \emph{Dog Walking Theory} to formulate and identify the missing element in existing research. The Dog Walking Theory describes the process of a dog walker leash walking multiple dogs from one side of the park to the other. The goal of the dog walker is to arrive at the right destination while giving the dogs enough exercise (i.e., space exploration). In FL, the server is analogous to the dog walker while the clients are analogous to the dogs. This analogy allows us to identify one crucial yet missing element in existing FL algorithms: the leash that guides the exploration of the clients. To address this gap, we propose a novel FL algorithm \emph{FedWalk} that leverages an external easy-to-converge task at the server side as a \emph{leash task} to guide the local training of the clients. We theoretically analyze the convergence of FedWalk with respect to data heterogeneity (between server and clients) and task discrepancy (between the leash and the original tasks). Experiments on multiple benchmark datasets demonstrate the superiority of FedWalk over state-of-the-art FL methods under both IID and non-IID settings.

變換 · Taxonomy · BASIC · Vision · Processing（編程語言） ·

2022 年 2 月 24 日

Transformers in Medical Image Analysis: A Review

Kelei He,Chen Gan,Zhuoyuan Li,Islem Rekik,Zihao Yin,Wen Ji,Yang Gao,Qian Wang,Junfeng Zhang,Dinggang Shen

Transformers have dominated the field of natural language processing, and recently impacted the computer vision area. In the field of medical image analysis, Transformers have also been successfully applied to full-stack clinical applications, including image synthesis/reconstruction, registration, segmentation, detection, and diagnosis. Our paper presents both a position paper and a primer, promoting awareness and application of Transformers in the field of medical image analysis. Specifically, we first overview the core concepts of the attention mechanism built into Transformers and other basic components. Second, we give a new taxonomy of various Transformer architectures tailored for medical image applications and discuss their limitations. Within this review, we investigate key challenges revolving around the use of Transformers in different learning paradigms, improving the model efficiency, and their coupling with other techniques. We hope this review can give a comprehensive picture of Transformers to the readers in the field of medical image analysis.

Performer · 變換 · MoDELS · Taxonomy · INTERACT ·

2022 年 2 月 15 日

Transformers in Time Series: A Survey

Qingsong Wen,Tian Zhou,Chaoli Zhang,Weiqi Chen,Ziqing Ma,Junchi Yan,Liang Sun

from arxiv, 8 pages, 1 figure, 4 tables, 65 referred papers

Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also intrigues great interests in the time series community. Among multiple advantages of transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review transformer schemes for time series modeling by highlighting their strengths as well as limitations through a new taxonomy to summarize existing time series transformers in two perspectives. From the perspective of network modifications, we summarize the adaptations of module level and architecture level of the time series transformers. From the perspective of applications, we categorize time series transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance. To the best of our knowledge, this paper is the first work to comprehensively and systematically summarize the recent advances of Transformers for modeling time series data. We hope this survey will ignite further research interests in time series Transformers.