亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<form id='8bkht'></form>

<bdo id='nNdXF'><sup id='FUxeG'><div id='I6XLw'><bdo id='PXjqH'></bdo></div></sup></bdo>

·

INTERACT · Prompt · HTTPS · Integration · 可理解性 ·

2023 年 5 月 4 日

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Seongmin Lee,Benjamin Hoover,Hendrik Strobelt,Zijie J. Wang,ShengYun Peng,Austin Wright,Kevin Li,Haekyu Park,Haoyang Yang,Duen Horng Chau

from arxiv, 5 pages, 5 figures

Diffusion-based generative models' impressive ability to create convincing images has captured global attention. However, their complex internal structures and operations often make them difficult for non-experts to understand. We present Diffusion Explainer, the first interactive visualization tool that explains how Stable Diffusion transforms text prompts into images. Diffusion Explainer tightly integrates a visual overview of Stable Diffusion's complex components with detailed explanations of their underlying operations, enabling users to fluidly transition between multiple levels of abstraction through animations and interactive elements. By comparing the evolutions of image representations guided by two related text prompts over refinement timesteps, users can discover the impact of prompts on image generation. Diffusion Explainer runs locally in users' web browsers without the need for installation or specialized hardware, broadening the public's education access to modern AI techniques. Our open-sourced tool is available at: //poloclub.github.io/diffusion-explainer/.

相關內容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人機交互領域的研究者和實踐者展示其工作的重要平臺。多年來，這些會議吸引了來自幾個國家和文化的研究人員。官網鏈接： · PAR · Performer · 最優化 · 閉式 ·

2023 年 6 月 20 日

Multi-Concept Customization of Text-to-Image Diffusion

Nupur Kumari,Bingliang Zhang,Richard Zhang,Eli Shechtman,Jun-Yan Zhu

from arxiv, Updated v2 with results on the new CustomConcept101 dataset //www.cs.cmu.edu/~custom-diffusion/dataset.html Project webpage: //www.cs.cmu.edu/~custom-diffusion

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to quickly acquire a new concept, given a few examples? Furthermore, can we compose multiple new concepts together? We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts while enabling fast tuning (~6 minutes). Additionally, we can jointly train for multiple concepts or combine multiple fine-tuned models into one via closed-form constrained optimization. Our fine-tuned model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings. Our method outperforms or performs on par with several baselines and concurrent works in both qualitative and quantitative evaluations while being memory and computationally efficient.

解碼 · MoDELS · INFORMS · Performer · HTTPS ·

2023 年 6 月 20 日

Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs

Yu Takagi,Shinji Nishimoto

The integration of deep learning and neuroscience has been advancing rapidly, which has led to improvements in the analysis of brain activity and the understanding of deep learning models from a neuroscientific perspective. The reconstruction of visual experience from human brain activity is an area that has particularly benefited: the use of deep learning models trained on large amounts of natural images has greatly improved its quality, and approaches that combine the diverse information contained in visual experiences have proliferated rapidly in recent years. In this technical paper, by taking advantage of the simple and generic framework that we proposed (Takagi and Nishimoto, CVPR 2023), we examine the extent to which various additional decoding techniques affect the performance of visual experience reconstruction. Specifically, we combined our earlier work with the following three techniques: using decoded text from brain activity, nonlinear optimization for structural image reconstruction, and using decoded depth information from brain activity. We confirmed that these techniques contributed to improving accuracy over the baseline. We also discuss what researchers should consider when performing visual reconstruction using deep generative models trained on large datasets. Please check our webpage at //sites.google.com/view/stablediffusion-with-brain/. Code is also available at //github.com/yu-takagi/StableDiffusionReconstruction.

AI · INFORMS · 在線 · 有偏 · Engineering ·

2023 年 6 月 20 日

The Cultivated Practices of Text-to-Image Generation

Jonas Oppenlaender

from arxiv, In "Humane autonomous technology - Re-thinking experience with and in intelligent systems", Palgrave Macmillan, 2024

Humankind is entering a novel creative era in which anybody can synthesize digital information using generative artificial intelligence (AI). Text-to-image generation, in particular, has become vastly popular and millions of practitioners produce AI-generated images and AI art online. This chapter first gives an overview of the key developments that enabled a healthy co-creative online ecosystem around text-to-image generation to rapidly emerge, followed by a high-level description of key elements in this ecosystem. A particular focus is placed on prompt engineering, a creative practice that has been embraced by the AI art community. It is then argued that the emerging co-creative ecosystem constitutes an intelligent system on its own - a system that both supports human creativity, but also potentially entraps future generations and limits future development efforts in AI. The chapter discusses the potential risks and dangers of cultivating this co-creative ecosystem, such as the bias inherent in today's training data, potential quality degradation in future image generation systems due to synthetic data becoming common place, and the potential long-term effects of text-to-image generation on people's imagination, ambitions, and development.

Nuance · 標注 · MoDELS · 可辨認的 · binary ·

2023 年 6 月 16 日

Exploring the Viability of Synthetic Query Generation for Relevance Prediction

Aditi Chaudhary,Karthik Raman,Krishna Srinivasan,Kazuma Hashimoto,Mike Bendersky,Marc Najork

from arxiv, In Proceedings of ACM SIGIRWorkshop on eCommerce (SIGIR eCom 23)

Query-document relevance prediction is a critical problem in Information Retrieval systems. This problem has increasingly been tackled using (pretrained) transformer-based models which are finetuned using large collections of labeled data. However, in specialized domains such as e-commerce and healthcare, the viability of this approach is limited by the dearth of large in-domain data. To address this paucity, recent methods leverage these powerful models to generate high-quality task and domain-specific synthetic data. Prior work has largely explored synthetic data generation or query generation (QGen) for Question-Answering (QA) and binary (yes/no) relevance prediction, where for instance, the QGen models are given a document, and trained to generate a query relevant to that document. However in many problems, we have a more fine-grained notion of relevance than a simple yes/no label. Thus, in this work, we conduct a detailed study into how QGen approaches can be leveraged for nuanced relevance prediction. We demonstrate that -- contrary to claims from prior works -- current QGen approaches fall short of the more conventional cross-domain transfer-learning approaches. Via empirical studies spanning 3 public e-commerce benchmarks, we identify new shortcomings of existing QGen approaches -- including their inability to distinguish between different grades of relevance. To address this, we introduce label-conditioned QGen models which incorporates knowledge about the different relevance. While our experiments demonstrate that these modifications help improve performance of QGen techniques, we also find that QGen approaches struggle to capture the full nuance of the relevance label space and as a result the generated queries are not faithful to the desired relevance label.

目標領域 · MoDELS · 無監督 · Extensibility · state-of-the-art ·

2023 年 6 月 16 日

One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models

Yasser Benigmim,Subhankar Roy,Slim Essid,Vicky Kalogeiton,Stéphane Lathuilière

from arxiv, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition- Workshop on Generative Models for Computer Vision (CVPR-W 2023)

Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the appearance of the target domain. Departing from the common notion of transferring only the target ``texture'' information, we leverage text-to-image diffusion models (e.g., Stable Diffusion) to generate a synthetic target dataset with photo-realistic images that not only faithfully depict the style of the target domain, but are also characterized by novel scenes in diverse contexts. The text interface in our method Data AugmenTation with diffUsion Models (DATUM) endows us with the possibility of guiding the generation of images towards desired semantic concepts while respecting the original spatial context of a single training image, which is not possible in existing OSUDA methods. Extensive experiments on standard benchmarks show that our DATUM surpasses the state-of-the-art OSUDA methods by up to +7.1%. The implementation is available at //github.com/yasserben/DATUM

XAI · Things · state-of-the-art · 有向 · AI ·

2022 年 11 月 2 日

Explainable AI over the Internet of Things: Overview, State-of-the-Art and Future Directions

Senthil Kumar Jagatheesaperumal,Quoc-Viet Pham,Rukhsana Ruby,Zhaohui Yang,Chunmei Xu,Zhaoyang Zhang

from arxiv, 29 pages, 7 figures, 2 tables. IEEE Open Journal of the Communications Society (2022)

Explainable Artificial Intelligence (XAI) is transforming the field of Artificial Intelligence (AI) by enhancing the trust of end-users in machines. As the number of connected devices keeps on growing, the Internet of Things (IoT) market needs to be trustworthy for the end-users. However, existing literature still lacks a systematic and comprehensive survey work on the use of XAI for IoT. To bridge this lacking, in this paper, we address the XAI frameworks with a focus on their characteristics and support for IoT. We illustrate the widely-used XAI services for IoT applications, such as security enhancement, Internet of Medical Things (IoMT), Industrial IoT (IIoT), and Internet of City Things (IoCT). We also suggest the implementation choice of XAI models over IoT systems in these applications with appropriate examples and summarize the key inferences for future works. Moreover, we present the cutting-edge development in edge XAI structures and the support of sixth-generation (6G) communication services for IoT applications, along with key inferences. In a nutshell, this paper constitutes the first holistic compilation on the development of XAI-based frameworks tailored for the demands of future IoT use cases.

LayoutLM · INFORMS · 可理解性 · SCAN · MoDELS ·

2020 年 2 月 19 日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Yiheng Xu,Minghao Li,Lei Cui,Shaohan Huang,Furu Wei,Ming Zhou

from arxiv, Work in progress

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread of pre-training models for NLP applications, they almost focused on text-level manipulation, while neglecting the layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model the interaction between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage the image features to incorporate the visual information of words into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at //github.com/microsoft/unilm/tree/master/layoutlm.

屬性空間 · 多樣性 · Pair · MoDELS · 訓練數據 ·

2018 年 8 月 2 日

Diverse Image-to-Image Translation via Disentangled Representations

Hsin-Ying Lee,Hung-Yu Tseng,Jia-Bin Huang,Maneesh Kumar Singh,Ming-Hsuan Yang

from arxiv, ECCV 2018 (Oral). Project page: //vllab.ucmerced.edu/hylee/DRIT/ Code: //github.com/HsinYingLee/DRIT/

Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: 1) the lack of aligned training pairs and 2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and the attribute vectors sampled from the attribute space to produce diverse outputs at test time. To handle unpaired training data, we introduce a novel cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative comparisons, we measure realism with user study and diversity with a perceptual distance metric. We apply the proposed model to domain adaptation and show competitive performance when compared to the state-of-the-art on the MNIST-M and the LineMod datasets.

視覺問答 · 數據集 · Performer · state-of-the-art · MoDELS ·

2018 年 3 月 20 日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Qing Li,Qingyi Tao,Shafiq Joty,Jianfei Cai,Jiebo Luo

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.

Extensibility · 圖像字幕 · 情景 · Better · MoDELS ·

2017 年 12 月 21 日

Exploring Models and Data for Remote Sensing Image Caption Generation

Xiaoqiang Lu,Binqiang Wang,Xiangtao Zheng,Xuelong Li

from arxiv, 14 pages, 8 figures

Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, noticeable progress has been made in scene classification and target detection.However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at //github.com/2051/RSICD_optimal

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

可(ke)理(li)解性(xing)

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tr id='7eveE'><strong id='PQsA7'></strong><small id='SVmRO'></small><button id='6aUnv'></button><li id='jWUof'><noscript id='aTunE'><big id='pCHWR'></big><dt id='x8dDO'></dt></noscript></li></tr><ol id='U5prv'><option id='5tqZz'><table id='S8Vfl'><blockquote id='kug9D'><tbody id='6wf6Q'></tbody></blockquote></table></option></ol><u id='qnA0r'></u><kbd id='87qJL'><kbd id='pGD3l'></kbd></kbd>

<code id='BbMaa'><strong id='eqjJl'></strong></code>

<fieldset id='3MANV'></fieldset>

<span id='OSwRr'></span>

<ins id='XuNPp'></ins>

<acronym id='Dyfja'><em id='ecNQt'></em><td id='RGWxg'><div id='ULikS'></div></td></acronym><address id='x0Q6k'><big id='IKti1'><big id='LwQaR'></big><legend id='p3TEw'></legend></big></address>

<i id='fafYt'><div id='DCjvO'><ins id='zyDed'></ins></div></i>

<i id='hE0yP'></i>