久久久久精品电影_全部免费毛片AV_在线播放高清资源国产成人精品_久久精品国产亚洲_国产一区二区三区无码免费播放_国产一区二区三区深田咏美_18禁毛片永久免费观看

Designers rely on visual search to explore and develop ideas in early design stages. However, designers can struggle to identify suitable text queries to initiate a search or to discover images for similarity-based search that can adequately express their intent. We propose GenQuery, a novel system that integrates generative models into the visual search process. GenQuery can automatically elaborate on users' queries and surface concrete search directions when users only have abstract ideas. To support precise expression of search intents, the system enables users to generatively modify images and use these in similarity-based search. In a comparative user study (N=16), designers felt that they could more accurately express their intents and find more satisfactory outcomes with GenQuery compared to a tool without generative features. Furthermore, the unpredictability of generations allowed participants to uncover more diverse outcomes. By supporting both convergence and divergence, GenQuery led to a more creative experience.

相關內容

生成模型

關注 131

在(zai)機(ji)器學習中，生成(cheng)模型可(ke)以用來直(zhi)接對(dui)數(shu)據(ju)建模（例(li)如根據(ju)某個變量的概(gai)率(lv)密度函數(shu)進行數(shu)據(ju)采樣），也可(ke)以用來建立變量間(jian)的條件概(gai)率(lv)分(fen)布。條件概(gai)率(lv)分(fen)布可(ke)以由生成(cheng)模型根據(ju)貝葉斯定(ding)理形(xing)成(cheng)。

控制器 · 代碼 · 可約的 · Readability · CASES ·

2024 年 4 月 16 日

AniFrame: A Programming Language for 2D Drawing and Frame-Based Animation

Mark Edward M. Gonzales,Hans Oswald A. Ibrahim,Elyssia Barrie H. Ong,Ryan Austin Fernandez

from arxiv, Accepted for paper presentation at the 24th Philippine Computing Science Congress (PCSC 2024), held in Laguna, Philippines

Creative coding is an experimentation-heavy activity that requires translating high-level visual ideas into code. However, most languages and libraries for creative coding may not be adequately intuitive for beginners. In this paper, we present AniFrame, a domain-specific language for drawing and animation. Designed for novice programmers, it (i) features animation-specific data types, operations, and built-in functions to simplify the creation and animation of composite objects, (ii) allows for fine-grained control over animation sequences through explicit specification of the target object and the start and end frames, (iii) reduces the learning curve through a Python-like syntax, type inferencing, and a minimal set of control structures and keywords that map closely to their semantic intent, and (iv) promotes computational expressivity through support for common mathematical operations, built-in trigonometric functions, and user-defined recursion. Our usability test demonstrates AniFrame's potential to enhance readability and writability for multiple creative coding use cases. AniFrame is open-source, and its implementation and reference are available at //github.com/memgonzales/aniframe-language.

知識 (knowledge) · MoDELS · 自動問答 · 基 · 知識庫 ·

2024 年 4 月 16 日

Find The Gap: Knowledge Base Reasoning For Visual Question Answering

Elham J. Barezi,Parisa Kordjamshidi

We analyze knowledge-based visual question answering, for which given a question, the models need to ground it into the visual modality and retrieve the relevant knowledge from a given large knowledge base (KB) to be able to answer. Our analysis has two folds, one based on designing neural architectures and training them from scratch, and another based on large pre-trained language models (LLMs). Our research questions are: 1) Can we effectively augment models by explicit supervised retrieval of the relevant KB information to solve the KB-VQA problem? 2) How do task-specific and LLM-based models perform in the integration of visual and external knowledge, and multi-hop reasoning over both sources of information? 3) Is the implicit knowledge of LLMs sufficient for KB-VQA and to what extent it can replace the explicit KB? Our results demonstrate the positive impact of empowering task-specific and LLM models with supervised external and visual knowledge retrieval models. Our findings show that though LLMs are stronger in 1-hop reasoning, they suffer in 2-hop reasoning in comparison with our fine-tuned NN model even if the relevant information from both modalities is available to the model. Moreover, we observed that LLM models outperform the NN model for KB-related questions which confirms the effectiveness of implicit knowledge in LLMs however, they do not alleviate the need for external KB.

語言模型化 · Continuity · 大語言模型 · Learning · MoDELS ·

2024 年 4 月 9 日

AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning

Senkang Hu,Zhengru Fang,Zihan Fang,Xianhao Chen,Yuguang Fang

Connected and autonomous driving is developing rapidly in recent years. However, current autonomous driving systems, which are primarily based on data-driven approaches, exhibit deficiencies in interpretability, generalization, and continuing learning capabilities. In addition, the single-vehicle autonomous driving systems lack of the ability of collaboration and negotiation with other vehicles, which is crucial for the safety and efficiency of autonomous driving systems. In order to address these issues, we leverage large language models (LLMs) to develop a novel framework, AgentsCoDriver, to enable multiple vehicles to conduct collaborative driving. AgentsCoDriver consists of five modules: observation module, reasoning engine, cognitive memory module, reinforcement reflection module, and communication module. It can accumulate knowledge, lessons, and experiences over time by continuously interacting with the environment, thereby making itself capable of lifelong learning. In addition, by leveraging the communication module, different agents can exchange information and realize negotiation and collaboration in complex traffic environments. Extensive experiments are conducted and show the superiority of AgentsCoDriver.

話題模型 · 話題 · MoDELS · Networking · 圖 ·

2024 年 4 月 2 日

GINopic: Topic Modeling with Graph Isomorphism Network

Suman Adhya,Debarshi Kumar Sanyal

from arxiv, Accepted as a long paper for NAACL 2024 main conference

Topic modeling is a widely used approach for analyzing and exploring large document collections. Recent research efforts have incorporated pre-trained contextualized language models, such as BERT embeddings, into topic modeling. However, they often neglect the intrinsic informational value conveyed by mutual dependencies between words. In this study, we introduce GINopic, a topic modeling framework based on graph isomorphism networks to capture the correlation between words. By conducting intrinsic (quantitative as well as qualitative) and extrinsic evaluations on diverse benchmark datasets, we demonstrate the effectiveness of GINopic compared to existing topic models and highlight its potential for advancing topic modeling.

圖 · U-Net · Neural Networks · 卷積 · Networking ·

2024 年 4 月 2 日

MAgNET: A Graph U-Net Architecture for Mesh-Based Simulations

Saurabh Deshpande,Stéphane P. A. Bordas,Jakub Lengiewicz

In many cutting-edge applications, high-fidelity computational models prove to be too slow for practical use and are therefore replaced by much faster surrogate models. Recently, deep learning techniques have increasingly been utilized to accelerate such predictions. To enable learning on large-dimensional and complex data, specific neural network architectures have been developed, including convolutional and graph neural networks. In this work, we present a novel encoder-decoder geometric deep learning framework called MAgNET, which extends the well-known convolutional neural networks to accommodate arbitrary graph-structured data. MAgNET consists of innovative Multichannel Aggregation (MAg) layers and graph pooling/unpooling layers, forming a graph U-Net architecture that is analogous to convolutional U-Nets. We demonstrate the predictive capabilities of MAgNET in surrogate modeling for non-linear finite element simulations in the mechanics of solids.

單峰值 · 多峰值 · 標注 · 未標記 · 設計 ·

2024 年 4 月 2 日

MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few Labels

Lilin Xu,Chaojie Gu,Rui Tan,Shibo He,Jiming Chen

from arxiv, Accepted to the 21th ACM Conference on Embedded Networked Sensor Systems (SenSys 2023)

Human activity recognition (HAR) will be an essential function of various emerging applications. However, HAR typically encounters challenges related to modality limitations and label scarcity, leading to an application gap between current solutions and real-world requirements. In this work, we propose MESEN, a multimodal-empowered unimodal sensing framework, to utilize unlabeled multimodal data available during the HAR model design phase for unimodal HAR enhancement during the deployment phase. From a study on the impact of supervised multimodal fusion on unimodal feature extraction, MESEN is designed to feature a multi-task mechanism during the multimodal-aided pre-training stage. With the proposed mechanism integrating cross-modal feature contrastive learning and multimodal pseudo-classification aligning, MESEN exploits unlabeled multimodal data to extract effective unimodal features for each modality. Subsequently, MESEN can adapt to downstream unimodal HAR with only a few labeled samples. Extensive experiments on eight public multimodal datasets demonstrate that MESEN achieves significant performance improvements over state-of-the-art baselines in enhancing unimodal HAR by exploiting multimodal data.

Automator · TOOLS · 情景 · 可理解性 · 覆蓋 ·

2024 年 4 月 1 日

AURORA: Navigating UI Tarpits via Automated Neural Screen Understanding

Safwat Ali Khan,Wenyu Wang,Yiran Ren,Bin Zhu,Jiangfan Shi,Alyssa McGowan,Wing Lam,Kevin Moran

from arxiv, Published at 17th IEEE International Conference on Software Testing, Verification and Validation (ICST) 2024, 12 pages

Nearly a decade of research in software engineering has focused on automating mobile app testing to help engineers in overcoming the unique challenges associated with the software platform. Much of this work has come in the form of Automated Input Generation tools (AIG tools) that dynamically explore app screens. However, such tools have repeatedly been demonstrated to achieve lower-than-expected code coverage - particularly on sophisticated proprietary apps. Prior work has illustrated that a primary cause of these coverage deficiencies is related to so-called tarpits, or complex screens that are difficult to navigate. In this paper, we take a critical step toward enabling AIG tools to effectively navigate tarpits during app exploration through a new form of automated semantic screen understanding. We introduce AURORA, a technique that learns from the visual and textual patterns that exist in mobile app UIs to automatically detect common screen designs and navigate them accordingly. The key idea of AURORA is that there are a finite number of mobile app screen designs, albeit with subtle variations, such that the general patterns of different categories of UI designs can be learned. As such, AURORA employs a multi-modal, neural screen classifier that is able to recognize the most common types of UI screen designs. After recognizing a given screen, it then applies a set of flexible and generalizable heuristics to properly navigate the screen. We evaluated AURORA both on a set of 12 apps with known tarpits from prior work, and on a new set of five of the most popular apps from the Google Play store. Our results indicate that AURORA is able to effectively navigate tarpit screens, outperforming prior approaches that avoid tarpits by 19.6% in terms of method coverage. The improvements can be attributed to AURORA's UI design classification and heuristic navigation techniques.

Pyramid · MoDELS · Extensibility · state-of-the-art · Performer ·

2022 年 12 月 1 日

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Wan-Cyuan Fan,Yen-Chun Chen,Dongdong Chen,Yu Cheng,Lu Yuan,Yu-Chiang Frank Wang

from arxiv, AAAI 2023

Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output. During the above multi-scale representation learning stage, additional input conditions like text, scene graph, or image layout can be further exploited. Thus, Frido can be also applied for conditional or cross-modality image synthesis. We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image. More specifically, we achieved state-of-the-art FID scores on five benchmarks, namely layout-to-image on COCO and OpenImages, scene-graph-to-image on COCO and Visual Genome, and label-to-image on COCO. Code is available at //github.com/davidhalladay/Frido.

Pix2Seq · 語言模型化 · 目標檢測 · MoDELS · 離散化 ·

2021 年 9 月 22 日

Pix2seq: A Language Modeling Framework for Object Detection

Ting Chen,Saurabh Saxena,Lala Li,David J. Fleet,Geoffrey Hinton

This paper presents Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we simply cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural net to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural net knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.

state-of-the-art · 可理解性 · BERT · 去噪自編碼器 · Performer ·

2019 年 6 月 19 日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang,Zihang Dai,Yiming Yang,Jaime Carbonell,Ruslan Salakhutdinov,Quoc V. Le

from arxiv, Pretrained models and code are available at //github.com/zihangdai/xlnet

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.