亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

·

圖片分類 · Prompt · 模型評估 · 類別 · 詞元分析器 ·

2023 年 10 月 9 日

TransHP: Image Classification with Hierarchical Prompting

Wenhao Wang,Yifan Sun,Wei Li,Yi Yang

from arxiv, Accepted to NeurIPS 2023

This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task. Different from prior HIC methods, our hierarchical prompting is the first to explicitly inject ancestor-class information as a tokenized hint that benefits the descendant-class discrimination. We think it well imitates human visual recognition, i.e., humans may use the ancestor class as a prompt to draw focus on the subtle differences among descendant classes. We model this prompting mechanism into a Transformer with Hierarchical Prompting (TransHP). TransHP consists of three steps: 1) learning a set of prompt tokens to represent the coarse (ancestor) classes, 2) on-the-fly predicting the coarse class of the input image at an intermediate block, and 3) injecting the prompt token of the predicted coarse class into the intermediate feature. Though the parameters of TransHP maintain the same for all input images, the injected coarse-class prompt conditions (modifies) the subsequent feature extraction and encourages a dynamic focus on relatively subtle differences among the descendant classes. Extensive experiments show that TransHP improves image classification on accuracy (e.g., improving ViT-B/16 by +2.83% ImageNet classification accuracy), training data efficiency (e.g., +12.69% improvement under 10% ImageNet training data), and model explainability. Moreover, TransHP also performs favorably against prior HIC methods, showing that TransHP well exploits the hierarchical information.

相關內容

圖片分類

圖像分類，顧名思義，是一個輸入圖像，輸出對該圖像內容分類的描述的問題。它是計算機視覺的核心，實際應用廣泛。

Performer · MoDELS · Adobe Flash · 控制器 · Extensibility ·

2023 年 11 月 24 日

ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model

Eslam Mohamed Bakr,Liangbing Zhao,Vincent Tao Hu,Matthieu Cord,Patrick Perez,Mohamed Elhoseiny

Diffusion-based generative models excel in perceptually impressive synthesis but face challenges in interpretability. This paper introduces ToddlerDiffusion, an interpretable 2D diffusion image-synthesis framework inspired by the human generation system. Unlike traditional diffusion models with opaque denoising steps, our approach decomposes the generation process into simpler, interpretable stages; generating contours, a palette, and a detailed colored image. This not only enhances overall performance but also enables robust editing and interaction capabilities. Each stage is meticulously formulated for efficiency and accuracy, surpassing Stable-Diffusion (LDM). Extensive experiments on datasets like LSUN-Churches and COCO validate our approach, consistently outperforming existing methods. ToddlerDiffusion achieves notable efficiency, matching LDM performance on LSUN-Churches while operating three times faster with a 3.76 times smaller architecture. Our source code is provided in the supplementary material and will be publicly accessible.

估計/估計量 · 穩健性 · FAST · MoDELS · CAD ·

2023 年 11 月 23 日

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

Van Nguyen Nguyen,Thibault Groueix,Mathieu Salzmann,Vincent Lepetit

We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative templates, rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three and matches the input image to the templates using fast nearest neighbor search in feature space, results in a speedup factor of 38x compared to the state of the art. Moreover, GigaPose is significantly more robust to segmentation errors. Our extensive evaluation on the seven core datasets of the BOP challenge demonstrates that it achieves state-of-the-art accuracy and can be seamlessly integrated with a refinement method. Additionally, we show the potential of GigaPose with 3D models predicted by recent work on 3D reconstruction from a single image, relaxing the need for CAD models and making 6D pose object estimation much more convenient. Our source code and trained models are publicly available at //github.com/nv-nguyen/gigaPose

表示 · 大學 · Extensibility · AI · Taxonomy ·

2023 年 11 月 23 日

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

Ruibin Yuan,Yinghao Ma,Yizhi Li,Ge Zhang,Xingran Chen,Hanzhi Yin,Le Zhuo,Yiqi Liu,Jiawen Huang,Zeyue Tian,Binyue Deng,Ningzhi Wang,Chenghua Lin,Emmanouil Benetos,Anton Ragni,Norbert Gyenge,Roger Dannenberg,Wenhu Chen,Gus Xia,Wei Xue,Si Liu,Shi Wang,Ruibo Liu,Yike Guo,Jie Fu

from arxiv, camera-ready version for NeurIPS 2023

In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue, we introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and reproducible suite for the community, with a clear statement on copyright issues on datasets. Results suggest recently proposed large-scale pre-trained musical language models perform the best in most tasks, with room for further improvement. The leaderboard and toolkit repository are published at //marble-bm.shef.ac.uk to promote future music AI research.

Learning · Wireless Networks · 聯邦學習 · 回合 · FAST ·

2023 年 11 月 22 日

FAVANO: Federated AVeraging with Asynchronous NOdes

Louis Leconte,Van Minh Nguyen,Eric Moulines

In this paper, we propose a novel centralized Asynchronous Federated Learning (FL) framework, FAVANO, for training Deep Neural Networks (DNNs) in resource-constrained environments. Despite its popularity, ``classical'' federated learning faces the increasingly difficult task of scaling synchronous communication over large wireless networks. Moreover, clients typically have different computing resources and therefore computing speed, which can lead to a significant bias (in favor of ``fast'' clients) when the updates are asynchronous. Therefore, practical deployment of FL requires to handle users with strongly varying computing speed in communication/resource constrained setting. We provide convergence guarantees for FAVANO in a smooth, non-convex environment and carefully compare the obtained convergence guarantees with existing bounds, when they are available. Experimental results show that the FAVANO algorithm outperforms current methods on standard benchmarks.

INTERACT · 圖像分割 · MoDELS · Extensibility · HTTPS ·

2023 年 11 月 22 日

SegVol: Universal and Interactive Volumetric Medical Image Segmentation

Yuxin Du,Fan Bai,Tiejun Huang,Bo Zhao

Precise image segmentation provides clinical study with meaningful and well-structured information. Despite the remarkable progress achieved in medical image segmentation, there is still an absence of foundation segmentation model that can segment a wide range of anatomical categories with easy user interaction. In this paper, we propose a universal and interactive volumetric medical image segmentation model, named SegVol. By training on 90k unlabeled Computed Tomography (CT) volumes and 6k labeled CTs, this foundation model supports the segmentation of over 200 anatomical categories using semantic and spatial prompts. Extensive experiments verify that SegVol outperforms the state of the art by a large margin on multiple segmentation benchmarks. Notably, on three challenging lesion datasets, our method achieves around 20% higher Dice score than nnU-Net. The model and data are publicly available at: //github.com/BAAI-DCAI/SegVol.

知識 (knowledge) · 語言模型化 · MoDELS · 潛在 · INTERACT ·

2023 年 11 月 21 日

Latent Lab: Large Language Models for Knowledge Exploration

Kevin Dunnell,Trudy Painter,Andrew Stoddard,Andy Lippman

This paper investigates the potential of AI models, particularly large language models (LLMs), to support knowledge exploration and augment human creativity during ideation. We present "Latent Lab" an interactive tool for discovering connections among MIT Media Lab research projects, emphasizing "exploration" over search. The work offers insights into collaborative AI systems by addressing the challenges of organizing, searching, and synthesizing content. In a user study, the tool's success was evaluated based on its ability to introduce users to an unfamiliar knowledge base, ultimately setting the groundwork for the ongoing advancement of human-AI knowledge exploration systems.

自動問答 · 注意力機制 · 可約的 · MoDELS · 匯聚 ·

2021 年 5 月 10 日

Poolingformer: Long Document Modeling with Pooling Attention

Hang Zhang,Yeyun Gong,Yelong Shen,Weisheng Li,Jiancheng Lv,Nan Duan,Weizhu Chen

from arxiv, Accepted by ICML 2021

In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.

圖 · 知識圖譜 · 鏈路預測 · Extensibility · entity ·

2020 年 10 月 6 日

CoDEx: A Comprehensive Knowledge Graph Completion Benchmark

Tara Safavi,Danai Koutra

from arxiv, EMNLP 2020

We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at //bit.ly/2EPbrJs.

判別器 · 語義相似度 · state-of-the-art · 相似度 · MoDELS ·

2019 年 9 月 15 日

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Wataru Hirota,Yoshihiko Suhara,Behzad Golshan,Wang-Chiew Tan

We present Emu, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The semantic classifier improves the semantic similarity of related sentences, whereas the language discriminator enhances the multilinguality of the embeddings via multilingual adversarial training. Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data.

entity · 鏈路預測 · Extensibility · 圖 · 知識圖譜 ·

2019 年 3 月 13 日

MMKG: Multi-Modal Knowledge Graphs

Ye Liu,Hui Li,Alberto Garcia-Duran,Mathias Niepert,Daniel Onoro-Rubio,David S. Rosenblum

from arxiv, ESWC 2019

We present MMKG, a collection of three knowledge graphs that contain both numerical features and (links to) images for all entities as well as entity alignments between pairs of KGs. Therefore, multi-relational link prediction and entity matching communities can benefit from this resource. We believe this data set has the potential to facilitate the development of novel multi-modal learning approaches for knowledge graphs.We validate the utility ofMMKG in the sameAs link prediction task with an extensive set of experiments. These experiments show that the task at hand benefits from learning of multiple feature types.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

詞元分析器

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tr id='gMDMr'><strong id='veSjA'></strong><small id='0cdR6'></small><button id='FWwNQ'></button><li id='BD2gY'><noscript id='TO2YD'><big id='ryQPA'></big><dt id='fT9pp'></dt></noscript></li></tr><ol id='5BHPL'><option id='Hmflr'><table id='KnCyc'><blockquote id='MVwXH'><tbody id='Qn4Pi'></tbody></blockquote></table></option></ol><u id='Kvxr1'></u><kbd id='WXhyW'><kbd id='07lEq'></kbd></kbd>

<code id='b8Rao'><strong id='UomDf'></strong></code>

<fieldset id='MCyYP'></fieldset>

<span id='mKcuk'></span>

<ins id='lZNML'></ins>

<acronym id='EqruQ'><em id='YAMYL'></em><td id='JzUQi'><div id='YvLrw'></div></td></acronym><address id='bUhBE'><big id='ODIpl'><big id='Kbsog'></big><legend id='rqun0'></legend></big></address>

<i id='lE93E'><div id='9QJ4i'><ins id='Ml0If'></ins></div></i>

<i id='IS98E'></i>