亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='ark5z'></tfoot>

<legend id='ark5z'><style id='ark5z'><dir id='ark5z'><q id='ark5z'></q></dir></style></legend>

<i id='ark5z'><tr id='ark5z'><dt id='ark5z'><q id='ark5z'><span id='ark5z'><b id='ark5z'><form id='ark5z'><ins id='ark5z'></ins><ul id='ark5z'></ul><sub id='ark5z'></sub></form><legend id='ark5z'></legend><bdo id='ark5z'><pre id='ark5z'><center id='ark5z'></center></pre></bdo></b><th id='ark5z'></th></span></q></dt></tr></i><div id='ark5z'><tfoot id='ark5z'></tfoot><dl id='ark5z'><fieldset id='ark5z'></fieldset></dl></div>

·

可理解性 · ACM Multimedia · Attention · Pivotal（公司） · MoDELS ·

2023 年 9 月 27 日

Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks

Payal Mohapatra,Akash Pandey,Yueyuan Sui,Qi Zhu

from arxiv, Accepted to appear at ACM Multimedia 2023 Multimedia Grand Challenges Track

Human emotion understanding is pivotal in making conversational technology mainstream. We view speech emotion understanding as a perception task which is a more realistic setting. With varying contexts (languages, demographics, etc.) different share of people perceive the same speech segment as a non-unanimous emotion. As part of the ACM Multimedia 2023 Computational Paralinguistics ChallengE (ComParE) in the EMotion Share track, we leverage their rich dataset of multilingual speakers and multi-label regression target of 'emotion share' or perception of that emotion. We demonstrate that the training scheme of different foundation models dictates their effectiveness for tasks beyond speech recognition, especially for non-semantic speech tasks like emotion understanding. This is a very complex task due to multilingual speakers, variability in the target labels, and inherent imbalance in the regression dataset. Our results show that HuBERT-Large with a self-attention-based light-weight sequence model provides 4.6% improvement over the reported baseline.

相關內容

可理解性

可(ke)理解性

GPT-4V · 知識 (knowledge) · 視覺問答 · 自動問答 · MoDELS ·

2023 年 11 月 13 日

A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering

Yunxin Li,Longyue Wang,Baotian Hu,Xinyu Chen,Wanqi Zhong,Chenyang Lyu,Min Zhang

from arxiv, 18 pages, 13pages; working in progress

The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA). Yet, the true challenge lies in the domain of knowledge-intensive VQA tasks, which necessitate not just recognition of visual elements, but also a deep comprehension of the visual information in conjunction with a vast repository of learned knowledge. To uncover such capabilities of MLMs, particularly the newly introduced GPT-4V, we provide an in-depth evaluation from three perspectives: 1) Commonsense Knowledge, which assesses how well models can understand visual cues and connect to general knowledge; 2) Fine-grained World Knowledge, which tests the model's skill in reasoning out specific knowledge from images, showcasing their proficiency across various specialized fields; 3) Comprehensive Knowledge with Decision-making Rationales, which examines model's capability to provide logical explanations for its inference, facilitating a deeper analysis from the interpretability perspective. Extensive experiments indicate that GPT-4V achieves SOTA performance on above three tasks. Interestingly, we find that: a) GPT-4V demonstrates enhanced reasoning and explanation when using composite images as few-shot; b) GPT-4V produces severe hallucinations when dealing with world knowledge, highlighting the future need for advancements in this research direction.

Subspace · MoDELS · 表示 · Learning · 詞性標注 ·

2023 年 11 月 12 日

Parts of Speech-Grounded Subspaces in Vision-Language Models

James Oldfield,Christos Tzelepis,Yannis Panagakis,Mihalis A. Nicolaou,Ioannis Patras

from arxiv, Accepted at NeurIPS 2023

Latent image representations arising from vision-language models have proved immensely useful for a variety of downstream tasks. However, their utility is limited by their entanglement with respect to different visual attributes. For instance, recent work has shown that CLIP image representations are often biased toward specific visual properties (such as objects or actions) in an unpredictable manner. In this paper, we propose to separate representations of the different visual modalities in CLIP's joint vision-language space by leveraging the association between parts of speech and specific visual modes of variation (e.g. nouns relate to objects, adjectives describe appearance). This is achieved by formulating an appropriate component analysis model that learns subspaces capturing variability corresponding to a specific part of speech, while jointly minimising variability to the rest. Such a subspace yields disentangled representations of the different visual properties of an image or text in closed form while respecting the underlying geometry of the manifold on which the representations lie. What's more, we show the proposed model additionally facilitates learning subspaces corresponding to specific visual appearances (e.g. artists' painting styles), which enables the selective removal of entire visual themes from CLIP-based text-to-image synthesis. We validate the model both qualitatively, by visualising the subspace projections with a text-to-image model and by preventing the imitation of artists' styles, and quantitatively, through class invariance metrics and improvements to baseline zero-shot classification.

QoE · FAST · Automator · 可約的 · 剪枝 ·

2023 年 11 月 12 日

VidPlat: A Tool for Fast Crowdsourcing of Quality-of-Experience Measurements

Xu Zhang,Hanchen Li,Paul Schmitt,Marshini Chetty,Nick Feamster,Junchen Jiang

For video or web services, it is crucial to measure user-perceived quality of experience (QoE) at scale under various video quality or page loading delays. However, fast QoE measurements remain challenging as they must elicit subjective assessment from human users. Previous work either (1) automates QoE measurements by letting crowdsourcing raters watch and rate QoE test videos or (2) dynamically prunes redundant QoE tests based on previously collected QoE measurements. Unfortunately, it is hard to combine both ideas because traditional crowdsourcing requires QoE test videos to be pre-determined before a crowdsourcing campaign begins. Thus, if researchers want to dynamically prune redundant test videos based on other test videos' QoE, they are forced to launch multiple crowdsourcing campaigns, causing extra overheads to re-calibrate or train raters every time. This paper presents VidPlat, the first open-source tool for fast and automated QoE measurements, by allowing dynamic pruning of QoE test videos within a single crowdsourcing task. VidPlat creates an indirect shim layer between researchers and the crowdsourcing platforms. It allows researchers to define a logic that dynamically determines which new test videos need more QoE ratings based on the latest QoE measurements, and it then redirects crowdsourcing raters to watch QoE test videos dynamically selected by this logic. Other than having fewer crowdsourcing campaigns, VidPlat also reduces the total number of QoE ratings by dynamically deciding when enough ratings are gathered for each test video. It is an open-source platform that future researchers can reuse and customize. We have used VidPlat in three projects (web loading, on-demand video, and online gaming). We show that VidPlat can reduce crowdsourcing cost by 31.8% - 46.0% and latency by 50.9% - 68.8%.

成對型 · 模型評估 · 樣本 · 得分 · 可約的 ·

2023 年 11 月 10 日

Evaluation of Sampling Algorithms for a Pairwise Subjective Assessment Methodology

Shima Mohammadi,Joao Ascenso

from arxiv, 5 pages, 4 Figures

Subjective assessment tests are often employed to evaluate image processing systems, notably image and video compression, super-resolution among others and have been used as an indisputable way to provide evidence of the performance of an algorithm or system. While several methodologies can be used in a subjective quality assessment test, pairwise comparison tests are nowadays attracting a lot of attention due to their accuracy and simplicity. However, the number of comparisons in a pairwise comparison test increases quadratically with the number of stimuli and thus often leads to very long tests, which is impractical for many cases. However, not all the pairs contribute equally to the final score and thus, it is possible to reduce the number of comparisons without degrading the final accuracy. To do so, pairwise sampling methods are often used to select the pairs which provide more information about the quality of each stimuli. In this paper, a reliable and much-needed evaluation procedure is proposed and used for already available methods in the literature, especially considering the case of subjective evaluation of image and video codecs. The results indicate that an appropriate selection of the pairs allows to achieve very reliable scores while requiring the comparison of a much lower number of pairs.

MINE · 哈希學習 · 描述符 · 圖像檢索 · GROUP ·

2023 年 11 月 10 日

Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval

Xin Lu,Shikun Chen,Yichao Cao,Xin Zhou,Xiaobo Lu

In recent years, hashing methods have been popular in the large-scale media search for low storage and strong representation capabilities. To describe objects with similar overall appearance but subtle differences, more and more studies focus on hashing-based fine-grained image retrieval. Existing hashing networks usually generate both local and global features through attention guidance on the same deep activation tensor, which limits the diversity of feature representations. To handle this limitation, we substitute convolutional descriptors for attention-guided features and propose an Attributes Grouping and Mining Hashing (AGMH), which groups and embeds the category-specific visual attributes in multiple descriptors to generate a comprehensive feature representation for efficient fine-grained image retrieval. Specifically, an Attention Dispersion Loss (ADL) is designed to force the descriptors to attend to various local regions and capture diverse subtle details. Moreover, we propose a Stepwise Interactive External Attention (SIEA) to mine critical attributes in each descriptor and construct correlations between fine-grained attributes and objects. The attention mechanism is dedicated to learning discrete attributes, which will not cost additional computations in hash codes generation. Finally, the compact binary codes are learned by preserving pairwise similarities. Experimental results demonstrate that AGMH consistently yields the best performance against state-of-the-art methods on fine-grained benchmark datasets.

Performer · CUDA · AMD · 英特爾 (Intel) · 英偉達（NVIDIA） ·

2023 年 11 月 10 日

Comparing Performance and Portability between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs

Manuel Costanzo,Enzo Rucci,Carlos García Sánchez,Marcelo Naiouf,Manuel Prieto-Matías

from arxiv, This article was accepted for publication in 2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

The heterogeneous computing paradigm has led to the need for portable and efficient programming solutions that can leverage the capabilities of various hardware devices, such as NVIDIA, Intel, and AMD GPUs. This study evaluates the portability and performance of the SYCL and CUDA languages for one fundamental bioinformatics application (Smith-Waterman protein database search) across different GPU architectures, considering single and multi-GPU configurations from different vendors. The experimental work showed that, while both CUDA and SYCL versions achieve similar performance on NVIDIA devices, the latter demonstrated remarkable code portability to other GPU architectures, such as AMD and Intel. Furthermore, the architectural efficiency rates achieved on these devices were superior in 3 of the 4 cases tested. This brief study highlights the potential of SYCL as a viable solution for achieving both performance and portability in the heterogeneous computing ecosystem.

控制器 · MoDELS · SimPLe · SOFT · 機器人 ·

2023 年 11 月 9 日

Modeling and Control of Intrinsically Elasticity Coupled Soft-Rigid Robots

Zach J. Patterson,Cosimo Della Santina,Daniela Rus

from arxiv, 7 pages, 8 figures

While much work has been done recently in the realm of model-based control of soft robots and soft-rigid hybrids, most works examine robots that have an inherently serial structure. While these systems have been prevalent in the literature, there is an increasing trend toward designing soft-rigid hybrids with intrinsically coupled elasticity between various degrees of freedom. In this work, we seek to address the issues of modeling and controlling such structures, particularly when underactuated. We introduce several simple models for elastic coupling, typical of those seen in these systems. We then propose a controller that compensates for the elasticity, and we prove its stability with Lyapunov methods without relying on the elastic dominance assumption. This controller is applicable to the general class of underactuated soft robots. After evaluating the controller in simulated cases, we then develop a simple hardware platform to evaluate both the models and the controller. Finally, using the hardware, we demonstrate a novel use case for underactuated, elastically coupled systems in "sensorless" force control.

Unity · 劃分 · 圖 · Performer · Signal Processing ·

2023 年 11 月 7 日

Node-Bound Communities for Partition of Unity Interpolation on Graphs

Roberto Cavoretto,Alessandra De Rossi,Sandro Lancellotti,Federico Romaniello

from arxiv, 13 pages, 4 figures. arXiv admin note: text overlap with arXiv:2311.04299

Graph signal processing benefits significantly from the direct and highly adaptable supplementary techniques offered by partition of unity methods (PUMs) on graphs. In our approach, we demonstrate the generation of a partition of unity solely based on the underlying graph structure, employing an algorithm that relies exclusively on centrality measures and modularity, without requiring the input of the number of subdomains. Subsequently, we integrate PUMs with a local graph basis function (GBF) approximation method to develop cost-effective global interpolation schemes. We also discuss numerical experiments conducted on both synthetic and real datasets to assess the performance of this presented technique.

Vision · 模型評估 · 可約的 · 計算機視覺 · DNN ·

2020 年 3 月 24 日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Abhinav Goel,Caleb Tung,Yung-Hsiang Lu,George K. Thiruvathukal

from arxiv, Accepted for publication at 2020 IEEE 6th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA 2020

Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.

任務對話系統 · 學成 · 極小點 · 深度學習 · Vision ·

2018 年 1 月 11 日

A Survey on Dialogue Systems: Recent Advances and New Frontiers

Hongshen Chen,Xiaorui Liu,Dawei Yin,Jiliang Tang

from arxiv, 13 pages. arXiv admin note: text overlap with arXiv:1703.01008 by other authors

Dialogue systems have attracted more and more attention. Recent advances on dialogue systems are overwhelmingly contributed by deep learning techniques, which have been employed to enhance a wide range of big data applications such as computer vision, natural language processing, and recommender systems. For dialogue systems, deep learning can leverage a massive amount of data to learn meaningful feature representations and response generation strategies, while requiring a minimum amount of hand-crafting. In this article, we give an overview to these recent advances on dialogue systems from various perspectives and discuss some possible research directions. In particular, we generally divide existing dialogue systems into task-oriented and non-task-oriented models, then detail how deep learning techniques help them with representative algorithms and finally discuss some appealing research directions that can bring the dialogue system research into a new frontier.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

Pivotal（公(gong)司）

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='DifAH'></tfoot>

<legend id='9lpKr'><style id='knEr0'><dir id='vfluW'><q id='i0Wev'></q></dir></style></legend>

<i id='CeeHF'><tr id='bCIib'><dt id='2xj7u'><q id='tbHkq'><span id='stWlK'><b id='5KHVr'><form id='GnMoQ'><ins id='7GE4a'></ins><ul id='3dqvx'></ul><sub id='VAXUI'></sub></form><legend id='4R3Tf'></legend><bdo id='ir47L'><pre id='A6W1F'><center id='OV5py'></center></pre></bdo></b><th id='WHNkx'></th></span></q></dt></tr></i><div id='weSv1'><tfoot id='f2fhh'></tfoot><dl id='MjIDR'><fieldset id='t2VSU'></fieldset></dl></div>

<li id='Qclxt'><abbr id='l3iVR'></abbr></li>