亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='f8ry9'></tfoot>

<legend id='f8ry9'><style id='f8ry9'><dir id='f8ry9'><q id='f8ry9'></q></dir></style></legend>

<i id='f8ry9'><tr id='f8ry9'><dt id='f8ry9'><q id='f8ry9'><span id='f8ry9'><b id='f8ry9'><form id='f8ry9'><ins id='f8ry9'></ins><ul id='f8ry9'></ul><sub id='f8ry9'></sub></form><legend id='f8ry9'></legend><bdo id='f8ry9'><pre id='f8ry9'><center id='f8ry9'></center></pre></bdo></b><th id='f8ry9'></th></span></q></dt></tr></i><div id='f8ry9'><tfoot id='f8ry9'></tfoot><dl id='f8ry9'><fieldset id='f8ry9'></fieldset></dl></div>

<li id='f8ry9'><abbr id='f8ry9'></abbr></li>

·

GROUP · Networking · Learning · 推斷 · 路徑 ·

2024 年 5 月 28 日

Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

Muhammad Adi Nugroho,Sangmin Woo,Sumin Lee,Jinyoung Park,Yooseung Wang,Donguk Kim,Changick Kim

Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net) for WSGAR, which consists of the motion-aware actor encoder to extract actor features and the two-pathways relation module to infer the interaction among actors and their activity. Flaming-Net leverages an additional optical flow modality in the training stage to enhance its motion awareness when finding locally active actors. The first pathway of the relation module, the actor-centric path, initially captures the temporal dynamics of individual actors and then constructs inter-actor relationships. In parallel, the group-centric path starts by building spatial connections between actors within the same timeframe and then captures simultaneous spatio-temporal dynamics among them. We demonstrate that Flaming-Net achieves new state-of-the-art WSGAR results on two benchmarks, including a 2.8%p higher MPCA score on the NBA dataset. Importantly, we use the optical flow modality only for training and not for inference.

相關內容

GROUP

Group一直是研究計算機支持的合作工作、人機交互、計算機支持的協作學習和社會技術研究的主要場所。該會議將社會科學、計算機科學、工程、設計、價值觀以及其他與小組工作相關的多個不同主題的工作結合起來，并進行了廣泛的概念化。官網鏈接： · prototype · Networking · 語言模型化 · Learning ·

2024 年 7 月 7 日

Prototypical Reward Network for Data-Efficient RLHF

Jinghan Zhang,Xiting Wang,Yiqiao Jin,Changyu Chen,Xinhao Zhang,Kunpeng Liu

from arxiv, Accepted by ACL 2024

The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven effective in fine-tuning Large Language Models (LLMs). Notably, collecting human feedback for RLHF can be resource-intensive and lead to scalability issues for LLMs and complex tasks. Our proposed framework Proto-RM leverages prototypical networks to enhance reward models under limited human feedback. By enabling stable and reliable structural learning from fewer samples, Proto-RM significantly enhances LLMs' adaptability and accuracy in interpreting human preferences. Extensive experiments on various datasets demonstrate that Proto-RM significantly improves the performance of reward models and LLMs in human feedback tasks, achieving comparable and usually better results than traditional methods, while requiring significantly less data. in data-limited scenarios. This research offers a promising direction for enhancing the efficiency of reward models and optimizing the fine-tuning of language models under restricted feedback conditions.

MoDELS · state-of-the-art · Attention · 模態 · 循環網絡 ·

2024 年 7 月 6 日

A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

R. Gnana Praveen,Wheidima Carneiro de Melo,Nasib Ullah,Haseeb Aslam,Osama Zeeshan,Théo Denorme,Marco Pedersoli,Alessandro Koerich,Simon Bacon,Patrick Cardinal,Eric Granger

from arxiv, arXiv admin note: text overlap with arXiv:2111.05222

Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities (e.g., audio, visual, biosignals, etc.), and can provide some robustness to noisy modalities. Most state-of-the-art methods for audio-visual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complementary nature of A-V modalities. In this paper, we focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos. Specifically, we propose a joint cross-attention model that relies on the complementary relationships to extract the salient features across A-V modalities, allowing for accurate prediction of continuous values of valence and arousal. The proposed fusion model efficiently leverages the inter-modal relationships, while reducing the heterogeneity between the features. In particular, it computes the cross-attention weights based on correlation between the combined feature representation and individual modalities. By deploying the combined A-V feature representation into the cross-attention module, the performance of our fusion module improves significantly over the vanilla cross-attention module. Experimental results on validation-set videos from the AffWild2 dataset indicate that our proposed A-V fusion model provides a cost-effective solution that can outperform state-of-the-art approaches. The code is available on GitHub: //github.com/praveena2j/JointCrossAttentional-AV-Fusion.

層 · MoDELS · 變換 · INFORMS · Performer ·

2024 年 7 月 6 日

A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition

Shreya G. Upadhyay,Carlos Busso,Chi-Chun Lee

Cross-lingual speech emotion recognition (SER) is important for a wide range of everyday applications. While recent SER research relies heavily on large pretrained models for emotion training, existing studies often concentrate solely on the final transformer layer of these models. However, given the task-specific nature and hierarchical architecture of these models, each transformer layer encapsulates different levels of information. Leveraging this hierarchical structure, our study focuses on the information embedded across different layers. Through an examination of layer feature similarity across different languages, we propose a novel strategy called a layer-anchoring mechanism to facilitate emotion transfer in cross-lingual SER tasks. Our approach is evaluated using two distinct language affective corpora (MSP-Podcast and BIIC-Podcast), achieving a best UAR performance of 60.21% on the BIIC-podcast corpus. The analysis uncovers interesting insights into the behavior of popular pretrained models.

語言模型化 · MoDELS · 大語言模型 · CRAFT · 黑盒 ·

2024 年 7 月 5 日

Crafting Large Language Models for Enhanced Interpretability

Chung-En Sun,Tuomas Oikarinen,Tsui-Wei Weng

from arxiv, Present at ICML 2024 Mechanistic Interpretability (MI) Workshop

We introduce the Concept Bottleneck Large Language Model (CB-LLM), a pioneering approach to creating inherently interpretable Large Language Models (LLMs). Unlike traditional black-box LLMs that rely on post-hoc interpretation methods with limited neuron function insights, CB-LLM sets a new standard with its built-in interpretability, scalability, and ability to provide clear, accurate explanations. This innovation not only advances transparency in language models but also enhances their effectiveness. Our unique Automatic Concept Correction (ACC) strategy successfully narrows the performance gap with conventional black-box LLMs, positioning CB-LLM as a model that combines the high accuracy of traditional LLMs with the added benefit of clear interpretability -- a feature markedly absent in existing LLMs.

等變 · GROUP · 卷積 · 輸出 · MoDELS ·

2024 年 7 月 5 日

Variational Partial Group Convolutions for Input-Aware Partial Equivariance of Rotations and Color-Shifts

Hyunsu Kim,Yegon Kim,Hongseok Yang,Juho Lee

from arxiv, ICML2024

Group Equivariant CNNs (G-CNNs) have shown promising efficacy in various tasks, owing to their ability to capture hierarchical features in an equivariant manner. However, their equivariance is fixed to the symmetry of the whole group, limiting adaptability to diverse partial symmetries in real-world datasets, such as limited rotation symmetry of handwritten digit images and limited color-shift symmetry of flower images. Recent efforts address this limitation, one example being Partial G-CNN which restricts the output group space of convolution layers to break full equivariance. However, such an approach still fails to adjust equivariance levels across data. In this paper, we propose a novel approach, Variational Partial G-CNN (VP G-CNN), to capture varying levels of partial equivariance specific to each data instance. VP G-CNN redesigns the distribution of the output group elements to be conditioned on input data, leveraging variational inference to avoid overfitting. This enables the model to adjust its equivariance levels according to the needs of individual data points. Additionally, we address training instability inherent in discrete group equivariance models by redesigning the reparametrizable distribution. We demonstrate the effectiveness of VP G-CNN on both toy and real-world datasets, including MNIST67-180, CIFAR10, ColorMNIST, and Flowers102. Our results show robust performance, even in uncertainty metrics.

代價函數 · 泛化理論 · 泛函 · 代價 · Performer ·

2024 年 7 月 4 日

Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation

Hang Li,Qian Feng,Zhi Zheng,Jianxiang Feng,Alois Knoll

from arxiv, 11 pages, 8 figures

Learning from demonstrations faces challenges in generalizing beyond the training data and is fragile even to slight visual variations. To tackle this problem, we introduce Lan-o3dp, a language guided object centric diffusion policy that takes 3d representation of task relevant objects as conditional input and can be guided by cost function for safety constraints at inference time. Lan-o3dp enables strong generalization in various aspects, such as background changes, visual ambiguity and can avoid novel obstacles that are unseen during the demonstration process. Specifically, We first train a diffusion policy conditioned on point clouds of target objects and then harness a large language model to decompose the user instruction into task related units consisting of target objects and obstacles, which can be used as visual observation for the policy network or converted to a cost function, guiding the generation of trajectory towards collision free region at test time. Our proposed method shows training efficiency and higher success rates compared with the baselines in simulation experiments. In real world experiments, our method exhibits strong generalization performance towards unseen instances, cluttered scenes, scenes of multiple similar objects and demonstrates training free capability of obstacle avoidance.

模式崩潰 · 3D · Guidance · 峰值 · MoDELS ·

2024 年 7 月 4 日

View-Consistent 3D Editing with Gaussian Splatting

Yuxuan Wang,Xuanyu Yi,Zike Wu,Na Zhao,Long Chen,Hanwang Zhang

from arxiv, accepted to ECCV 2024

The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes. Further code and video results are released at //yuxuanw.me/vcedit/.

Performer · 學成 · Boosting（一種模型訓練加速方式） · MoDELS · 可辨認的 ·

2021 年 12 月 22 日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Lin Yang,Yi Shen,Yue Mao,Longjun Cai

from arxiv, Accepted by AAAI-2022

Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERC-oriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on "emotion shift" frequency within a conversation, then the conversations are scheduled in an "easy to hard" schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model's ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.

entity · 小樣本學習 · 注意力機制 · 圖 · Networking ·

2020 年 10 月 19 日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Jiawei Sheng,Shu Guo,Zhenyu Chen,Juwei Yue,Lihong Wang,Tingwen Liu,Hongbo Xu

from arxiv, 11 pages, 3 figures

Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of entities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.

圖 · Networking · 學成 · Performer · 深度學習 ·

2020 年 10 月 9 日

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Emanuele Rossi,Ben Chamberlain,Fabrizio Frasca,Davide Eynard,Federico Monti,Michael Bronstein

Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<li id='f8ry9'></li>

_{^{<dd id='f8ry9'><tbody id='f8ry9'><td id='f8ry9'><optgroup id='f8ry9'><strong id='f8ry9'></strong></optgroup><address id='f8ry9'><ul id='f8ry9'></ul></address><big id='f8ry9'></big></td><table id='f8ry9'></table></tbody><pre id='f8ry9'></pre></dd><span id='f8ry9'><b id='f8ry9'></b></span>}}


<dfn id='f8ry9'><optgroup id='f8ry9'></optgroup></dfn><tfoot id='f8ry9'><bdo id='f8ry9'><div id='f8ry9'></div><i id='f8ry9'><dt id='f8ry9'></dt></i></bdo></tfoot>

_{<fieldset id='f8ry9'></fieldset>}