动漫AV观看网站不卡无码,黄色片视频免费观看国产,无码人妻精品一区二区三批,亚洲精品无码国产片台湾精品

This study introduces LRDif, a novel diffusion-based framework designed specifically for facial expression recognition (FER) within the context of under-display cameras (UDC). To address the inherent challenges posed by UDC's image degradation, such as reduced sharpness and increased noise, LRDif employs a two-stage training strategy that integrates a condensed preliminary extraction network (FPEN) and an agile transformer network (UDCformer) to effectively identify emotion labels from UDC images. By harnessing the robust distribution mapping capabilities of Diffusion Models (DMs) and the spatial dependency modeling strength of transformers, LRDif effectively overcomes the obstacles of noise and distortion inherent in UDC environments. Comprehensive experiments on standard FER datasets including RAF-DB, KDEF, and FERPlus, LRDif demonstrate state-of-the-art performance, underscoring its potential in advancing FER applications. This work not only addresses a significant gap in the literature by tackling the UDC challenge in FER but also sets a new benchmark for future research in the field.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · Cognition · MoDELS · Learning · Notability ·

2024 年 3 月 14 日

M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

Long Nguyen-Phuoc,Renald Gaboriau,Dimitri Delacroix,Laurent Navarro

This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.

后向 · 語言模型化 · 大語言模型 · MoDELS · 相似度 ·

2024 年 3 月 14 日

BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings

Xianming Li,Jing Li

from arxiv, Accepted by NAACL24 Main Conference

Sentence embeddings are crucial in measuring semantic similarity. Most recent studies employed large language models (LLMs) to learn sentence embeddings. Existing LLMs mainly adopted autoregressive architecture without explicit backward dependency modeling. Therefore, we examined the effects of backward dependencies in LLMs for semantic similarity measurements. Concretely, we propose a novel model: backward dependency enhanced large language model (BeLLM). It learns sentence embeddings via transforming specific attention layers from uni- to bi-directional. We extensively experiment across various semantic textual similarity (STS) tasks and downstream applications. BeLLM achieves state-of-the-art performance in varying scenarios. It shows that auto-regressive LLMs benefit from backward dependencies for sentence embeddings.

三維重建 · 3D · Microsoft Surface · INFORMS · 全 ·

2024 年 3 月 13 日

3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface

Linyi Jin,Nilesh Kulkarni,David Fouhey

from arxiv, Accepted to CVPR 2024. Project Page //jinlinyi.github.io/3DFIRES/

This paper introduces 3DFIRES, a novel system for scene-level 3D reconstruction from posed images. Designed to work with as few as one view, 3DFIRES reconstructs the complete geometry of unseen scenes, including hidden surfaces. With multiple view inputs, our method produces full reconstruction within all camera frustums. A key feature of our approach is the fusion of multi-view information at the feature level, enabling the production of coherent and comprehensive 3D reconstruction. We train our system on non-watertight scans from large-scale real scene dataset. We show it matches the efficacy of single-view reconstruction methods with only one input and surpasses existing techniques in both quantitative and qualitative measures for sparse-view 3D reconstruction.

Learning · 掩碼 · MoDELS · 3D · 表示學習 ·

2024 年 3 月 13 日

MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning

Jialv Zou,Bencheng Liao,Qian Zhang,Wenyu Liu,Xinggang Wang

Learning robust and scalable visual representations from massive multi-view video data remains a challenge in computer vision and autonomous driving. Existing pre-training methods either rely on expensive supervised learning with 3D annotations, limiting the scalability, or focus on single-frame or monocular inputs, neglecting the temporal information. We propose MIM4D, a novel pre-training paradigm based on dual masked image modeling (MIM). MIM4D leverages both spatial and temporal relations by training on masked multi-view video inputs. It constructs pseudo-3D features using continuous scene flow and projects them onto 2D plane for supervision. To address the lack of dense 3D supervision, MIM4D reconstruct pixels by employing 3D volumetric differentiable rendering to learn geometric representations. We demonstrate that MIM4D achieves state-of-the-art performance on the nuScenes dataset for visual representation learning in autonomous driving. It significantly improves existing methods on multiple downstream tasks, including BEV segmentation (8.7% IoU), 3D object detection (3.5% mAP), and HD map construction (1.4% mAP). Our work offers a new choice for learning representation at scale in autonomous driving. Code and models are released at //github.com/hustvl/MIM4D

tuning · 成比例 · 混合 · MoDELS · Performer ·

2024 年 3 月 13 日

SMART: Submodular Data Mixture Strategy for Instruction Tuning

H S V N S Kowndinya Renduchintala,Sumit Bhatia,Ganesh Ramakrishnan

Instruction Tuning involves finetuning a language model on a collection of instruction-formatted datasets in order to enhance the generalizability of the model to unseen tasks. Studies have shown the importance of balancing different task proportions during finetuning, but finding the right balance remains challenging. Unfortunately, there's currently no systematic method beyond manual tuning or relying on practitioners' intuition. In this paper, we introduce SMART (Submodular data Mixture strAtegy for instRuction Tuning) - a novel data mixture strategy which makes use of a submodular function to assign importance scores to tasks which are then used to determine the mixture weights. Given a fine-tuning budget, SMART redistributes the budget among tasks and selects non-redundant samples from each task. Experimental results demonstrate that SMART significantly outperforms traditional methods such as examples proportional mixing and equal mixing. Furthermore, SMART facilitates the creation of data mixtures based on a few representative subsets of tasks alone and through task pruning analysis, we reveal that in a limited budget setting, allocating budget among a subset of representative tasks yields superior performance compared to distributing the budget among all tasks. The code for reproducing our results is open-sourced at //github.com/kowndinya-renduchintala/SMART.

核化 · 代碼 · INFORMS · Integration · 簇 ·

2024 年 3 月 12 日

Towards Code Generation for Octree-Based Multigrid Solvers

Richard Angersbach,Sebastian Kuckuck,Harald K?stler

This paper presents a novel method designed to generate multigrid solvers optimized for octree-based software frameworks. Our approach focuses on accurately capturing local features within a domain while leveraging the efficiency inherent in multigrid techniques. We outline the essential steps involved in generating specialized kernels for local refinement and communication routines, integrating on-the-fly interpolations to seamlessly transfer information between refinement levels. For this purpose, we established a software coupling via an automatic fusion of generated multigrid solvers and communication kernels with manual implementations of complex octree data structures and algorithms often found in established software frameworks. We demonstrate the effectiveness of our method through numerical experiments with different interpolation orders. Large-scale benchmarks conducted on the SuperMUC-NG CPU cluster underscore the advantages of our approach, offering a comparison against a reference implementation to highlight the benefits of our method and code generation in general.

回合 · INTERACT · INFORMS · Projection · 可辨認的 ·

2024 年 3 月 12 日

RobotCycle: Assessing Cycling Safety in Urban Environments

Efimia Panagiotaki,Tyler Reinmund,Brian Liu,Stephan Mouton,Luke Pitt,Arundathi Shaji Shanthini,Matthew Towlson,Wayne Tubby,Chris Prahacs,Daniele De Martini,Lars Kunze

from arxiv, 6 pages, 7 figures

This paper introduces RobotCycle, a novel ongoing project that leverages Autonomous Vehicle (AV) research to investigate how cycling infrastructure influences cyclist behaviour and safety during real-world journeys. The project's requirements were defined in collaboration with key stakeholders (i.e. city planners, cyclists, and policymakers), informing the design of risk and safety metrics and the data collection criteria. We propose a data-driven approach relying on a novel, rich dataset of diverse traffic scenes captured through a custom-designed wearable sensing unit. We extract road-user trajectories and analyse deviations suggesting risk or potentially hazardous interactions in correlation with infrastructural elements in the environment. Driving profiles and trajectory patterns are associated with local road segments, driving conditions, and road-user interactions to predict traffic behaviour and identify critical scenarios. Moreover, leveraging advancements in AV research, the project extracts detailed 3D maps, traffic flow patterns, and trajectory models to provide an in-depth assessment and analysis of the behaviour of all traffic agents. This data can then inform the design of cyclist-friendly road infrastructure, improving road safety and cyclability, as it provides valuable insights for enhancing cyclist protection and promoting sustainable urban mobility.

去噪 · MoDELS · 3D · 情景 · Extensibility ·

2024 年 3 月 12 日

DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

Jiapeng Tang,Yinyu Nie,Lev Markhasin,Angela Dai,Justus Thies,Matthias Nie?ner

from arxiv, CVPR 2024

We present DiffuScene for indoor 3D scene synthesis based on a novel scene configuration denoising diffusion model. It generates 3D instance properties stored in an unordered object set and retrieves the most similar geometry for each object configuration, which is characterized as a concatenation of different attributes, including location, size, orientation, semantics, and geometry features. We introduce a diffusion network to synthesize a collection of 3D indoor objects by denoising a set of unordered object attributes. Unordered parametrization simplifies and eases the joint distribution approximation. The shape feature diffusion facilitates natural object placements, including symmetries. Our method enables many downstream applications, including scene completion, scene arrangement, and text-conditioned scene synthesis. Experiments on the 3D-FRONT dataset show that our method can synthesize more physically plausible and diverse indoor scenes than state-of-the-art methods. Extensive ablation studies verify the effectiveness of our design choice in scene diffusion models.

Processing（編程語言） · 圖 · Analysis · Bioinformatics · 可理解性 ·

2024 年 3 月 11 日

Thought Graph: Generating Thought Process for Biological Reasoning

Chi-Yang Hsu,Kyle Cox,Jiawei Xu,Zhen Tan,Tianhua Zhai,Mengzhou Hu,Dexter Pratt,Tianlong Chen,Ziniu Hu,Ying Ding

from arxiv, 4 pages. Accepted by Web Conf 2024

We present the Thought Graph as a novel framework to support complex reasoning and use gene set analysis as an example to uncover semantic relationships between biological processes. Our framework stands out for its ability to provide a deeper understanding of gene sets, significantly surpassing GSEA by 40.28% and LLM baselines by 5.38% based on cosine similarity to human annotations. Our analysis further provides insights into future directions of biological processes naming, and implications for bioinformatics and precision medicine.

圖 · 學成 · 知識圖譜 · FreeBASIC · 強化學習 ·

2018 年 1 月 8 日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Wenhan Xiong,Thien Hoang,William Yang Wang

We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.