国产亚洲欧美日韩精品色狠二区,国产高潮白浆调教福利在线视频,国产精品久久久精品无码A片闺蜜,成人无码区免费视频网站蜜臀,精品国产美女免费一区

Holographic displays hold the promise of providing authentic depth cues, resulting in enhanced immersive visual experiences for near-eye applications. However, current holographic displays are hindered by speckle noise, which limits accurate reproduction of color and texture in displayed images. We present HoloChrome, a polychromatic holographic display framework designed to mitigate these limitations. HoloChrome utilizes an ultrafast, wavelength-adjustable laser and a dual-Spatial Light Modulator (SLM) architecture, enabling the multiplexing of a large set of discrete wavelengths across the visible spectrum. By leveraging spatial separation in our dual-SLM setup, we independently manipulate speckle patterns across multiple wavelengths. This novel approach effectively reduces speckle noise through incoherent averaging achieved by wavelength multiplexing. Our method is complementary to existing speckle reduction techniques, offering a new pathway to address this challenge. Furthermore, the use of polychromatic illumination broadens the achievable color gamut compared to traditional three-color primary holographic displays. Our simulations and tabletop experiments validate that HoloChrome significantly reduces speckle noise and expands the color gamut. These advancements enhance the performance of holographic near-eye displays, moving us closer to practical, immersive next-generation visual experiences.

相關內容

噪聲

關注 0

數據集 · 視覺問答 · MoDELS · 語言模型化 · Performer ·

2024 年 12 月 13 日

VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation

Hyeonseok Lim,Dongjae Shin,Seohyun Song,Inho Won,Minjun Kim,Junghun Yuk,Haneol Jang,KyungTae Lim

from arxiv, The 31st International Conference on Computational Linguistics (COLING 2025), 19 pages

We propose the VLR-Bench, a visual question answering (VQA) benchmark for evaluating vision language models (VLMs) based on retrieval augmented generation (RAG). Unlike existing evaluation datasets for external knowledge-based VQA, the proposed VLR-Bench includes five input passages. This allows testing of the ability to determine which passage is useful for answering a given query, a capability lacking in previous research. In this context, we constructed a dataset of 32,000 automatically generated instruction-following examples, which we denote as VLR-IF. This dataset is specifically designed to enhance the RAG capabilities of VLMs by enabling them to learn how to generate appropriate answers based on input passages. We evaluated the validity of the proposed benchmark and training data and verified its performance using the state-of-the-art Llama3-based VLM, the Llava-Llama-3 model. The proposed VLR-Bench and VLR-IF datasets are publicly available online.

超參數 · Learning · MoDELS · 估計/估計量 · motivation ·

2024 年 12 月 13 日

FISTA-Condat-Vu: Automatic Differentiation for Hyperparameter Learning in Variational Models

Patricio Guerrero,Simon Bellens,Wim Dewulf

Motivated by industrial computed tomography, we propose a memory efficient strategy to estimate the regularization hyperparameter of a non-smooth variational model. The approach is based on a combination of FISTA and Condat-Vu algorithms exploiting the convergence rate of the former and the low per-iteration complexity of the latter. The estimation is cast as a bilevel learning problem where a first-order method is obtained via reduced-memory automatic differentiation to compute the derivatives. The method is validated with experimental industrial tomographic data with the numerical implementation available.

3D · 穩健性 · 控制器 · 剪枝 · MoDELS ·

2024 年 12 月 13 日

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

Jongmin Park,Minh-Quan Viet Bui,Juan Luis Gonzalez Bello,Jaeho Moon,Jihyong Oh,Munchurl Kim

from arxiv, The first two authors contributed equally to this work (equal contribution). The last two authors advised equally to this work. Please visit our project page at this //kaist-viclab.github.io/splinegs-site/

Synthesizing novel views from in-the-wild monocular videos is challenging due to scene dynamics and the lack of multi-view cues. To address this, we propose SplineGS, a COLMAP-free dynamic 3D Gaussian Splatting (3DGS) framework for high-quality reconstruction and fast rendering from monocular videos. At its core is a novel Motion-Adaptive Spline (MAS) method, which represents continuous dynamic 3D Gaussian trajectories using cubic Hermite splines with a small number of control points. For MAS, we introduce a Motion-Adaptive Control points Pruning (MACP) method to model the deformation of each dynamic 3D Gaussian across varying motions, progressively pruning control points while maintaining dynamic modeling integrity. Additionally, we present a joint optimization strategy for camera parameter estimation and 3D Gaussian attributes, leveraging photometric and geometric consistency. This eliminates the need for Structure-from-Motion preprocessing and enhances SplineGS's robustness in real-world conditions. Experiments show that SplineGS significantly outperforms state-of-the-art methods in novel view synthesis quality for dynamic scenes from monocular videos, achieving thousands times faster rendering speed.

MoDELS · Processing（編程語言） · Integration · 控制器 · Performance ·

2024 年 12 月 13 日

EI-Drive: A Platform for Cooperative Perception with Realistic Communication Models

Hanchu Zhou,Edward Xie,Wei Shao,Dechen Gao,Michelle Dong,Junshan Zhang

The growing interest in autonomous driving calls for realistic simulation platforms capable of accurately simulating cooperative perception process in realistic traffic scenarios. Existing studies for cooperative perception often have not accounted for transmission latency and errors in real-world environments. To address this gap, we introduce EI-Drive, an edge-AI based autonomous driving simulation platform that integrates advanced cooperative perception with more realistic communication models. Built on the CARLA framework, EI-Drive features new modules for cooperative perception while taking into account transmission latency and errors, providing a more realistic platform for evaluating cooperative perception algorithms. In particular, the platform enables vehicles to fuse data from multiple sources, improving situational awareness and safety in complex environments. With its modular design, EI-Drive allows for detailed exploration of sensing, perception, planning, and control in various cooperative driving scenarios. Experiments using EI-Drive demonstrate significant improvements in vehicle safety and performance, particularly in scenarios with complex traffic flow and network conditions. All code and documents are accessible on our GitHub page: \url{//ucd-dare.github.io/eidrive.github.io/}.

通道 · Learning · 回合 · 穩健性 · 逼真度 ·

2024 年 12 月 12 日

CLEAR: Channel Learning and Enhanced Adaptive Reconstruction for Semantic Communication in Complex Time-Varying Environments

Hongzhi Pan,Shengliang Wu,Lingyun Wang,Yujun Zhu,Weiwei Jiang,Xin He

To address the challenges of robust data transmission over complex time-varying channels, this paper introduces channel learning and enhanced adaptive reconstruction (CLEAR) strategy for semantic communications. CLEAR integrates deep joint source-channel coding (DeepJSCC) with an adaptive diffusion denoising model (ADDM) to form a unique framework. It leverages a trainable encoder-decoder architecture to encode data into complex semantic codes, which are then transmitted and reconstructed while minimizing distortion, ensuring high semantic fidelity. By addressing multipath effects, frequency-selective fading, phase noise, and Doppler shifts, CLEAR achieves high semantic fidelity and reliable transmission across diverse signal-to-noise ratios (SNRs) and channel conditions. Extensive experiments demonstrate that CLEAR achieves a 2.3 dB gain on peak signal-to-noise ratio (PSNR) over the existing state-of-the-art method, DeepJSCC-V. Furthermore, the results verify that CLEAR is robust against varying channel conditions, particularly in scenarios characterized by high Doppler shifts and strong phase noise.

圖像字幕 · MoDELS · 優化器 · NLP · Processing（編程語言） ·

2024 年 12 月 11 日

AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization

Jiyao Li,Mingze Ni,Yifei Dong,Tianqing Zhu,Wei Liu

Recent advances in deep learning research have shown remarkable achievements across many tasks in computer vision (CV) and natural language processing (NLP). At the intersection of CV and NLP is the problem of image captioning, where the related models' robustness against adversarial attacks has not been well studied. This paper presents a novel adversarial attack strategy, AICAttack (Attention-based Image Captioning Attack), designed to attack image captioning models through subtle perturbations on images. Operating within a black-box attack scenario, our algorithm requires no access to the target model's architecture, parameters, or gradient information. We introduce an attention-based candidate selection mechanism that identifies the optimal pixels to attack, followed by a customised differential evolution method to optimise the perturbations of pixels' RGB values. We demonstrate AICAttack's effectiveness through extensive experiments on benchmark datasets against multiple victim models. The experimental results demonstrate that our method outperforms current leading-edge techniques by achieving consistently higher attack success rates.

有偏 · MoDELS · 不變 · Guidance · Learning ·

2024 年 12 月 11 日

InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models

Min Hou,Yueying Wu,Chang Xu,Yu-Hao Huang,Chenxi Bai,Le Wu,Jiang Bian

from arxiv, KDD 2025

As one of the most successful generative models, diffusion models have demonstrated remarkable efficacy in synthesizing high-quality images. These models learn the underlying high-dimensional data distribution in an unsupervised manner. Despite their success, diffusion models are highly data-driven and prone to inheriting the imbalances and biases present in real-world data. Some studies have attempted to address these issues by designing text prompts for known biases or using bias labels to construct unbiased data. While these methods have shown improved results, real-world scenarios often contain various unknown biases, and obtaining bias labels is particularly challenging. In this paper, we emphasize the necessity of mitigating bias in pre-trained diffusion models without relying on auxiliary bias annotations. To tackle this problem, we propose a framework, InvDiff, which aims to learn invariant semantic information for diffusion guidance. Specifically, we propose identifying underlying biases in the training data and designing a novel debiasing training objective. Then, we employ a lightweight trainable module that automatically preserves invariant semantic information and uses it to guide the diffusion model's sampling process toward unbiased outcomes simultaneously. Notably, we only need to learn a small number of parameters in the lightweight learnable module without altering the pre-trained diffusion model. Furthermore, we provide a theoretical guarantee that the implementation of InvDiff is equivalent to reducing the error upper bound of generalization. Extensive experimental results on three publicly available benchmarks demonstrate that InvDiff effectively reduces biases while maintaining the quality of image generation. Our code is available at //github.com/Hundredl/InvDiff.

3D · 表示 · MoDELS · 值域 · Microsoft Surface ·

2024 年 12 月 11 日

LineGS : 3D Line Segment Representation on 3D Gaussian Splatting

Chenggang Yang,Yuang Shi,Wei Tsang Ooi

Abstract representations of 3D scenes play a crucial role in computer vision, enabling a wide range of applications such as mapping, localization, surface reconstruction, and even advanced tasks like SLAM and rendering. Among these representations, line segments are widely used because of their ability to succinctly capture the structural features of a scene. However, existing 3D reconstruction methods often face significant challenges. Methods relying on 2D projections suffer from instability caused by errors in multi-view matching and occlusions, while direct 3D approaches are hampered by noise and sparsity in 3D point cloud data. This paper introduces LineGS, a novel method that combines geometry-guided 3D line reconstruction with a 3D Gaussian splatting model to address these challenges and improve representation ability. The method leverages the high-density Gaussian point distributions along the edge of the scene to refine and optimize initial line segments generated from traditional geometric approaches. By aligning these segments with the underlying geometric features of the scene, LineGS achieves a more precise and reliable representation of 3D structures. The results show significant improvements in both geometric accuracy and model compactness compared to baseline methods.

估計/估計量 · 穩健性 · 有偏 · 查準率/準確率 · 規范化的 ·

2024 年 12 月 11 日

DOGE: An Extrinsic Orientation and Gyroscope Bias Estimation for Visual-Inertial Odometry Initialization

Zewen Xu,Yijia He,Hao Wei,Yihong Wu

Most existing visual-inertial odometry (VIO) initialization methods rely on accurate pre-calibrated extrinsic parameters. However, during long-term use, irreversible structural deformation caused by temperature changes, mechanical squeezing, etc. will cause changes in extrinsic parameters, especially in the rotational part. Existing initialization methods that simultaneously estimate extrinsic parameters suffer from poor robustness, low precision, and long initialization latency due to the need for sufficient translational motion. To address these problems, we propose a novel VIO initialization method, which jointly considers extrinsic orientation and gyroscope bias within the normal epipolar constraints, achieving higher precision and better robustness without delayed rotational calibration. First, a rotation-only constraint is designed for extrinsic orientation and gyroscope bias estimation, which tightly couples gyroscope measurements and visual observations and can be solved in pure-rotation cases. Second, we propose a weighting strategy together with a failure detection strategy to enhance the precision and robustness of the estimator. Finally, we leverage Maximum A Posteriori to refine the results before enough translation parallax comes. Extensive experiments have demonstrated that our method outperforms the state-of-the-art methods in both accuracy and robustness while maintaining competitive efficiency.

任務對話系統 · INFORMS · 圖 · Networking · entity ·

2020 年 8 月 11 日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Xiaoze Jiang,Siyi Du,Zengchang Qin,Yajing Sun,Jing Yu

from arxiv, Accepted by the 28th ACM International Conference on Multimedia (ACM MM 2020)

Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms exiting models with state-of-the-art results.