苹果电影在线观看免费高清_国产亚洲一区二区三区在线_91啦中文精品人妻久久_亚洲AV日韩AV无码A影音先锋_国产乱码亚洲日韩欧美乱码_亚洲鲁在视频在线观看_99久久成人国产精品久久久久

In the realm of autonomous driving,accurately detecting occluded or distant objects,referred to as weak positive sample ,presents significant challenges. These challenges predominantly arise during query initialization, where an over-reliance on heatmap confidence often results in a high rate of false positives, consequently masking weaker detections and impairing system performance. To alleviate this issue, we propose a novel approach, Co-Fix3D, which employs a collaborative hybrid multi-stage parallel query generation mechanism for BEV representations. Our method incorporates the Local-Global Feature Enhancement (LGE) module, which refines BEV features to more effectively highlight weak positive samples. It uniquely leverages the Discrete Wavelet Transform (DWT) for accurate noise reduction and features refinement in localized areas, and incorporates an attention mechanism to more comprehensively optimize global BEV features. Moreover, our method increases the volume of BEV queries through a multi-stage parallel processing of the LGE, significantly enhancing the probability of selecting weak positive samples. This enhancement not only improves training efficiency within the decoder framework but also boosts overall system performance. Notably, Co-Fix3D achieves superior results on the stringent nuScenes benchmark, outperforming all previous models with a 69.1% mAP and 72.9% NDS on the LiDAR-based benchmark, and 72.3% mAP and 74.1% NDS on the multi-modality benchmark, without relying on test-time augmentation or additional datasets. The source code will be made publicly available upon acceptance.

相關內容

Performance

關注 3

Performance：International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation：計(ji)算機性能(neng)建(jian)模、測量(liang)和(he)評估國際研(yan)討會。 Publisher：ACM。 SIT：

MoDELS · 情景 · contrastive · Continuity · Notability ·

2024 年 10 月 4 日

TOB-SVD: Total-Order Broadcast with Single-Vote Decisions in the Sleepy Model

Francesco D'Amato,Roberto Saltini,Thanh-Hai Tran,Luca Zanolini

Over the past years, distributed consensus research has extended its focus towards addressing challenges in large-scale, permissionless systems, such as blockchains. This shift is characterized by the need to accommodate dynamic participation, contrasting the traditional approach of a static set of continuously online participants. Works like Bitcoin and the sleepy model have set the stage for this evolving framework. Notable contributions from Momose and Ren (CCS 2022) and subsequent works have introduced Total-Order Broadcast protocols leveraging Graded Agreement primitives and supporting dynamic participation. However, these approaches often require multiple phases of voting per decision, creating a potential bottleneck for real-world large-scale systems. Addressing this, our paper introduces TOB-SVD, a novel Total-Order Broadcast protocol in the sleepy model, which is resilient to up to 1/2 of adversarial participants. TOB-SVD requires only a single phase of voting per decision in the best case and achieves lower expected latency compared to existing approaches offering the same optimal adversarial resilience. This work paves the way to more practical Total-Order Broadcast protocols to be implemented in real-world systems where a large number of participants are involved simultaneously and their participation level might fluctuate over time.

層 · 詞元分析器 · 推斷 · 可約的 · MoDELS ·

2024 年 10 月 4 日

AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity

Zhibin Lan,Liqiang Niu,Fandong Meng,Wenbo Li,Jie Zhou,Jinsong Su

from arxiv, Preprint

Recently, when dealing with high-resolution images, dominant LMMs usually divide them into multiple local images and one global image, which will lead to a large number of visual tokens. In this work, we introduce AVG-LLaVA, an LMM that can adaptively select the appropriate visual granularity based on the input image and instruction. This approach not only reduces the number of visual tokens and speeds up inference, but also improves the overall model performance. Specifically, we introduce the following modules based on LLaVA-NeXT: (a) a visual granularity scaler that includes multiple pooling layers to obtain visual tokens with different granularities; (b) a visual granularity router, which includes a Transformer layer, an MLP layer, and a voter layer, used to select the appropriate visual granularity based on the image and instruction. Furthermore, we propose RGLF, a novel training paradigm that aims at aligning the granularity predicted by the router with the preferences of the LMM, without the need for additional manually annotated data. Extensive experiments and analysis show that AVG-LLaVA achieves superior performance across 11 benchmarks, as well as significantly reduces the number of visual tokens and speeds up inference (e.g., an 85.3% reduction in visual tokens and a 2.53$\times$ increase in inference speed on the AI2D benchmark).

MoDELS · 潛在 · INFORMS · 特征空間 · contrastive ·

2024 年 10 月 3 日

LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space

Jinho Chang,Jong Chul Ye

With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, we present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation. LDMol comprises a molecule autoencoder that produces a learnable and structurally informative feature space, and a natural language-conditioned latent diffusion model. In particular, recognizing that multiple SMILES notations can represent the same molecule, we employ a contrastive learning strategy to extract feature space that is aware of the unique characteristics of the molecule structure. LDMol outperforms the existing baselines on the text-to-molecule generation benchmark, suggesting a potential for diffusion models can outperform autoregressive models in text data generation with a better choice of the latent domain. Furthermore, we show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing, demonstrating its versatility as a diffusion model.

圖像配準 · Attention · MoDELS · Integration · INFORMS ·

2024 年 10 月 3 日

NestedMorph: Enhancing Deformable Medical Image Registration with Nested Attention Mechanisms

Gurucharan Marthi Krishna Kumar,Janine Mendola,Amir Shmuel

from arxiv, Submitted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

Deformable image registration is crucial for aligning medical images in a non-linear fashion across different modalities, allowing for precise spatial correspondence between varying anatomical structures. This paper presents NestedMorph, a novel network utilizing a Nested Attention Fusion approach to improve intra-subject deformable registration between T1-weighted (T1w) MRI and diffusion MRI (dMRI) data. NestedMorph integrates high-resolution spatial details from an encoder with semantic information from a decoder using a multi-scale framework, enhancing both local and global feature extraction. Our model notably outperforms existing methods, including CNN-based approaches like VoxelMorph, MIDIR, and CycleMorph, as well as Transformer-based models such as TransMorph and ViT-V-Net, and traditional techniques like NiftyReg and SyN. Evaluations on the HCP dataset demonstrate that NestedMorph achieves superior performance across key metrics, including SSIM, HD95, and SDlogJ, with the highest SSIM of 0.89, and the lowest HD95 of 2.5 and SDlogJ of 0.22. These results highlight NestedMorph's ability to capture both local and global image features effectively, leading to superior registration performance. The promising outcomes of this study underscore NestedMorph's potential to significantly advance deformable medical image registration, providing a robust framework for future research and clinical applications. The source code and our implementation are available at: //bit.ly/3zdVqcg

Analysis · MoDELS · Learning · AI · XAI ·

2024 年 10 月 3 日

Self-eXplainable AI for Medical Image Analysis: A Survey and New Outlooks

Junlin Hou,Sicen Liu,Yequan Bie,Hongmei Wang,Andong Tan,Luyang Luo,Hao Chen

The increasing demand for transparent and reliable models, particularly in high-stakes decision-making areas such as medical image analysis, has led to the emergence of eXplainable Artificial Intelligence (XAI). Post-hoc XAI techniques, which aim to explain black-box models after training, have been controversial in recent works concerning their fidelity to the models' predictions. In contrast, Self-eXplainable AI (S-XAI) offers a compelling alternative by incorporating explainability directly into the training process of deep learning models. This approach allows models to generate inherent explanations that are closely aligned with their internal decision-making processes. Such enhanced transparency significantly supports the trustworthiness, robustness, and accountability of AI systems in real-world medical applications. To facilitate the development of S-XAI methods for medical image analysis, this survey presents an comprehensive review across various image modalities and clinical applications. It covers more than 200 papers from three key perspectives: 1) input explainability through the integration of explainable feature engineering and knowledge graph, 2) model explainability via attention-based learning, concept-based learning, and prototype-based learning, and 3) output explainability by providing counterfactual explanation and textual explanation. Additionally, this paper outlines the desired characteristics of explainability and existing evaluation methods for assessing explanation quality. Finally, it discusses the major challenges and future research directions in developing S-XAI for medical image analysis.

MoDELS · 圖像分割 · 卷積 · Mamba · 可約的 ·

2024 年 10 月 2 日

HC-Mamba: Vision MAMBA with Hybrid Convolutional Techniques for Medical Image Segmentation

Jiashu Xu

from arxiv, 3figures, 3tabels, fixed data leak

Automatic medical image segmentation technology has the potential to expedite pathological diagnoses, thereby enhancing the efficiency of patient care. However, medical images often have complex textures and structures, and the models often face the problem of reduced image resolution and information loss due to downsampling. To address this issue, we propose HC-Mamba, a new medical image segmentation model based on the modern state space model Mamba. Specifically, we introduce the technique of dilated convolution in the HC-Mamba model to capture a more extensive range of contextual information without increasing the computational cost by extending the perceptual field of the convolution kernel. In addition, the HC-Mamba model employs depthwise separable convolutions, significantly reducing the number of parameters and the computational power of the model. By combining dilated convolution and depthwise separable convolutions, HC-Mamba is able to process large-scale medical image data at a much lower computational cost while maintaining a high level of performance. We conduct comprehensive experiments on segmentation tasks including organ segmentation and skin lesion, and conduct extensive experiments on Synapse, ISIC17 and ISIC18 to demonstrate the potential of the HC-Mamba model in medical image segmentation. The experimental results show that HC-Mamba exhibits competitive performance on all these datasets, thereby proving its effectiveness and usefulness in medical image segmentation.

控制器 · MoDELS · 去噪 · AIM · HTTPS ·

2024 年 3 月 7 日

Controllable Generation with Text-to-Image Diffusion Models: A Survey

Pu Cao,Feng Zhou,Qing Song,Lu Yang

from arxiv, A collection of resources on controllable generation with text-to-image diffusion models: //github.com/PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models

In the rapidly advancing realm of visual generation, diffusion models have revolutionized the landscape, marking a significant shift in capabilities with their impressive text-guided generative functions. However, relying solely on text for conditioning these models does not fully cater to the varied and complex requirements of different applications and scenarios. Acknowledging this shortfall, a variety of studies aim to control pre-trained text-to-image (T2I) models to support novel conditions. In this survey, we undertake a thorough review of the literature on controllable generation with T2I diffusion models, covering both the theoretical foundations and practical advancements in this domain. Our review begins with a brief introduction to the basics of denoising diffusion probabilistic models (DDPMs) and widely used T2I diffusion models. We then reveal the controlling mechanisms of diffusion models, theoretically analyzing how novel conditions are introduced into the denoising process for conditional generation. Additionally, we offer a detailed overview of research in this area, organizing it into distinct categories from the condition perspective: generation with specific conditions, generation with multiple conditions, and universal controllable generation. For an exhaustive list of the controllable generation literature surveyed, please refer to our curated repository at \url{//github.com/PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models}.

語言表示 · 知識神經元 · MoDELS · 圖 · 知識圖譜 ·

2019 年 9 月 17 日

K-BERT: Enabling Language Representation with Knowledge Graph

Weijie Liu,Peng Zhou,Zhe Zhao,Zhiruo Wang,Qi Ju,Haotang Deng,Ping Wang

from arxiv, 8 pages, 20190917

Pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When reading a domain text, experts make inferences with relevant knowledge. For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge. However, too much knowledge incorporation may divert the sentence from its correct meaning, which is called knowledge noise (KN) issue. To overcome KN, K-BERT introduces soft-position and visible matrix to limit the impact of knowledge. K-BERT can easily inject domain knowledge into the models by equipped with a KG without pre-training by-self because it is capable of loading model parameters from the pre-trained BERT. Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts.

state-of-the-art · 可理解性 · BERT · 去噪自編碼器 · Performer ·

2019 年 6 月 19 日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang,Zihang Dai,Yiming Yang,Jaime Carbonell,Ruslan Salakhutdinov,Quoc V. Le

from arxiv, Pretrained models and code are available at //github.com/zihangdai/xlnet

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.

圖 · 表征學習 · 知識圖譜 · INTERACT · Performer ·

2019 年 1 月 23 日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Hongwei Wang,Fuzheng Zhang,Miao Zhao,Wenjie Li,Xing Xie,Minyi Guo

from arxiv, In Proceedings of The 2019 Web Conference (WWW 2019)

Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems. In this paper, we consider knowledge graphs as the source of side information. We propose MKR, a Multi-task feature learning approach for Knowledge graph enhanced Recommendation. MKR is a deep end-to-end framework that utilizes knowledge graph embedding task to assist recommendation task. The two tasks are associated by cross&compress units, which automatically share latent features and learn high-order interactions between items in recommender systems and entities in the knowledge graph. We prove that cross&compress units have sufficient capability of polynomial approximation, and show that MKR is a generalized framework over several representative methods of recommender systems and multi-task learning. Through extensive experiments on real-world datasets, we demonstrate that MKR achieves substantial gains in movie, book, music, and news recommendation, over state-of-the-art baselines. MKR is also shown to be able to maintain a decent performance even if user-item interactions are sparse.