国产免费一区二区三区在线能观看,青青国产成人久久99网站,午夜喷水福利操B福利

Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution microphone arrays, yet most DoAE datasets use low-resolution ones. Therefore, we first propose a super-resolution method to upsample low-resolution microphones. Next, we benchmark DoAE models that use SIMs as input. We arrive to a model that uses SIMs for DoAE estimation and outperforms a baseline and a state-of-the-art model. Our study highlights the relevance of acoustic imaging for DoAE tasks.

相關內容

估計/估計量

關注 3

散度 · Extensibility · Performer · 數學 · 原點 ·

2024 年 2 月 28 日

Versatile mixed methods for compressible flows

Edward A. Miller,David M. Williams

from arxiv, 28 pages, 4 figures, 4 tables

Versatile mixed finite element methods were originally developed by Chen and Williams for isothermal incompressible flows in "Versatile mixed methods for the incompressible Navier-Stokes equations," Computers & Mathematics with Applications, Volume 80, 2020. Thereafter, these methods were extended by Miller, Chen, and Williams to non-isothermal incompressible flows in "Versatile mixed methods for non-isothermal incompressible flows," Computers & Mathematics with Applications, Volume 125, 2022. The main advantage of these methods lies in their flexibility. Unlike traditional mixed methods, they retain the divergence terms in the momentum and temperature equations. As a result, the favorable properties of the schemes are maintained even in the presence of non-zero divergence. This makes them an ideal candidate for an extension to compressible flows, in which the divergence does not generally vanish. In the present article, we finally construct the fully-compressible extension of the methods. In addition, we demonstrate the excellent performance of the resulting methods for weakly-compressible flows that arise near the incompressible limit, as well as more strongly-compressible flows that arise near Mach 0.5.

Boosting（一種模型訓練加速方式） · 解碼 · 表示 · 基準 · Performer ·

2024 年 2 月 28 日

Boosting Neural Representations for Videos with a Conditional Decoder

Xinjie Zhang,Ren Yang,Dailan He,Xingtong Ge,Tongda Xu,Yan Wang,Hongwei Qin,Jun Zhang

from arxiv, Accept by CVPR 2024

Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting framework for current implicit video representation approaches. Specifically, we utilize a conditional decoder with a temporal-aware affine transform module, which uses the frame index as a prior condition to effectively align intermediate features with target frames. Besides, we introduce a sinusoidal NeRV-like block to generate diverse intermediate features and achieve a more balanced parameter distribution, thereby enhancing the model's capacity. With a high-frequency information-preserving reconstruction loss, our approach successfully boosts multiple baseline INRs in the reconstruction quality and convergence speed for video regression, and exhibits superior inpainting and interpolation results. Further, we integrate a consistent entropy minimization technique and develop video codecs based on these boosted INRs. Experiments on the UVG dataset confirm that our enhanced codecs significantly outperform baseline INRs and offer competitive rate-distortion performance compared to traditional and learning-based codecs.

INFORMS · 信息抽取 · Performer · 測試數據 · MoDELS ·

2024 年 2 月 28 日

On the use of Silver Standard Data for Zero-shot Classification Tasks in Information Extraction

Jianwei Wang,Tianyin Wang,Ziqian Zeng

from arxiv, accepted by coling2024. arXiv admin note: substantial text overlap with arXiv:2211.13883

The superior performance of supervised classification methods in the information extraction (IE) area heavily relies on a large amount of gold standard data. Recent zero-shot classification methods converted the task to other NLP tasks (e.g., textual entailment) and used off-the-shelf models of these NLP tasks to directly perform inference on the test data without using a large amount of IE annotation data. A potentially valuable by-product of these methods is the large-scale silver standard data, i.e., pseudo-labeled data by the off-the-shelf models of other NLP tasks. However, there is no further investigation into the use of these data. In this paper, we propose a new framework, Clean-LaVe, which aims to utilize silver standard data to enhance the zero-shot performance. Clean-LaVe includes four phases: (1) Obtaining silver data; (2) Identifying relatively clean data from silver data; (3) Finetuning the off-the-shelf model using clean data; (4) Inference on the test data. The experimental results show that Clean-LaVe can outperform the baseline by 5% and 6% on TACRED and Wiki80 dataset in the zero-shot relation classification task, and by 3%-7% on Smile (Korean and Polish) in the zero-shot cross-lingual relation classification task, and by 8% on ACE05-E+ in the zero-shot event argument classification task. The code is share in //github.com/wjw136/Clean_LaVe.git.

樣例 · MoDELS · Performer · CASES · 可約的 ·

2024 年 2 月 27 日

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

Jiongxiao Wang,Jiazhao Li,Yiquan Li,Xiangyu Qi,Junjie Hu,Yixuan Li,Patrick McDaniel,Muhao Chen,Bo Li,Chaowei Xiao

Despite the general capabilities of Large Language Models (LLMs) like GPT-4 and Llama-2, these models still request fine-tuning or adaptation with customized data when it comes to meeting the specific business demands and intricacies of tailored use cases. However, this process inevitably introduces new safety threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack), where incorporating just a few harmful examples into the fine-tuning dataset can significantly compromise the model safety. Though potential defenses have been proposed by incorporating safety examples into the fine-tuning dataset to reduce the safety issues, such approaches require incorporating a substantial amount of safety examples, making it inefficient. To effectively defend against the FJAttack with limited safety examples, we propose a Backdoor Enhanced Safety Alignment method inspired by an analogy with the concept of backdoor attacks. In particular, we construct prefixed safety examples by integrating a secret prompt, acting as a "backdoor trigger", that is prefixed to safety examples. Our comprehensive experiments demonstrate that through the Backdoor Enhanced Safety Alignment with adding as few as 11 prefixed safety examples, the maliciously fine-tuned LLMs will achieve similar safety performance as the original aligned models. Furthermore, we also explore the effectiveness of our method in a more practical setting where the fine-tuning data consists of both FJAttack examples and the fine-tuning task data. Our method shows great efficacy in defending against FJAttack without harming the performance of fine-tuning tasks.

值域 · 評論員 · 情景 · SOTA · MoDELS ·

2024 年 2 月 27 日

Neural Video Compression with Feature Modulation

Jiahao Li,Bin Li,Yan Lu

from arxiv, CVPR 2024. Codes are at //github.com/microsoft/DCVC

The emerging conditional coding-based neural video codec (NVC) shows superiority over commonly-used residual coding-based codec and the latest NVC already claims to outperform the best traditional codec. However, there still exist critical problems blocking the practicality of NVC. In this paper, we propose a powerful conditional coding-based NVC that solves two critical problems via feature modulation. The first is how to support a wide quality range in a single model. Previous NVC with this capability only supports about 3.8 dB PSNR range on average. To tackle this limitation, we modulate the latent feature of the current frame via the learnable quantization scaler. During the training, we specially design the uniform quantization parameter sampling mechanism to improve the harmonization of encoding and quantization. This results in a better learning of the quantization scaler and helps our NVC support about 11.4 dB PSNR range. The second is how to make NVC still work under a long prediction chain. We expose that the previous SOTA NVC has an obvious quality degradation problem when using a large intra-period setting. To this end, we propose modulating the temporal feature with a periodically refreshing mechanism to boost the quality. %Besides solving the above two problems, we also design a single model that can support both RGB and YUV colorspaces. Notably, under single intra-frame setting, our codec can achieve 29.7\% bitrate saving over previous SOTA NVC with 16\% MACs reduction. Our codec serves as a notable landmark in the journey of NVC evolution. The codes are at //github.com/microsoft/DCVC.

估計/估計量 · 矩 · 矩匹配 · INFORMS · 特化 ·

2024 年 2 月 24 日

Local moment matching with Erlang mixtures under automatic roughness penalization

Oskar Laverny,Philippe Lambert

We consider the class of Erlang mixtures for the task of density estimation on the positive real line when the only available information is given as local moments, a histogram with potentially higher order moments in some bins. By construction, the obtained moment problem is ill-posed and requires regularization. Several penalties can be used for such a task, such as a lasso penalty for sparsity of the representation, but we focus here on a simplified roughness penalty from the P-splines literature. We show that the corresponding hyperparameter can be selected without cross-validation through the computation of the so-called effective dimension of the estimator, which makes the estimator practical and adapted to these summarized information settings. The flexibility of the local moments representations allows interesting additions such as the enforcement of Value-at-Risk and Tail Value-at-Risk constraints on the resulting estimator, making the procedure suitable for the estimation of heavy-tailed densities.

MoDELS · 標注 · Performer · Networking · Learning ·

2024 年 2 月 23 日

Label-efficient Multi-organ Segmentation Method with Diffusion Model

Yongzhi Huang,Jinxin Zhu,Haseeb Hassan,Liyilei Su,Jingyu Li,Binding Huang

Accurate segmentation of multiple organs in Computed Tomography (CT) images plays a vital role in computer-aided diagnosis systems. Various supervised-learning approaches have been proposed recently. However, these methods heavily depend on a large amount of high-quality labeled data, which is expensive to obtain in practice. In this study, we present a label-efficient learning approach using a pre-trained diffusion model for multi-organ segmentation tasks in CT images. First, a denoising diffusion model was trained using unlabeled CT data, generating additional two-dimensional (2D) CT images. Then the pre-trained denoising diffusion network was transferred to the downstream multi-organ segmentation task, effectively creating a semi-supervised learning model that requires only a small amount of labeled data. Furthermore, linear classification and fine-tuning decoder strategies were employed to enhance the network's segmentation performance. Our generative model at 256x256 resolution achieves impressive performance in terms of Fr\'echet inception distance, spatial Fr\'echet inception distance, and F1-score, with values of 11.32, 46.93, and 73.1\%, respectively. These results affirm the diffusion model's ability to generate diverse and realistic 2D CT images. Additionally, our method achieves competitive multi-organ segmentation performance compared to state-of-the-art methods on the FLARE 2022 dataset, particularly in limited labeled data scenarios. Remarkably, even with only 1\% and 10\% labeled data, our method achieves Dice similarity coefficients (DSCs) of 71.56\% and 78.51\% after fine-tuning, respectively. The method achieves a DSC score of 51.81\% using just four labeled CT scans. These results demonstrate the efficacy of our approach in overcoming the limitations of supervised learning heavily reliant on large-scale labeled data.

CLIP · MoDELS · Processing（編程語言） · 相似度 · Seven ·

2024 年 2 月 23 日

Fine-tuning CLIP Text Encoders with Two-step Paraphrasing

Hyunjae Kim,Seunghyun Yoon,Trung Bui,Handong Zhao,Quan Tran,Franck Dernoncourt,Jaewoo Kang

from arxiv, EACL 2024 (Findings of the ACL)

Contrastive language-image pre-training (CLIP) models have demonstrated considerable success across various vision-language tasks, such as text-to-image retrieval, where the model is required to effectively process natural language input to produce an accurate visual output. However, current models still face limitations in dealing with linguistic variations in input queries, such as paraphrases, making it challenging to handle a broad range of user queries in real-world applications. In this study, we introduce a straightforward fine-tuning approach to enhance the representations of CLIP models for paraphrases. Our approach involves a two-step paraphrase generation process, where we automatically create two categories of paraphrases from web-scale image captions by leveraging large language models. Subsequently, we fine-tune the CLIP text encoder using these generated paraphrases while freezing the image encoder. Our resulting model, which we call ParaCLIP, exhibits significant improvements over baseline CLIP models across various tasks, including paraphrased retrieval (with rank similarity scores improved by up to 2.0% and 5.6%), Visual Genome Relation and Attribution, as well as seven semantic textual similarity tasks.

小樣本學習 · 標注 · 學成 · Extensibility · 噪聲 ·

2022 年 4 月 12 日

Few-shot Learning with Noisy Labels

Kevin J Liang,Samrudhdhi B. Rangrej,Vladan Petrovic,Tal Hassner

from arxiv, Accepted to CVPR 2022

Few-shot learning (FSL) methods typically assume clean support sets with accurately labeled samples when training on novel classes. This assumption can often be unrealistic: support sets, no matter how small, can still include mislabeled samples. Robustness to label noise is therefore essential for FSL methods to be practical, but this problem surprisingly remains largely unexplored. To address mislabeled samples in FSL settings, we make several technical contributions. (1) We offer simple, yet effective, feature aggregation methods, improving the prototypes used by ProtoNet, a popular FSL technique. (2) We describe a novel Transformer model for Noisy Few-Shot Learning (TraNFS). TraNFS leverages a transformer's attention mechanism to weigh mislabeled versus correct samples. (3) Finally, we extensively test these methods on noisy versions of MiniImageNet and TieredImageNet. Our results show that TraNFS is on-par with leading FSL methods on clean support sets, yet outperforms them, by far, in the presence of label noise.

圖像字幕 · Performer · MoDELS · Processing（編程語言） · 對象識別 ·

2018 年 1 月 17 日

Image Captioning using Deep Neural Architectures

Parth Shah,Vishvajit Bakarola,Supriya Pati

from arxiv, Pre-print version of paper accepted at 2017 International Conference on Innovations in information Embedded and Communication Systems (ICIIECS)

Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about different available models for image captioning task. We have also discussed about how the advancement in the task of object recognition and machine translation has greatly improved the performance of image captioning model in recent years. In addition to that we have discussed how this model can be implemented. In the end, we have also evaluated the performance of model using standard evaluation matrices.