亚洲精品无码国产爽快A片百度_一级A婬片试看28分钟_美女视频又黄又免费刺激_国产高清不卡码一区二区三区_久久无码精品一区二区三区不卡_精品国产一区在线观看_国产剧情演绎不卡一区剧情

State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant cost in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent local descriptors and their scene coordinates. Our method is characterized by its simplicity and cost-effectiveness. It solely leverages a single RGB image for localization during the testing phase and only requires a lightweight model to encode a complex sparse scene. The proposed D2S employs a combination of a simple loss function and graph attention to selectively focus on robust descriptors while disregarding areas such as clouds, trees, and several dynamic objects. This selective attention enables D2S to effectively perform a binary-semantic classification for sparse descriptors. Additionally, we propose a new outdoor dataset to evaluate the capabilities of visual localization methods in terms of scene generalization and self-updating from unlabeled observations. Our approach outperforms the state-of-the-art CNN-based methods in scene coordinate regression in indoor and outdoor environments. It demonstrates the ability to generalize beyond training data, including scenarios involving transitions from day to night and adapting to domain shifts, even in the absence of the labeled data sources. The source code, trained models, dataset, and demo videos are available at the following link: //thpjp.github.io/d2s

相關內容

描述符

關注 0

Ad hoc · Extensibility · Less · 生成模型 · MoDELS ·

2023 年 9 月 20 日

Proximal methods for point source localisation

Tuomo Valkonen

Point source localisation is generally modelled as a Lasso-type problem on measures. However, optimisation methods in non-Hilbert spaces, such as the space of Radon measures, are much less developed than in Hilbert spaces. Most numerical algorithms for point source localisation are based on the Frank-Wolfe conditional gradient method, for which ad hoc convergence theory is developed. We develop extensions of proximal-type methods to spaces of measures. This includes forward-backward splitting, its inertial version, and primal-dual proximal splitting. Their convergence proofs follow standard patterns. We demonstrate their numerical efficacy.

分解的 ·

2023 年 9 月 20 日

Tropical cryptography III: digital signatures

Jiale Chen,Dima Grigoriev,Vladimir Shpilrain

from arxiv, 7 pages

We use tropical algebras as platforms for a very efficient digital signature protocol. Security relies on computational hardness of factoring one-variable tropical polynomials; this problem is known to be NP-hard.

Automator · 查準率/準確率 · 歐幾里得距離 · 數據集 · Analysis ·

2023 年 9 月 19 日

Fully automated landmarking and facial segmentation on 3D photographs

Bo Berends,Freek Bielevelt,Ruud Schreurs,Shankeeth Vinayahalingam,Thomas Maal,Guido de Jong

from arxiv, 13 pages, 4 figures, 7 tables, repository //github.com/rumc3dlab/3dlandmarkdetection/

Three-dimensional facial stereophotogrammetry provides a detailed representation of craniofacial soft tissue without the use of ionizing radiation. While manual annotation of landmarks serves as the current gold standard for cephalometric analysis, it is a time-consuming process and is prone to human error. The aim in this study was to develop and evaluate an automated cephalometric annotation method using a deep learning-based approach. Ten landmarks were manually annotated on 2897 3D facial photographs by a single observer. The automated landmarking workflow involved two successive DiffusionNet models and additional algorithms for facial segmentation. The dataset was randomly divided into a training and test dataset. The training dataset was used to train the deep learning networks, whereas the test dataset was used to evaluate the performance of the automated workflow. The precision of the workflow was evaluated by calculating the Euclidean distances between the automated and manual landmarks and compared to the intra-observer and inter-observer variability of manual annotation and the semi-automated landmarking method. The workflow was successful in 98.6% of all test cases. The deep learning-based landmarking method achieved precise and consistent landmark annotation. The mean precision of 1.69 (+/-1.15) mm was comparable to the inter-observer variability (1.31 +/-0.91 mm) of manual annotation. The Euclidean distance between the automated and manual landmarks was within 2 mm in 69%. Automated landmark annotation on 3D photographs was achieved with the DiffusionNet-based approach. The proposed method allows quantitative analysis of large datasets and may be used in diagnosis, follow-up, and virtual surgical planning.

語音增強 · 無監督 · MoDELS · state-of-the-art · 噪聲 ·

2023 年 9 月 19 日

Unsupervised speech enhancement with diffusion-based generative models

Berné Nortier,Mostafa Sadeghi,Romain Serizel

Recently, conditional score-based diffusion models have gained significant attention in the field of supervised speech enhancement, yielding state-of-the-art performance. However, these methods may face challenges when generalising to unseen conditions. To address this issue, we introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models. Specifically, in a training phase, a clean speech prior distribution is learnt in the short-time Fourier transform (STFT) domain using score-based diffusion models, allowing it to unconditionally generate clean speech from Gaussian noise. Then, we develop a posterior sampling methodology for speech enhancement by combining the learnt clean speech prior with a noise model for speech signal inference. The noise parameters are simultaneously learnt along with clean speech estimation through an iterative expectationmaximisation (EM) approach. To the best of our knowledge, this is the first work exploring diffusion-based generative models for unsupervised speech enhancement, demonstrating promising results compared to a recent variational auto-encoder (VAE)-based unsupervised approach and a state-of-the-art diffusion-based supervised method. It thus opens a new direction for future research in unsupervised speech enhancement.

Color · 正則化項 · Processing（編程語言） · Analysis · 查準率/準確率 ·

2023 年 9 月 18 日

Digital analysis of early color photographs taken using regular color screen processes

Jan Hubi?ka,Linda Kimrová,Kenzie Klaeser,Sara Manco,Doug Peterson

from arxiv, 8 pages, 4 figures, submitted to the proceedings of XVIII Color Conference

Some early color photographic processes based on special color screen filters pose specific challenges in their digitization and digital presentation. Those challenges include dynamic range, resolution, and the difficulty of stitching geometrically-repeating patterns. We describe a novel method used to digitize the collection of early color photographs at the National Geographic Society which makes use of a custom open-source software tool to analyze and precisely stitch regular color screen processes.

INTERACT · 知識 (knowledge) · 相關系數 · Learning · INFORMS ·

2023 年 9 月 18 日

Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection

Yichao Cao,Qingfei Tang,Feng Yang,Xiu Su,Shan You,Xiaobo Lu,Chang Xu

from arxiv, ICCV2023

Human-Object Interaction (HOI) detection is a challenging computer vision task that requires visual models to address the complex interactive relationship between humans and objects and predict HOI triplets. Despite the challenges posed by the numerous interaction combinations, they also offer opportunities for multimodal learning of visual texts. In this paper, we present a systematic and unified framework (RmLR) that enhances HOI detection by incorporating structured text knowledge. Firstly, we qualitatively and quantitatively analyze the loss of interaction information in the two-stage HOI detector and propose a re-mining strategy to generate more comprehensive visual representation.Secondly, we design more fine-grained sentence- and word-level alignment and knowledge transfer strategies to effectively address the many-to-many matching problem between multiple interactions and multiple texts.These strategies alleviate the matching confusion problem that arises when multiple interactions occur simultaneously, thereby improving the effectiveness of the alignment process. Finally, HOI reasoning by visual features augmented with textual knowledge substantially improves the understanding of interactions. Experimental results illustrate the effectiveness of our approach, where state-of-the-art performance is achieved on public benchmarks. We further analyze the effects of different components of our approach to provide insights into its efficacy.

MoDELS · 語音識別 · 自動語音識別 · Performer · 后驗分布 ·

2023 年 9 月 18 日

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

George August Wright,Umberto Cappellazzo,Salah Zaiem,Desh Raj,Lucas Ondel Yang,Daniele Falavigna,Alessio Brutti

The possibility of dynamically modifying the computational load of neural models at inference time is crucial for on-device processing, where computational power is limited and time-varying. Established approaches for neural model compression exist, but they provide architecturally static models. In this paper, we investigate the use of early-exit architectures, that rely on intermediate exit branches, applied to large-vocabulary speech recognition. This allows for the development of dynamic models that adjust their computational cost to the available resources and recognition performance. Unlike previous works, besides using pre-trained backbones we also train the model from scratch with an early-exit architecture. Experiments on public datasets show that early-exit architectures from scratch not only preserve performance levels when using fewer encoder layers, but also improve task accuracy as compared to using single-exit models or using pre-trained models. Additionally, we investigate an exit selection strategy based on posterior probabilities as an alternative to frame-based entropy.

泛函 · Analysis · 能量函數 · 全局極小解 · binary ·

2023 年 9 月 18 日

Lifting-based variational multiclass segmentation algorithm: design, convergence analysis, and implementation with applications in medical imaging

Nadja Gruber,Johannes Schwab,Sebastien Court,Elke Gizewski,Markus Haltmeier

We propose, analyze and realize a variational multiclass segmentation scheme that partitions a given image into multiple regions exhibiting specific properties. Our method determines multiple functions that encode the segmentation regions by minimizing an energy functional combining information from different channels. Multichannel image data can be obtained by lifting the image into a higher dimensional feature space using specific multichannel filtering or may already be provided by the imaging modality under consideration, such as an RGB image or multimodal medical data. Experimental results show that the proposed method performs well in various scenarios. In particular, promising results are presented for two medical applications involving classification of brain abscess and tumor growth, respectively. As main theoretical contributions, we prove the existence of global minimizers of the proposed energy functional and show its stability and convergence with respect to noisy inputs. In particular, these results also apply to the special case of binary segmentation, and these results are also novel in this particular situation.

估計/估計量 · 數據集 · RGB-D · 真實值 · 相同 ·

2023 年 9 月 15 日

YCB-Ev: Event-vision dataset for 6DoF object pose estimation

Pavel Rojtberg,Thomas P?llabauer

Our work introduces the YCB-Ev dataset, which contains synchronized RGB-D frames and event data that enables evaluating 6DoF object pose estimation algorithms using these modalities. This dataset provides ground truth 6DoF object poses for the same 21 YCB objects \cite{calli2017yale} that were used in the YCB-Video (YCB-V) dataset, enabling the evaluation of algorithm performance when transferred across datasets. The dataset consists of 21 synchronized event and RGB-D sequences, amounting to a total of 7:43 minutes of video. Notably, 12 of these sequences feature the same object arrangement as the YCB-V subset used in the BOP challenge. Our dataset is the first to provide ground truth 6DoF pose data for event streams. Furthermore, we evaluate the generalization capabilities of two state-of-the-art algorithms, which were pre-trained for the BOP challenge, using our novel YCB-V sequences. The proposed dataset is available at //github.com/paroj/ycbev.

圖片分類 · 前饋網絡 · INTERACT · Networking · 前饋 ·

2021 年 5 月 7 日

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron,Piotr Bojanowski,Mathilde Caron,Matthieu Cord,Alaaeldin El-Nouby,Edouard Grave,Armand Joulin,Gabriel Synnaeve,Jakob Verbeek,Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.