18GAY国产小鲜肉可播放,亚洲自偷拍狠无码,2021日本人人爽人人爽,国产AV原创国片精品有毛

The accurate alignment of 3D woodblock geometrical models with 2D orthographic projection images presents a significant challenge in the digital preservation of Vietnamese cultural heritage. This paper proposes a unified image processing algorithm to address this issue, enhancing the registration quality between 3D woodblock models and their 2D representations. The method includes determining the plane of the 3D character model, establishing a transformation matrix to align this plane with the 2D printed image plane, and creating a parallel-projected depth map for precise alignment. This process minimizes disocclusions and ensures that character shapes and strokes are correctly positioned. Experimental results highlight the importance of structure-based comparisons to optimize alignment for large-scale Han-Nom character datasets. The proposed approach, combining density-based and structure-based methods, demonstrates improved registration performance, offering an effective normalization scheme for digital heritage preservation.

相關內容

關注 36

3D是英文“Three Dimensions”的簡稱，中文是指三維、三個維度、三個坐標，即有長、有寬、有高，換句話說，就是立體的，是相對于只有長和寬的平面（2D）而言。

MoDELS · Processing（編程語言） · Automator · 講稿 · 數據集 ·

2024 年 12 月 20 日

Segmentation of arbitrary features in very high resolution remote sensing imagery

Henry Cording,Yves Plancherel,Pablo Brito-Parada

from arxiv, Main article: 18 pages, 9 figures; appendix: 17 pages, 9 figures

Very high resolution (VHR) mapping through remote sensing (RS) imagery presents a new opportunity to inform decision-making and sustainable practices in countless domains. Efficient processing of big VHR data requires automated tools applicable to numerous geographic regions and features. Contemporary RS studies address this challenge by employing deep learning (DL) models for specific datasets or features, which limits their applicability across contexts. The present research aims to overcome this limitation by introducing EcoMapper, a scalable solution to segment arbitrary features in VHR RS imagery. EcoMapper fully automates processing of geospatial data, DL model training, and inference. Models trained with EcoMapper successfully segmented two distinct features in a real-world UAV dataset, achieving scores competitive with prior studies which employed context-specific models. To evaluate EcoMapper, many additional models were trained on permutations of principal field survey characteristics (FSCs). A relationship was discovered allowing derivation of optimal ground sampling distance from feature size, termed Cording Index (CI). A comprehensive methodology for field surveys was developed to ensure DL methods can be applied effectively to collected data. The EcoMapper code accompanying this work is available at //github.com/hcording/ecomapper .

LIDAR · Ray · 跡 · 傳感器 · MoDELS ·

2024 年 12 月 19 日

LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation

Chenxu Zhou,Lvchang Fu,Sida Peng,Yunzhi Yan,Zhanhua Zhang,Yong Chen,Jiazhi Xia,Xiaowei Zhou

from arxiv, Project page: //zju3dv.github.io/lidar-rt

This paper targets the challenge of real-time LiDAR re-simulation in dynamic driving scenarios. Recent approaches utilize neural radiance fields combined with the physical modeling of LiDAR sensors to achieve high-fidelity re-simulation results. Unfortunately, these methods face limitations due to high computational demands in large-scale scenes and cannot perform real-time LiDAR rendering. To overcome these constraints, we propose LiDAR-RT, a novel framework that supports real-time, physically accurate LiDAR re-simulation for driving scenes. Our primary contribution is the development of an efficient and effective rendering pipeline, which integrates Gaussian primitives and hardware-accelerated ray tracing technology. Specifically, we model the physical properties of LiDAR sensors using Gaussian primitives with learnable parameters and incorporate scene graphs to handle scene dynamics. Building upon this scene representation, our framework first constructs a bounding volume hierarchy (BVH), then casts rays for each pixel and generates novel LiDAR views through a differentiable rendering algorithm. Importantly, our framework supports realistic rendering with flexible scene editing operations and various sensor configurations. Extensive experiments across multiple public benchmarks demonstrate that our method outperforms state-of-the-art methods in terms of rendering quality and efficiency. Our project page is at //zju3dv.github.io/lidar-rt.

Python · 數據集 · state-of-the-art · Integration · TOOLS ·

2024 年 12 月 19 日

PhotoHolmes: a Python library for forgery detection in digital images

Julián O'Flaherty,Rodrigo Paganini,Juan Pablo Sotelo,Julieta Umpiérrez,Marina Gardella,Matías Tailanian,Pablo Musé

In this paper, we introduce PhotoHolmes, an open-source Python library designed to easily run and benchmark forgery detection methods on digital images. The library includes implementations of popular and state-of-the-art methods, dataset integration tools, and evaluation metrics. Utilizing the Benchmark tool in PhotoHolmes, users can effortlessly compare various methods. This facilitates an accurate and reproducible comparison between their own methods and those in the existing literature. Furthermore, PhotoHolmes includes a command-line interface (CLI) to easily run the methods implemented in the library on any suspicious image. As such, image forgery methods become more accessible to the community. The library has been built with extensibility and modularity in mind, which makes adding new methods, datasets and metrics to the library a straightforward process. The source code is available at //github.com/photoholmes/photoholmes.

MoDELS · SimPLe · 多峰值 · INFORMS · Notability ·

2024 年 12 月 18 日

XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser

Xianfu Cheng,Hang Zhang,Jian Yang,Xiang Li,Weixiao Zhou,Fei Liu,Kui Wu,Xiangyuan Guan,Tao Sun,Xianjie Wu,Tongliang Li,Zhoujun Li

from arxiv, 15 pages, 8 figures, 8 tables

In the domain of Document AI, parsing semi-structured image form is a crucial Key Information Extraction (KIE) task. The advent of pre-trained multimodal models significantly empowers Document AI frameworks to extract key information from form documents in different formats such as PDF, Word, and images. Nonetheless, form parsing is still encumbered by notable challenges like subpar capabilities in multilingual parsing and diminished recall in industrial contexts in rich text and rich visuals. In this work, we introduce a simple but effective \textbf{M}ultimodal and \textbf{M}ultilingual semi-structured \textbf{FORM} \textbf{PARSER} (\textbf{XFormParser}), which anchored on a comprehensive Transformer-based pre-trained language model and innovatively amalgamates semantic entity recognition (SER) and relation extraction (RE) into a unified framework. Combined with Bi-LSTM, the performance of multilingual parsing is significantly improved. Furthermore, we develop InDFormSFT, a pioneering supervised fine-tuning (SFT) industrial dataset that specifically addresses the parsing needs of forms in various industrial contexts. XFormParser has demonstrated its unparalleled effectiveness and robustness through rigorous testing on established benchmarks. Compared to existing state-of-the-art (SOTA) models, XFormParser notably achieves up to 1.79\% F1 score improvement on RE tasks in language-specific settings. It also exhibits exceptional cross-task performance improvements in multilingual and zero-shot settings. The codes, datasets, and pre-trained models are publicly available at //github.com/zhbuaa0/xformparser.

近似 · 列 · Pivotal（公司） · 樣本 · 近似誤差 ·

2024 年 12 月 18 日

Adaptive randomized pivoting for column subset selection, DEIM, and low-rank approximation

Alice Cortinovis,Daniel Kressner

We derive a new adaptive leverage score sampling strategy for solving the Column Subset Selection Problem (CSSP). The resulting algorithm, called Adaptive Randomized Pivoting, can be viewed as a randomization of Osinsky's recently proposed deterministic algorithm for CSSP. It guarantees, in expectation, an approximation error that matches the optimal existence result in the Frobenius norm. Although the same guarantee can be achieved with volume sampling, our sampling strategy is much simpler and less expensive. To show the versatility of Adaptive Randomized Pivoting, we apply it to select indices in the Discrete Empirical Interpolation Method, in cross/skeleton approximation of general matrices, and in the Nystroem approximation of symmetric positive semi-definite matrices. In all these cases, the resulting randomized algorithms are new and they enjoy bounds on the expected error that match -- or improve -- the best known deterministic results. A derandomization of the algorithm for the Nystroem approximation results in a new deterministic algorithm with a rather favorable error bound.

MoDELS · 語言模型化 · 優化器 · Next · INFORMS ·

2024 年 12 月 18 日

What makes a good metric? Evaluating automatic metrics for text-to-image consistency

Candace Ross,Melissa Hall,Adriana Romero Soriano,Adina Williams

from arxiv, Accepted and presented at COLM 2024

Language models are increasingly being incorporated as components in larger AI systems for various purposes, from prompt optimization to automatic evaluation. In this work, we analyze the construct validity of four recent, commonly used methods for measuring text-to-image consistency - CLIPScore, TIFA, VPEval, and DSG - which rely on language models and/or VQA models as components. We define construct validity for text-image consistency metrics as a set of desiderata that text-image consistency metrics should have, and find that no tested metric satisfies all of them. We find that metrics lack sufficient sensitivity to language and visual properties. Next, we find that TIFA, VPEval and DSG contribute novel information above and beyond CLIPScore, but also that they correlate highly with each other. We also ablate different aspects of the text-image consistency metrics and find that not all model components are strictly necessary, also a symptom of insufficient sensitivity to visual information. Finally, we show that all three VQA-based metrics likely rely on familiar text shortcuts (such as yes-bias in QA) that call their aptitude as quantitative evaluations of model performance into question.

示例 · MoDELS · 優化器 · Performer · Better ·

2024 年 12 月 18 日

IDEQ: an improved diffusion model for the TSP

Mickael Basson,Philippe Preux

We investigate diffusion models to solve the Traveling Salesman Problem. Building on the recent DIFUSCO and T2TCO approaches, we propose IDEQ. IDEQ improves the quality of the solutions by leveraging the constrained structure of the state space of the TSP. Another key component of IDEQ consists in replacing the last stages of DIFUSCO curriculum learning by considering a uniform distribution over the Hamiltonian tours whose orbits by the 2-opt operator converge to the optimal solution as the training objective. Our experiments show that IDEQ improves the state of the art for such neural network based techniques on synthetic instances. More importantly, our experiments show that IDEQ performs very well on the instances of the TSPlib, a reference benchmark in the TSP community: it closely matches the performance of the best heuristics, LKH3, being even able to obtain better solutions than LKH3 on 2 instances of the TSPlib defined on 1577 and 3795 cities. IDEQ obtains 0.3% optimality gap on TSP instances made of 500 cities, and 0.5% on TSP instances with 1000 cities. This sets a new SOTA for neural based methods solving the TSP. Moreover, IDEQ exhibits a lower variance and better scales-up with the number of cities with regards to DIFUSCO and T2TCO.

標注 · 情景 · prototype · MoDELS · 特征選擇 ·

2024 年 12 月 18 日

Modelling Multi-modal Cross-interaction for ML-FSIC Based on Local Feature Selection

Kun Yan,Zied Bouraoui,Fangyun Wei,Chang Xu,Ping Wang,Shoaib Jameel,Steven Schockaert

from arxiv, Accepted in Transactions on Multimedia Computing Communications and Applications

The aim of multi-label few-shot image classification (ML-FSIC) is to assign semantic labels to images, in settings where only a small number of training examples are available for each label. A key feature of the multi-label setting is that images often have several labels, which typically refer to objects appearing in different regions of the image. When estimating label prototypes, in a metric-based setting, it is thus important to determine which regions are relevant for which labels, but the limited amount of training data and the noisy nature of local features make this highly challenging. As a solution, we propose a strategy in which label prototypes are gradually refined. First, we initialize the prototypes using word embeddings, which allows us to leverage prior knowledge about the meaning of the labels. Second, taking advantage of these initial prototypes, we then use a Loss Change Measurement~(LCM) strategy to select the local features from the training images (i.e.\ the support set) that are most likely to be representative of a given label. Third, we construct the final prototype of the label by aggregating these representative local features using a multi-modal cross-interaction mechanism, which again relies on the initial word embedding-based prototypes. Experiments on COCO, PASCAL VOC, NUS-WIDE, and iMaterialist show that our model substantially improves the current state-of-the-art.

圖 · Color · 邊 · 稀疏 · Continuity ·

2024 年 12 月 18 日

Flexible realizations existence: NP-completeness on sparse graphs and algorithms

Petr La?tovi?ka,Jan Legersky

One of the questions in Rigidity Theory is whether a realization of the vertices of a graph in the plane is flexible, namely, if it allows a continuous deformation preserving the edge lengths. A flexible realization of a connected graph in the plane exists if and only if the graph has a so called NAC-coloring, which is surjective edge coloring by two colors such that for each cycle either all the edges have the same color or there are at least two edges of each color. The question whether a graph has a NAC-coloring, and hence also the existence of a flexible realization, has been proven to be NP-complete. We show that this question is also NP-complete on graphs with maximum degree five and on graphs with the average degree at most $4+\varepsilon$ for every fixed $\varepsilon >0$. The existence of a NAC-coloring is fixed parameter tractable when parametrized by treewidth. Since the only existing implementation of checking the existence of a NAC-coloring is rather naive, we propose new algorithms along with their implementation, which is significantly faster. We also focus on searching all NAC-colorings of a graph, since they provide useful information about its possible flexible realizations.

Performer · Color · Networking · CRAFT · 均方誤差 ·

2018 年 1 月 25 日

C2MSNet: A Novel approach for single image haze removal

Akshay Dudhane,Subrahmanyam Murala

from arxiv, Accepted in Winter Conference on Applications of Computer Vision (WACV-2018)

Degradation of image quality due to the presence of haze is a very common phenomenon. Existing DehazeNet [3], MSCNN [11] tackled the drawbacks of hand crafted haze relevant features. However, these methods have the problem of color distortion in gloomy (poor illumination) environment. In this paper, a cardinal (red, green and blue) color fusion network for single image haze removal is proposed. In first stage, network fusses color information present in hazy images and generates multi-channel depth maps. The second stage estimates the scene transmission map from generated dark channels using multi channel multi scale convolutional neural network (McMs-CNN) to recover the original scene. To train the proposed network, we have used two standard datasets namely: ImageNet [5] and D-HAZY [1]. Performance evaluation of the proposed approach has been carried out using structural similarity index (SSIM), mean square error (MSE) and peak signal to noise ratio (PSNR). Performance analysis shows that the proposed approach outperforms the existing state-of-the-art methods for single image dehazing.