午夜剧场成年免费视_亚洲无码精品动漫啪啪一区二区_国产美女遭强高潮网站下载_国产老肥婆视频二区_亚洲色欲色香天天综合网_2022国产在线无码精品_欧美日韩变态另类校园

The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 控制器 · Processing（編程語言） · 機器人 · 潛在 ·

2023 年 5 月 8 日

Controlled Gaussian Process Dynamical Models with Application to Robotic Cloth Manipulation

Fabio Amadio,Juan Antonio Delgado-Guerrero,Adrià Colomé,Carme Torras

from arxiv, Accepted by the International Journal of Dynamics and Control. Code is publicly available at //github.com/fabio-amadio/cgpdm_lib

Over the last years, significant advances have been made in robotic manipulation, but still, the handling of non-rigid objects, such as cloth garments, is an open problem. Physical interaction with non-rigid objects is uncertain and complex to model. Thus, extracting useful information from sample data can considerably improve modeling performance. However, the training of such models is a challenging task due to the high-dimensionality of the state representation. In this paper, we propose Controlled Gaussian Process Dynamical Model (CGPDM) for learning high-dimensional, nonlinear dynamics by embedding it in a low-dimensional manifold. A CGPDM is constituted by a low-dimensional latent space, with an associated dynamics where external control variables can act and a mapping to the observation space. The parameters of both maps are marginalized out by considering Gaussian Process (GP) priors. Hence, a CGPDM projects a high-dimensional state space into a smaller dimension latent space, in which it is feasible to learn the system dynamics from training data. The modeling capacity of CGPDM has been tested in both a simulated and a real scenario, where it proved to be capable of generalizing over a wide range of movements and confidently predicting the cloth motions obtained by previously unseen sequences of control actions.

穩健性 · 噪聲 · Less · SimPLe · Analysis ·

2023 年 5 月 8 日

Larger Offspring Populations Help the $(1 + (λ, λ))$ Genetic Algorithm to Overcome the Noise

Alexandra Ivanova,Denis Antipov,Benjamin Doerr

from arxiv, Author-generated version of the same paper published at GECCO 2023

Evolutionary algorithms are known to be robust to noise in the evaluation of the fitness. In particular, larger offspring population sizes often lead to strong robustness. We analyze to what extent the $(1+(\lambda,\lambda))$ genetic algorithm is robust to noise. This algorithm also works with larger offspring population sizes, but an intermediate selection step and a non-standard use of crossover as repair mechanism could render this algorithm less robust than, e.g., the simple $(1+\lambda)$ evolutionary algorithm. Our experimental analysis on several classic benchmark problems shows that this difficulty does not arise. Surprisingly, in many situations this algorithm is even more robust to noise than the $(1+\lambda)$~EA.

Taxonomy · 知識 (knowledge) · 數據集 · Analysis · 有向 ·

2023 年 5 月 8 日

Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks

Junyu Lu,Bo Xu,Xiaokun Zhang,Changrong Min,Liang Yang,Hongfei Lin

from arxiv, 13 pages, 4 figures. The paper has been accepted in ACL 2023

The widespread dissemination of toxic online posts is increasingly damaging to society. However, research on detecting toxic language in Chinese has lagged significantly. Existing datasets lack fine-grained annotation of toxic types and expressions, and ignore the samples with indirect toxicity. In addition, it is crucial to introduce lexical knowledge to detect the toxicity of posts, which has been a challenge for researchers. In this paper, we facilitate the fine-grained detection of Chinese toxic language. First, we built Monitor Toxic Frame, a hierarchical taxonomy to analyze toxic types and expressions. Then, a fine-grained dataset ToxiCN is presented, including both direct and indirect toxic samples. We also build an insult lexicon containing implicit profanity and propose Toxic Knowledge Enhancement (TKE) as a benchmark, incorporating the lexical feature to detect toxic language. In the experimental stage, we demonstrate the effectiveness of TKE. After that, a systematic quantitative and qualitative analysis of the findings is given.

INTERACT · 計算機科學 · 排序 · INFORMS · state-of-the-art ·

2023 年 5 月 7 日

Exploiting Simulated User Feedback for Conversational Search: Ranking, Rewriting, and Beyond

Paul Owoicho,Ivan Sekuli?,Mohammad Aliannejadi,Jeffrey Dalton,Fabio Crestani

from arxiv, 11 pages, 2 figures, to be published in SIGIR 2023

This research aims to explore various methods for assessing user feedback in mixed-initiative conversational search (CS) systems. While CS systems enjoy profuse advancements across multiple aspects, recent research fails to successfully incorporate feedback from the users. One of the main reasons for that is the lack of system-user conversational interaction data. To this end, we propose a user simulator-based framework for multi-turn interactions with a variety of mixed-initiative CS systems. Specifically, we develop a user simulator, dubbed ConvSim, that, once initialized with an information need description, is capable of providing feedback to a system's responses, as well as answering potential clarifying questions. Our experiments on a wide variety of state-of-the-art passage retrieval and neural re-ranking models show that effective utilization of user feedback can lead to 16% retrieval performance increase in terms of nDCG@3. Moreover, we observe consistent improvements as the number of feedback rounds increases (35% relative improvement in terms of nDCG@3 after three rounds). This points to a research gap in the development of specific feedback processing modules and opens a potential for significant advancements in CS. To support further research in the topic, we release over 30,000 transcripts of system-simulator interactions based on well-established CS datasets.

Storage · INFORMS · MoDELS · 數據集 · Swift ·

2023 年 5 月 6 日

Sherlock in OSS: A Novel Approach of Content-Based Searching in Object Storage System

Jannatun Noor,Rizwanul Haque Ratul,Mir Rownak Ali Uday,Joyanta Jyoti Mondal,Md. Sadiqul Islam Sakif,A. B. M. Alim Al Islam

from arxiv, 13 Pages, 9 Figures, Submitted to IEEE Transactions on Parallel and Distributed Systems for possible publication. arXiv admin note: substantial text overlap with arXiv:1306.3075 by other authors; substantial text overlap with arXiv:1910.05786 by other authors without attribution

Object Storage Systems (OSS) inside a cloud promise scalability, durability, availability, and concurrency. However, open-source OSS does not have a specific approach to letting users and administrators search based on the data, which is contained inside the object storage, without involving the entire cloud infrastructure. Therefore, in this paper, we propose Sherlock, a novel Content-Based Searching (CoBS) architecture to extract additional information from images and documents. Here, we store the additional information in an Elasticsearch-enabled database, which helps us to search for our desired data based on its contents. This approach works in two sequential stages. First, the data will be uploaded to a classifier that will determine the data type and send it to the specific model for the data. Here, the images that are being uploaded are sent to our trained model for object detection, and the documents are sent for keyword extraction. Next, the extracted information is sent to Elasticsearch, which enables searching based on the contents. Because the precision of the models is so fundamental to the search's correctness, we train our models with comprehensive datasets (Microsoft COCO Dataset for multimedia data and SemEval2017 Dataset for document data). Furthermore, we put our designed architecture to the test with a real-world implementation of an open-source OSS called OpenStack Swift. We upload images into the dataset of our implementation in various segments to find out the efficacy of our proposed model in real-life Swift object storage.

示例 · 類別 · Learning · Extensibility · 可約的 ·

2023 年 5 月 5 日

Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks

Xiaohui Wan,Zheng Zheng,Fangyun Qin,Xuhui Lu

Defect prediction is crucial for software quality assurance and has been extensively researched over recent decades. However, prior studies rarely focus on data complexity in defect prediction tasks, and even less on understanding the difficulties of these tasks from the perspective of data complexity. In this paper, we conduct an empirical study to estimate the hardness of over 33,000 instances, employing a set of measures to characterize the inherent difficulty of instances and the characteristics of defect datasets. Our findings indicate that: (1) instance hardness in both classes displays a right-skewed distribution, with the defective class exhibiting a more scattered distribution; (2) class overlap is the primary factor influencing instance hardness and can be characterized through feature, structural, instance, and multiresolution overlap; (3) no universal preprocessing technique is applicable to all datasets, and it may not consistently reduce data complexity, fortunately, dataset complexity measures can help identify suitable techniques for specific datasets; (4) integrating data complexity information into the learning process can enhance an algorithm's learning capacity. In summary, this empirical study highlights the crucial role of data complexity in defect prediction tasks, and provides a novel perspective for advancing research in defect prediction techniques.

學習器 · Learning · MoDELS · 小樣本學習 · 講稿 ·

2023 年 5 月 4 日

Can In-context Learners Learn a Reasoning Concept from Demonstrations?

Michal ?tefánik,Marek Kadl?ík

Large language models show an emergent ability to learn a new task from a small number of input-output demonstrations. However, recent work shows that in-context learners largely rely on their pre-trained knowledge, such as the sentiment of the labels, instead of finding new associations in the input. However, the commonly-used few-shot evaluation settings using a random selection of in-context demonstrations can not disentangle models' ability to learn a new skill from demonstrations, as most of the randomly-selected demonstrations do not present relations informative for prediction beyond exposing the new task distribution. To disentangle models' in-context learning ability independent of models' memory, we introduce a Conceptual few-shot learning method selecting the demonstrations sharing a possibly-informative concept with the predicted sample. We extract a set of such concepts from annotated explanations and measure how much can models benefit from presenting these concepts in few-shot demonstrations. We find that smaller models are more sensitive to the presented concepts. While some of the models are able to benefit from concept-presenting demonstrations for each assessed concept, we find that none of the assessed in-context learners can benefit from all presented reasoning concepts consistently, leaving the in-context concept learning an open challenge.

MoDELS · Performer · Extensibility · CASES · 數據集 ·

2023 年 5 月 4 日

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Vittorio Pippi,Silvia Cascianelli,Christopher Kermorvant,Rita Cucchiara

from arxiv, Accepted at ICDAR2023

Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets. Nonetheless, those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting. This issue is very relevant for valuable but small collections of documents preserved in historical archives, for which obtaining sufficient annotated training data is costly or, in some cases, unfeasible. To overcome this challenge, a possible solution is to pretrain HTR models on large datasets and then fine-tune them on small single-author collections. In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model. Through extensive experimental analysis, also considering the amount of fine-tuning lines, we give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines.

可辨認的 · 可理解性 · 區塊鏈 · 塊 · Cyberspace ·

2023 年 5 月 4 日

Understand Waiting Time in Transaction Fee Mechanism: An Interdisciplinary Perspective

Luyao Zhang,Fan Zhang

Blockchain enables peer-to-peer transactions in cyberspace without a trusted third party. The rapid growth of Ethereum and smart contract blockchains generally calls for well-designed Transaction Fee Mechanisms (TFMs) to allocate limited storage and computation resources. However, existing research on TFMs must consider the waiting time for transactions, which is essential for computer security and economic efficiency. Integrating data from the Ethereum blockchain and memory pool (mempool), we explore how two types of events affect transaction latency. First, we apply regression discontinuity design (RDD) to study the causal inference of the Merge, the most recent significant upgrade of Ethereum. Our results show that the Merge significantly reduces the long waiting time, network loads, and market congestion. In addition, we verify our results' robustness by inspecting other compounding factors, such as censorship and unobserved delays of transactions via private changes. Second, examining three major protocol changes during the merge, we identify block interval shortening as the most plausible cause for our empirical results. Furthermore, in a mathematical model, we show block interval as a unique mechanism design choice for EIP1559 TFM to achieve better security and efficiency, generally applicable to the market congestion caused by demand surges. Finally, we apply time series analysis to research the interaction of Non-Fungible token (NFT) drops and market congestion using Facebook Prophet, an open-source algorithm for generating time-series models. Our study identified NFT drops as a unique source of market congestion -- holiday effects -- beyond trend and season effects. Finally, we envision three future research directions of TFM.

MoDELS · 語言模型化 · Prompt · 相關系數 · Analysis ·

2023 年 5 月 4 日

Language, Time Preferences, and Consumer Behavior: Evidence from Large Language Models

Ali Goli,Amandeep Singh

Language has a strong influence on our perceptions of time and rewards. This raises the question of whether large language models, when asked in different languages, show different preferences for rewards over time and if their choices are similar to those of humans. In this study, we analyze the responses of GPT-3.5 (hereafter referred to as GPT) to prompts in multiple languages, exploring preferences between smaller, sooner rewards and larger, later rewards. Our results show that GPT displays greater patience when prompted in languages with weak future tense references (FTR), such as German and Mandarin, compared to languages with strong FTR, like English and French. These findings are consistent with existing literature and suggest a correlation between GPT's choices and the preferences of speakers of these languages. However, further analysis reveals that the preference for earlier or later rewards does not systematically change with reward gaps, indicating a lexicographic preference for earlier payments. While GPT may capture intriguing variations across languages, our findings indicate that the choices made by these models do not correspond to those of human decision-makers.