亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

·

Performer · 相關系數 · Analysis · 數據分析 · 自然語言處理 ·

2023 年 12 月 19 日

Is post-editing really faster than human translation?

Silvia Terribile

from arxiv, 30 pages, 11 tables, 7 figures. This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the published version at: //doi.org/10.1075/ts.22044.ter

Time efficiency is paramount for the localisation industry, which demands ever-faster turnaround times. However, translation speed is largely underresearched, and there is a lack of clarity about how language service providers (LSPs) can evaluate the performance of their post-editing (PE) and human translation (HT) services. This study constitutes the first large-scale investigation of translation and revision speed in HT and in the PE of neural machine translation, based on real-world data from an LSP. It uses an exploratory data analysis approach to investigate data for 90 million words translated by 879 linguists across 11 language pairs, over 2.5 years. The results of this research indicate that (a) PE is usually but not always faster than HT; (b) average speed values may be misleading; (c) translation speed is highly variable; and (d) edit distance cannot be used as a proxy for post-editing productivity, because it does not correlate strongly with speed.

相關內容

Performer

評論員 · 數據集 · 音素 · 相似度 · Prompt ·

2024 年 2 月 8 日

Phonetically rich corpus construction for a low-resourced language

Marcellus Amadeus,William Alberto Cruz Casta?eda,Wilmer Lobato,Niasche Aquino

Speech technologies rely on capturing a speaker's voice variability while obtaining comprehensive language information. Textual prompts and sentence selection methods have been proposed in the literature to comprise such adequate phonetic data, referred to as a phonetically rich \textit{corpus}. However, they are still insufficient for acoustic modeling, especially critical for languages with limited resources. Hence, this paper proposes a novel approach and outlines the methodological aspects required to create a \textit{corpus} with broad phonetic coverage for a low-resourced language, Brazilian Portuguese. Our methodology includes text dataset collection up to a sentence selection algorithm based on triphone distribution. Furthermore, we propose a new phonemic classification according to acoustic-articulatory speech features since the absolute number of distinct triphones, or low-probability triphones, does not guarantee an adequate representation of every possible combination. Using our algorithm, we achieve a 55.8\% higher percentage of distinct triphones -- for samples of similar size -- while the currently available phonetic-rich corpus, CETUC and TTS-Portuguese, 12.6\% and 12.3\% in comparison to a non-phonetically rich dataset.

AI · Principle · 設計 · Continuity · Conformer ·

2024 年 2 月 8 日

POLARIS: A framework to guide the development of Trustworthy AI systems

Maria Teresa Baldassarre,Domenico Gigante,Marcos Kalinowski,Azzurra Ragone

In the ever-expanding landscape of Artificial Intelligence (AI), where innovation thrives and new products and services are continuously being delivered, ensuring that AI systems are designed and developed responsibly throughout their entire lifecycle is crucial. To this end, several AI ethics principles and guidelines have been issued to which AI systems should conform. Nevertheless, relying solely on high-level AI ethics principles is far from sufficient to ensure the responsible engineering of AI systems. In this field, AI professionals often navigate by sight. Indeed, while recommendations promoting Trustworthy AI (TAI) exist, these are often high-level statements that are difficult to translate into concrete implementation strategies. There is a significant gap between high-level AI ethics principles and low-level concrete practices for AI professionals. To address this challenge, our work presents an experience report where we develop a novel holistic framework for Trustworthy AI - designed to bridge the gap between theory and practice - and report insights from its application in an industrial case study. The framework is built on the result of a systematic review of the state of the practice, a survey, and think-aloud interviews with 34 AI practitioners. The framework, unlike most of those already in the literature, is designed to provide actionable guidelines and tools to support different types of stakeholders throughout the entire Software Development Life Cycle (SDLC). Our goal is to empower AI professionals to confidently navigate the ethical dimensions of TAI through practical insights, ensuring that the vast potential of AI is exploited responsibly for the benefit of society as a whole.

INFORMS · Integration · 語言模型化 · 有向 · 大語言模型 ·

2024 年 2 月 7 日

Navigating the Knowledge Sea: Planet-scale answer retrieval using LLMs

Dipankar Sarkar

Information retrieval is a rapidly evolving field of information retrieval, which is characterized by a continuous refinement of techniques and technologies, from basic hyperlink-based navigation to sophisticated algorithm-driven search engines. This paper aims to provide a comprehensive overview of the evolution of Information Retrieval Technology, with a particular focus on the role of Large Language Models (LLMs) in bridging the gap between traditional search methods and the emerging paradigm of answer retrieval. The integration of LLMs in the realms of response retrieval and indexing signifies a paradigm shift in how users interact with information systems. This paradigm shift is driven by the integration of large language models (LLMs) like GPT-4, which are capable of understanding and generating human-like text, thus enabling them to provide more direct and contextually relevant answers to user queries. Through this exploration, we seek to illuminate the technological milestones that have shaped this journey and the potential future directions in this rapidly changing field.

QoS · 資源管理 · 樣例 · Excel · 同質 ·

2024 年 2 月 7 日

Leveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding

Luis Costero,Francisco D. Igual,Katzalin Olcoz,Francisco Tirado

The coexistence of parallel applications in shared computing nodes, each one featuring different Quality of Service (QoS) requirements, carries out new challenges to improve resource occupation while keeping acceptable rates in terms of QoS. As more application-specific and system-wide metrics are included as QoS dimensions, or under situations in which resource-usage limits are strict, building and serving the most appropriate set of actions (application control knobs and system resource assignment) to concurrent applications in an automatic and optimal fashion becomes mandatory. In this paper, we propose strategies to build and serve this type of knowledge to concurrent applications by leveraging Reinforcement Learning techniques. Taking multi-user video transcoding as a driving example, our experimental results reveal an excellent adaptation of resource and knob management to heterogeneous QoS requests, and increases in the amount of concurrently served users up to 1.24x compared with alternative approaches considering homogeneous QoS requests.

Google Scholar · Google · bulk · 示例 · 編譯器 ·

2024 年 2 月 7 日

Google Scholar is manipulatable

Hazem Ibrahim,Fengyuan Liu,Yasir Zaki,Talal Rahwan

Citations are widely considered in scientists' evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and citation cartels, it remains unclear whether scientists can purchase citations. Here, we compile a dataset of ~1.6 million profiles on Google Scholar to examine instances of citation fraud on the platform. We survey faculty at highly-ranked universities, and confirm that Google Scholar is widely used when evaluating scientists. Intrigued by a citation-boosting service that we unravelled during our investigation, we contacted the service while undercover as a fictional author, and managed to purchase 50 citations. These findings provide conclusive evidence that citations can be bought in bulk, and highlight the need to look beyond citation counts.

語言模型化 · TOOLS · MoDELS · 統計量 · AI ·

2024 年 2 月 6 日

AI language models as role-playing tools, not human participants

from arxiv, 6 pages, 1 table

Advances in AI invite misuse of language models as replacements for human participants. We argue that treating their responses as glimpses into an average human mind fundamentally mischaracterizes these statistical algorithms and that language models should be embraced as flexible simulation tools, able to mimic diverse behaviors without possessing human traits themselves.

AI · Learning · Agent · AGI · Principle ·

2024 年 2 月 6 日

A call for embodied AI

Giuseppe Paolo,Jonas Gonzalez-Billandon,Balázs Kégl

from arxiv, Submitted to ICML 2024 Position paper track

We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence, juxtaposing it against current AI advancements, particularly Large Language Models. We traverse the evolution of the embodiment concept across diverse fields - philosophy, psychology, neuroscience, and robotics - to highlight how EAI distinguishes itself from the classical paradigm of static learning. By broadening the scope of Embodied AI, we introduce a theoretical framework based on cognitive architectures, emphasizing perception, action, memory, and learning as essential components of an embodied agent. This framework is aligned with Friston's active inference principle, offering a comprehensive approach to EAI development. Despite the progress made in the field of AI, substantial challenges, such as the formulation of a novel AI learning theory and the innovation of advanced hardware, persist. Our discussion lays down a foundational guideline for future Embodied AI research. Highlighting the importance of creating Embodied AI agents capable of seamless communication, collaboration, and coexistence with humans and other intelligent entities within real-world environments, we aim to steer the AI community towards addressing the multifaceted challenges and seizing the opportunities that lie ahead in the quest for AGI.

Taxonomy · 推斷 · 講稿 ·

2024 年 2 月 6 日

Privacy risk in GeoData: A survey

Mahrokh Abdollahi Lorestani,Thilina Ranbaduge,Thierry Rakotoarivelo

With the ubiquitous use of location-based services, large-scale individual-level location data has been widely collected through location-awareness devices. The exposure of location data constitutes a significant privacy risk to users as it can lead to de-anonymisation, the inference of sensitive information, and even physical threats. Geoprivacy concerns arise on the issues of user identity de-anonymisation and location exposure. In this survey, we analyse different geomasking techniques that have been proposed to protect the privacy of individuals in geodata. We present a taxonomy to characterise these techniques along different dimensions, and conduct a survey of geomasking techniques. We then highlight shortcomings of current techniques and discuss avenues for future research.

Attention · 支持向量機 · 長短期記憶網絡 · 支持向量 · SimPLe ·

2024 年 2 月 5 日

An Attention Long Short-Term Memory based system for automatic classification of speech intelligibility

Miguel Fernández-Díaz,Ascensión Gallardo-Antolín

Speech intelligibility can be degraded due to multiple factors, such as noisy environments, technical difficulties or biological conditions. This work is focused on the development of an automatic non-intrusive system for predicting the speech intelligibility level in this latter case. The main contribution of our research on this topic is the use of Long Short-Term Memory (LSTM) networks with log-mel spectrograms as input features for this purpose. In addition, this LSTM-based system is further enhanced by the incorporation of a simple attention mechanism that is able to determine the more relevant frames to this task. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. Results show that the attention LSTM architecture outperforms both, a reference Support Vector Machine (SVM)-based system with hand-crafted features and a LSTM-based system with Mean-Pooling.

Machine Learning · 語音識別 · Learning · 近鄰 · MATLAB ·

2024 年 2 月 1 日

Introduction to speech recognition

Gabriel Dauphin

from arxiv, in French language

This document contains lectures and practical experimentations using Matlab and implementing a system which is actually correctly classifying three words (one, two and three) with the help of a very small database. To achieve this performance, it uses speech modeling specificities, powerful computer algorithms (dynamic time warping and Dijktra's algorithm) and machine learning (nearest neighbor). This document introduces also some machine learning evaluation metrics.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

自然語言處理

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<dir id='x1we7'><del id='7A3yJ'><del id='17AX6'></del><pre id='DM8ZB'><pre id='IxitI'><option id='iDll5'><address id='eBMZ4'></address><bdo id='UQpEx'><tr id='NrtcI'><acronym id='FIIk6'><pre id='444x3'></pre></acronym><div id='aRcFY'></div></tr></bdo></option></pre><small id='DoQEE'><address id='yuL5l'><u id='XxDCX'><legend id='qegVd'><option id='K5bqa'><abbr id='kIo84'></abbr><li id='cOTQ4'><pre id='zKZHj'></pre></li></option></legend><select id='l7O0i'></select></u></address></small></pre></del><sup id='FABJL'></sup><blockquote id='6LqCn'><dt id='JnADu'></dt></blockquote><blockquote id='1fOai'></blockquote></dir><tt id='jNWIy'></tt><u id='PoIL3'><tt id='NpbiE'><form id='UgOv8'></form></tt><td id='BxrDt'><dt id='lkvnt'></dt></td></u>

<code id='apD4H'><i id='1dbhn'><q id='fNwTH'><legend id='EpNwq'><pre id='piLn2'><style id='CsAhk'><acronym id='xO4Uy'><i id='ejoSl'><form id='9fFup'><option id='WmN2X'><center id='INR0s'></center></option></form></i></acronym></style><tt id='2Npbf'></tt></pre></legend></q></i></code><center id='IM48Z'></center>

<dd id='qFxwZ'></dd>

<style id='tLO8G'></style><sub id='SOga2'><dfn id='k8GZF'><abbr id='ao84X'><big id='TkMtz'><bdo id='U7Dxn'></bdo></big></abbr></dfn></sub>_{<dir id='WkSWN'></dir>}