精品自在线观看影片天天看,国产又色又爽又黄剌激视频,亚洲自偷拍狠无码,91国在线啪精品一区,日韩欧美一区二区在线免费观看

With the emergence of Machine Learning, there has been a surge in leveraging its capabilities for problem-solving across various domains. In the code clone realm, the identification of type-4 or semantic clones has emerged as a crucial yet challenging task. Researchers aim to utilize Machine Learning to tackle this challenge, often relying on the BigCloneBench dataset. However, it's worth noting that BigCloneBench, originally not designed for semantic clone detection, presents several limitations that hinder its suitability as a comprehensive training dataset for this specific purpose. Furthermore, CLCDSA dataset suffers from a lack of reusable examples aligning with real-world software systems, rendering it inadequate for cross-language clone detection approaches. In this work, we present a comprehensive semantic clone and cross-language clone benchmark, GPTCloneBench by exploiting SemanticCloneBench and OpenAI's GPT-3 model. In particular, using code fragments from SemanticCloneBench as sample inputs along with appropriate prompt engineering for GPT-3 model, we generate semantic and cross-language clones for these specific fragments and then conduct a combination of extensive manual analysis, tool-assisted filtering, functionality testing and automated validation in building the benchmark. From 79,928 clone pairs of GPT-3 output, we created a benchmark with 37,149 true semantic clone pairs, 19,288 false semantic pairs(Type-1/Type-2), and 20,770 cross-language clones across four languages (Java, C, C#, and Python). Our benchmark is 15-fold larger than SemanticCloneBench, has more functional code examples for software systems and programming language support than CLCDSA, and overcomes BigCloneBench's qualities, quantification, and language variety limitations.

相關內容

GPT-3

關注 4

MoDELS · 樣本 · NLP · 數據集 · 論文 ·

2023 年 10 月 18 日

Grounded and Well-rounded: A Methodological Approach to the Study of Cross-modal and Cross-lingual Grounding

Timothee Mickus,Elaine Zosa,Denis Paperno

from arxiv, accepted to Findings of EMNLP 2023

Grounding has been argued to be a crucial component towards the development of more complete and truly semantically competent artificial intelligence systems. Literature has divided into two camps: While some argue that grounding allows for qualitatively different generalizations, others believe it can be compensated by mono-modal data quantity. Limited empirical evidence has emerged for or against either position, which we argue is due to the methodological challenges that come with studying grounding and its effects on NLP systems. In this paper, we establish a methodological framework for studying what the effects are - if any - of providing models with richer input sources than text-only. The crux of it lies in the construction of comparable samples of populations of models trained on different input modalities, so that we can tease apart the qualitative effects of different input sources from quantifiable model performances. Experiments using this framework reveal qualitative differences in model behavior between cross-modally grounded, cross-lingually grounded, and ungrounded models, which we measure both at a global dataset level as well as for specific word representations, depending on how concrete their semantics is.

Processing（編程語言） · 變換 · 語言處理 · 自然語言處理 · 可理解性 ·

2023 年 10 月 18 日

Field-testing items using artificial intelligence: Natural language processing with transformers

Hotaka Maeda

Five thousand variations of the RoBERTa model, an artificially intelligent "transformer" that can understand text language, completed an English literacy exam with 29 multiple-choice questions. Data were used to calculate the psychometric properties of the items, which showed some degree of agreement to those obtained from human examinee data.

泛函 · 情景 · 可辨認的 · MoDELS · 可約的 ·

2023 年 10 月 17 日

A Kripke-Lewis semantics for belief update and belief revision

Giacomo Bonanno

from arxiv, 37 pages

We provide a new characterization of both belief update and belief revision in terms of a Kripke-Lewis semantics. We consider frames consisting of a set of states, a Kripke belief relation and a Lewis selection function. Adding a valuation to a frame yields a model. Given a model and a state, we identify the initial belief set K with the set of formulas that are believed at that state and we identify either the updated belief set or the revised belief set, prompted by the input represented by formula A, as the set of formulas that are the consequent of conditionals that (1) are believed at that state and (2) have A as antecedent. We show that this class of models characterizes both the Katsuno-Mendelzon (KM) belief update functions and the AGM belief revision functions, in the following sense: (1) each model gives rise to a partial belief function that can be completed into a full KM/AGM update/revision function, and (2) for every KM/AGM update/revision function there is a model whose associated belief function coincides with it. The difference between update and revision can be reduced to two semantic properties that appear in a stronger form in revision relative to update, thus confirming the finding by Peppas et al. (1996) that, "for a fixed theory K, revising K is much the same as updating K"

泛函 · 卷積 · Reverberation · 最大后驗 · MoDELS ·

2023 年 10 月 17 日

RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function

Pengyu Wang,Xiaofei Li

from arxiv, Submitted to ICASSP2024

In indoor scenes, reverberation is a crucial factor in degrading the perceived quality and intelligibility of speech. In this work, we propose a generative dereverberation method. Our approach is based on a probabilistic model utilizing a recurrent variational auto-encoder (RVAE) network and the convolutive transfer function (CTF) approximation. Different from most previous approaches, the output of our RVAE serves as the prior of the clean speech. And our target is the maximum a posteriori (MAP) estimation of clean speech, which is achieved iteratively through the expectation maximization (EM) algorithm. The proposed method integrates the capabilities of network-based speech prior modelling and CTF-based observation modelling. Experiments on single-channel speech dereverberation show that the proposed generative method noticeably outperforms the advanced discriminative networks.

哈希學習 · 泛函 · 平滑 · 圖 · 混合 ·

2023 年 10 月 16 日

The supersingular Endomorphism Ring and One Endomorphism problems are equivalent

Aurel Page,Benjamin Wesolowski

The supersingular Endomorphism Ring problem is the following: given a supersingular elliptic curve, compute all of its endomorphisms. The presumed hardness of this problem is foundational for isogeny-based cryptography. The One Endomorphism problem only asks to find a single non-scalar endomorphism. We prove that these two problems are equivalent, under probabilistic polynomial time reductions. We prove a number of consequences. First, assuming the hardness of the endomorphism ring problem, the Charles--Goren--Lauter hash function is collision resistant, and the SQIsign identification protocol is sound. Second, the endomorphism ring problem is equivalent to the problem of computing arbitrary isogenies between supersingular elliptic curves, a result previously known only for isogenies of smooth degree. Third, there exists an unconditional probabilistic algorithm to solve the endomorphism ring problem in time O~(sqrt(p)), a result that previously required to assume the generalized Riemann hypothesis. To prove our main result, we introduce a flexible framework for the study of isogeny graphs with additional information. We prove a general and easy-to-use rapid mixing theorem. The proof of this result goes via an augmented Deuring correspondence and the Jacquet-Langlands correspondence.

有限差分 · 離散化 · FAST · 塊 · Extensibility ·

2023 年 10 月 15 日

On iterative methods based on Sherman-Morrison-Woodbury splitting

Dimitrios Mitsotakis

We consider a new splitting based on the Sherman-Morrison-Woodbury formula, which is particularly effective with iterative methods for the numerical solution of large linear systems. These systems involve matrices that are perturbations of circulant or block circulant matrices, which commonly arise in the discretization of differential equations using finite element or finite difference methods. We prove the convergence of the new iteration without making any assumptions regarding the symmetry or diagonal-dominance of the matrix. To illustrate the efficacy of the new iteration we present various applications. These include extensions of the new iteration to block matrices that arise in certain saddle point problems as well as two-dimensional finite difference discretizations. The new method exhibits fast convergence in all of the test cases we used. It has minimal storage requirements, straightforward implementation and compatibility with nearly circulant matrices via the Fast Fourier Transform. For this reasons it can be a valuable tool for the solution of various finite element and finite difference discretizations of differential equations.

近似 · 成比例 · 線性回歸 · 線性的 · 均值 ·

2023 年 10 月 15 日

Sub-optimality of the Naive Mean Field approximation for proportional high-dimensional Linear Regression

Jiaze Qiu

The Na\"ive Mean Field (NMF) approximation is widely employed in modern Machine Learning due to the huge computational gains it bestows on the statistician. Despite its popularity in practice, theoretical guarantees for high-dimensional problems are only available under strong structural assumptions (e.g., sparsity). Moreover, existing theory often does not explain empirical observations noted in the existing literature. In this paper, we take a step towards addressing these problems by deriving sharp asymptotic characterizations for the NMF approximation in high-dimensional linear regression. Our results apply to a wide class of natural priors and allow for model mismatch (i.e., the underlying statistical model can be different from the fitted model). We work under an \textit{iid} Gaussian design and the proportional asymptotic regime, where the number of features and the number of observations grow at a proportional rate. As a consequence of our asymptotic characterization, we establish two concrete corollaries: (a) we establish the inaccuracy of the NMF approximation for the log-normalizing constant in this regime, and (b) we provide theoretical results backing the empirical observation that the NMF approximation can be overconfident in terms of uncertainty quantification. Our results utilize recent advances in the theory of Gaussian comparison inequalities. To the best of our knowledge, this is the first application of these ideas to the analysis of Bayesian variational inference problems. Our theoretical results are corroborated by numerical experiments. Lastly, we believe our results can be generalized to non-Gaussian designs and provide empirical evidence to support it.

MoDELS · Learning · Performer · 語言模型化 · NLP ·

2023 年 10 月 13 日

Can language models learn analogical reasoning? Investigating training objectives and comparisons to human performance

Molly R. Petersen,Lonneke van der Plas

While analogies are a common way to evaluate word embeddings in NLP, it is also of interest to investigate whether or not analogical reasoning is a task in itself that can be learned. In this paper, we test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans than those in commonly used NLP benchmarks. Our experiments find that models are able to learn analogical reasoning, even with a small amount of data. We additionally compare our models to a dataset with a human baseline, and find that after training, models approach human performance.

語言模型化 · MoDELS · NLP · 命名實體識別 · 掩碼語言模型化 ·

2023 年 10 月 13 日

PuoBERTa: Training and evaluation of a curated language model for Setswana

Vukosi Marivate,Moseli Mots'Oehli,Valencia Wagner,Richard Lastrucci,Isheanesu Dzingirai

Natural language processing (NLP) has made significant progress for well-resourced languages such as English but lagged behind for low-resource languages like Setswana. This paper addresses this gap by presenting PuoBERTa, a customised masked language model trained specifically for Setswana. We cover how we collected, curated, and prepared diverse monolingual texts to generate a high-quality corpus for PuoBERTa's training. Building upon previous efforts in creating monolingual resources for Setswana, we evaluated PuoBERTa across several NLP tasks, including part-of-speech (POS) tagging, named entity recognition (NER), and news categorisation. Additionally, we introduced a new Setswana news categorisation dataset and provided the initial benchmarks using PuoBERTa. Our work demonstrates the efficacy of PuoBERTa in fostering NLP capabilities for understudied languages like Setswana and paves the way for future research directions.

Analysis · Machine Learning · 知識 (knowledge) · 縮放 · Learning ·

2023 年 10 月 13 日

Insightful analysis of historical sources at scales beyond human capabilities using unsupervised Machine Learning and XAI

Oliver Eberle,Jochen Büttner,Hassan El-Hajj,Grégoire Montavon,Klaus-Robert Müller,Matteo Valleriani

Historical materials are abundant. Yet, piecing together how human knowledge has evolved and spread both diachronically and synchronically remains a challenge that can so far only be very selectively addressed. The vast volume of materials precludes comprehensive studies, given the restricted number of human specialists. However, as large amounts of historical materials are now available in digital form there is a promising opportunity for AI-assisted historical analysis. In this work, we take a pivotal step towards analyzing vast historical corpora by employing innovative machine learning (ML) techniques, enabling in-depth historical insights on a grand scale. Our study centers on the evolution of knowledge within the `Sacrobosco Collection' -- a digitized collection of 359 early modern printed editions of textbooks on astronomy used at European universities between 1472 and 1650 -- roughly 76,000 pages, many of which contain astronomic, computational tables. An ML based analysis of these tables helps to unveil important facets of the spatio-temporal evolution of knowledge and innovation in the field of mathematical astronomy in the period, as taught at European universities.