精品亚洲中文一区二区三区,久草精品视频在线观看,欧美成人一区二区在线电影

Thanks to recent advances in generative AI, we are able to prompt large language models (LLMs) to produce texts which are fluent and grammatical. In addition, it has been shown that we can elicit attempts at grammatical error correction (GEC) from LLMs when prompted with ungrammatical input sentences. We evaluate how well LLMs can perform at GEC by measuring their performance on established benchmark datasets. We go beyond previous studies, which only examined GPT* models on a selection of English GEC datasets, by evaluating seven open-source and three commercial LLMs on four established GEC benchmarks. We investigate model performance and report results against individual error types. Our results indicate that LLMs do not always outperform supervised English GEC models except in specific contexts -- namely commercial LLMs on benchmarks annotated with fluency corrections as opposed to minimal edits. We find that several open-source models outperform commercial ones on minimal edit benchmarks, and that in some settings zero-shot prompting is just as competitive as few-shot prompting.

相關內容

Performer

關注 10

語言模型化 · MoDELS · 統計量 · Next · Automator ·

2024 年 2 月 27 日

Predict the Next Word:

Evgenia Ilia,Wilker Aziz

from arxiv, 22 pages, EACL 2024

Language models (LMs) are statistical models trained to assign probability to human-generated text. As such, it is reasonable to question whether they approximate linguistic variability exhibited by humans well. This form of statistical assessment is difficult to perform at the passage level, for it requires acceptability judgements (i.e., human evaluation) or a robust automated proxy (which is non-trivial). At the word level, however, given some context, samples from an LM can be assessed via exact matching against a prerecorded dataset of alternative single-word continuations of the available context. We exploit this fact and evaluate the LM's ability to reproduce variability that humans (in particular, a population of English speakers) exhibit in the 'next word prediction' task. This can be seen as assessing a form of calibration, which, in the context of text classification, Baan et al. (2022) termed calibration to human uncertainty. We assess GPT2, BLOOM and ChatGPT and find that they exhibit fairly low calibration to human uncertainty. We also verify the failure of expected calibration error (ECE) to reflect this, and as such, advise the community against relying on it in this setting.

近似 · Analysis · 正則化項 · 平滑 · 泛函 ·

2024 年 2 月 26 日

Isogeometric analysis of the Laplace eigenvalue problem on circular sectors: Regularity properties, graded meshes & variational crimes

Thomas Apel,Philipp Zilk

from arxiv, 35 pages, 19 figures

The Laplace eigenvalue problem on circular sectors has eigenfunctions with corner singularities. Standard methods may produce suboptimal approximation results. To address this issue, a novel numerical algorithm that enhances standard isogeometric analysis is proposed in this paper by using a single-patch graded mesh refinement scheme. Numerical tests demonstrate optimal convergence rates for both the eigenvalues and eigenfunctions. Furthermore, the results show that smooth splines possess a superior approximation constant compared to their $C^0$-continuous counterparts for the lower part of the Laplace spectrum. This is an extension of previous findings about excellent spectral approximation properties of smooth splines on rectangular domains to circular sectors. In addition, graded meshes prove to be particularly advantageous for an accurate approximation of a limited number of eigenvalues. The novel algorithm applied here has a drawback in the singularity of the isogeometric parameterization. It results in some basis functions not belonging to the solution space of the corresponding weak problem, which is considered a variational crime. However, the approach proves to be robust. Finally, a hierarchical mesh structure is presented to avoid anisotropic elements, omit redundant degrees of freedom and keep the number of basis functions contributing to the variational crime constant, independent of the mesh size. Numerical results validate the effectiveness of hierarchical mesh grading for the simulation of eigenfunctions with and without corner singularities.

Performer · Analysis · MoDELS · 可理解性 · 中央處理器 (CPU) ·

2024 年 2 月 24 日

Performance bottlenecks detection through microarchitectural sensitivity

Hugo Pompougnac,Alban Dutilleul,Christophe Guillon,Nicolas Derumigny,Fabrice Rastello

Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to make the most of hardware resources. We provide an in-depth overview of performance bottlenecks in recent OoO microarchitectures and describe the difficulties of detecting them. Techniques that measure resources utilization can offer a good understanding of a program's execution, but, due to the constraints inherent to Performance Monitoring Units (PMU) of CPUs, do not provide the relevant metrics for each use case. Another approach is to rely on a performance model to simulate the CPU behavior. Such a model makes it possible to implement any new microarchitecture-related metric. Within this framework, we advocate for implementing modeled resources as parameters that can be varied at will to reveal performance bottlenecks. This allows a generalization of bottleneck analysis that we call sensitivity analysis. We present Gus, a novel performance analysis tool that combines the advantages of sensitivity analysis and dynamic binary instrumentation within a resource-centric CPU model. We evaluate the impact of sensitivity on bottleneck analysis over a set of high-performance computing kernels.

估計/估計量 · 近似 · 方差 · 輸出 · 留一法 ·

2024 年 2 月 23 日

Efficient error and variance estimation for randomized matrix computations

Ethan N. Epperly,Joel A. Tropp

from arxiv, 22 pages, 10 figures, 13 pages of supplementary material. v4: added additional supplementary material

Randomized matrix algorithms have become workhorse tools in scientific computing and machine learning. To use these algorithms safely in applications, they should be coupled with posterior error estimates to assess the quality of the output. To meet this need, this paper proposes two diagnostics: a leave-one-out error estimator for randomized low-rank approximations and a jackknife resampling method to estimate the variance of the output of a randomized matrix computation. Both of these diagnostics are rapid to compute for randomized low-rank approximation algorithms such as the randomized SVD and randomized Nystr\"om approximation, and they provide useful information that can be used to assess the quality of the computed output and guide algorithmic parameter choices.

Analysis · INFORMS · 可理解性 · 知識 (knowledge) · Extensibility ·

2024 年 2 月 23 日

Understanding metric-related pitfalls in image analysis validation

Annika Reinke,Minu D. Tizabi,Michael Baumgartner,Matthias Eisenmann,Doreen Heckmann-N?tzel,A. Emre Kavur,Tim R?dsch,Carole H. Sudre,Laura Acion,Michela Antonelli,Tal Arbel,Spyridon Bakas,Arriel Benis,Matthew Blaschko,Florian Buettner,M. Jorge Cardoso,Veronika Cheplygina,Jianxu Chen,Evangelia Christodoulou,Beth A. Cimini,Gary S. Collins,Keyvan Farahani,Luciana Ferrer,Adrian Galdran,Bram van Ginneken,Ben Glocker,Patrick Godau,Robert Haase,Daniel A. Hashimoto,Michael M. Hoffman,Merel Huisman,Fabian Isensee,Pierre Jannin,Charles E. Kahn,Dagmar Kainmueller,Bernhard Kainz,Alexandros Karargyris,Alan Karthikesalingam,Hannes Kenngott,Jens Kleesiek,Florian Kofler,Thijs Kooi,Annette Kopp-Schneider,Michal Kozubek,Anna Kreshuk,Tahsin Kurc,Bennett A. Landman,Geert Litjens,Amin Madani,Klaus Maier-Hein,Anne L. Martel,Peter Mattson,Erik Meijering,Bjoern Menze,Karel G. M. Moons,Henning Müller,Brennan Nichyporuk,Felix Nickel,Jens Petersen,Susanne M. Rafelski,Nasir Rajpoot,Mauricio Reyes,Michael A. Riegler,Nicola Rieke,Julio Saez-Rodriguez,Clara I. Sánchez,Shravya Shetty,Maarten van Smeden,Ronald M. Summers,Abdel A. Taha,Aleksei Tiulpin,Sotirios A. Tsaftaris,Ben Van Calster,Ga?l Varoquaux,Manuel Wiesenfarth,Ziv R. Yaniv,Paul F. J?ger,Lena Maier-Hein

from arxiv, Shared first authors: Annika Reinke and Minu D. Tizabi; shared senior authors: Lena Maier-Hein and Paul F. J\"ager. Published in Nature Methods. arXiv admin note: text overlap with arXiv:2206.01653

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.

知識 (knowledge) · 論文 · Notability · Less · 秩 ·

2024 年 2 月 23 日

Countries pushing the boundaries of knowledge: the US dominance, China rise, and the EU stagnation

Alonso Rodriguez-Navarro

from arxiv, 18 pages, 1 figure, 6 tables

Knowing which countries contribute the most to pushing the boundaries of knowledge in science and technology has social and political importance. However, common citation metrics do not adequately measure this contribution. This measure requires more stringent metrics appropriate for the highly influential breakthrough papers that push the boundaries of knowledge, which are very highly cited but very rare. Here I used the recently described Rk index, specifically designed to address this issue. I applied this index to 25 countries and the EU across 10 key research topics, five technological and five biomedical, studying domestic and international collaborative papers independently. In technological topics, the Rk indices of domestic papers show that overall, the USA, China, and the EU are leaders; other countries are clearly behind. The USA is notably ahead of China, and the EU is far behind China. The same approach to biomedical topics shows an overwhelming dominance of the USA and that the EU is ahead of China. The analysis of internationally collaborative papers further demonstrates the US dominance. These results conflict with current country rankings based on less stringent indicators.

離散化 · Networking · 相同 · 操作 · Integration ·

2024 年 2 月 23 日

A note on the adjoint method for neural ordinary differential equation network

Pipi Hu

Perturbation and operator adjoint method are used to give the right adjoint form rigourously. From the derivation, we can have following results: 1) The loss gradient is not an ODE, it is an integral and we shows the reason; 2) The traditional adjoint form is not equivalent with the back propagation results. 3) The adjoint operator analysis shows that if and only if the discrete adjoint has the same scheme with the discrete neural ODE, the adjoint form would give the same results as BP does.

估計/估計量 · 圖 · GM · MoDELS · 得分 ·

2024 年 2 月 23 日

Estimation of partially known Gaussian graphical models with score-based structural priors

Martín Sevilla,Antonio García Marques,Santiago Segarra

from arxiv, 17 pages, 7 figures, AISTATS 2024

We propose a novel algorithm for the support estimation of partially known Gaussian graphical models that incorporates prior information about the underlying graph. In contrast to classical approaches that provide a point estimate based on a maximum likelihood or a maximum a posteriori criterion using (simple) priors on the precision matrix, we consider a prior on the graph and rely on annealed Langevin diffusion to generate samples from the posterior distribution. Since the Langevin sampler requires access to the score function of the underlying graph prior, we use graph neural networks to effectively estimate the score from a graph dataset (either available beforehand or generated from a known distribution). Numerical experiments demonstrate the benefits of our approach.

估計/估計量 · 分解的 · Weight · 有偏 · Extensibility ·

2024 年 2 月 23 日

A modified debiased inverse-variance weighted estimator in two-sample summary-data Mendelian randomization

Youpeng Su,Siqi Xu,Yilei Ma,Ping Yin,Wing Kam Fung,Peng Wang

from arxiv, 33 pages, 6 figures

Mendelian randomization uses genetic variants as instrumental variables to make causal inferences about the effects of modifiable risk factors on diseases from observational data. One of the major challenges in Mendelian randomization is that many genetic variants are only modestly or even weakly associated with the risk factor of interest, a setting known as many weak instruments. Many existing methods, such as the popular inverse-variance weighted (IVW) method, could be biased when the instrument strength is weak. To address this issue, the debiased IVW (dIVW) estimator, which is shown to be robust to many weak instruments, was recently proposed. However, this estimator still has non-ignorable bias when the effective sample size is small. In this paper, we propose a modified debiased IVW (mdIVW) estimator by multiplying a modification factor to the original dIVW estimator. After this simple correction, we show that the bias of the mdIVW estimator converges to zero at a faster rate than that of the dIVW estimator under some regularity conditions. Moreover, the mdIVW estimator has smaller variance than the dIVW estimator.We further extend the proposed method to account for the presence of instrumental variable selection and balanced horizontal pleiotropy. We demonstrate the improvement of the mdIVW estimator over the dIVW estimator through extensive simulation studies and real data analysis.

XAI · 查準率/準確率 · 相似度 · 顯著圖 · 泛化理論 ·

2022 年 5 月 17 日

A psychological theory of explainability

Scott Cheng-Hsin Yang,Tomas Folke,Patrick Shafto

from arxiv, 14 pages, 2 figures, ICML (accepted, pre camera-ready version)

The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.