五月丁香四月婷婷激情综合_亚洲AV永久无码一区二区不卡_又大又硬又粗又爽又舒服_欧美特黄A级高清免费大片A级_欧美在线一区二区三区观看_日本一区二区三区最新中文字幕_精品国产第一福利在线观看

Recent investigations show that large language models (LLMs), specifically GPT-4, not only have remarkable capabilities in common Natural Language Processing (NLP) tasks but also exhibit human-level performance on various professional and academic benchmarks. However, whether GPT-4 can be directly used in practical applications and replace traditional artificial intelligence (AI) tools in specialized domains requires further experimental validation. In this paper, we explore the potential of LLMs such as GPT-4 to outperform traditional AI tools in dementia diagnosis. Comprehensive comparisons between GPT-4 and traditional AI tools are conducted to examine their diagnostic accuracy in a clinical setting. Experimental results on two real clinical datasets show that, although LLMs like GPT-4 demonstrate potential for future advancements in dementia diagnosis, they currently do not surpass the performance of traditional AI tools. The interpretability and faithfulness of GPT-4 are also evaluated by comparison with real doctors. We discuss the limitations of GPT-4 in its current state and propose future research directions to enhance GPT-4 in dementia diagnosis.

相關內容

GPT-4

關注 29

北京(jing)時間2023年3月15日凌晨，ChatGPT開發商OpenAI 發布(bu)了(le)發布(bu)了(le)全新的多模(mo)態(tai)預訓練大模(mo)型(xing) GPT-4，可以(yi)更(geng)(geng)可靠、更(geng)(geng)具創造(zao)力、能(neng)(neng)處(chu)理(li)更(geng)(geng)細(xi)節(jie)(jie)的指令，根據圖片和(he)文字提(ti)示都能(neng)(neng)生成相應(ying)內容。具體來說(shuo)來說(shuo)，GPT-4 相比(bi)上(shang)一(yi)代的模(mo)型(xing)，實現(xian)了(le)飛躍式(shi)(shi)提(ti)升：支持圖像和(he)文本輸入(ru)，擁(yong)有強大的識圖能(neng)(neng)力；大幅提(ti)升了(le)文字輸入(ru)限制，在ChatGPT模(mo)式(shi)(shi)下，GPT-4可以(yi)處(chu)理(li)超過2.5萬字的文本，可以(yi)處(chu)理(li)一(yi)些更(geng)(geng)加細(xi)節(jie)(jie)的指令；回答準確性(xing)也得到了(le)顯著(zhu)提(ti)高。

Analysis · Performer · 估計/估計量 · 回合 · Networking ·

2023 年 7 月 24 日

Deployment of Image Analysis Algorithms under Prevalence Shifts

Patrick Godau,Piotr Kalinowski,Evangelia Christodoulou,Annika Reinke,Minu Tizabi,Luciana Ferrer,Paul J?ger,Lena Maier-Hein

Domain gaps are among the most relevant roadblocks in the clinical translation of machine learning (ML)-based solutions for medical image analysis. While current research focuses on new training paradigms and network architectures, little attention is given to the specific effect of prevalence shifts on an algorithm deployed in practice. Such discrepancies between class frequencies in the data used for a method's development/validation and that in its deployment environment(s) are of great importance, for example in the context of artificial intelligence (AI) democratization, as disease prevalences may vary widely across time and location. Our contribution is twofold. First, we empirically demonstrate the potentially severe consequences of missing prevalence handling by analyzing (i) the extent of miscalibration, (ii) the deviation of the decision threshold from the optimum, and (iii) the ability of validation metrics to reflect neural network performance on the deployment population as a function of the discrepancy between development and deployment prevalence. Second, we propose a workflow for prevalence-aware image classification that uses estimated deployment prevalences to adjust a trained classifier to a new environment, without requiring additional annotated deployment data. Comprehensive experiments based on a diverse set of 30 medical classification tasks showcase the benefit of the proposed workflow in generating better classifier decisions and more reliable performance estimates compared to current practice.

語言模型化 · MoDELS · Performer · 知識 (knowledge) · Next ·

2023 年 7 月 24 日

The Next Chapter: A Study of Large Language Models in Storytelling

Zhuohan Xie,Trevor Cohn,Jey Han Lau

from arxiv, Accepted to INLG2023

To enhance the quality of generated stories, recent story generation models have been investigating the utilization of higher-level attributes like plots or commonsense knowledge. The application of prompt-based learning with large language models (LLMs), exemplified by GPT-3, has exhibited remarkable performance in diverse natural language processing (NLP) tasks. This paper conducts a comprehensive investigation, utilizing both automatic and human evaluation, to compare the story generation capacity of LLMs with recent models across three datasets with variations in style, register, and length of stories. The results demonstrate that LLMs generate stories of significantly higher quality compared to other story generation models. Moreover, they exhibit a level of performance that competes with human authors, albeit with the preliminary observation that they tend to replicate real stories in situations involving world knowledge, resembling a form of plagiarism.

語言模型化 · MoDELS · 數據集 · binary · Performer ·

2023 年 7 月 22 日

The Imitation Game: Detecting Human and AI-Generated Texts in the Era of Large Language Models

Kadhim Hayawi,Sakib Shahriar,Sujith Samuel Mathew

The potential of artificial intelligence (AI)-based large language models (LLMs) holds considerable promise in revolutionizing education, research, and practice. However, distinguishing between human-written and AI-generated text has become a significant task. This paper presents a comparative study, introducing a novel dataset of human-written and LLM-generated texts in different genres: essays, stories, poetry, and Python code. We employ several machine learning models to classify the texts. Results demonstrate the efficacy of these models in discerning between human and AI-generated text, despite the dataset's limited sample size. However, the task becomes more challenging when classifying GPT-generated text, particularly in story writing. The results indicate that the models exhibit superior performance in binary classification tasks, such as distinguishing human-generated text from a specific LLM, compared to the more complex multiclass tasks that involve discerning among human-generated and multiple LLMs. Our findings provide insightful implications for AI text detection while our dataset paves the way for future research in this evolving area.

Learning · 黑盒 · 深度學習 · MoDELS · 樣例 ·

2023 年 7 月 21 日

Unveiling Vulnerabilities in Interpretable Deep Learning Systems with Query-Efficient Black-box Attacks

Eldor Abdukhamidov,Mohammed Abuhamad,Simon S. Woo,Eric Chan-Tin,Tamer Abuhmed

from arxiv, arXiv admin note: text overlap with arXiv:2307.06496

Deep learning has been rapidly employed in many applications revolutionizing many industries, but it is known to be vulnerable to adversarial attacks. Such attacks pose a serious threat to deep learning-based systems compromising their integrity, reliability, and trust. Interpretable Deep Learning Systems (IDLSes) are designed to make the system more transparent and explainable, but they are also shown to be susceptible to attacks. In this work, we propose a novel microbial genetic algorithm-based black-box attack against IDLSes that requires no prior knowledge of the target model and its interpretation model. The proposed attack is a query-efficient approach that combines transfer-based and score-based methods, making it a powerful tool to unveil IDLS vulnerabilities. Our experiments of the attack show high attack success rates using adversarial examples with attribution maps that are highly similar to those of benign samples which makes it difficult to detect even by human analysts. Our results highlight the need for improved IDLS security to ensure their practical reliability.

回合 · Better · 代碼 · 可約的 · Guidance ·

2023 年 7 月 20 日

Empirical Evaluation of a Live Environment for Extract Method Refactoring

Sara Fernandes,Ademar Aguiar,André Restivo

Complex software can be hard to read, adapt, and maintain. Refactoring it can create cleaner and self-explanatory code. Refactoring tools try to guide developers towards better code, with more quality. However, most of them take too long to provide feedback, support, and guidance on how developers should improve their software. To reduce this problem, we explored the concept of Live Refactoring, focusing on visually suggesting and applying refactorings, in real-time. With this in mind, we developed a Live Refactoring Environment that visually identifies, recommends, and applies Extract Method refactorings. To validate it, we conducted an empirical experiment. Early results showed that our approach improved several code quality metrics. Besides, we also concluded that our results were significantly different and better than the ones from refactoring the code manually without further help.

上溢 · Stack Overflow · 編譯器 · MoDELS · GPT-4 ·

2023 年 7 月 20 日

Addressing Compiler Errors: Stack Overflow or Large Language Models?

Patricia Widjojo,Christoph Treude

Compiler error messages serve as an initial resource for programmers dealing with compilation errors. However, previous studies indicate that they often lack sufficient targeted information to resolve code issues. Consequently, programmers typically rely on their own research to fix errors. Historically, Stack Overflow has been the primary resource for such information, but recent advances in large language models offer alternatives. This study systematically examines 100 compiler error messages from three sources to determine the most effective approach for programmers encountering compiler errors. Factors considered include Stack Overflow search methods and the impact of model version and prompt phrasing when using large language models. The results reveal that GPT-4 outperforms Stack Overflow in explaining compiler error messages, the effectiveness of adding code snippets to Stack Overflow searches depends on the search method, and results for Stack Overflow differ significantly between Google and StackExchange API searches. Furthermore, GPT-4 surpasses GPT-3.5, with "How to fix" prompts yielding superior outcomes to "What does this error mean" prompts. These results offer valuable guidance for programmers seeking assistance with compiler error messages, underscoring the transformative potential of advanced large language models like GPT-4 in debugging and opening new avenues of exploration for researchers in AI-assisted programming.

MoDELS · 層 · 可辨認的 · Performer · 數據集 ·

2023 年 7 月 20 日

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

Weidong Chen,Xiaofen Xing,Peihao Chen,Xiangmin Xu

from arxiv, 13 pages, 5 figures, 8 tables

This paper presents a paradigm that adapts general large-scale pretrained models (PTMs) to speech emotion recognition task. Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved. Additionally, employing PTMs in practical applications can be challenging due to their considerable size. Above limitations spawn another research direction, namely, optimizing large-scale PTMs for specific tasks to generate task-specific PTMs that are both compact and effective. In this paper, we focus on the speech emotion recognition task and propose an improved emotion-specific pretrained encoder called Vesper. Vesper is pretrained on a speech dataset based on WavLM and takes into account emotional characteristics. To enhance sensitivity to emotional information, Vesper employs an emotion-guided masking strategy to identify the regions that need masking. Subsequently, Vesper employs hierarchical and cross-layer self-supervision to improve its ability to capture acoustic and semantic representations, both of which are crucial for emotion recognition. Experimental results on the IEMOCAP, MELD, and CREMA-D datasets demonstrate that Vesper with 4 layers outperforms WavLM Base with 12 layers, and the performance of Vesper with 12 layers surpasses that of WavLM Large with 24 layers.

CASES · 可理解性 · MoDELS · 語言模型化 · Processing（編程語言） ·

2023 年 4 月 26 日

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Jingfeng Yang,Hongye Jin,Ruixiang Tang,Xiaotian Han,Qizhang Feng,Haoming Jiang,Bing Yin,Xia Hu

This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current GPT- and BERT-style LLMs. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, natural language generation tasks, emergent abilities, and considerations for specific tasks.We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, can be found at \url{//github.com/Mooler0410/LLMsPracticalGuide}.

Networking · MoDELS · 卷積神經網絡 · 變換 · DNN ·

2021 年 8 月 30 日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Yucheng Zhao,Guangting Wang,Chuanxin Tang,Chong Luo,Wenjun Zeng,Zheng-Jun Zha

Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision. Recently, Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and MLP-Mixer, started to lead new trends as they showed promising results in the ImageNet classification task. In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons. To ensure a fair comparison, we first develop a unified framework called SPACH which adopts separate modules for spatial and channel processing. Our experiments under the SPACH framework reveal that all structures can achieve competitive performance at a moderate scale. However, they demonstrate distinctive behaviors when the network size scales up. Based on our findings, we propose two hybrid models using convolution and Transformer modules. The resulting Hybrid-MS-S+ model achieves 83.9% top-1 accuracy with 63M parameters and 12.3G FLOPS. It is already on par with the SOTA models with sophisticated designs. The code and models will be made publicly available.

MoDELS · INFORMS · 分解的 · 推薦系統 · 剪枝 ·

2021 年 2 月 20 日

$FM^2$: Field-matrixed Factorization Machines for Recommender Systems

Yang Sun,Junwei Pan,Alex Zhang,Aaron Flores

from arxiv, In Proceedings of the Web Conference 2021 (WWW 2021), April 19-23, 2021, Ljubljana, Slovenia. 10 pages

Click-through rate (CTR) prediction plays a critical role in recommender systems and online advertising. The data used in these applications are multi-field categorical data, where each feature belongs to one field. Field information is proved to be important and there are several works considering fields in their models. In this paper, we proposed a novel approach to model the field information effectively and efficiently. The proposed approach is a direct improvement of FwFM, and is named as Field-matrixed Factorization Machines (FmFM, or $FM^2$). We also proposed a new explanation of FM and FwFM within the FmFM framework, and compared it with the FFM. Besides pruning the cross terms, our model supports field-specific variable dimensions of embedding vectors, which acts as soft pruning. We also proposed an efficient way to minimize the dimension while keeping the model performance. The FmFM model can also be optimized further by caching the intermediate vectors, and it only takes thousands of floating-point operations (FLOPs) to make a prediction. Our experiment results show that it can out-perform the FFM, which is more complex. The FmFM model's performance is also comparable to DNN models which require much more FLOPs in runtime.