国产欧美日韩综合在线_一级欧美一级日韩大片_欧美日韩一区在线_日韩成人手机在线_国产福利免费在线网址_天天爽天天干天天透天天要天天射_欧美日本国产丁香五月天

The rapid growth and popularity of large language model (LLM) app stores have created new opportunities and challenges for researchers, developers, users, and app store managers. As the LLM app ecosystem continues to evolve, it is crucial to understand the current landscape and identify potential areas for future research and development. This paper presents a forward-looking analysis of LLM app stores, focusing on key aspects such as data mining, security risk identification, development assistance, etc. By examining these aspects, we aim to provide a vision for future research directions and highlight the importance of collaboration among stakeholders to address the challenges and opportunities within the LLM app ecosystem. The insights and recommendations provided in this paper serve as a foundation for driving innovation, ensuring responsible development, and creating a thriving, user-centric LLM app landscape.

相關內容

大語言模型

關注 56

大(da)語(yu)(yu)(yu)言模(mo)(mo)型是基于海量(liang)文(wen)本(ben)數(shu)據訓(xun)練的(de)(de)(de)(de)深度學習(xi)模(mo)(mo)型。它(ta)不僅能夠生(sheng)成(cheng)自(zi)(zi)然語(yu)(yu)(yu)言文(wen)本(ben)，還能夠深入理解(jie)文(wen)本(ben)含義(yi)，處理各種自(zi)(zi)然語(yu)(yu)(yu)言任(ren)務(wu)，如(ru)文(wen)本(ben)摘要(yao)、問答、翻譯等(deng)。2023年，大(da)語(yu)(yu)(yu)言模(mo)(mo)型及其在(zai)(zai)人(ren)工智能領域的(de)(de)(de)(de)應(ying)用(yong)已(yi)(yi)成(cheng)為(wei)全球科技(ji)研(yan)究(jiu)的(de)(de)(de)(de)熱(re)點，其在(zai)(zai)規模(mo)(mo)上(shang)的(de)(de)(de)(de)增(zeng)長(chang)尤為(wei)引人(ren)注目(mu)，參數(shu)量(liang)已(yi)(yi)從(cong)最初(chu)的(de)(de)(de)(de)十幾億躍升(sheng)到(dao)如(ru)今的(de)(de)(de)(de)一萬億。參數(shu)量(liang)的(de)(de)(de)(de)提(ti)升(sheng)使得模(mo)(mo)型能夠更(geng)加精細地捕(bu)捉人(ren)類語(yu)(yu)(yu)言微妙之(zhi)處，更(geng)加深入地理解(jie)人(ren)類語(yu)(yu)(yu)言的(de)(de)(de)(de)復(fu)雜(za)性(xing)。在(zai)(zai)過去的(de)(de)(de)(de)一年里，大(da)語(yu)(yu)(yu)言模(mo)(mo)型在(zai)(zai)吸納新知(zhi)識(shi)、分解(jie)復(fu)雜(za)任(ren)務(wu)以(yi)及圖文(wen)對齊等(deng)多方面都(dou)有顯著提(ti)升(sheng)。隨著技(ji)術的(de)(de)(de)(de)不斷(duan)成(cheng)熟，它(ta)將不斷(duan)拓展其應(ying)用(yong)范(fan)圍，為(wei)人(ren)類提(ti)供(gong)更(geng)加智能化(hua)和(he)個性(xing)化(hua)的(de)(de)(de)(de)服務(wu)，進(jin)一步改善人(ren)們的(de)(de)(de)(de)生(sheng)活和(he)生(sheng)產方式。

大語言模型 · 語言模型化 · 輸出 · MoDELS · 訓練數據 ·

2024 年 5 月 31 日

Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models

Yihong Dong,Xue Jiang,Huanyu Liu,Zhi Jin,Bin Gu,Mengfei Yang,Ge Li

from arxiv, Accepted to ACL

Recent statements about the impressive capabilities of large language models (LLMs) are usually supported by evaluating on open-access benchmarks. Considering the vast size and wide-ranging sources of LLMs' training data, it could explicitly or implicitly include test data, leading to LLMs being more susceptible to data contamination. However, due to the opacity of training data, the black-box access of models, and the rapid growth of synthetic training data, detecting and mitigating data contamination for LLMs faces significant challenges. In this paper, we propose CDD, which stands for Contamination Detection via output Distribution for LLMs. CDD necessitates only the sampled texts to detect data contamination, by identifying the peakedness of LLM's output distribution. To mitigate the impact of data contamination in evaluation, we also present TED: Trustworthy Evaluation via output Distribution, based on the correction of LLM's output distribution. To facilitate this study, we introduce two benchmarks, i.e., DetCon and ComiEval, for data contamination detection and contamination mitigation evaluation tasks. Extensive experimental results show that CDD achieves the average relative improvements of 21.8\%-30.2\% over other contamination detection approaches in terms of Accuracy, F1 Score, and AUC metrics, and can effectively detect implicit contamination. TED substantially mitigates performance improvements up to 66.9\% attributed to data contamination across various contamination setups. In real-world applications, we reveal that ChatGPT exhibits a high potential to suffer from data contamination on HumanEval benchmark.

多峰值 · MoDELS · 知識 (knowledge) · Processing（編程語言） · Extensibility ·

2024 年 5 月 31 日

Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning

Cheng Tan,Jingxuan Wei,Linzhuang Sun,Zhangyang Gao,Siyuan Li,Bihui Yu,Ruifeng Guo,Stan Z. Li

from arxiv, Under review

Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases. Although the application of RAG with language-only models has been extensively explored, its adaptation into multimodal vision-language models remains nascent. Going beyond mere answer generation, the primary goal of multimodal RAG is to cultivate the models' ability to reason in response to relevant queries. To this end, we introduce a novel multimodal RAG framework named RMR (Retrieval Meets Reasoning). The RMR framework employs a bi-modal retrieval module to identify the most relevant question-answer pairs, which then serve as scaffolds for the multimodal reasoning process. This training-free approach not only encourages the model to engage deeply with the reasoning processes inherent in the retrieved content but also facilitates the generation of answers that are precise and richly interpretable. Surprisingly, utilizing solely the ScienceQA dataset, collected from elementary and high school science curricula, RMR significantly boosts the performance of various vision-language models across a spectrum of benchmark datasets, including A-OKVQA, MMBench, and SEED. These outcomes highlight the substantial potential of our multimodal retrieval and reasoning mechanism to improve the reasoning capabilities of vision-language models.

AI · MoDELS · Better · 樣例 · CASE ·

2024 年 5 月 31 日

There and Back Again: The AI Alignment Paradox

Robert West,Roland Aydin

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This perspective article draws attention to a fundamental challenge inherent in all AI alignment endeavors, which we term the "AI alignment paradox": The better we align AI models with our values, the easier we make it for adversaries to misalign the models. We illustrate the paradox by sketching three concrete example incarnations for the case of language models, each corresponding to a distinct way in which adversaries can exploit the paradox. With AI's increasing real-world impact, it is imperative that a broad community of researchers be aware of the AI alignment paradox and work to find ways to break out of it, in order to ensure the beneficial use of AI for the good of humanity.

Engineering · 優化器 · INFORMS · search engine · 黑盒 ·

2024 年 5 月 28 日

GEO: Generative Engine Optimization

Pranjal Aggarwal,Vishvak Murahari,Tanmay Rajpurohit,Ashwin Kalyan,Karthik R Narasimhan,Ameet Deshpande

The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unified framework of generative engines (GEs), can generate accurate and personalized responses, rapidly replacing traditional search engines like Google and Bing. Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them using LLMs. While this shift significantly improves \textit{user} utility and \textit{generative search engine} traffic, it poses a huge challenge for the third stakeholder - website and content creators. Given the black-box and fast-moving nature of generative engines, content creators have little to no control over \textit{when} and \textit{how} their content is displayed. With generative engines here to stay, we must ensure the creator economy is not disadvantaged. To address this, we introduce Generative Engine Optimization (GEO), the first novel paradigm to aid content creators in improving their content visibility in GE responses through a flexible black-box optimization framework for optimizing and defining visibility metrics. We facilitate systematic evaluation by introducing GEO-bench, a large-scale benchmark of diverse user queries across multiple domains, along with relevant web sources to answer these queries. Through rigorous evaluation, we demonstrate that GEO can boost visibility by up to 40\% in GE responses. Moreover, we show the efficacy of these strategies varies across domains, underscoring the need for domain-specific optimization methods. Our work opens a new frontier in information discovery systems, with profound implications for both developers of GEs and content creators.

多峰值 · Extensibility · Agent · AI Agent · 多樣性 ·

2024 年 2 月 23 日

Large Multimodal Agents: A Survey

Junlin Xie,Zhihong Chen,Ruifei Zhang,Xiang Wan,Guanbin Li

from arxiv, 15 pages, 4 figures

Large language models (LLMs) have achieved superior performance in powering text-based AI agents, endowing them with decision-making and reasoning abilities akin to humans. Concurrently, there is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This extension enables AI agents to interpret and respond to diverse multimodal user queries, thereby handling more intricate and nuanced tasks. In this paper, we conduct a systematic review of LLM-driven multimodal agents, which we refer to as large multimodal agents ( LMAs for short). First, we introduce the essential components involved in developing LMAs and categorize the current body of research into four distinct types. Subsequently, we review the collaborative frameworks integrating multiple LMAs , enhancing collective efficacy. One of the critical challenges in this field is the diverse evaluation methods used across existing studies, hindering effective comparison among different LMAs . Therefore, we compile these evaluation methodologies and establish a comprehensive framework to bridge the gaps. This framework aims to standardize evaluations, facilitating more meaningful comparisons. Concluding our review, we highlight the extensive applications of LMAs and propose possible future research directions. Our discussion aims to provide valuable insights and guidelines for future research in this rapidly evolving field. An up-to-date resource list is available at //github.com/jun0wanan/awesome-large-multimodal-agents.

Prompt · MoDELS · TOOLS · Continuity · INTERACT ·

2023 年 11 月 21 日

Prompting Frameworks for Large Language Models: A Survey

Xiaoxia Liu,Jingyi Wang,Jun Sun,Xiaohan Yuan,Guoliang Dong,Peng Di,Wenhai Wang,Dongxia Wang

Since the launch of ChatGPT, a powerful AI Chatbot developed by OpenAI, large language models (LLMs) have made significant advancements in both academia and industry, bringing about a fundamental engineering paradigm shift in many areas. While LLMs are powerful, it is also crucial to best use their power where "prompt'' plays a core role. However, the booming LLMs themselves, including excellent APIs like ChatGPT, have several inherent limitations: 1) temporal lag of training data, and 2) the lack of physical capabilities to perform external actions. Recently, we have observed the trend of utilizing prompt-based tools to better utilize the power of LLMs for downstream tasks, but a lack of systematic literature and standardized terminology, partly due to the rapid evolution of this field. Therefore, in this work, we survey related prompting tools and promote the concept of the "Prompting Framework" (PF), i.e. the framework for managing, simplifying, and facilitating interaction with large language models. We define the lifecycle of the PF as a hierarchical structure, from bottom to top, namely: Data Level, Base Level, Execute Level, and Service Level. We also systematically depict the overall landscape of the emerging PF field and discuss potential future research and challenges. To continuously track the developments in this area, we maintain a repository at //github.com/lxx0628/Prompting-Framework-Survey, which can be a useful resource sharing platform for both academic and industry in this field.

語言模型化 · Performer · Agent · MoDELS · Learning ·

2023 年 5 月 19 日

Introspective Tips: Large Language Model for In-Context Decision Making

Liting Chen,Lu Wang,Hang Dong,Yali Du,Jie Yan,Fangkai Yang,Shuang Li,Pu Zhao,Si Qin,Saravan Rajmohan,Qingwei Lin,Dongmei Zhang

from arxiv, 22 pages, 4 figures

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.

變換 · Vision · 可辨認的 · Taxonomy · Prompt ·

2022 年 1 月 24 日

Transformers in Medical Imaging: A Survey

Fahad Shamshad,Salman Khan,Syed Waqas Zamir,Muhammad Haris Khan,Munawar Hayat,Fahad Shahbaz Khan,Huazhu Fu

from arxiv, 41 pages, \url{//github.com/fahadshamshad/awesome-transformers-in-medical-imaging}

Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as {de facto} operators. Capitalizing on these advances in computer vision, the medical imaging field has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. Inspired from this transition, in this survey, we attempt to provide a comprehensive review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues. Specifically, we survey the use of Transformers in medical image segmentation, detection, classification, reconstruction, synthesis, registration, clinical report generation, and other tasks. In particular, for each of these applications, we develop taxonomy, identify application-specific challenges as well as provide insights to solve them, and highlight recent trends. Further, we provide a critical discussion of the field's current state as a whole, including the identification of key challenges, open problems, and outlining promising future directions. We hope this survey will ignite further interest in the community and provide researchers with an up-to-date reference regarding applications of Transformer models in medical imaging. Finally, to cope with the rapid development in this field, we intend to regularly update the relevant latest papers and their open-source implementations at \url{//github.com/fahadshamshad/awesome-transformers-in-medical-imaging}.

編譯器 · 優化器 · 學成 · Performer · TVM ·

2020 年 2 月 6 日

The Deep Learning Compiler: A Comprehensive Survey

Mingzhen Li,Yi Liu,Xiaoyan Liu,Qingxiao Sun,Xin You,Hailong Yang,Zhongzhi Luan,Depei Qian

The difficulty of deploying various deep learning (DL) models on diverse DL hardwares has boosted the research and development of DL compilers in the community. Several DL compilers have been proposed from both industry and academia such as Tensorflow XLA and TVM. Similarly, the DL compilers take the DL models described in different DL frameworks as input, and then generate optimized codes for diverse DL hardwares as output. However, none of the existing survey has analyzed the unique design of the DL compilers comprehensively. In this paper, we perform a comprehensive survey of existing DL compilers by dissecting the commonly adopted design in details, with emphasis on the DL oriented multi-level IRs, and frontend/backend optimizations. Specifically, we provide a comprehensive comparison among existing DL compilers from various aspects. In addition, we present detailed analysis of the multi-level IR design and compiler optimization techniques. Finally, several insights are highlighted as the potential research directions of DL compiler. This is the first survey paper focusing on the unique design of DL compiler, which we hope can pave the road for future research towards the DL compiler.

基于上下文的表示 · 模型評估 · 學成 · 詞向量表示 · 層 ·

2018 年 8 月 27 日

Dissecting Contextual Word Embeddings: Architecture and Representation

Matthew E. Peters,Mark Neumann,Luke Zettlemoyer,Wen-tau Yih

from arxiv, EMNLP 2018

Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.