苹果电影在线观看免费高清_国产亚洲一区二区三区在线_草草线在成年免费视频2_999精品无码A片在线1级_A级毛片免费真人_6080亚洲精品午夜福利_欧美日韩国产综合在线一区二区看

Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at //github.com/Lichang-Chen/InstructZero.

相關內容

黑盒

關注 1

在科學，計算和工程學中(zhong)(zhong)，黑(hei)(hei)盒(he)(he)是(shi)一(yi)種設備，系統或對(dui)象(xiang)，可以(yi)根據其輸入和輸出（或傳輸特(te)性）對(dui)其進行(xing)查(cha)看，而無需對(dui)其內部(bu)工作有(you)任何(he)了(le)(le)解。它的(de)(de)(de)實現是(shi)“不(bu)透明的(de)(de)(de)”（黑(hei)(hei)色）。幾乎任何(he)事物(wu)都可以(yi)被稱為黑(hei)(hei)盒(he)(he)：晶體管，引擎，算法(fa)(fa)，人(ren)腦，機構或政府。為了(le)(le)使(shi)用典(dian)型的(de)(de)(de)“黑(hei)(hei)匣子(zi)(zi)方法(fa)(fa)”來分析建模為開放系統的(de)(de)(de)事物(wu)，僅考慮刺激/響應的(de)(de)(de)行(xing)為，以(yi)推斷（未知(zhi)）盒(he)(he)子(zi)(zi)。該黑(hei)(hei)匣子(zi)(zi)系統的(de)(de)(de)通(tong)常(chang)表示形式是(shi)在該方框中(zhong)(zhong)居中(zhong)(zhong)的(de)(de)(de)數據流程圖(tu)。黑(hei)(hei)盒(he)(he)的(de)(de)(de)對(dui)立面是(shi)一(yi)個內部(bu)組(zu)件或邏輯可用于檢查(cha)的(de)(de)(de)系統，通(tong)常(chang)將其稱為白盒(he)(he)（有(you)時也稱為“透明盒(he)(he)”或“玻璃(li)盒(he)(he)”）。

語言模型化 · MoDELS · GPT-4 · GPT3.5 · INTERACT ·

2023 年 9 月 28 日

Neuro Symbolic Reasoning for Planning: Counterexample Guided Inductive Synthesis using Large Language Models and Satisfiability Solving

Sumit Kumar Jha,Susmit Jha,Patrick Lincoln,Nathaniel D. Bastian,Alvaro Velasquez,Rickard Ewetz,Sandeep Neema

from arxiv, 25 pages, 7 figures

Generative large language models (LLMs) with instruct training such as GPT-4 can follow human-provided instruction prompts and generate human-like responses to these prompts. Apart from natural language responses, they have also been found to be effective at generating formal artifacts such as code, plans, and logical specifications from natural language prompts. Despite their remarkably improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results despite their syntactic coherence - a phenomenon often referred to as hallucination. This limitation makes it difficult to use these models to synthesize formal artifacts that are used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plan, and other formal artifacts produced by LLMs can be catastrophic. We posit that we can use the satisfiability modulo theory (SMT) solvers as deductive reasoning engines to analyze the generated solutions from the LLMs, produce counterexamples when the solutions are incorrect, and provide that feedback to the LLMs exploiting the dialog capability of instruct-trained LLMs. This interaction between inductive LLMs and deductive SMT solvers can iteratively steer the LLM to generate the correct response. In our experiments, we use planning over the domain of blocks as our synthesis task for evaluating our approach. We use GPT-4, GPT3.5 Turbo, Davinci, Curie, Babbage, and Ada as the LLMs and Z3 as the SMT solver. Our method allows the user to communicate the planning problem in natural language; even the formulation of queries to SMT solvers is automatically generated from natural language. Thus, the proposed technique can enable non-expert users to describe their problems in natural language, and the combination of LLMs and SMT solvers can produce provably correct solutions.

語言模型化 · 圖 · 機器人 · 回合 · MoDELS ·

2023 年 9 月 27 日

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning

Krishan Rana,Jesse Haviland,Sourav Garg,Jad Abou-Chakra,Ian Reid,Niko Suenderhauf

from arxiv, Accepted for oral presentation at the Conference on Robot Learning (CoRL), 2023. Project page can be found here: //sayplan.github.io

Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a 'semantic search' for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an 'iterative replanning' pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors and 36 rooms with 140 assets and objects and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute. We provide real robot video demonstrations on our project page //sayplan.github.io.

MoDELS · 多峰值 · BASIC · 值域 · Microsoft Surface ·

2023 年 9 月 27 日

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

Yonatan Bitton,Hritik Bansal,Jack Hessel,Rulin Shao,Wanrong Zhu,Anas Awadalla,Josh Gardner,Rohan Taori,Ludwig Schimdt

We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for evaluation of instruction-following vision-language models for real-world use. Our starting point is curating 70 'instruction families' that we envision instruction tuned vision-language models should be able to address. Extending beyond evaluations like VQAv2 and COCO, tasks range from basic recognition to game playing and creative generation. Following curation, our dataset comprises 592 test queries, each with a human-authored instruction-conditioned caption. These descriptions surface instruction-specific factors, e.g., for an instruction asking about the accessibility of a storefront for wheelchair users, the instruction-conditioned caption describes ramps/potential obstacles. These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment. We quantify quality gaps between models and references using both human and automatic evaluations; e.g., the top-performing instruction-following model wins against the GPT-4 reference in just 27% of the comparison. VisIT-Bench is dynamic to participate, practitioners simply submit their model's response on the project website; Data, code and leaderboard is available at visit-bench.github.io.

Prompt · Learning · Extensibility · 重賦權 · MoDELS ·

2023 年 9 月 27 日

Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing

Kai Wang,Fei Yang,Shiqi Yang,Muhammad Atif Butt,Joost van de Weijer

from arxiv, Neurips 2023. The code page: //github.com/wangkai930418/DPL

Large-scale text-to-image generative models have been a ground-breaking development in generative AI, with diffusion models showing their astounding ability to synthesize convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are susceptible to unintended modifications of regions outside the targeted area, such as on the background or on distractor objects which have some semantic or visual relationship with the targeted object. According to our experimental findings, inaccurate cross-attention maps are at the root of this problem. Based on this observation, we propose Dynamic Prompt Learning (DPL) to force cross-attention maps to focus on correct noun words in the text prompt. By updating the dynamic tokens for nouns in the textual input with the proposed leakage repairment losses, we achieve fine-grained image editing over particular objects while preventing undesired changes to other image regions. Our method DPL, based on the publicly available Stable Diffusion, is extensively evaluated on a wide range of images, and consistently obtains superior results both quantitatively (CLIP score, Structure-Dist) and qualitatively (on user-evaluation). We show improved prompt editing results for Word-Swap, Prompt Refinement, and Attention Re-weighting, especially for complex multi-object scenes.

語言模型化 · MoDELS · Seven · Performer · 張成子空間 ·

2023 年 9 月 27 日

ChatCounselor: A Large Language Models for Mental Health Support

June M. Liu,Donghao Li,He Cao,Tianhe Ren,Zeyi Liao,Jiamin Wu

This paper presents ChatCounselor, a large language model (LLM) solution designed to provide mental health support. Unlike generic chatbots, ChatCounselor is distinguished by its foundation in real conversations between consulting clients and professional psychologists, enabling it to possess specialized knowledge and counseling skills in the field of psychology. The training dataset, Psych8k, was constructed from 260 in-depth interviews, each spanning an hour. To assess the quality of counseling responses, the counseling Bench was devised. Leveraging GPT-4 and meticulously crafted prompts based on seven metrics of psychological counseling assessment, the model underwent evaluation using a set of real-world counseling questions. Impressively, ChatCounselor surpasses existing open-source models in the counseling Bench and approaches the performance level of ChatGPT, showcasing the remarkable enhancement in model capability attained through high-quality domain-specific data.

MoDELS · Performer · 語言模型化 · 值域 · Performance ·

2023 年 9 月 27 日

CHORUS: Foundation Models for Unified Data Discovery and Exploration

Moe Kayali,Anton Lykov,Ilias Fountalis,Nikolaos Vasiloglou,Dan Olteanu,Dan Suciu

We apply foundation models to data discovery and exploration tasks. Foundation models are large language models (LLMs) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly applicable to the data discovery and data exploration domain. When carefully used, they have superior capability on three representative tasks: table-class detection, column-type annotation and join-column prediction. On all three tasks, we show that a foundation-model-based approach outperforms the task-specific models and so the state of the art. Further, our approach often surpasses human-expert task performance. We investigate the fundamental characteristics of this approach including generalizability to several foundation models, impact of non-determinism on the outputs and syntactic/semantic signals. All in all, this suggests a future direction in which disparate data management tasks can be unified under foundation models.

語言模型化 · 設計 · MoDELS · Performer · SimPLe ·

2023 年 9 月 26 日

RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model

Yao Lu,Shang Liu,Qijun Zhang,Zhiyao Xie

Inspired by the recent success of large language models (LLMs) like ChatGPT, researchers start to explore the adoption of LLMs for agile hardware design, such as generating design RTL based on natural-language instructions. However, in existing works, their target designs are all relatively simple and in a small scale, and proposed by the authors themselves, making a fair comparison among different LLM solutions challenging. In addition, many prior works only focus on the design correctness, without evaluating the design qualities of generated design RTL. In this work, we propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions. To systematically evaluate the auto-generated design RTL, we summarized three progressive goals, named syntax goal, functionality goal, and design quality goal. This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution. Furthermore, we propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning, which proves to significantly boost the performance of GPT-3.5 in our proposed benchmark.

MoDELS · Taxonomy · 語言模型化 · 可理解性 · Performance ·

2023 年 9 月 2 日

Explainability for Large Language Models: A Survey

Haiyan Zhao,Hanjie Chen,Fan Yang,Ninghao Liu,Huiqi Deng,Hengyi Cai,Shuaiqiang Wang,Dawei Yin,Mengnan Du

Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this paper, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations, and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional machine learning models.

語言模型化 · Performer · Agent · MoDELS · Learning ·

2023 年 5 月 19 日

Introspective Tips: Large Language Model for In-Context Decision Making

Liting Chen,Lu Wang,Hang Dong,Yali Du,Jie Yan,Fangkai Yang,Shuang Li,Pu Zhao,Si Qin,Saravan Rajmohan,Qingwei Lin,Dongmei Zhang

from arxiv, 22 pages, 4 figures

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.

Extensibility · GM · MoDELS · 類別 · 多代理人模型 ·

2021 年 2 月 9 日

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Lewis Hammond,James Fox,Tom Everitt,Alessandro Abate,Michael Wooldridge

from arxiv, Accepted to the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibrium refinements. We then prove several equivalence results between MAIDs and EFGs. Finally, we describe an open source implementation for reasoning about MAIDs and computing their equilibria.