亚洲色偷偷色噜噜狠狠99网VR_甜味弥漫一区二区在线观看_日本高清不卡码二区三区_国产日韩亚洲欧美_免费看AV毛片一区二区三区_一级毛片AAA无码_亚洲国产精品久久久无码一线

While Large Language Models (LLMs) dominate tasks like natural language processing and computer vision, harnessing their power for spatial-temporal forecasting remains challenging. The disparity between sequential text and complex spatial-temporal data hinders this application. To address this issue, this paper introduces STG-LLM, an innovative approach empowering LLMs for spatial-temporal forecasting. We tackle the data mismatch by proposing: 1) STG-Tokenizer: This spatial-temporal graph tokenizer transforms intricate graph data into concise tokens capturing both spatial and temporal relationships; 2) STG-Adapter: This minimalistic adapter, consisting of linear encoding and decoding layers, bridges the gap between tokenized data and LLM comprehension. By fine-tuning only a small set of parameters, it can effectively grasp the semantics of tokens generated by STG-Tokenizer, while preserving the original natural language understanding capabilities of LLMs. Extensive experiments on diverse spatial-temporal benchmark datasets show that STG-LLM successfully unlocks LLM potential for spatial-temporal forecasting. Remarkably, our approach achieves competitive performance on par with dedicated SOTA methods.

相關內容

大語(yu)言(yan)模型

關注 56

大(da)語(yu)(yu)(yu)(yu)言(yan)(yan)模(mo)型是基于(yu)海(hai)量(liang)(liang)文本(ben)數(shu)據(ju)訓練(lian)的(de)(de)深度學習模(mo)型。它(ta)不(bu)僅能(neng)夠生成自然(ran)(ran)語(yu)(yu)(yu)(yu)言(yan)(yan)文本(ben)，還能(neng)夠深入理解文本(ben)含(han)義，處理各(ge)種(zhong)自然(ran)(ran)語(yu)(yu)(yu)(yu)言(yan)(yan)任(ren)(ren)務，如(ru)文本(ben)摘要、問答、翻譯等。2023年，大(da)語(yu)(yu)(yu)(yu)言(yan)(yan)模(mo)型及其在(zai)人(ren)工智能(neng)領域的(de)(de)應用(yong)已成為全球(qiu)科技研究的(de)(de)熱點(dian)，其在(zai)規模(mo)上的(de)(de)增長尤(you)為引人(ren)注目，參數(shu)量(liang)(liang)已從最(zui)初的(de)(de)十幾億(yi)躍升(sheng)到如(ru)今的(de)(de)一萬億(yi)。參數(shu)量(liang)(liang)的(de)(de)提(ti)升(sheng)使得模(mo)型能(neng)夠更加(jia)精(jing)細地捕(bu)捉人(ren)類(lei)語(yu)(yu)(yu)(yu)言(yan)(yan)微妙之處，更加(jia)深入地理解人(ren)類(lei)語(yu)(yu)(yu)(yu)言(yan)(yan)的(de)(de)復雜性(xing)。在(zai)過去的(de)(de)一年里，大(da)語(yu)(yu)(yu)(yu)言(yan)(yan)模(mo)型在(zai)吸納新知(zhi)識、分(fen)解復雜任(ren)(ren)務以(yi)及圖(tu)文對齊等多方(fang)(fang)面都有顯著提(ti)升(sheng)。隨著技術的(de)(de)不(bu)斷成熟，它(ta)將不(bu)斷拓展其應用(yong)范圍，為人(ren)類(lei)提(ti)供更加(jia)智能(neng)化和個(ge)性(xing)化的(de)(de)服務，進一步改(gai)善人(ren)們的(de)(de)生活和生產方(fang)(fang)式。

Performer · RAVEN · 多樣性 · CLUES · 多跳 ·

2024 年 3 月 7 日

How Far Are We from Intelligent Visual Deductive Reasoning?

Yizhe Zhang,He Bai,Ruixiang Zhang,Jiatao Gu,Shuangfei Zhai,Josh Susskind,Navdeep Jaitly

from arxiv, ICLR 2024 AGI workshop

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks. We dig into vision-based deductive reasoning, a more sophisticated but less explored realm, and find previously unexposed blindspots in the current SOTA VLMs. Specifically, we leverage Raven's Progressive Matrices (RPMs), to assess VLMs' abilities to perform multi-hop relational and deductive reasoning relying solely on visual clues. We perform comprehensive evaluations of several popular VLMs employing standard strategies such as in-context learning, self-consistency, and Chain-of-thoughts (CoT) on three diverse datasets, including the Mensa IQ test, IntelligenceTest, and RAVEN. The results reveal that despite the impressive capabilities of LLMs in text-based reasoning, we are still far from achieving comparable proficiency in visual deductive reasoning. We found that certain standard strategies that are effective when applied to LLMs do not seamlessly translate to the challenges presented by visual reasoning tasks. Moreover, a detailed analysis reveals that VLMs struggle to solve these tasks mainly because they are unable to perceive and comprehend multiple, confounding abstract patterns in RPM examples.

大語言模型 · 可理解性 · 語言模型化 · MoDELS · entity ·

2024 年 3 月 7 日

Do Large Language Model Understand Multi-Intent Spoken Language ?

Shangjian Yin,Peijie Huang,Yuhong Xu,Haojing Huang,Jiatian Chen

This study marks a significant advancement by harnessing Large Language Models (LLMs) for multi-intent spoken language understanding (SLU), proposing a unique methodology that capitalizes on the generative power of LLMs within an SLU context. Our innovative technique reconfigures entity slots specifically for LLM application in multi-intent SLU environments and introduces the concept of Sub-Intent Instruction (SII), enhancing the dissection and interpretation of intricate, multi-intent communication within varied domains. The resultant datasets, dubbed LM-MixATIS and LM-MixSNIPS, are crafted from pre-existing benchmarks. Our research illustrates that LLMs can match and potentially excel beyond the capabilities of current state-of-the-art multi-intent SLU models. It further explores LLM efficacy across various intent configurations and dataset proportions. Moreover, we introduce two pioneering metrics, Entity Slot Accuracy (ESA) and Combined Semantic Accuracy (CSA), to provide an in-depth analysis of LLM proficiency in this complex field.

MoDELS · SLIM · 語言模型化 · 大語言模型 · 蒸餾 ·

2024 年 3 月 7 日

Can Small Language Models be Good Reasoners for Sequential Recommendation?

Yuling Wang,Changxin Tian,Binbin Hu,Yanhua Yu,Ziqi Liu,Zhiqiang Zhang,Jun Zhou,Liang Pang,Xiao Wang

Large language models (LLMs) open up new horizons for sequential recommendations, owing to their remarkable language comprehension and generation capabilities. However, there are still numerous challenges that should be addressed to successfully implement sequential recommendations empowered by LLMs. Firstly, user behavior patterns are often complex, and relying solely on one-step reasoning from LLMs may lead to incorrect or task-irrelevant responses. Secondly, the prohibitively resource requirements of LLM (e.g., ChatGPT-175B) are overwhelmingly high and impractical for real sequential recommender systems. In this paper, we propose a novel Step-by-step knowLedge dIstillation fraMework for recommendation (SLIM), paving a promising path for sequential recommenders to enjoy the exceptional reasoning capabilities of LLMs in a "slim" (i.e., resource-efficient) manner. We introduce CoT prompting based on user behavior sequences for the larger teacher model. The rationales generated by the teacher model are then utilized as labels to distill the downstream smaller student model (e.g., LLaMA2-7B). In this way, the student model acquires the step-by-step reasoning capabilities in recommendation tasks. We encode the generated rationales from the student model into a dense vector, which empowers recommendation in both ID-based and ID-agnostic scenarios. Extensive experiments demonstrate the effectiveness of SLIM over state-of-the-art baselines, and further analysis showcasing its ability to generate meaningful recommendation reasoning at affordable costs.

MoDELS · 語言模型化 · 大語言模型 · Performer · GPT-3.5 ·

2024 年 3 月 6 日

Can Large Language Models do Analytical Reasoning?

Yebowen Hu,Kaiqiang Song,Sangwoo Cho,Xiaoyang Wang,Hassan Foroosh,Dong Yu,Fei Liu

This paper explores the cutting-edge Large Language Model with analytical reasoning on sports. Our analytical reasoning embodies the tasks of letting large language models count how many points each team scores in a quarter in the NBA and NFL games. Our major discoveries are in two folds. Firstly, we find among all the models we employed, GPT-4 stands out in effectiveness, followed by Claude-2.1, with GPT-3.5, Gemini-Pro, and Llama-2-70b lagging behind. Specifically, we compare three different prompting techniques and a divide-and-conquer approach, we find that the latter was the most effective. Our divide-and-conquer approach breaks down play-by-play data into smaller, more manageable segments, solves each piece individually, and then aggregates them together. Besides the divide-and-conquer approach, we also explore the Chain of Thought (CoT) strategy, which markedly improves outcomes for certain models, notably GPT-4 and Claude-2.1, with their accuracy rates increasing significantly. However, the CoT strategy has negligible or even detrimental effects on the performance of other models like GPT-3.5 and Gemini-Pro. Secondly, to our surprise, we observe that most models, including GPT-4, struggle to accurately count the total scores for NBA quarters despite showing strong performance in counting NFL quarter scores. This leads us to further investigate the factors that impact the complexity of analytical reasoning tasks with extensive experiments, through which we conclude that task complexity depends on the length of context, the information density, and the presence of related information. Our research provides valuable insights into the complexity of analytical reasoning tasks and potential directions for developing future large language models.

Analysis · 語言模型化 · 代碼 · 大語言模型 · 可理解性 ·

2024 年 3 月 5 日

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

Chongzhou Fang,Ning Miao,Shaurya Srivastav,Jialin Liu,Ruoyu Zhang,Ruijie Fang, Asmita,Ryan Tsang,Najmeh Nazari,Han Wang,Houman Homayoun

from arxiv, Accepted by Usenix Security 2024

Large language models (LLMs) have demonstrated significant potential in the realm of natural language understanding and programming code processing tasks. Their capacity to comprehend and generate human-like code has spurred research into harnessing LLMs for code analysis purposes. However, the existing body of literature falls short in delivering a systematic evaluation and assessment of LLMs' effectiveness in code analysis, particularly in the context of obfuscated code. This paper seeks to bridge this gap by offering a comprehensive evaluation of LLMs' capabilities in performing code analysis tasks. Additionally, it presents real-world case studies that employ LLMs for code analysis. Our findings indicate that LLMs can indeed serve as valuable tools for automating code analysis, albeit with certain limitations. Through meticulous exploration, this research contributes to a deeper understanding of the potential and constraints associated with utilizing LLMs in code analysis, paving the way for enhanced applications in this critical domain.

MoDELS · 基準 · Performer · state-of-the-art · Integration ·

2024 年 3 月 5 日

HARGPT: Are LLMs Zero-Shot Human Activity Recognizers?

Sijie Ji,Xinzhe Zheng,Chenshu Wu

There is an ongoing debate regarding the potential of Large Language Models (LLMs) as foundational models seamlessly integrated with Cyber-Physical Systems (CPS) for interpreting the physical world. In this paper, we carry out a case study to answer the following question: Are LLMs capable of zero-shot human activity recognition (HAR). Our study, HARGPT, presents an affirmative answer by demonstrating that LLMs can comprehend raw IMU data and perform HAR tasks in a zero-shot manner, with only appropriate prompts. HARGPT inputs raw IMU data into LLMs and utilizes the role-play and think step-by-step strategies for prompting. We benchmark HARGPT on GPT4 using two public datasets of different inter-class similarities and compare various baselines both based on traditional machine learning and state-of-the-art deep classification models. Remarkably, LLMs successfully recognize human activities from raw IMU data and consistently outperform all the baselines on both datasets. Our findings indicate that by effective prompting, LLMs can interpret raw IMU data based on their knowledge base, possessing a promising potential to analyze raw sensor data of the physical world effectively.

可理解性 · Analysis · INTERACT · 代碼 · 統計量 ·

2024 年 3 月 4 日

How Do Analysts Understand and Verify AI-Assisted Data Analyses?

Ken Gu,Ruoxi Shang,Tim Althoff,Chenglong Wang,Steven M. Drucker

from arxiv, Accepted to CHI 2024

Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to incorrect conclusions. Therefore, validating AI assistance is crucial and challenging. Here, we explore how analysts understand and verify the correctness of AI-generated analyses. To observe analysts in diverse verification approaches, we develop a design probe equipped with natural language explanations, code, visualizations, and interactive data tables with common data operations. Through a qualitative user study (n=22) using this probe, we uncover common behaviors within verification workflows and how analysts' programming, analysis, and tool backgrounds reflect these behaviors. Additionally, we provide recommendations for analysts and highlight opportunities for designers to improve future AI-assistant experiences.

大語言模型 · MoDELS · 設計 · 情景 · 小樣本學習 ·

2024 年 3 月 4 日

Can LLMs Generate Architectural Design Decisions? -An Exploratory Empirical study

Rudra Dhar,Karthik Vaidhyanathan,Vasudeva Varma

from arxiv, This paper has been accepted to IEEE ICSA 2024 (Main Track - Research Track)

Architectural Knowledge Management (AKM) involves the organized handling of information related to architectural decisions and design within a project or organization. An essential artifact of AKM is the Architecture Decision Records (ADR), which documents key design decisions. ADRs are documents that capture decision context, decision made and various aspects related to a design decision, thereby promoting transparency, collaboration, and understanding. Despite their benefits, ADR adoption in software development has been slow due to challenges like time constraints and inconsistent uptake. Recent advancements in Large Language Models (LLMs) may help bridge this adoption gap by facilitating ADR generation. However, the effectiveness of LLM for ADR generation or understanding is something that has not been explored. To this end, in this work, we perform an exploratory study that aims to investigate the feasibility of using LLM for the generation of ADRs given the decision context. In our exploratory study, we utilize GPT and T5-based models with 0-shot, few-shot, and fine-tuning approaches to generate the Decision of an ADR given its Context. Our results indicate that in a 0-shot setting, state-of-the-art models such as GPT-4 generate relevant and accurate Design Decisions, although they fall short of human-level performance. Additionally, we observe that more cost-effective models like GPT-3.5 can achieve similar outcomes in a few-shot setting, and smaller models such as Flan-T5 can yield comparable results after fine-tuning. To conclude, this exploratory study suggests that LLM can generate Design Decisions, but further research is required to attain human-level generation and establish standardized widespread adoption.

真實值 · 可辨認的 · 數據集 · HTTPS · 計算學習理論 ·

2021 年 12 月 15 日

Do Feature Attribution Methods Correctly Attribute Features?

Yilun Zhou,Serena Booth,Marco Tulio Ribeiro,Julie Shah

from arxiv, AAAI 2022. Video summary at //www.youtube.com/watch?v=kAodFw6jvvo

Feature attribution methods are popular in interpretable machine learning. These methods compute the attribution of each input feature to represent its importance, but there is no consensus on the definition of "attribution", leading to many competing methods with little systematic evaluation, complicated in particular by the lack of ground truth attribution. To address this, we propose a dataset modification procedure to induce such ground truth. Using this procedure, we evaluate three common methods: saliency maps, rationales, and attentions. We identify several deficiencies and add new perspectives to the growing body of evidence questioning the correctness and reliability of these methods applied on datasets in the wild. We further discuss possible avenues for remedy and recommend new attribution methods to be tested against ground truth before deployment. The code is available at \url{//github.com/YilunZhou/feature-attribution-evaluation}.

全局極小值 · 優化器 · 極小值 · 非凸 · 近似 ·

2021 年 3 月 24 日

Why Do Local Methods Solve Nonconvex Problems?

Tengyu Ma

from arxiv, This is the Chapter 21 of the book "Beyond the Worst-Case Analysis of Algorithms"

Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.