亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='R9mjh'><strong id='pooGU'></strong><small id='xxjdf'></small><button id='vstju'></button><li id='If1Sj'><noscript id='SvfR0'><big id='AcV4N'></big><dt id='HQMdB'></dt></noscript></li></tr><ol id='slpGP'><option id='qPo4e'><table id='hkRle'><blockquote id='LiE0z'><tbody id='5iGGt'></tbody></blockquote></table></option></ol><u id='vMpAI'></u><kbd id='PPpRc'><kbd id='wZtc4'></kbd></kbd>

<code id='pCUsE'><strong id='EelmD'></strong></code>

<fieldset id='J7n5i'></fieldset>

<span id='SNgDz'></span>

<ins id='f7uk3'></ins>

<acronym id='BlENa'><em id='zj67x'></em><td id='BDXct'><div id='vKPzJ'></div></td></acronym><address id='2mfhl'><big id='cTjMq'><big id='3mcVm'></big><legend id='12a93'></legend></big></address>

<i id='9zVl4'><div id='l9wE8'><ins id='FO4xb'></ins></div></i>

<i id='IsSrH'></i>

·

大語言模型 · 語言模型化 · MoDELS · Prompt · 數據集 ·

2024 年 2 月 8 日

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

Xiaoxuan Wang,Ziniu Hu,Pan Lu,Yanqiao Zhu,Jieyu Zhang,Satyen Subramaniam,Arjun R. Loomba,Shichang Zhang,Yizhou Sun,Wei Wang

from arxiv, Results updated; multimodal dataset added

Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations. To systematically examine the reasoning capabilities required for solving complex scientific problems, we introduce an expansive benchmark suite SciBench for LLMs. SciBench contains a carefully curated dataset featuring a range of collegiate-level scientific problems from mathematics, chemistry, and physics domains. Based on the dataset, we conduct an in-depth benchmarking study of representative open-source and proprietary LLMs with various prompting strategies. The results reveal that the current LLMs fall short of delivering satisfactory performance, with the best overall score of merely 43.22%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms the others and some strategies that demonstrate improvements in certain problem-solving skills could result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.

相關內容

大語言模型

大(da)語(yu)言模型

大語(yu)言(yan)模(mo)型(xing)是(shi)基于海量文(wen)本(ben)(ben)數(shu)(shu)據(ju)訓練的(de)(de)(de)(de)(de)深(shen)度學習模(mo)型(xing)。它不(bu)僅(jin)能夠(gou)(gou)生(sheng)成自(zi)然語(yu)言(yan)文(wen)本(ben)(ben)，還能夠(gou)(gou)深(shen)入(ru)(ru)理解(jie)文(wen)本(ben)(ben)含(han)義(yi)，處(chu)理各種自(zi)然語(yu)言(yan)任務，如文(wen)本(ben)(ben)摘要(yao)、問答、翻譯等。2023年(nian)，大語(yu)言(yan)模(mo)型(xing)及(ji)其在(zai)(zai)(zai)人(ren)(ren)(ren)(ren)工智(zhi)能領(ling)域(yu)的(de)(de)(de)(de)(de)應(ying)用(yong)已成為全球(qiu)科(ke)技研究的(de)(de)(de)(de)(de)熱點(dian)，其在(zai)(zai)(zai)規(gui)模(mo)上(shang)的(de)(de)(de)(de)(de)增長尤為引人(ren)(ren)(ren)(ren)注目(mu)，參(can)數(shu)(shu)量已從最初的(de)(de)(de)(de)(de)十幾億躍升到如今的(de)(de)(de)(de)(de)一萬億。參(can)數(shu)(shu)量的(de)(de)(de)(de)(de)提升使得模(mo)型(xing)能夠(gou)(gou)更(geng)(geng)加(jia)精細地捕捉人(ren)(ren)(ren)(ren)類(lei)(lei)語(yu)言(yan)微妙之(zhi)處(chu)，更(geng)(geng)加(jia)深(shen)入(ru)(ru)地理解(jie)人(ren)(ren)(ren)(ren)類(lei)(lei)語(yu)言(yan)的(de)(de)(de)(de)(de)復(fu)雜性(xing)。在(zai)(zai)(zai)過去(qu)的(de)(de)(de)(de)(de)一年(nian)里，大語(yu)言(yan)模(mo)型(xing)在(zai)(zai)(zai)吸納新知(zhi)識、分解(jie)復(fu)雜任務以及(ji)圖文(wen)對齊等多方面都(dou)有顯著(zhu)提升。隨著(zhu)技術(shu)的(de)(de)(de)(de)(de)不(bu)斷(duan)成熟(shu)，它將不(bu)斷(duan)拓展其應(ying)用(yong)范圍，為人(ren)(ren)(ren)(ren)類(lei)(lei)提供更(geng)(geng)加(jia)智(zhi)能化和(he)個(ge)性(xing)化的(de)(de)(de)(de)(de)服務，進一步改(gai)善人(ren)(ren)(ren)(ren)們的(de)(de)(de)(de)(de)生(sheng)活和(he)生(sheng)產(chan)方式。

MoDELS · Performer · INTERACT · 跡 · 知識 (knowledge) ·

2024 年 3 月 22 日

KTbench: A Novel Data Leakage-Free Framework for Knowledge Tracing

Yahya Badran,Christine Preisach

from arxiv, preprint

Knowledge Tracing (KT) is concerned with predicting students' future performance on learning items in intelligent tutoring systems. Learning items are tagged with skill labels called knowledge concepts (KCs). Many KT models expand the sequence of item-student interactions into KC-student interactions by replacing learning items with their constituting KCs. This often results in a longer sequence length. This approach addresses the issue of sparse item-student interactions and minimises model parameters. However, two problems have been identified with such models. The first problem is the model's ability to learn correlations between KCs belonging to the same item, which can result in the leakage of ground truth labels and hinder performance. This problem can lead to a significant decrease in performance on datasets with a higher number of KCs per item. The second problem is that the available benchmark implementations ignore accounting for changes in sequence length when expanding KCs, leading to different models being tested with varying sequence lengths but still compared against the same benchmark. To address these problems, we introduce a general masking framework that mitigates the first problem and enhances the performance of such KT models while preserving the original model architecture without significant alterations. Additionally, we introduce KTbench, an open-source benchmark library designed to ensure the reproducibility of this work while mitigating the second problem.

變換 · Processing（編程語言） · Extensibility · Branch · 推斷 ·

2024 年 3 月 22 日

Allspark: Workload Orchestration for Visual Transformers on Processing In-Memory Systems

Mengke Ge,Junpeng Wang,Binhan Chen,Yingjian Zhong,Haitao Du,Song Chen,Yi Kang

from arxiv, The article is currently under review by IEEE Transactions on Computers, and has been submitted to HPCA'2024 and ISCA'2024

The advent of Transformers has revolutionized computer vision, offering a powerful alternative to convolutional neural networks (CNNs), especially with the local attention mechanism that excels at capturing local structures within the input and achieve state-of-the-art performance. Processing in-memory (PIM) architecture offers extensive parallelism, low data movement costs, and scalable memory bandwidth, making it a promising solution to accelerate Transformer with memory-intensive operations. However, the crucial challenge lies in efficiently deploying the entire model onto a resource-limited PIM system while parallelizing each transformer block with potentially many computational branches based on local attention mechanisms. We present Allspark, which focuses on workload orchestration for visual Transformers on PIM systems, aiming at minimizing inference latency. Firstly, to fully utilize the massive parallelism of PIM, Allspark empolys a finer-grained partitioning scheme for computational branches, and format a systematic layout and interleaved dataflow with maximized data locality and reduced data movement. Secondly, Allspark formulates the scheduling of the complete model on a resource-limited distributed PIM system as an integer linear programming (ILP) problem. Thirdly, as local-global data interactions exhibit complex yet regular dependencies, Allspark provides a greedy-based mapping method to allocate computational branches onto the PIM system and minimize NoC communication costs. Extensive experiments on 3D-stacked DRAM-based PIM systems show that Allspark brings 1.2x-24.0x inference speedup for various visual Transformers over baselines, and that Allspark-enriched PIM system yields average speedups of 2.3x and energy savings of 20x-55x over Nvidia V100 GPU.

學習器 · 可理解性 · Performer · 評論員 · ENJOY ·

2024 年 3 月 22 日

Learners Teaching Novices: An Uplifting Alternative Assessment

Ali Malik,Juliette Woodrow,Chris Piech

We propose and carry-out a novel method of formative assessment called Assessment via Teaching (AVT), in which learners demonstrate their understanding of CS1 topics by tutoring more novice students. AVT has powerful benefits over traditional forms of assessment: it is centered around service to others and is highly rewarding for the learners who teach. Moreover, teaching greatly improves the learners' own understanding of the material and has a huge positive impact on novices, who receive free 1:1 tutoring. Lastly, this form of assessment is naturally difficult to cheat -- a critical property for assessments in the era of large-language models. We use AVT in a randomised control trial with learners in a CS1 course at an R1 university. The learners provide tutoring sessions to more novice students taking a lagged online version of the same course. We show that learners who do an AVT session before the course exam performed 20 to 30 percentage points better than the class average on several questions. Moreover, compared to students who did a practice exam, the AVT learners enjoyed their experience more and were twice as likely to study for their teaching session. We believe AVT is a scalable and uplifting method for formative assessment that could one day replace traditional exams.

相關系數 · 大語言模型 · Prompt · 可辨認的 · 查準率/準確率 ·

2024 年 3 月 21 日

Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

Jiaxing Sun,Weiquan Huang,Jiang Wu,Chenya Gu,Wei Li,Songyang Zhang,Hang Yan,Conghui He

from arxiv, Equal contribution: Jiaxing Sun, Weiquan Huang, Jiang Wu; Corresponding author: Conghui He

We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense. We evaluated 7 English and 12 Chinese-oriented LLMs on CHARM, employing 5 representative prompt strategies for improving LLMs' reasoning ability, such as Chain-of-Thought. Our findings indicate that the LLM's language orientation and the task's domain influence the effectiveness of the prompt strategy, which enriches previous research findings. We built closely-interconnected reasoning and memorization tasks, and found that some LLMs struggle with memorizing Chinese commonsense, affecting their reasoning ability, while others show differences in reasoning despite similar memorization performance. We also evaluated the LLMs' memorization-independent reasoning abilities and analyzed the typical errors. Our study precisely identified the LLMs' strengths and weaknesses, providing the clear direction for optimization. It can also serve as a reference for studies in other fields. We will release CHARM at //github.com/opendatalab/CHARM .

MoDELS · 正則化項 · Synopsys · Prompt · Attention ·

2024 年 3 月 20 日

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Yumeng Li,William Beluch,Margret Keuper,Dan Zhang,Anna Khoreva

from arxiv, Project page: //yumengli007.github.io/VSTAR

Despite tremendous progress in the field of text-to-video (T2V) synthesis, open-sourced T2V diffusion models struggle to generate longer videos with dynamically varying and evolving content. They tend to synthesize quasi-static videos, ignoring the necessary visual change-over-time implied in the text prompt. At the same time, scaling these models to enable longer, more dynamic video synthesis often remains computationally intractable. To address this challenge, we introduce the concept of Generative Temporal Nursing (GTN), where we aim to alter the generative process on the fly during inference to improve control over the temporal dynamics and enable generation of longer videos. We propose a method for GTN, dubbed VSTAR, which consists of two key ingredients: 1) Video Synopsis Prompting (VSP) - automatic generation of a video synopsis based on the original single prompt leveraging LLMs, which gives accurate textual guidance to different visual states of longer videos, and 2) Temporal Attention Regularization (TAR) - a regularization technique to refine the temporal attention units of the pre-trained T2V diffusion models, which enables control over the video dynamics. We experimentally showcase the superiority of the proposed approach in generating longer, visually appealing videos over existing open-sourced T2V models. We additionally analyze the temporal attention maps realized with and without VSTAR, demonstrating the importance of applying our method to mitigate neglect of the desired visual change over time.

Integration · 多峰值 · contrastive · 損失 · MoDELS ·

2024 年 3 月 20 日

MEDBind: Unifying Language and Multimodal Medical Data Embeddings

Yuan Gao,Sangwook Kim,David E Austin,Chris McIntosh

Medical vision-language pretraining models (VLPM) have achieved remarkable progress in fusing chest X-rays (CXR) with clinical texts, introducing image-text data binding approaches that enable zero-shot learning and downstream clinical tasks. However, the current landscape lacks the holistic integration of additional medical modalities, such as electrocardiograms (ECG). We present MEDBind (Medical Electronic patient recorD), which learns joint embeddings across CXR, ECG, and medical text. Using text data as the central anchor, MEDBind features tri-modality binding, delivering competitive performance in top-K retrieval, zero-shot, and few-shot benchmarks against established VLPM, and the ability for CXR-to-ECG zero-shot classification and retrieval. This seamless integration is achieved through combination of contrastive loss on modality-text pairs with our proposed contrastive loss function, Edge-Modality Contrastive Loss, fostering a cohesive embedding space for CXR, ECG, and text. Finally, we demonstrate that MEDBind can improve downstream tasks by directly integrating CXR and ECG embeddings into a large-language model for multimodal prompt tuning.

圖 · 稀疏化 · 圖形處理器 · Neural Networks · Networking ·

2024 年 3 月 20 日

Unifews: Unified Entry-Wise Sparsification for Efficient Graph Neural Network

Ningyi Liao,Zihao Yu,Siqiang Luo

Graph Neural Networks (GNNs) have shown promising performance in various graph learning tasks, but at the cost of resource-intensive computations. The primary overhead of GNN update stems from graph propagation and weight transformation, both involving operations on graph-scale matrices. Previous studies attempt to reduce the computational budget by leveraging graph-level or network-level sparsification techniques, resulting in downsized graph or weights. In this work, we propose Unifews, which unifies the two operations in an entry-wise manner considering individual matrix elements, and conducts joint edge-weight sparsification to enhance learning efficiency. The entry-wise design of Unifews enables adaptive compression across GNN layers with progressively increased sparsity, and is applicable to a variety of architectural designs with on-the-fly operation simplification. Theoretically, we establish a novel framework to characterize sparsified GNN learning in view of a graph optimization process, and prove that Unifews effectively approximates the learning objective with bounded error and reduced computational load. We conduct extensive experiments to evaluate the performance of our method in diverse settings. Unifews is advantageous in jointly removing more than 90% of edges and weight entries with comparable or better accuracy than baseline models. The sparsification offers remarkable efficiency improvements including 10-20x matrix operation reduction and up to 100x acceleration in graph propagation time for the largest graph at the billion-edge scale.

可約的 · 模型評估 · 大語言模型 · MoDELS · 語言模型化 ·

2024 年 3 月 11 日

SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees

Saehan Jo,Immanuel Trummer

The advancement of Large Language Models (LLMs) has significantly boosted performance in natural language processing (NLP) tasks. However, the deployment of high-performance LLMs incurs substantial costs, primarily due to the increased number of parameters aimed at enhancing model performance. This has made the use of state-of-the-art LLMs more expensive for end-users. AI service providers, such as OpenAI and Anthropic, often offer multiple versions of LLMs with varying prices and performance. However, end-users still face challenges in choosing the appropriate LLM for their tasks that balance result quality with cost. We introduce SMART, Scaling Models Adaptively for Reduced Token Fees, a novel LLM framework designed to minimize the inference costs of NLP tasks while ensuring sufficient result quality. It enables users to specify an accuracy constraint in terms of the equivalence of outputs to those of the most powerful LLM. SMART then generates results that deviate from the outputs of this LLM only with a probability below a user-defined threshold. SMART employs a profiling phase that evaluates the performance of multiple LLMs to identify those that meet the user-defined accuracy level. SMART optimizes the tradeoff between profiling overheads and the anticipated cost savings resulting from profiling. Moreover, our approach significantly reduces inference costs by strategically leveraging a mix of LLMs. Our experiments on three real-world datasets show that, based on OpenAI models, SMART achieves significant cost savings, up to 25.6x in comparison to GPT-4.

Extensibility · 點云 · 隨機采樣 · 樣本 · state-of-the-art ·

2019 年 11 月 25 日

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

Qingyong Hu,Bo Yang,Linhai Xie,Stefano Rosa,Yulan Guo,Zhihua Wang,Niki Trigoni,Andrew Markham

from arxiv, Code and data are available at: //github.com/QingyongHu/RandLA-Net

We study the problem of efficient semantic segmentation for large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Extensive experiments show that our RandLA-Net can process 1 million points in a single pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net clearly surpasses state-of-the-art approaches for semantic segmentation on two large-scale benchmarks Semantic3D and SemanticKITTI.

三維重建 · 3D · Networks · Networking · Neural Networks ·

2018 年 12 月 10 日

Occupancy Networks: Learning 3D Reconstruction in Function Space

Lars Mescheder,Michael Oechsle,Michael Niemeyer,Sebastian Nowozin,Andreas Geiger

With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose occupancy networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

大語言模型

語言模型化

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tr id='8wa63'><strong id='8wa63'></strong><small id='8wa63'></small><button id='8wa63'></button><li id='8wa63'><noscript id='8wa63'><big id='8wa63'></big><dt id='8wa63'></dt></noscript></li></tr><ol id='8wa63'><option id='8wa63'><table id='8wa63'><blockquote id='8wa63'><tbody id='8wa63'></tbody></blockquote></table></option></ol><u id='8wa63'></u><kbd id='8wa63'><kbd id='8wa63'></kbd></kbd>

<code id='8wa63'><strong id='8wa63'></strong></code>

<fieldset id='8wa63'></fieldset>

<span id='8wa63'></span>

<ins id='8wa63'></ins>

<acronym id='8wa63'><em id='8wa63'></em><td id='8wa63'><div id='8wa63'></div></td></acronym><address id='8wa63'><big id='8wa63'><big id='8wa63'></big><legend id='8wa63'></legend></big></address>

<i id='8wa63'><div id='8wa63'><ins id='8wa63'></ins></div></i>

<i id='8wa63'></i>