亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='bHuxl'><strong id='gKb2R'></strong><small id='98FfL'></small><button id='Jw2a3'></button><li id='5uMjw'><noscript id='gutII'><big id='Hujf3'></big><dt id='UYQcl'></dt></noscript></li></tr><ol id='zW3Y9'><option id='0LqBG'><table id='ZlCUF'><blockquote id='Q6laf'><tbody id='VqDsk'></tbody></blockquote></table></option></ol><u id='NXAn2'></u><kbd id='VPZZD'><kbd id='hBYlx'></kbd></kbd>

<code id='p88eK'><strong id='oPfyL'></strong></code>

<fieldset id='YnBm1'></fieldset>

<span id='osq3y'></span>

<ins id='JWTO6'></ins>

<acronym id='BVf9u'><em id='b0NVC'></em><td id='DrLp9'><div id='PHFbH'></div></td></acronym><address id='xtdNo'><big id='pkRGF'><big id='NtHuw'></big><legend id='FscAv'></legend></big></address>

<i id='yzzlf'><div id='nysNp'><ins id='LlwN9'></ins></div></i>

<i id='r9HHo'></i>

·

數據集 · MoDELS · 有偏 · Vision · Learning ·

2024 年 5 月 21 日

Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?

Ziqin Lin,Heng Li,Zinan Li,Huazhu Fu,Jiang Liu

from arxiv, 10 pages, 6 figures

Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supervised learning framework. This LFM has shown promising performance in fundus disease diagnosis across multiple datasets. On the other hand, deep learning models have long been challenged by dataset quality issues, such as image quality and dataset bias. To investigate the influence of data quality on LFM, we conducted explorations in two fundus diagnosis tasks using datasets of varying quality. Specifically, we explored the following questions: Is LFM more robust to image quality? Is LFM affected by dataset bias? Can fine-tuning techniques alleviate these effects? Our investigation found that LFM exhibits greater resilience to dataset quality issues, including image quality and dataset bias, compared to typical convolutional networks. Furthermore, we discovered that overall fine-tuning is an effective adapter for LFM to mitigate the impact of dataset quality issues.

相關內容

數據集

數據集，又稱為資料集、數據集合或資料集合，是一種由數據所組成的集合。
Data set（或dataset）是一個數據的集合，通常以表格形式出現。每一列代表一個特定變量。每一行都對應于某一成員的數據集的問題。它列出的價值觀為每一個變量，如身高和體重的一個物體或價值的隨機數。每個數值被稱為數據資料。對應于行數，該數據集的數據可能包括一個或多個成員。

大語言模型 · Performer · 相關系數 · 可理解性 · 值域 ·

2024 年 7 月 2 日

How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs?

Ehsan Doostmohammadi,Oskar Holmstr?m,Marco Kuhlmann

Work on instruction-tuned Large Language Models (LLMs) has used automatic methods based on text overlap and LLM judgments as cost-effective alternatives to human evaluation. In this paper, we perform a meta-evaluation of such methods and assess their reliability across a broad range of tasks. We observe that while automatic evaluation methods can approximate human ratings under specific conditions, their validity is highly context-dependent. Specifically, the simple ROUGE-L metric correlates well with human ratings for short-answer English tasks but is unreliable in free-form generation tasks and cross-lingual transfer. The effectiveness of the more advanced method of using GPT-4 as a judge diminishes significantly if reference answers are not included in the prompt, which is the scenario where this method has the potential to provide the most value compared to other metrics. Our findings enhance the understanding of how automatic methods should be applied and interpreted when developing and evaluating instruction-tuned LLMs.

多樣性 · MoDELS · 可約的 · 語言模型化 · GPT3 ·

2024 年 7 月 1 日

Does Writing with Language Models Reduce Content Diversity?

Vishakh Padmakumar,He He

from arxiv, ICLR 2024

Large language models (LLMs) have led to a surge in collaborative writing with model assistance. As different users incorporate suggestions from the same model, there is a risk of decreased diversity in the produced content, potentially limiting diverse perspectives in public discourse. In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write argumentative essays in three setups -- using a base LLM (GPT3), a feedback-tuned LLM (InstructGPT), and writing without model help. We develop a set of diversity metrics and find that writing with InstructGPT (but not the GPT3) results in a statistically significant reduction in diversity. Specifically, it increases the similarity between the writings of different authors and reduces the overall lexical and content diversity. We additionally find that this effect is mainly attributable to InstructGPT contributing less diverse text to co-written essays. In contrast, the user-contributed text remains unaffected by model collaboration. This suggests that the recent improvement in generation quality from adapting models to human feedback might come at the cost of more homogeneous and less diverse content.

泛化理論 · INFORMS · 穩健性 · MoDELS · 推斷 ·

2024 年 7 月 1 日

Probabilistic Test-Time Generalization by Variational Neighbor-Labeling

Sameer Ambekar,Zehao Xiao,Jiayi Shen,Xiantong Zhen,Cees G. M. Snoek

from arxiv, Accepted by CoLLAs 2024

This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed on unseen target domains. We follow the strict separation of source training and target testing, but exploit the value of the unlabeled target data itself during inference. We make three contributions. First, we propose probabilistic pseudo-labeling of target samples to generalize the source-trained model to the target domain at test time. We formulate the generalization at test time as a variational inference problem, by modeling pseudo labels as distributions, to consider the uncertainty during generalization and alleviate the misleading signal of inaccurate pseudo labels. Second, we learn variational neighbor labels that incorporate the information of neighboring target samples to generate more robust pseudo labels. Third, to learn the ability to incorporate more representative target information and generate more precise and robust variational neighbor labels, we introduce a meta-generalization stage during training to simulate the generalization procedure. Experiments on seven widely-used datasets demonstrate the benefits, abilities, and effectiveness of our proposal.

Vision · MoDELS · 詞元分析器 · 語言模型化 · 可約的 ·

2024 年 7 月 1 日

Long Context Transfer from Language to Vision

Peiyuan Zhang,Kaichen Zhang,Bo Li,Guangtao Zeng,Jingkang Yang,Yuanhan Zhang,Ziyue Wang,Haoran Tan,Chunyuan Li,Ziwei Liu

from arxiv, Code, demo, and models are available at //github.com/EvolvingLMMs-Lab/LongVA

Video sequences offer valuable temporal information, but existing large multimodal models (LMMs) fall short in understanding extremely long videos. Many works address this by reducing the number of visual tokens using visual resamplers. Alternatively, in this paper, we approach this problem from the perspective of the language model. By simply extrapolating the context length of the language backbone, we enable LMMs to comprehend orders of magnitude more visual tokens without any video training. We call this phenomenon long context transfer and carefully ablate its properties. To effectively measure LMMs' ability to generalize to long contexts in the vision modality, we develop V-NIAH (Visual Needle-In-A-Haystack), a purely synthetic long vision benchmark inspired by the language model's NIAH test. Our proposed Long Video Assistant (LongVA) can process 2000 frames or over 200K visual tokens without additional complexities. With its extended context length, LongVA achieves state-of-the-art performance on Video-MME among 7B-scale models by densely sampling more input frames. Our work is open-sourced at //github.com/EvolvingLMMs-Lab/LongVA.

MoDELS · Sora · Performer · INFORMS · state-of-the-art ·

2024 年 6 月 27 日

What Matters in Detecting AI-Generated Videos like Sora?

Chirui Chang,Zhengzhe Liu,Xiaoyang Lyu,Xiaojuan Qi

Recent advancements in diffusion-based video generation have showcased remarkable results, yet the gap between synthetic and real-world videos remains under-explored. In this study, we examine this gap from three fundamental perspectives: appearance, motion, and geometry, comparing real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion. To achieve this, we train three classifiers using 3D convolutional networks, each targeting distinct aspects: vision foundation model features for appearance, optical flow for motion, and monocular depth for geometry. Each classifier exhibits strong performance in fake video detection, both qualitatively and quantitatively. This indicates that AI-generated videos are still easily detectable, and a significant gap between real and fake videos persists. Furthermore, utilizing the Grad-CAM, we pinpoint systematic failures of AI-generated videos in appearance, motion, and geometry. Finally, we propose an Ensemble-of-Experts model that integrates appearance, optical flow, and depth information for fake video detection, resulting in enhanced robustness and generalization ability. Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training. This suggests that the gap between real and fake videos can be generalized across various video generative models. Project page: //justin-crchang.github.io/3DCNNDetection.github.io/

代碼 · Projection · 語言模型化 · GitHub · MoDELS ·

2024 年 6 月 27 日

Where Are Large Language Models for Code Generation on GitHub?

Xiao Yu,Lei Liu,Xing Hu,Jacky Wai Keung,Jin Liu,Xin Xia

The increasing use of Large Language Models (LLMs) in software development has garnered significant attention from researchers assessing the quality of the code they generate. However, much of the research focuses on controlled datasets such as HumanEval, which fail to adequately represent how developers actually utilize LLMs' code generation capabilities or clarify the characteristics of LLM-generated code in real-world development scenarios. To bridge this gap, our study investigates the characteristics of LLM-generated code and its corresponding projects hosted on GitHub. Our findings reveal several key insights: (1) ChatGPT and Copilot are the most frequently utilized for generating code on GitHub. In contrast, there is very little code generated by other LLMs on GitHub. (2) Projects containing ChatGPT/Copilot-generated code are often small and less known, led by individuals or small teams. Despite this, most projects are continuously evolving and improving. (3) ChatGPT/Copilot is mainly utilized for generating Python, Java, and TypeScript scripts for data processing and transformation. C/C++ and JavaScript code generation focuses on algorithm and data structure implementation and user interface code. Most ChatGPT/Copilot-generated code snippets are relatively short and exhibit low complexity. (4) Compared to human-written code, ChatGPT/Copilot-generated code exists in a small proportion of projects and generally undergoes fewer modifications. Additionally, modifications due to bugs are even fewer, ranging from just 3% to 8% across different languages. (5) Most comments on ChatGPT/Copilot-generated code lack detailed information, often only stating the code's origin without mentioning prompts, human modifications, or testing status. Based on these findings, we discuss the implications for researchers and practitioners.

MoDELS · 語言模型化 · Agent · 標注 · wikidata ·

2024 年 6 月 27 日

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

Peter Hase,Thomas Hofweber,Xiang Zhou,Elias Stengel-Eskin,Mohit Bansal

from arxiv, 23 pages, 4 figures

The model editing problem concerns how language models should learn new facts about the world over time. While empirical research on model editing has drawn widespread attention, the conceptual foundations of model editing remain shaky -- perhaps unsurprisingly, since model editing is essentially belief revision, a storied problem in philosophy that has eluded succinct solutions for decades. Model editing nonetheless demands a solution, since we need to be able to control the knowledge within language models. With this goal in mind, this paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research. We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place. Many of these challenges are extremely difficult to address, e.g. determining far-reaching consequences of edits, labeling probabilistic entailments between facts, and updating beliefs of agent simulators. Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent. This enables us to say exactly how belief revision in language models falls short of a desirable epistemic standard. We encourage further research exploring settings where such a gold standard can be compared against. Our code is publicly available at: //github.com/peterbhase/LLM-belief-revision

知識 (knowledge) · 蒸餾 · 語言模型化 · MoDELS · Automator ·

2024 年 6 月 27 日

Can Low-Rank Knowledge Distillation in LLMs be Useful for Microelectronic Reasoning?

Nirjhor Rouf,Fin Amin,Paul D. Franzon

from arxiv, 4 pages, 2 figures, 2 tables, The First IEEE International Workshop on LLM-Aided Design (LAD'24)

In this work, we present empirical results regarding the feasibility of using offline large language models (LLMs) in the context of electronic design automation (EDA). The goal is to investigate and evaluate a contemporary language model's (Llama-2-7B) ability to function as a microelectronic Q & A expert as well as its reasoning, and generation capabilities in solving microelectronic-related problems. Llama-2-7B was tested across a variety of adaptation methods, including introducing a novel low-rank knowledge distillation (LoRA-KD) scheme. Our experiments produce both qualitative and quantitative results.

控制器 · 語言模型化 · MoDELS · 大語言模型 · 多樣性 ·

2024 年 6 月 27 日

Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?

Marcio Fonseca,Shay B. Cohen

from arxiv, ACL 2024 camera ready

In this work, we investigate the controllability of large language models (LLMs) on scientific summarization tasks. We identify key stylistic and content coverage factors that characterize different types of summaries such as paper reviews, abstracts, and lay summaries. By controlling stylistic features, we find that non-fine-tuned LLMs outperform humans in the MuP review generation task, both in terms of similarity to reference summaries and human preferences. Also, we show that we can improve the controllability of LLMs with keyword-based classifier-free guidance (CFG) while achieving lexical overlap comparable to strong fine-tuned baselines on arXiv and PubMed. However, our results also indicate that LLMs cannot consistently generate long summaries with more than 8 sentences. Furthermore, these models exhibit limited capacity to produce highly abstractive lay summaries. Although LLMs demonstrate strong generic summarization competency, sophisticated content control without costly fine-tuning remains an open problem for domain-specific applications.

圖 · Neural Networks · state-of-the-art · SimPLe · 向量化 ·

2018 年 10 月 1 日

How Powerful are Graph Neural Networks?

Keyulu Xu,Weihua Hu,Jure Leskovec,Stefanie Jegelka

Graph Neural Networks (GNNs) for representation learning of graphs broadly follow a neighborhood aggregation framework, where the representation vector of a node is computed by recursively aggregating and transforming feature vectors of its neighboring nodes. Many GNN variants have been proposed and have achieved state-of-the-art results on both node and graph classification tasks. However, despite GNNs revolutionizing graph representation learning, there is limited understanding of their representational properties and limitations. Here, we present a theoretical framework for analyzing the expressive power of GNNs in capturing different graph structures. Our results characterize the discriminative power of popular GNN variants, such as Graph Convolutional Networks and GraphSAGE, and show that they cannot learn to distinguish certain simple graph structures. We then develop a simple architecture that is provably the most expressive among the class of GNNs and is as powerful as the Weisfeiler-Lehman graph isomorphism test. We empirically validate our theoretical findings on a number of graph classification benchmarks, and demonstrate that our model achieves state-of-the-art performance.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='3Ot0k'></tfoot>

<legend id='6lI79'><style id='9lXae'><dir id='nYof0'><q id='Pkr3U'></q></dir></style></legend>

<i id='yqPaZ'><tr id='K9yS3'><dt id='pJM1Y'><q id='sNJBw'><span id='Ge3Jj'><b id='Ptc5g'><form id='ohQ0N'><ins id='P6a4o'></ins><ul id='RyVks'></ul><sub id='q7E0M'></sub></form><legend id='Mt1Hi'></legend><bdo id='tyf2k'><pre id='r6DMO'><center id='h4Mmj'></center></pre></bdo></b><th id='xNPnP'></th></span></q></dt></tr></i><div id='0ON3a'><tfoot id='aQY9B'></tfoot><dl id='C2liT'><fieldset id='1UV33'></fieldset></dl></div>