国产精品亚洲综合久久_国产三级A专区在线观看播放_国产美女精品在线观看_制服丝袜无码在线_亚洲高清无码DVD视频_老司机观看精品一区二区色欲AV_九九九成人精品免费视频

This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Cognitive capability is defined as the quantity and quality of information conveyed by the neuron output vectors within the network, similar to the neural signal processing in human cognition. Expressive capability is defined as the model's capability to produce word-level output. Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF. Statistical analyses confirm a significant correlation between the two capabilities, suggesting that cognitive capacity may limit expressive potential. The paper also explores the theoretical underpinnings of these divergent developmental trajectories and their connection to the LLMs' architectural design. Moreover, we evaluate various optimization-independent strategies, such as few-shot learning and repeated sampling, which bridge the gap between cognitive and expressive capabilities. This research reveals the potential connection between the hidden space and the output space, contributing valuable insights into the interpretability and controllability of their training processes.

相關內容

Cognition

關注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：認(ren)知(zhi)：國際認(ren)知(zhi)科學雜志。 Publisher：Elsevier。 SIT：

回合 · MoDELS · React · Integration · 規范化的 ·

2024 年 12 月 20 日

Simulation of Crowd Egress with Environmental Stressors

Peng Wang,Xiaoda Wang,Peter Luh,Christian Wilkie,Timo Korhonen,Neal Olderman

from arxiv, 27 pages, 19 figures

This article introduces a modeling framework to characterize evacuee response to environmental stimuli during emergency egress. The model is developed in consistency with stress theory, which explains how an organism reacts to environmental stressors. We integrate the theory into the well-known social-force model, and develop a framework to simulate crowd evacuation behavior in multi-compartment buildings. Our method serves as a theoretical basis to study crowd movement at bottlenecks, and simulate their herding and way-finding behavior in normal and hazardous conditions. The pre-movement behavior is also briefly investigated by using opinion dynamics with social group model. The algorithms have been partly tested in FDS+EVAC as well as our simulation platform crowdEgress.

MoDELS · 設計 · 生成模型 · 離散化 · 余弦 ·

2024 年 12 月 19 日

DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

Mang Ning,Mingxiao Li,Jianlin Su,Haozhe Jia,Lanmiao Liu,Martin Bene?,Albert Ali Salah,Itir Onal Ertugrul

from arxiv, 23 pages

This paper explores image modeling from the frequency space and introduces DCTdiff, an end-to-end diffusion generative paradigm that efficiently models images in the discrete cosine transform (DCT) space. We investigate the design space of DCTdiff and reveal the key design factors. Experiments on different frameworks (UViT, DiT), generation tasks, and various diffusion samplers demonstrate that DCTdiff outperforms pixel-based diffusion models regarding generative quality and training efficiency. Remarkably, DCTdiff can seamlessly scale up to high-resolution generation without using the latent diffusion paradigm. Finally, we illustrate several intriguing properties of DCT image modeling. For example, we provide a theoretical proof of why `image diffusion can be seen as spectral autoregression', bridging the gap between diffusion and autoregressive models. The effectiveness of DCTdiff and the introduced properties suggest a promising direction for image modeling in the frequency space. The code is at \url{//github.com/forever208/DCTdiff}.

稀疏 · 點云 · Learning · 泛函 · 監督 ·

2024 年 12 月 19 日

Point Cloud Semantic Segmentation with Sparse and Inhomogeneous Annotations

Zhiyi Pan,Nan Zhang,Wei Gao,Shan Liu,Ge Li

Utilizing uniformly distributed sparse annotations, weakly supervised learning alleviates the heavy reliance on fine-grained annotations in point cloud semantic segmentation tasks. However, few works discuss the inhomogeneity of sparse annotations, albeit it is common in real-world scenarios. Therefore, this work introduces the probability density function into the gradient sampling approximation method to qualitatively analyze the impact of annotation sparsity and inhomogeneity under weakly supervised learning. Based on our analysis, we propose an Adaptive Annotation Distribution Network (AADNet) capable of robust learning on arbitrarily distributed sparse annotations. Specifically, we propose a label-aware point cloud downsampling strategy to increase the proportion of annotations involved in the training stage. Furthermore, we design the multiplicative dynamic entropy as the gradient calibration function to mitigate the gradient bias caused by non-uniformly distributed sparse annotations and explicitly reduce the epistemic uncertainty. Without any prior restrictions and additional information, our proposed method achieves comprehensive performance improvements at multiple label rates and different annotation distributions.

表示 · MoDELS · Analysis · BERT · Networking ·

2024 年 12 月 19 日

Analysis and Visualization of Linguistic Structures in Large Language Models: Neural Representations of Verb-Particle Constructions in BERT

Hassane Kissane,Achim Schilling,Patrick Krauss

This study investigates the internal representations of verb-particle combinations within transformer-based large language models (LLMs), specifically examining how these models capture lexical and syntactic nuances at different neural network layers. Employing the BERT architecture, we analyse the representational efficacy of its layers for various verb-particle constructions such as 'agree on', 'come back', and 'give up'. Our methodology includes a detailed dataset preparation from the British National Corpus, followed by extensive model training and output analysis through techniques like multi-dimensional scaling (MDS) and generalized discrimination value (GDV) calculations. Results show that BERT's middle layers most effectively capture syntactic structures, with significant variability in representational accuracy across different verb categories. These findings challenge the conventional uniformity assumed in neural network processing of linguistic elements and suggest a complex interplay between network architecture and linguistic representation. Our research contributes to a better understanding of how deep learning models comprehend and process language, offering insights into the potential and limitations of current neural approaches to linguistic analysis. This study not only advances our knowledge in computational linguistics but also prompts further research into optimizing neural architectures for enhanced linguistic precision.

控制器 · Learning · 粵港澳大灣區數字經濟研究院 · MoDELS · 優化器 ·

2024 年 12 月 19 日

Learning to Generate Research Idea with Dynamic Control

Ruochen Li,Liqiang Jing,Chi Han,Jiawei Zhou,Xinya Du

The rapid advancements in large language models (LLMs) have demonstrated their potential to accelerate scientific discovery, particularly in automating the process of research ideation. LLM-based systems have shown promise in generating hypotheses and research ideas. However, current approaches predominantly rely on prompting-based pre-trained models, limiting their ability to optimize generated content effectively. Moreover, they also lack the capability to deal with the complex interdependence and inherent restrictions among novelty, feasibility, and effectiveness, which remains challenging due to the inherent trade-offs among these dimensions, such as the innovation-feasibility conflict. To address these limitations, we for the first time propose fine-tuning LLMs to be better idea proposers and introduce a novel framework that employs a two-stage approach combining Supervised Fine-Tuning (SFT) and controllable Reinforcement Learning (RL). In the SFT stage, the model learns foundational patterns from pairs of research papers and follow-up ideas. In the RL stage, multi-dimensional reward modeling, guided by fine-grained feedback, evaluates and optimizes the generated ideas across key metrics. Dimensional controllers enable dynamic adjustment of generation, while a sentence-level decoder ensures context-aware emphasis during inference. Our framework provides a balanced approach to research ideation, achieving high-quality outcomes by dynamically navigating the trade-offs among novelty, feasibility, and effectiveness.

MoDELS · 向量化 · 聯合分布 · 逼真度 · state-of-the-art ·

2024 年 12 月 18 日

Autoregressive Video Generation without Vector Quantization

Haoge Deng,Ting Pan,Haiwen Diao,Zhengxiong Luo,Yufeng Cui,Huchuan Lu,Shiguang Shan,Yonggang Qi,Xinlong Wang

from arxiv, 22 pages, 16 figures

This paper presents a novel approach that enables autoregressive video generation with high efficiency. We propose to reformulate the video generation problem as a non-quantized autoregressive modeling of temporal frame-by-frame prediction and spatial set-by-set prediction. Unlike raster-scan prediction in prior autoregressive models or joint distribution modeling of fixed-length tokens in diffusion models, our approach maintains the causal property of GPT-style models for flexible in-context capabilities, while leveraging bidirectional modeling within individual frames for efficiency. With the proposed approach, we train a novel video autoregressive model without vector quantization, termed NOVA. Our results demonstrate that NOVA surpasses prior autoregressive video models in data efficiency, inference speed, visual fidelity, and video fluency, even with a much smaller model capacity, i.e., 0.6B parameters. NOVA also outperforms state-of-the-art image diffusion models in text-to-image generation tasks, with a significantly lower training cost. Additionally, NOVA generalizes well across extended video durations and enables diverse zero-shot applications in one unified model. Code and models are publicly available at //github.com/baaivision/NOVA.

INFORMS · Guidance · 相關特征 · Processing（編程語言） · 相關系數 ·

2024 年 12 月 18 日

Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

Xingjian Wang,Li Chai

from arxiv, 14 pages, 5 figures

In-the-wild dynamic facial expression recognition (DFER) encounters a significant challenge in recognizing emotion-related expressions, which are often temporally and spatially diluted by emotion-irrelevant expressions and global context. Most prior DFER methods directly utilize coupled spatiotemporal representations that may incorporate weakly relevant features with emotion-irrelevant context bias. Several DFER methods highlight dynamic information for DFER, but following explicit guidance that may be vulnerable to irrelevant motion. In this paper, we propose a novel Implicit Facial Dynamics Disentanglement framework (IFDD). Through expanding wavelet lifting scheme to fully learnable framework, IFDD disentangles emotion-related dynamic information from emotion-irrelevant global context in an implicit manner, i.e., without exploit operations and external guidance. The disentanglement process contains two stages. The first is Inter-frame Static-dynamic Splitting Module (ISSM) for rough disentanglement estimation, which explores inter-frame correlation to generate content-aware splitting indexes on-the-fly. We utilize these indexes to split frame features into two groups, one with greater global similarity, and the other with more unique dynamic features. The second stage is Lifting-based Aggregation-Disentanglement Module (LADM) for further refinement. LADM first aggregates two groups of features from ISSM to obtain fine-grained global context features by an updater, and then disentangles emotion-related facial dynamic features from the global context by a predictor. Extensive experiments on in-the-wild datasets have demonstrated that IFDD outperforms prior supervised DFER methods with higher recognition accuracy and comparable efficiency. Code is available at //github.com/CyberPegasus/IFDD.

數據可視化 · 可理解性 · Guidance · 相同 · 人機交互 ·

2024 年 12 月 17 日

Towards Understanding the Impact of Guidance in Data Visualization Systems for Domain Experts

Sherry Qiu,Holly Rushmeier,Kim R. M. Blenman

from arxiv, Poster presented at IEEE VIS 2024: //ieeevis.org/year/2024/program/poster_v-vis-posters-1030.html, Video demonstration: //blenmaninnovationgroup.org/databases-tools/

Guided data visualization systems are highly useful for domain experts to highlight important trends in their large-scale and complex datasets. However, more work is needed to understand the impact of guidance on interpreting data visualizations as well as on the resulting use of visualizations when communicating insights. We conducted two user studies with domain experts and found that experts benefit from a guided coarse-to-fine structure when using data visualization systems, as this is the same structure in which they communicate findings.

MoDELS · Pivotal（公司） · 通用智能 · 語言模型化 · 多峰值 ·

2024 年 1 月 25 日

A Survey of Reasoning with Foundation Models

Jiankai Sun,Chuanyang Zheng,Enze Xie,Zhengying Liu,Ruihang Chu,Jianing Qiu,Jiaqi Xu,Mingyu Ding,Hongyang Li,Mengzhe Geng,Yue Wu,Wenhai Wang,Junsong Chen,Zhangyue Yin,Xiaozhe Ren,Jie Fu,Junxian He,Wu Yuan,Qi Liu,Xihui Liu,Yu Li,Hao Dong,Yu Cheng,Ming Zhang,Pheng Ann Heng,Jifeng Dai,Ping Luo,Jingdong Wang,Ji-Rong Wen,Xipeng Qiu,Yike Guo,Hui Xiong,Qun Liu,Zhenguo Li

from arxiv, 20 Figures, 160 Pages, 750+ References, Project Page //github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.

多峰值 · 情感分析 · MoDELS · AIM · Tumblr ·

2018 年 5 月 25 日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Anthony Hu,Seth Flaxman

from arxiv, Accepted as a conference paper at KDD 2018

We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.