露脸视频一区二区三区在线播放_国产精品久久久精品无码A片闺蜜_国产精品初中视频_亚洲一级片AV日逼一级片_精品一区二区三区AV仓井空_久久久久久久91精品免费观看_蜜臀AV无码精品久久久久手机版

Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to dismiss them as viable models of human cognition. Over the last decade, however, modern deep neural networks (DNNs), which share the same fundamental design principles as their predecessors, have come to dominate artificial intelligence, exhibiting the most advanced cognitive behaviors ever demonstrated in machines. In particular, large language models (LLMs), DNNs trained to predict the next word on a large corpus of text, have proven capable of sophisticated behaviors such as writing syntactically complex sentences without grammatical errors, producing cogent chains of reasoning, and even writing original computer programs -- all behaviors thought to require compositional processing. In this chapter, we survey recent empirical work from machine learning for a broad audience in philosophy, cognitive science, and neuroscience, situating recent breakthroughs within the broader context of philosophical arguments about compositionality. In particular, our review emphasizes two approaches to endowing neural networks with compositional generalization capabilities: (1) architectural inductive biases, and (2) metalearning, or learning to learn. We also present findings suggesting that LLM pretraining can be understood as a kind of metalearning, and can thereby equip DNNs with compositional generalization abilities in a similar way. We conclude by discussing the implications that these findings may have for the study of compositionality in human cognition and by suggesting avenues for future research.

相關內容

Cognition

關注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：認知(zhi)：國際(ji)認知(zhi)科學雜志(zhi)。 Publisher：Elsevier。 SIT：

自動問答 · MoDELS · Performer · 模型性能 · 語言模型化 ·

2024 年 7 月 3 日

CaLMQA: Exploring culturally specific long-form question answering across 23 languages

Shane Arora,Marzena Karpinska,Hung-Ting Chen,Ipsita Bhattacharjee,Mohit Iyyer,Eunsol Choi

from arxiv, 39 pages, 17 figures. Code and data available at //github.com/2015aroras/CaLMQA. Revised argument in section 4, results unchanged

Large language models (LLMs) are used for long-form question answering (LFQA), which requires them to generate paragraph-length answers to complex questions. While LFQA has been well-studied in English, this research has not been extended to other languages. To bridge this gap, we introduce CaLMQA, a collection of 1.5K complex culturally specific questions spanning 23 languages and 51 culturally agnostic questions translated from English into 22 other languages. We define culturally specific questions as those uniquely or more likely to be asked by people from cultures associated with the question's language. We collect naturally-occurring questions from community web forums and hire native speakers to write questions to cover under-resourced, rarely-studied languages such as Fijian and Kirundi. Our dataset contains diverse, complex questions that reflect cultural topics (e.g. traditions, laws, news) and the language usage of native speakers. We automatically evaluate a suite of open- and closed-source models on CaLMQA by detecting incorrect language and token repetitions in answers, and observe that the quality of LLM-generated answers degrades significantly for some low-resource languages. Lastly, we perform human evaluation on a subset of models and languages. Manual evaluation reveals that model performance is significantly worse for culturally specific questions than for culturally agnostic questions. Our findings highlight the need for further research in non-English LFQA and provide an evaluation framework.

Networking · MoDELS · Analysis · 圖 · Principle ·

2024 年 7 月 2 日

Multi-topic belief formation through bifurcations over signed social networks

Anastasia Bizyaeva,Alessio Franci,Naomi Ehrich Leonard

from arxiv, 16 pages, 7 figures

We propose and analyze a nonlinear dynamic model of continuous-time multi-dimensional belief formation over signed social networks. Our model accounts for the effects of a structured belief system, self-appraisal, internal biases, and various sources of cognitive dissonance posited by recent theories in social psychology. We prove that agents become opinionated as a consequence of a bifurcation. We analyze how the balance of social network effects in the model controls the nature of the bifurcation and, therefore, the belief-forming limit-set solutions. Our analysis provides constructive conditions on how multi-stable network belief equilibria and belief oscillations emerging at a belief-forming bifurcation depend on the communication network graph and belief system network graph. Our model and analysis provide new theoretical insights on the dynamics of social systems and a new principled framework for designing decentralized decision-making on engineered networks in the presence of structured relationships among alternatives.

估計/估計量 · 穩健性 · Performer · MoDELS · 樣例 ·

2024 年 7 月 2 日

CausalMetaR: An R package for performing causally interpretable meta-analyses

Guanbo Wang,Sean McGrath,Yi Lian

Researchers would often like to leverage data from a collection of sources (e.g., primary studies in a meta-analysis) to estimate causal effects in a target population of interest. However, traditional meta-analytic methods do not produce causally interpretable estimates for a well-defined target population. In this paper, we present the CausalMetaR R package, which implements efficient and robust methods to estimate causal effects in a given internal or external target population using multi-source data. The package includes estimators of average and subgroup treatment effects for the entire target population. To produce efficient and robust estimates of causal effects, the package implements doubly robust and non-parametric efficient estimators and supports using flexible data-adaptive (e.g., machine learning techniques) methods and cross-fitting techniques to estimate the nuisance models (e.g., the treatment model, the outcome model). We describe the key features of the package and demonstrate how to use the package through an example.

Lyapunov · 標量 · Integration · 簇 · 全 ·

2024 年 7 月 1 日

Relevance of the Basset history term for Lagrangian particle dynamics

Julio Urizarna-Carasa,Daniel Ruprecht,Alexandra von Kameke,Kathrin Padberg-Gehle

The movement of small but finite spherical particles in a fluid can be described by the Maxey-Riley equation (MRE) if they are too large to be considered passive tracers. The MRE contains an integral "history term" modeling wake effects, which causes the force acting on a particle at some given time to depend on its full past trajectory. The history term causes complications in the numerical solution of the MRE and is therefore often neglected, despite both numerical and experimental evidence that its effects are generally not negligible. By numerically computing trajectories with and without the history term of a large number of particles in different flow fields, we investigate its impact on the large-scale Lagrangian dynamics of simulated particles. We show that for moderate to large Stokes numbers, ignoring the history term leads to significant differences in clustering patterns. Furthermore, we compute finite-time Lyapunov exponents and show that, even for small particles, the differences in the resulting scalar field from ignoring the BHT can be significant, in particular if the underlying flow is turbulent.

Cognition · Performer · Agent · 知識 (knowledge) · MoDELS ·

2023 年 7 月 14 日

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

Zhenhailong Wang,Shaoguang Mao,Wenshan Wu,Tao Ge,Furu Wei,Heng Ji

from arxiv, work in progress

Human intelligence thrives on the concept of cognitive synergy, where collaboration and information integration among different cognitive processes yield superior outcomes compared to individual cognitive processes in isolation. Although Large Language Models (LLMs) have demonstrated promising performance as general task-solving agents, they still struggle with tasks that require intensive domain knowledge and complex reasoning. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist refers to an intelligent agent that collaborates with multiple minds, combining their individual strengths and knowledge, to enhance problem-solving and overall performance in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. We have discovered that assigning multiple, fine-grained personas in LLMs elicits better problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, SPP effectively elicits internal knowledge acquisition abilities, reduces hallucination, and maintains strong reasoning capabilities. Code, data, and prompts can be found at: //github.com/MikeWangWZHL/Solo-Performance-Prompting.git.

貪心 · 模態 · MoDELS · 學成 · 泛化理論 ·

2022 年 2 月 10 日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Nan Wu,Stanis?aw Jastrz?bski,Kyunghyun Cho,Krzysztof J. Geras

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.

任務對話系統 · INTERACT · 語言模型化 · contrastive · MoDELS ·

2021 年 10 月 11 日

Advances in Multi-turn Dialogue Comprehension: A Survey

Zhuosheng Zhang,Hai Zhao

from arxiv, arXiv admin note: substantial text overlap with arXiv:2103.03125

Training machines to understand natural language and interact with humans is an elusive and essential task of artificial intelligence. A diversity of dialogue systems has been designed with the rapid development of deep learning techniques, especially the recent pre-trained language models (PrLMs). Among these studies, the fundamental yet challenging type of task is dialogue comprehension whose role is to teach the machines to read and comprehend the dialogue context before responding. In this paper, we review the previous methods from the technical perspective of dialogue modeling for the dialogue comprehension task. We summarize the characteristics and challenges of dialogue comprehension in contrast to plain-text reading comprehension. Then, we discuss three typical patterns of dialogue modeling. In addition, we categorize dialogue-related pre-training techniques which are employed to enhance PrLMs in dialogue scenarios. Finally, we highlight the technical advances in recent years and point out the lessons from the empirical analysis and the prospects towards a new frontier of researches.

特化 · 可約的 · Neural Networks · 剪枝 · Networking ·

2021 年 1 月 31 日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Torsten Hoefler,Dan Alistarh,Tal Ben-Nun,Nikoli Dryden,Alexandra Peste

from arxiv, 90 pages, 26 figures

The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.

圖形處理器 · 圖 · INTERACT · Performer · Neural Networks ·

2019 年 11 月 6 日

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Ruochi Zhang,Yuesong Zou,Jian Ma

Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.

學成 · 圖 · Processing（編程語言） · 知識圖譜 · MoDELS ·

2018 年 2 月 16 日

Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing

K M Annervaz,Somnath Basu Roy Chowdhury,Ambedkar Dukkipati

Machine Learning has been the quintessential solution for many AI problems, but learning is still heavily dependent on the specific training data. Some learning models can be incorporated with a prior knowledge in the Bayesian set up, but these learning models do not have the ability to access any organised world knowledge on demand. In this work, we propose to enhance learning models with world knowledge in the form of Knowledge Graph (KG) fact triples for Natural Language Processing (NLP) tasks. Our aim is to develop a deep learning model that can extract relevant prior support facts from knowledge graphs depending on the task using attention mechanism. We introduce a convolution-based model for learning representations of knowledge graph entity and relation clusters in order to reduce the attention space. We show that the proposed method is highly scalable to the amount of prior information that has to be processed and can be applied to any generic NLP task. Using this method we show significant improvement in performance for text classification with News20, DBPedia datasets and natural language inference with Stanford Natural Language Inference (SNLI) dataset. We also demonstrate that a deep learning model can be trained well with substantially less amount of labeled training data, when it has access to organised world knowledge in the form of knowledge graph.