苍井空无码免费换线-日韩精品人妻中文字幕有码网址

Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making presuppositions based on established facts and extrapolating potential outcomes. Existing multimodal large language models (MLLMs) have exhibited impressive cognitive and reasoning capabilities, which have been examined across a wide range of Visual Question Answering (VQA) benchmarks. Nevertheless, how will existing MLLMs perform when faced with counterfactual questions? To answer this question, we first curate a novel \textbf{C}ounter\textbf{F}actual \textbf{M}ulti\textbf{M}odal reasoning benchmark, abbreviated as \textbf{CFMM}, to systematically assess the counterfactual reasoning capabilities of MLLMs. Our CFMM comprises six challenging tasks, each including hundreds of carefully human-labeled and GPT-generated counterfactual questions, to evaluate MLLM's counterfactual reasoning capabilities across diverse aspects. Through experiments, interestingly, we find that existing MLLMs prefer to believe what they see, but ignore the counterfactual presuppositions presented in the question, thereby leading to inaccurate responses. Furthermore, we evaluate a wide range of prevalent MLLMs on our proposed CFMM. The significant gap between their performance on our CFMM and that on several VQA benchmarks indicates that there is still considerable room for improvement in existing MLLMs toward approaching human-level intelligence. On the other hand, through boosting MLLMs performances on our CFMM in the future, potential avenues toward developing MLLMs with advanced intelligence can be explored.

相關內容

語言模型化

關注 9

Learning · 回合 · Processing（編程語言） · 強化學習 · 情景 ·

2024 年 10 月 27 日

MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning

Florian Felten,Umut Ucak,Hicham Azmani,Gao Peng,Willem R?pke,Hendrik Baier,Patrick Mannion,Diederik M. Roijers,Jordan K. Terry,El-Ghazali Talbi,Grégoire Danoy,Ann Nowé,Roxana R?dulescu

Many challenging tasks such as managing traffic systems, electricity grids, or supply chains involve complex decision-making processes that must balance multiple conflicting objectives and coordinate the actions of various independent decision-makers (DMs). One perspective for formalising and addressing such tasks is multi-objective multi-agent reinforcement learning (MOMARL). MOMARL broadens reinforcement learning (RL) to problems with multiple agents each needing to consider multiple objectives in their learning process. In reinforcement learning research, benchmarks are crucial in facilitating progress, evaluation, and reproducibility. The significance of benchmarks is underscored by the existence of numerous benchmark frameworks developed for various RL paradigms, including single-agent RL (e.g., Gymnasium), multi-agent RL (e.g., PettingZoo), and single-agent multi-objective RL (e.g., MO-Gymnasium). To support the advancement of the MOMARL field, we introduce MOMAland, the first collection of standardised environments for multi-objective multi-agent reinforcement learning. MOMAland addresses the need for comprehensive benchmarking in this emerging field, offering over 10 diverse environments that vary in the number of agents, state representations, reward structures, and utility considerations. To provide strong baselines for future research, MOMAland also includes algorithms capable of learning policies in such settings.

MoDELS · TOOLS · 評論員 · 回合 · INTERACT ·

2024 年 10 月 26 日

Preparing for Super-Reactivity: Early Fault-Detection in the Development of Exceedingly Complex Reactive Systems

David Harel,Assaf Marron

We introduce the term Super-Reactive Systems to refer to reactive systems whose construction and behavior are complex, constantly changing and evolving, and heavily interwoven with other systems and the physical world. Finding hidden faults in such systems early in planning and development is critical for human safety, the environment, society and the economy. However, the complexity of the system and its interactions and the absence of adequate technical details pose a great obstacle. We propose an architecture for models and tools to overcome such barriers and enable simulation, systematic analysis, and fault detection and handling, early in the development of super-reactive systems. The approach is facilitated by the inference and abstraction capabilities and the power and knowledge afforded by large language models and associated AI tools. It is based on: (i) deferred, just-in-time interpretation of model elements that are stored in natural language form, and (ii) early capture of tacit interdependencies among seemingly orthogonal requirements.

MoDELS · 可理解性 · Extensibility · NLP · AIM ·

2024 年 10 月 25 日

README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP

Zonghai Yao,Nandyala Siddharth Kantu,Guanghao Wei,Hieu Tran,Zhangqi Duan,Sunjae Kwon,Zhichao Yang,README annotation team,Hong Yu

from arxiv, To appear in Findings of the Association for Computational Linguistics: EMNLP 2024. We sincerely appreciate the tremendous efforts of the entire README annotation team throughout the expert annotation process of the README dataset

The advancement in healthcare has shifted focus toward patient-centric approaches, particularly in self-care and patient education, facilitated by access to Electronic Health Records (EHR). However, medical jargon in EHRs poses significant challenges in patient comprehension. To address this, we introduce a new task of automatically generating lay definitions, aiming to simplify complex medical terms into patient-friendly lay language. We first created the README dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions, each offering context-aware lay definitions manually annotated by domain experts. We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality. We then used README as the training data for models and leveraged a Retrieval-Augmented Generation method to reduce hallucinations and improve the quality of model outputs. Our extensive automatic and human evaluations demonstrate that open-source mobile-friendly models, when fine-tuned with high-quality data, are capable of matching or even surpassing the performance of state-of-the-art closed-source large language models like ChatGPT. This research represents a significant stride in closing the knowledge gap in patient education and advancing patient-centric healthcare solutions.

優化器 · Attention · MoDELS · 通用動力公司 · 詞元分析器 ·

2024 年 10 月 18 日

Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection

Aaron Alvarado Kristanto Julistiono,Davoud Ataee Tarzanagh,Navid Azizan

Attention mechanisms have revolutionized several domains of artificial intelligence, such as natural language processing and computer vision, by enabling models to selectively focus on relevant parts of the input data. While recent work has characterized the optimization dynamics of gradient descent (GD) in attention-based models and the structural properties of its preferred solutions, less is known about more general optimization algorithms such as mirror descent (MD). In this paper, we investigate the convergence properties and implicit biases of a family of MD algorithms tailored for softmax attention mechanisms, with the potential function chosen as the $p$-th power of the $\ell_p$-norm. Specifically, we show that these algorithms converge in direction to a generalized hard-margin SVM with an $\ell_p$-norm objective when applied to a classification problem using a softmax attention model. Notably, our theoretical results reveal that the convergence rate is comparable to that of traditional GD in simpler models, despite the highly nonlinear and nonconvex nature of the present problem. Additionally, we delve into the joint optimization dynamics of the key-query matrix and the decoder, establishing conditions under which this complex joint optimization converges to their respective hard-margin SVM solutions. Lastly, our numerical experiments on real data demonstrate that MD algorithms improve generalization over standard GD and excel in optimal token selection.

估計/估計量 · ERP · 值域 · Vision · 變換 ·

2024 年 10 月 17 日

360U-Former: HDR Illumination Estimation with Panoramic Adapted Vision Transformers

Jack Hilliard,Adrian Hilton,Jean-Yves Guillemaut

from arxiv, Accepted at AIM Workshop 2024 at ECCV 2024, 18 pages, 6 figures

Recent illumination estimation methods have focused on enhancing the resolution and improving the quality and diversity of the generated textures. However, few have explored tailoring the neural network architecture to the Equirectangular Panorama (ERP) format utilised in image-based lighting. Consequently, high dynamic range images (HDRI) results usually exhibit a seam at the side borders and textures or objects that are warped at the poles. To address this shortcoming we propose a novel architecture, 360U-Former, based on a U-Net style Vision-Transformer which leverages the work of PanoSWIN, an adapted shifted window attention tailored to the ERP format. To the best of our knowledge, this is the first purely Vision-Transformer model used in the field of illumination estimation. We train 360U-Former as a GAN to generate HDRI from a limited field of view low dynamic range image (LDRI). We evaluate our method using current illumination estimation evaluation protocols and datasets, demonstrating that our approach outperforms existing and state-of-the-art methods without the artefacts typically associated with the use of the ERP format.

CASE · Integration · RE · INTERACT · INFORMS ·

2024 年 10 月 15 日

Improving Digital Mentorship: Insights and Recommendations from the Re:Coded Community Platform Case Study

Huda Najm Alabbas

from arxiv, Master's thesis

With the rapid growth of technology, emerging IT professionals increasingly require mentorship to secure positions in the field. Recognizing this need, ReCoded has enhanced the skill set of their tech Bootcamp graduates by introducing the community platform -- "ReCoded's Mentorship Platform". To improve the user experience of the volunteer mentors at ReCoded, this thesis investigates optimizing mentors' interactions with digital mentorship platforms and provides suggestions for enhancing these interactions. Multiple third-party collaborators have powered the mentorship platform at ReCoded. This thesis examines the platform powered by StellarUp as a case study. The insights obtained may inform the UX design of any subsequent mentorship platforms considered by ReCoded. This thesis adopted a user-centric approach to solving a UX question. The study identified challenges in the mentors' user journey and their needs by engaging with users via interviews, usability tests, and eye-tracking methods. Three principal issues emerged: platform navigation, the onboarding process, and the seamless integration of external tools. Solutions were derived from desk research for onboarding, card sorting techniques for navigation, and competitive analyses for tool integration. Throughout the research, feedback from 23 participants was gathered, ensuring a holistic understanding and actionable recommendations for developing a user-friendly and efficient mentorship platform.

CoT · 機器人 · 設計 · 代價 · 分解的 ·

2024 年 10 月 12 日

A Novel Multi-Gait Strategy for Stable and Efficient Quadruped Robot Locomotion

Daoxun Zhang,Xieyuanli Chen,Zhengyu Zhong,Ming Xu,Zhiqiang Zheng,Huimin Lu

Taking inspiration from the natural gait transition mechanism of quadrupeds, devising a good gait transition strategy is important for quadruped robots to achieve energy-efficient locomotion on various terrains and velocities. While previous studies have recognized that gait patterns linked to velocities impact two key factors, the Cost of Transport (CoT) and the stability of robot locomotion, only a limited number of studies have effectively combined these factors to design a mechanism that ensures both efficiency and stability in quadruped robot locomotion. In this paper, we propose a multi-gait selection and transition strategy to achieve stable and efficient locomotion across different terrains. Our strategy starts by establishing a gait mapping considering both CoT and locomotion stability to guide the gait selection process during locomotion. Then, we achieve gait switching in time by introducing affine transformations for gait parameters and a designed finite state machine to build the switching order. Comprehensive experiments have been conducted on using our strategy with changing terrains and velocities, and the results indicate that our proposed strategy outperforms baseline methods in achieving simultaneous efficiency in locomotion by considering CoT and stability.

Processing（編程語言） · Taxonomy · Cognition · Attention · 蒸餾 ·

2023 年 9 月 27 日

A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

Zheng Chu,Jingchang Chen,Qianglong Chen,Weijiang Yu,Tao He,Haotian Wang,Weihua Peng,Ming Liu,Bing Qin,Ting Liu

from arxiv, Resources are available at //github.com/zchuz/CoT-Reasoning-Survey

Chain-of-thought reasoning, a cognitive process fundamental to human intelligence, has garnered significant attention in the realm of artificial intelligence and natural language processing. However, there still remains a lack of a comprehensive survey for this arena. To this end, we take the first step and present a thorough survey of this research field carefully and widely. We use X-of-Thought to refer to Chain-of-Thought in a broad sense. In detail, we systematically organize the current research according to the taxonomies of methods, including XoT construction, XoT structure variants, and enhanced XoT. Additionally, we describe XoT with frontier applications, covering planning, tool use, and distillation. Furthermore, we address challenges and discuss some future directions, including faithfulness, multi-modal, and theory. We hope this survey serves as a valuable resource for researchers seeking to innovate within the domain of chain-of-thought reasoning.

Cognition · Performer · Agent · 知識 (knowledge) · MoDELS ·

2023 年 7 月 14 日

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

Zhenhailong Wang,Shaoguang Mao,Wenshan Wu,Tao Ge,Furu Wei,Heng Ji

from arxiv, work in progress

Human intelligence thrives on the concept of cognitive synergy, where collaboration and information integration among different cognitive processes yield superior outcomes compared to individual cognitive processes in isolation. Although Large Language Models (LLMs) have demonstrated promising performance as general task-solving agents, they still struggle with tasks that require intensive domain knowledge and complex reasoning. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist refers to an intelligent agent that collaborates with multiple minds, combining their individual strengths and knowledge, to enhance problem-solving and overall performance in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. We have discovered that assigning multiple, fine-grained personas in LLMs elicits better problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, SPP effectively elicits internal knowledge acquisition abilities, reduces hallucination, and maintains strong reasoning capabilities. Code, data, and prompts can be found at: //github.com/MikeWangWZHL/Solo-Performance-Prompting.git.

知識 (knowledge) · Processing（編程語言） · 圖 · NLP · 知識圖譜 ·

2022 年 9 月 30 日

A Decade of Knowledge Graphs in Natural Language Processing: A Survey

Phillip Schneider,Tim Schopf,Juraj Vladika,Mikhail Galkin,Elena Simperl,Florian Matthes

from arxiv, Accepted to AACL-IJCNLP 2022

In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.