男插曲女视频免费观看_无套内谢少妇毛片免费看看_999国外精品视频免费观看_七妺福利精品导航大全_亚州AV无码乱码在线观看_一区二区三区在线网站_日韩欧美在亚洲一区二区三区

Large Language Models (LLMs) have demonstrated impressive planning abilities due to their vast "world knowledge". Yet, obtaining plans that are both feasible (grounded in affordances) and cost-effective (in plan length), remains a challenge, despite recent progress. This contrasts with heuristic planning methods that employ domain knowledge (formalized in action models such as PDDL) and heuristic search to generate feasible, optimal plans. Inspired by this, we propose to combine the power of LLMs and heuristic planning by leveraging the world knowledge of LLMs and the principles of heuristic search. Our approach, SayCanPay, employs LLMs to generate actions (Say) guided by learnable domain knowledge, that evaluates actions' feasibility (Can) and long-term reward/payoff (Pay), and heuristic search to select the best sequence of actions. Our contributions are (1) a novel framing of the LLM planning problem in the context of heuristic planning, (2) integrating grounding and cost-effective elements into the generated plans, and (3) using heuristic search over actions. Our extensive evaluations show that our model surpasses other LLM planning approaches.

相關內容

大(da)語言模型(xing)

關注 56

大語言模(mo)(mo)型(xing)(xing)是基(ji)于海(hai)量文(wen)本數(shu)據訓練的(de)(de)深(shen)度學習模(mo)(mo)型(xing)(xing)。它不僅能夠(gou)生成自然語言文(wen)本，還能夠(gou)深(shen)入理(li)解(jie)文(wen)本含義，處理(li)各種自然語言任務(wu)，如(ru)文(wen)本摘(zhai)要、問答、翻譯等。2023年，大語言模(mo)(mo)型(xing)(xing)及(ji)其在(zai)人(ren)(ren)工智(zhi)(zhi)能領域的(de)(de)應(ying)用(yong)已(yi)成為(wei)全球科技研究的(de)(de)熱點，其在(zai)規模(mo)(mo)上的(de)(de)增長尤為(wei)引人(ren)(ren)注目，參(can)數(shu)量已(yi)從(cong)最初的(de)(de)十(shi)幾(ji)億(yi)躍升到(dao)如(ru)今的(de)(de)一(yi)萬億(yi)。參(can)數(shu)量的(de)(de)提升使得模(mo)(mo)型(xing)(xing)能夠(gou)更加(jia)精細地(di)捕捉人(ren)(ren)類語言微(wei)妙之處，更加(jia)深(shen)入地(di)理(li)解(jie)人(ren)(ren)類語言的(de)(de)復雜性。在(zai)過去的(de)(de)一(yi)年里，大語言模(mo)(mo)型(xing)(xing)在(zai)吸納新知識、分解(jie)復雜任務(wu)以及(ji)圖(tu)文(wen)對齊等多方(fang)面都有顯著提升。隨著技術的(de)(de)不斷成熟(shu)，它將不斷拓展(zhan)其應(ying)用(yong)范圍，為(wei)人(ren)(ren)類提供(gong)更加(jia)智(zhi)(zhi)能化和個(ge)性化的(de)(de)服務(wu)，進一(yi)步(bu)改善人(ren)(ren)們的(de)(de)生活和生產(chan)方(fang)式(shi)。

SAT · 優化器 · Extensibility · MoDELS · 語言模型化 ·

2024 年 2 月 16 日

AutoSAT: Automatically Optimize SAT Solvers via Large Language Models

Yiwen Sun,Xianyin Zhang,Shiyu Huang,Shaowei Cai,Bing-Zhen Zhang,Ke Wei

Heuristics are crucial in SAT solvers, while no heuristic rules are suitable for all problem instances. Therefore, it typically requires to refine specific solvers for specific problem instances. In this context, we present AutoSAT, a novel framework for automatically optimizing heuristics in SAT solvers. AutoSAT is based on Large Large Models (LLMs) which is able to autonomously generate code, conduct evaluation, then utilize the feedback to further optimize heuristics, thereby reducing human intervention and enhancing solver capabilities. AutoSAT operates on a plug-and-play basis, eliminating the need for extensive preliminary setup and model training, and fosters a Chain of Thought collaborative process with fault-tolerance, ensuring robust heuristic optimization. Extensive experiments on a Conflict-Driven Clause Learning (CDCL) solver demonstrates the overall superior performance of AutoSAT, especially in solving some specific SAT problem instances.

Learning · 貪心 · 優化器 · 內部結點 · binary ·

2024 年 2 月 16 日

GradTree: Learning Axis-Aligned Decision Trees with Gradient Descent

Sascha Marton,Stefan Lüdtke,Christian Bartelt,Heiner Stuckenschmidt

Decision Trees (DTs) are commonly used for many machine learning tasks due to their high degree of interpretability. However, learning a DT from data is a difficult optimization problem, as it is non-convex and non-differentiable. Therefore, common approaches learn DTs using a greedy growth algorithm that minimizes the impurity locally at each internal node. Unfortunately, this greedy procedure can lead to inaccurate trees. In this paper, we present a novel approach for learning hard, axis-aligned DTs with gradient descent. The proposed method uses backpropagation with a straight-through operator on a dense DT representation, to jointly optimize all tree parameters. Our approach outperforms existing methods on binary classification benchmarks and achieves competitive results for multi-class tasks. The method is available under: //github.com/s-marton/GradTree

命名實體識別 · MoDELS · entity · 語言模型化 · 大語言模型 ·

2024 年 2 月 16 日

LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty

Zhen Zhang,Yuhua Zhao,Hang Gao,Mengting Hu

from arxiv, Accepted by WebConf (WWW'2024)

Named Entity Recognition (NER) serves as a fundamental task in natural language understanding, bearing direct implications for web content analysis, search engines, and information retrieval systems. Fine-tuned NER models exhibit satisfactory performance on standard NER benchmarks. However, due to limited fine-tuning data and lack of knowledge, it performs poorly on unseen entity recognition. As a result, the usability and reliability of NER models in web-related applications are compromised. Instead, Large Language Models (LLMs) like GPT-4 possess extensive external knowledge, but research indicates that they lack specialty for NER tasks. Furthermore, non-public and large-scale weights make tuning LLMs difficult. To address these challenges, we propose a framework that combines small fine-tuned models with LLMs (LinkNER) and an uncertainty-based linking strategy called RDC that enables fine-tuned models to complement black-box LLMs, achieving better performance. We experiment with both standard NER test sets and noisy social media datasets. LinkNER enhances NER task performance, notably surpassing SOTA models in robustness tests. We also quantitatively analyze the influence of key components like uncertainty estimation methods, LLMs, and in-context learning on diverse NER tasks, offering specific web-related recommendations.

Integration · 正交 · 6G · 可理解性 · 可辨認的 ·

2024 年 2 月 15 日

Orthogonal Time Frequency Space for Integrated Sensing and Communication: A Survey

Eyad Shtaiwi,Ahmed Abdelhadi,Husheng Li,Zhu Han,H. Vincent Poor

Sixth-generation (6G) wireless communication systems, as stated in the European 6G flagship project Hexa-X, are anticipated to feature the integration of intelligence, communication, sensing, positioning, and computation. An important aspect of this integration is integrated sensing and communication (ISAC), in which the same waveform is used for both systems both sensing and communication, to address the challenge of spectrum scarcity. Recently, the orthogonal time frequency space (OTFS) waveform has been proposed to address OFDM's limitations due to the high Doppler spread in some future wireless communication systems. In this paper, we review existing OTFS waveforms for ISAC systems and provide some insights into future research. Firstly, we introduce the basic principles and a system model of OTFS and provide a foundational understanding of this innovative technology's core concepts and architecture. Subsequently, we present an overview of OTFS-based ISAC system frameworks. We provide a comprehensive review of recent research developments and the current state of the art in the field of OTFS-assisted ISAC systems to gain a thorough understanding of the current landscape and advancements. Furthermore, we perform a thorough comparison between OTFS-enabled ISAC operations and traditional OFDM, highlighting the distinctive advantages of OTFS, especially in high Doppler spread scenarios. Subsequently, we address the primary challenges facing OTFS-based ISAC systems, identifying potential limitations and drawbacks. Then, finally, we suggest future research directions, aiming to inspire further innovation in the 6G wireless communication landscape.

MoDELS · 評論員 · 數學 · 語言模型化 · 聲明 ·

2024 年 2 月 14 日

LogicPrpBank: A Corpus for Logical Implication and Equivalence

Zhexiong Liu,Jing Zhang,Jiaying Lu,Wenjing Ma,Joyce C Ho

from arxiv, In the 5th AI4ED Workshop, held in conjunction with The 38th AAAI Conference on Artificial Intelligence, February 2024

Logic reasoning has been critically needed in problem-solving and decision-making. Although Language Models (LMs) have demonstrated capabilities of handling multiple reasoning tasks (e.g., commonsense reasoning), their ability to reason complex mathematical problems, specifically propositional logic, remains largely underexplored. This lack of exploration can be attributed to the limited availability of annotated corpora. Here, we present a well-labeled propositional logic corpus, LogicPrpBank, containing 7093 Propositional Logic Statements (PLSs) across six mathematical subjects, to study a brand-new task of reasoning logical implication and equivalence. We benchmark LogicPrpBank with widely-used LMs to show that our corpus offers a useful resource for this challenging task and there is ample room for model improvement.

視覺問答 · MoDELS · Extensibility · Performer · 模態 ·

2024 年 2 月 14 日

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

Yutao Hu,Tianbin Li,Quanfeng Lu,Wenqi Shao,Junjun He,Yu Qiao,Ping Luo

Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this paper, we introduce OmniMedVQA, a novel comprehensive medical Visual Question Answering (VQA) benchmark. This benchmark is collected from 75 different medical datasets, including 12 different modalities and covering more than 20 distinct anatomical regions. Importantly, all images in this benchmark are sourced from authentic medical scenarios, ensuring alignment with the requirements of the medical field and suitability for evaluating LVLMs. Through our extensive experiments, we have found that existing LVLMs struggle to address these medical VQA problems effectively. Moreover, what surprises us is that medical-specialized LVLMs even exhibit inferior performance to those general-domain models, calling for a more versatile and robust LVLM in the biomedical field. The evaluation results not only reveal the current limitations of LVLM in understanding real medical images but also highlight our dataset's significance. Our dataset will be made publicly available.

Performer · 多樣性 · tuning · 代碼 · 語言模型化 ·

2024 年 2 月 14 日

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Yejie Wang,Keqing He,Guanting Dong,Pei Wang,Weihao Zeng,Muxi Diao,Yutao Mou,Mengdi Zhang,Jingang Wang,Xunliang Cai,Weiran Xu

from arxiv, 14 pages, 6 figures

Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this paper, we introduce a diverse instruction model (DolphCoder) with self-evaluating for code generation. It learns diverse instruction targets and combines a code evaluation objective to enhance its code generation ability. Our model achieves superior performance on the HumanEval and MBPP benchmarks, demonstrating new insights for future code instruction tuning work. Our key findings are: (1) Augmenting more diverse responses with distinct reasoning paths increases the code capability of LLMs. (2) Improving one's ability to evaluate the correctness of code solutions also enhances their ability to create it.

語言模型化 · MoDELS · 大語言模型 · Processing（編程語言） · Notability ·

2024 年 2 月 14 日

CLOMO: Counterfactual Logical Modification with Large Language Models

Yinya Huang,Ruixin Hong,Hongming Zhang,Wei Shao,Zhicheng Yang,Dong Yu,Changshui Zhang,Xiaodan Liang,Linqi Song

In this study, we delve into the realm of counterfactual reasoning capabilities of large language models (LLMs). Our primary objective is to cultivate the counterfactual thought processes within LLMs and rigorously assess these processes for their validity. Specifically, we introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark. In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship. To effectively evaluate a generation model's counterfactual capabilities, we propose an innovative evaluation metric, the LogicAware Counterfactual Score to directly evaluate the natural language output of LLMs instead of modeling the task as a multiple-choice problem. Analysis shows that the proposed automatic metric aligns well with human preference. Our experimental results show that while LLMs demonstrate a notable capacity for logical counterfactual thinking, there remains a discernible gap between their current abilities and human performance.

相似度 · INFORMS · 估計/估計量 · Extensibility · 無監督 ·

2021 年 3 月 10 日

SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance

Fu-Zhao Ou,Xingyu Chen,Ruixin Zhang,Yuge Huang,Shaoxin Li,Jilin Li,Yong Li,Liujuan Cao,Yuan-Gen Wang

In recent years, Face Image Quality Assessment (FIQA) has become an indispensable part of the face recognition system to guarantee the stability and reliability of recognition performance in an unconstrained scenario. For this purpose, the FIQA method should consider both the intrinsic property and the recognizability of the face image. Most previous works aim to estimate the sample-wise embedding uncertainty or pair-wise similarity as the quality score, which only considers the information from partial intra-class. However, these methods ignore the valuable information from the inter-class, which is for estimating to the recognizability of face image. In this work, we argue that a high-quality face image should be similar to its intra-class samples and dissimilar to its inter-class samples. Thus, we propose a novel unsupervised FIQA method that incorporates Similarity Distribution Distance for Face Image Quality Assessment (SDD-FIQA). Our method generates quality pseudo-labels by calculating the Wasserstein Distance (WD) between the intra-class similarity distributions and inter-class similarity distributions. With these quality pseudo-labels, we are capable of training a regression network for quality prediction. Extensive experiments on benchmark datasets demonstrate that the proposed SDD-FIQA surpasses the state-of-the-arts by an impressive margin. Meanwhile, our method shows good generalization across different recognition systems.

語音識別 · Google Voice · 清華大學智能產業研究院 · CRAFT · Cortana ·

2018 年 1 月 24 日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Xuejing Yuan,Yuxuan Chen,Yue Zhao,Yunhui Long,Xiaokang Liu,Kai Chen,Shengzhi Zhang,Heqing Huang,Xiaofeng Wang,Carl A. Gunter

ASR (automatic speech recognition) systems like Siri, Alexa, Google Voice or Cortana has become quite popular recently. One of the key techniques enabling the practical use of such systems in people's daily life is deep learning. Though deep learning in computer vision is known to be vulnerable to adversarial perturbations, little is known whether such perturbations are still valid on the practical speech recognition. In this paper, we not only demonstrate such attacks can happen in reality, but also show that the attacks can be systematically conducted. To minimize users' attention, we choose to embed the voice commands into a song, called CommandSong. In this way, the song carrying the command can spread through radio, TV or even any media player installed in the portable devices like smartphones, potentially impacting millions of users in long distance. In particular, we overcome two major challenges: minimizing the revision of a song in the process of embedding commands, and letting the CommandSong spread through the air without losing the voice "command". Our evaluation demonstrates that we can craft random songs to "carry" any commands and the modify is extremely difficult to be noticed. Specially, the physical attack that we play the CommandSongs over the air and record them can success with 94 percentage.