国产日黄色大片一区二区_欧美一级视频在线观看播放_中文字幕在线视频永久_日本一本免费线观看视频_国产日本韩国中文字幕_国产精品一区在线_视频国产综合一区二区三区

Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, previous works have mostly studied textual and relational retrieval tasks as separate topics. To address the gap, we develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Relational K nowledge Bases. Our benchmark covers three domains/datasets: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties, together with their ground-truth answers (items). We conduct rigorous human evaluation to validate the quality of our synthesized queries. We further enhance the benchmark with high-quality human-generated queries to provide an authentic reference. STARK serves as a comprehensive testbed for evaluating the performance of retrieval systems driven by large language models (LLMs). Our experiments suggest that STARK presents significant challenges to the current retrieval and LLM systems, indicating the demand for building more capable retrieval systems. The benchmark data and code are available on //github.com/snap-stanford/stark.

相關內容

大語言模型

關注 56

大(da)語(yu)言(yan)(yan)模型(xing)(xing)(xing)(xing)是(shi)基于(yu)海量(liang)文本數據訓(xun)練的(de)(de)(de)(de)深度學(xue)習模型(xing)(xing)(xing)(xing)。它(ta)不(bu)僅能(neng)(neng)(neng)夠(gou)生(sheng)(sheng)成自然語(yu)言(yan)(yan)文本，還能(neng)(neng)(neng)夠(gou)深入理解(jie)(jie)文本含(han)義，處理各種自然語(yu)言(yan)(yan)任務，如文本摘要(yao)、問(wen)答、翻(fan)譯等(deng)。2023年(nian)，大(da)語(yu)言(yan)(yan)模型(xing)(xing)(xing)(xing)及其(qi)(qi)在人工智能(neng)(neng)(neng)領域的(de)(de)(de)(de)應用(yong)已(yi)成為全球科技研究的(de)(de)(de)(de)熱(re)點，其(qi)(qi)在規(gui)模上的(de)(de)(de)(de)增長尤為引(yin)人注目(mu)，參(can)(can)數量(liang)已(yi)從最(zui)初的(de)(de)(de)(de)十(shi)幾億躍升到如今的(de)(de)(de)(de)一萬億。參(can)(can)數量(liang)的(de)(de)(de)(de)提升使得模型(xing)(xing)(xing)(xing)能(neng)(neng)(neng)夠(gou)更(geng)(geng)加精細(xi)地捕捉(zhuo)人類(lei)(lei)語(yu)言(yan)(yan)微妙之處，更(geng)(geng)加深入地理解(jie)(jie)人類(lei)(lei)語(yu)言(yan)(yan)的(de)(de)(de)(de)復雜性(xing)。在過去的(de)(de)(de)(de)一年(nian)里，大(da)語(yu)言(yan)(yan)模型(xing)(xing)(xing)(xing)在吸納新(xin)知識、分解(jie)(jie)復雜任務以(yi)及圖文對(dui)齊等(deng)多方面都有(you)顯(xian)著(zhu)提升。隨著(zhu)技術的(de)(de)(de)(de)不(bu)斷成熟，它(ta)將(jiang)不(bu)斷拓展其(qi)(qi)應用(yong)范圍，為人類(lei)(lei)提供更(geng)(geng)加智能(neng)(neng)(neng)化和個性(xing)化的(de)(de)(de)(de)服務，進一步改善人們(men)的(de)(de)(de)(de)生(sheng)(sheng)活和生(sheng)(sheng)產方式。

Prompt · Performer · 設計 · Principle · EASE ·

2024 年 6 月 29 日

LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language

Ming Wang,Yuanzhong Liu,Xiaoyu Liang,Songlian Li,Yijie Huang,Xiaoming Zhang,Sijia Shen,Chaofeng Guan,Daling Wang,Shi Feng,Huaiwen Zhang,Yifei Zhang,Minghui Zheng,Chi Zhang

LLMs have demonstrated commendable performance across diverse domains. Nevertheless, formulating high-quality prompts to instruct LLMs proficiently poses a challenge for non-AI experts. Existing research in prompt engineering suggests somewhat scattered optimization principles and designs empirically dependent prompt optimizers. Unfortunately, these endeavors lack a structured design template, incurring high learning costs and resulting in low reusability. In addition, it is not conducive to the iterative updating of prompts. Inspired by structured reusable programming languages, we propose LangGPT, a dual-layer prompt design framework as the programming language for LLMs. LangGPT has an easy-to-learn normative structure and provides an extended structure for migration and reuse. Experiments illustrate that LangGPT significantly enhances the performance of LLMs. Moreover, the case study shows that LangGPT leads LLMs to generate higher-quality responses. Furthermore, we analyzed the ease of use and reusability of LangGPT through a user survey in our online community.

Vision · Integration · 長短期記憶網絡 · MoDELS · Extensibility ·

2024 年 6 月 29 日

VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting

Yujin Tang,Peijie Dong,Zhenheng Tang,Xiaowen Chu,Junwei Liang

from arxiv, CVPR2024 Precognition Workshop

Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics. However, modeling extensive global information remains a formidable challenge; CNNs are limited by their narrow receptive fields, and ViTs struggle with the intensive computational demands of their attention mechanisms. The emergence of recent Mamba-based architectures has been met with enthusiasm for their exceptional long-sequence modeling capabilities, surpassing established vision models in efficiency and accuracy, which motivates us to develop an innovative architecture tailored for spatiotemporal forecasting. In this paper, we propose the VMRNN cell, a new recurrent unit that integrates the strengths of Vision Mamba blocks with LSTM. We construct a network centered on VMRNN cells to tackle spatiotemporal prediction tasks effectively. Our extensive evaluations show that our proposed approach secures competitive results on a variety of tasks while maintaining a smaller model size. Our code is available at //github.com/yyyujintang/VMRNN-PyTorch.

優化器 · 大語言模型 · 縮放 · MoDELS · Learning ·

2024 年 6 月 28 日

ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting

Rui Pan,Jipeng Zhang,Xingyuan Pan,Renjie Pi,Xiaoyu Wang,Tong Zhang

Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms emerged, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particularly in the context of large language models (LLMs). This paper introduces the first scalable instantiation of this paradigm called ScaleBiO, focusing on bilevel optimization for large-scale LLM data reweighting. By combining with a recently proposed memory-efficient training technique called LISA, our novel algorithm allows the paradigm to scale to 34-billion-parameter LLMs on eight A40 GPUs, marking the first successful application of bilevel optimization under practical scenarios for large-sized LLMs. Empirically, extensive experiments on data reweighting verify the effectiveness of ScaleBiO for different-scaled models, including GPT-2, LLaMA-3-8B, GPT-NeoX-20B, and Yi-34B, where bilevel optimization succeeds in filtering irrelevant data samples and selecting informative samples. Theoretically, ScaleBiO ensures the optimality of the learned data weights, along with a convergence guarantee matching the conventional first-order bilevel optimization paradigm on smooth and strongly convex objectives.

知識 (knowledge) · Learning · 模糊邏輯 · 樣例 · Neural Networks ·

2024 年 6 月 28 日

ULLER: A Unified Language for Learning and Reasoning

Emile van Krieken,Samy Badreddine,Robin Manhaeve,Eleonora Giunchiglia

from arxiv, Accepted at NeSy 2024

The field of neuro-symbolic artificial intelligence (NeSy), which combines learning and reasoning, has recently experienced significant growth. There now are a wide variety of NeSy frameworks, each with its own specific language for expressing background knowledge and how to relate it to neural networks. This heterogeneity hinders accessibility for newcomers and makes comparing different NeSy frameworks challenging. We propose a language for NeSy, which we call ULLER, a Unfied Language for LEarning and Reasoning. ULLER encompasses a wide variety of settings, while ensuring that knowledge described in it can be used in existing NeSy systems. ULLER has a first-order logic syntax specialised for NeSy for which we provide example semantics including classical FOL, fuzzy logic, and probabilistic logic. We believe ULLER is a first step towards making NeSy research more accessible and comparable, paving the way for libraries that streamline training and evaluation across a multitude of semantics, knowledge bases, and NeSy systems.

語言模型化 · 大語言模型 · MoDELS · Prompt · 數據集 ·

2024 年 6 月 28 日

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

Xiaoxuan Wang,Ziniu Hu,Pan Lu,Yanqiao Zhu,Jieyu Zhang,Satyen Subramaniam,Arjun R. Loomba,Shichang Zhang,Yizhou Sun,Wei Wang

from arxiv, To appear at ICML 2024

Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations. To systematically examine the reasoning capabilities required for solving complex scientific problems, we introduce an expansive benchmark suite SciBench for LLMs. SciBench contains a carefully curated dataset featuring a range of collegiate-level scientific problems from mathematics, chemistry, and physics domains. Based on the dataset, we conduct an in-depth benchmarking study of representative open-source and proprietary LLMs with various prompting strategies. The results reveal that the current LLMs fall short of delivering satisfactory performance, with the best overall score of merely 43.22%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms the others and some strategies that demonstrate improvements in certain problem-solving skills could result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.

成比例 · 表示 · 秩 · Facebook AI Research · 情景 ·

2024 年 6 月 28 日

Committee Monotonic Proportional Representation: A New Voting Rule and Impossibility Results

Haris Aziz,Patrick Lederer,Angus Ritossa

We study committee voting rules under ranked preferences, which map the voters' preference relations to a subset of the alternatives of predefined size. In this setting, the compatibility between proportional representation and committee monotonicity is a fundamental open problem that has been mentioned in several works. We design a new multi-winner voting rule called the Solid Coalition Refinement (SCR) Rule that simultaneously satisfies committee monotonicity and Dummett's PSC as well as one of its variants called inclusion PSC. This is the first rule known to satisfy both of these properties. Moreover, we show that this is effectively the best that we can hope for as other fairness notions adapted from approval voting such as Rank-JR and Rank-PJR+ are incompatible with committee monotonicity.

控制器 · 多樣性 · MoDELS · Extensibility · 可理解性 ·

2024 年 6 月 28 日

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation

Yanan Sun,Yanchen Liu,Yinhao Tang,Wenjie Pei,Kai Chen

The field of text-to-image (T2I) generation has made significant progress in recent years, largely driven by advancements in diffusion models. Linguistic control enables effective content creation, but struggles with fine-grained control over image generation. This challenge has been explored, to a great extent, by incorporating additional user-supplied spatial conditions, such as depth maps and edge maps, into pre-trained T2I models through extra encoding. However, multi-control image synthesis still faces several challenges. Specifically, current approaches are limited in handling free combinations of diverse input control signals, overlook the complex relationships among multiple spatial conditions, and often fail to maintain semantic alignment with provided textual prompts. This can lead to suboptimal user experiences. To address these challenges, we propose AnyControl, a multi-control image synthesis framework that supports arbitrary combinations of diverse control signals. AnyControl develops a novel Multi-Control Encoder that extracts a unified multi-modal embedding to guide the generation process. This approach enables a holistic understanding of user inputs, and produces high-quality, faithful results under versatile control signals, as demonstrated by extensive quantitative and qualitative evaluations. Our project page is available in //any-control.github.io.

可辨認的 · Performer · 相似度 · 示例 · 類別 ·

2024 年 6 月 27 日

Leakage-Resilient Hardness Equivalence to Logspace Derandomization

Yakov Shalunov

from arxiv, 19 pages

Efficient derandomization has long been a goal in complexity theory, and a major recent result by Yanyi Liu and Rafael Pass identifies a new class of hardness assumption under which it is possible to perform time-bounded derandomization efficiently: that of ''leakage-resilient hardness.'' They identify a specific form of this assumption which is $\textit{equivalent}$ to $\mathsf{prP} = \mathsf{prBPP}$. In this paper, we pursue an equivalence to derandomization of $\mathsf{prBP{\cdot}L}$ (logspace promise problems with two-way randomness) through techniques analogous to Liu and Pass. We are able to obtain an equivalence between a similar ''leakage-resilient hardness'' assumption and a slightly stronger statement than derandomization of $\mathsf{prBP{\cdot}L}$, that of finding ''non-no'' instances of ''promise search problems.''

Machine Learning · 近似 · Learning · Extensibility · 樣本均值 ·

2024 年 6 月 27 日

Credit Ratings: Heterogeneous Effect on Capital Structure

Helmut Wasserbacher,Martin Spindler

from arxiv, 288 pages, 13 figures

Why do companies choose particular capital structures? A compelling answer to this question remains elusive despite extensive research. In this article, we use double machine learning to examine the heterogeneous causal effect of credit ratings on leverage. Taking advantage of the flexibility of random forests within the double machine learning framework, we model the relationship between variables associated with leverage and credit ratings without imposing strong assumptions about their functional form. This approach also allows for data-driven variable selection from a large set of individual company characteristics, supporting valid causal inference. We report three findings: First, credit ratings causally affect the leverage ratio. Having a rating, as opposed to having none, increases leverage by approximately 7 to 9 percentage points, or 30\% to 40\% relative to the sample mean leverage. However, this result comes with an important caveat, captured in our second finding: the effect is highly heterogeneous and varies depending on the specific rating. For AAA and AA ratings, the effect is negative, reducing leverage by about 5 percentage points. For A and BBB ratings, the effect is approximately zero. From BB ratings onwards, the effect becomes positive, exceeding 10 percentage points. Third, contrary to what the second finding might imply at first glance, the change from no effect to a positive effect does not occur abruptly at the boundary between investment and speculative grade ratings. Rather, it is gradual, taking place across the granular rating notches ("+/-") within the BBB and BB categories.

Performer · ILP · 預測器/決策函數 · 基準 · 相同 ·

2024 年 6 月 26 日

Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution

Rahul Bera,Adithya Ranganathan,Joydeep Rakshit,Sujit Mahto,Anant V. Nori,Jayesh Gaur,Ataberk Olgun,Konstantinos Kanellopoulos,Mohammad Sadrosadati,Sreenivas Subramoney,Onur Mutlu

from arxiv, To appear in the proceedings of 51st International Symposium on Computer Architecture (ISCA)

Load instructions often limit instruction-level parallelism (ILP) in modern processors due to data and resource dependences they cause. Prior techniques like Load Value Prediction (LVP) and Memory Renaming (MRN) mitigate load data dependence by predicting the data value of a load instruction. However, they fail to mitigate load resource dependence as the predicted load instruction gets executed nonetheless. Our goal in this work is to improve ILP by mitigating both load data dependence and resource dependence. To this end, we propose a purely-microarchitectural technique called Constable, that safely eliminates the execution of load instructions. Constable dynamically identifies load instructions that have repeatedly fetched the same data from the same load address. We call such loads likely-stable. For every likely-stable load, Constable (1) tracks modifications to its source architectural registers and memory location via lightweight hardware structures, and (2) eliminates the execution of subsequent instances of the load instruction until there is a write to its source register or a store or snoop request to its load address. Our extensive evaluation using a wide variety of 90 workloads shows that Constable improves performance by 5.1% while reducing the core dynamic power consumption by 3.4% on average over a strong baseline system that implements MRN and other dynamic instruction optimizations (e.g., move and zero elimination, constant and branch folding). In presence of 2-way simultaneous multithreading (SMT), Constable's performance improvement increases to 8.8% over the baseline system. When combined with a state-of-the-art load value predictor (EVES), Constable provides an additional 3.7% and 7.8% average performance benefit over the load value predictor alone, in the baseline system without and with 2-way SMT, respectively.