久草精品视频在线观看-国产精品毛片久久久久

The task of natural language inference (NLI) asks whether a given premise (expressed in NL) entails a given NL hypothesis. NLI benchmarks contain human ratings of entailment, but the meaning relationships driving these ratings are not formalized. Can the underlying sentence pair relationships be made more explicit in an interpretable yet robust fashion? We compare semantic structures to represent premise and hypothesis, including sets of contextualized embeddings and semantic graphs (Abstract Meaning Representations), and measure whether the hypothesis is a semantic substructure of the premise, utilizing interpretable metrics. Our evaluation on three English benchmarks finds value in both contextualized embeddings and semantic graphs; moreover, they provide complementary signals, and can be leveraged together in a hybrid model.

相關內容

穩健性

關注 3

近似 · SGD · 隨機梯度下降 · Subspace · Continuity ·

2023 年 10 月 20 日

Numerical approximation of McKean-Vlasov SDEs via stochastic gradient descent

Ankush Agarwal,Andrea Amato,Goncalo dos Reis,Stefano Pagliarani

from arxiv, 25 pages, 6 figures

We propose a novel approach to numerically approximate McKean-Vlasov stochastic differential equations (MV-SDE) using stochastic gradient descent (SGD) while avoiding the use of interacting particle systems. The technique of SGD is deployed to solve a Euclidean minimization problem, which is obtained by first representing the MV-SDE as a minimization problem over the set of continuous functions of time, and then by approximating the domain with a finite-dimensional subspace. Convergence is established by proving certain intermediate stability and moment estimates of the relevant stochastic processes (including the tangent ones). Numerical experiments illustrate the competitive performance of our SGD based method compared to the IPS benchmarks. This work offers a theoretical foundation for using the SGD method in the context of numerical approximation of MV-SDEs, and provides analytical tools to study its stability and convergence.

可理解性 · 語言模型化 · Performer · MoDELS · Seven ·

2023 年 10 月 20 日

GPT4Table: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

Yuan Sui,Mengyu Zhou,Mingjie Zhou,Shi Han,Dongmei Zhang

from arxiv, This paper has been accepted as a full paper at WSDM 2024

Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, there is still much to learn about how well LLMs understand structured data, such as tables. While it is true that tables can be used as inputs to LLMs with serialization, there lack of comprehensive studies examining whether LLMs can truly comprehend such data. In this paper, we try to understand this by designing a benchmark to evaluate the structural understanding capabilities (SUC) of LLMs. The benchmark we create includes seven tasks, each with its own unique challenges, \eg, cell lookup, row retrieval, and size detection. We run a series of evaluations on GPT-3.5 and GPT-4. We discover that the performance varied depending on a number of input choices, including table input format, content order, role prompting, and partition marks. Drawing from the insights gained through the benchmark evaluations, we then propose \textit{self-augmentation} for effective structural prompting, \eg, critical value / range identification using LLMs' internal knowledge. When combined with carefully chosen input choices, these structural prompting methods lead to promising improvements in LLM performance on a variety of tabular tasks, \eg, TabFact($\uparrow2.31\%$), HybridQA($\uparrow2.13\%$), SQA($\uparrow2.72\%$), Feverous($\uparrow0.84\%$), and ToTTo($\uparrow5.68\%$). We believe that our benchmark and proposed prompting methods can serve as a simple yet generic selection for future research. The code and data are released in \url{//anonymous.4open.science/r/StructuredLLM-76F3}.

優化器 · UniFormer · CASES · 層 · Processing（編程語言） ·

2023 年 10 月 20 日

Uniform convergence of optimal order of a local discontinuous Galerkin method on a Shishkin mesh under a balanced norm

Xiaoqi Ma,Jin Zhang,Wenchao Zheng

This article investigates a local discontinuous Galerkin (LDG) method for one-dimensional and two-dimensional singularly perturbed reaction-diffusion problems on a Shishkin mesh. During this process, due to the inability of the energy norm to fully capture the behavior of the boundary layers appearing in the solutions, a balanced norm is introduced. By designing novel numerical fluxes and constructing special interpolations, optimal convergences under the balanced norm are achieved in both 1D and 2D cases. Numerical experiments support the main theoretical conclusions.

Networking · Learning · 優化器 · 不變 · 可辨認的 ·

2023 年 10 月 19 日

How a student becomes a teacher: learning and forgetting through Spectral methods

Lorenzo Giambagli,Lorenzo Buffoni,Lorenzo Chicchi,Duccio Fanelli

from arxiv, 10 pages + references + supplemental material. Poster presentation at NeurIPS 2023

In theoretical ML, the teacher-student paradigm is often employed as an effective metaphor for real-life tuition. The above scheme proves particularly relevant when the student network is overparameterized as compared to the teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network. This latter should be to some extent reminiscent of the frozen teacher structure, according to suitable metrics, while being approximately invariant across different architectures of the student candidate network. Unfortunately, state-of-the-art conventional learning techniques could not help in identifying the existence of such an invariant subnetwork, due to the inherent degree of non-convexity that characterizes the examined problem. In this work, we take a leap forward by proposing a radically different optimization scheme which builds on a spectral representation of the linear transfer of information between layers. The gradient is hence calculated with respect to both eigenvalues and eigenvectors with negligible increase in terms of computational and complexity load, as compared to standard training algorithms. Working in this framework, we could isolate a stable student substructure, that mirrors the true complexity of the teacher in terms of computing neurons, path distribution and topological attributes. When pruning unimportant nodes of the trained student, as follows a ranking that reflects the optimized eigenvalues, no degradation in the recorded performance is seen above a threshold that corresponds to the effective teacher size. The observed behavior can be pictured as a genuine second-order phase transition that bears universality traits.

蒙特卡羅 · Tensor · 流形 · 蒙特卡羅方法 · INFORMS ·

2023 年 10 月 19 日

Log-density gradient covariance and automatic metric tensors for Riemann manifold Monte Carlo methods

Tore Selland Kleppe

A metric tensor for Riemann manifold Monte Carlo particularly suited for non-linear Bayesian hierarchical models is proposed. The metric tensor is built from symmetric positive semidefinite log-density gradient covariance (LGC) matrices, which are also proposed and further explored here. The LGCs generalize the Fisher information matrix by measuring the joint information content and dependence structure of both a random variable and the parameters of said variable. Consequently, positive definite Fisher/LGC-based metric tensors may be constructed not only from the observation likelihoods as is current practice, but also from arbitrarily complicated non-linear prior/latent variable structures, provided the LGC may be derived for each conditional distribution used to construct said structures. The proposed methodology is highly automatic and allows for exploitation of any sparsity associated with the model in question. When implemented in conjunction with a Riemann manifold variant of the recently proposed numerical generalized randomized Hamiltonian Monte Carlo processes, the proposed methodology is highly competitive, in particular for the more challenging target distributions associated with Bayesian hierarchical models.

Prompt · 小樣本學習 · 語言模型化 · 知識 (knowledge) · 基 ·

2023 年 10 月 19 日

Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation

Yuanyuan Liang,Jianing Wang,Hanlun Zhu,Lei Wang,Weining Qian,Yunshi Lan

from arxiv, Accepted by EMNLP 2023 main conference

The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a natural language question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under low-resource scenarios urgently need to be developed. However, current methods heavily rely on annotated data for fine-tuning, which is not well-suited for few-shot question generation. The emergence of Large Language Models (LLMs) has shown their impressive generalization ability in few-shot tasks. Inspired by Chain-of-Thought (CoT) prompting, which is an in-context learning strategy for reasoning, we formulate KBQG task as a reasoning problem, where the generation of a complete question is splitted into a series of sub-question generation. Our proposed prompting method KQG-CoT first retrieves supportive logical forms from the unlabeled data pool taking account of the characteristics of the logical form. Then, we write a prompt to explicit the reasoning chain of generating complicated questions based on the selected demonstrations. To further ensure prompt quality, we extend KQG-CoT into KQG-CoT+ via sorting the logical forms by their complexity. We conduct extensive experiments over three public KBQG datasets. The results demonstrate that our prompting method consistently outperforms other prompting baselines on the evaluated datasets. Remarkably, our KQG-CoT+ method could surpass existing few-shot SoTA results of the PathQuestions dataset by 18.25, 10.72, and 10.18 absolute points on BLEU-4, METEOR, and ROUGE-L, respectively.

Subspace · 語言模型化 · MoDELS · 線性的 · 層 ·

2023 年 10 月 18 日

Investigating semantic subspaces of Transformer sentence embeddings through linear structural probing

Dmitry Nikolaev,Sebastian Padó

from arxiv, Accepted to BlackboxNLP 2023

The question of what kinds of linguistic information are encoded in different layers of Transformer-based language models is of considerable interest for the NLP community. Existing work, however, has overwhelmingly focused on word-level representations and encoder-only language models with the masked-token training objective. In this paper, we present experiments with semantic structural probing, a method for studying sentence-level representations via finding a subspace of the embedding space that provides suitable task-specific pairwise distances between data-points. We apply our method to language models from different families (encoder-only, decoder-only, encoder-decoder) and of different sizes in the context of two tasks, semantic textual similarity and natural-language inference. We find that model families differ substantially in their performance and layer dynamics, but that the results are largely model-size invariant.

泛函 · 情景 · 可辨認的 · MoDELS · 可約的 ·

2023 年 10 月 17 日

A Kripke-Lewis semantics for belief update and belief revision

Giacomo Bonanno

from arxiv, 37 pages

We provide a new characterization of both belief update and belief revision in terms of a Kripke-Lewis semantics. We consider frames consisting of a set of states, a Kripke belief relation and a Lewis selection function. Adding a valuation to a frame yields a model. Given a model and a state, we identify the initial belief set K with the set of formulas that are believed at that state and we identify either the updated belief set or the revised belief set, prompted by the input represented by formula A, as the set of formulas that are the consequent of conditionals that (1) are believed at that state and (2) have A as antecedent. We show that this class of models characterizes both the Katsuno-Mendelzon (KM) belief update functions and the AGM belief revision functions, in the following sense: (1) each model gives rise to a partial belief function that can be completed into a full KM/AGM update/revision function, and (2) for every KM/AGM update/revision function there is a model whose associated belief function coincides with it. The difference between update and revision can be reduced to two semantic properties that appear in a stronger form in revision relative to update, thus confirming the finding by Peppas et al. (1996) that, "for a fixed theory K, revising K is much the same as updating K"

MoDELS · 學成 · Next · Processing（編程語言） · Taxonomy ·

2021 年 8 月 12 日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Katikapalli Subramanyam Kalyan,Ajit Rajasekharan,Sivanesan Sangeetha

from arxiv, Preprint under review

Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. These models are built on the top of transformers, self-supervised learning and transfer learning. Transformed-based PTLMs learn universal language representations from large volumes of text data using self-supervised learning and transfer this knowledge to downstream tasks. These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch. In this comprehensive survey paper, we initially give a brief overview of self-supervised learning. Next, we explain various core concepts like pretraining, pretraining methods, pretraining tasks, embeddings and downstream adaptation methods. Next, we present a new taxonomy of T-PTLMs and then give brief overview of various benchmarks including both intrinsic and extrinsic. We present a summary of various useful libraries to work with T-PTLMs. Finally, we highlight some of the future research directions which will further improve these models. We strongly believe that this comprehensive survey paper will serve as a good reference to learn the core concepts as well as to stay updated with the recent happenings in T-PTLMs.

BERT · 語言表示 · state-of-the-art · 可理解性 · 自動問答 ·

2018 年 10 月 11 日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova

from arxiv, 13 pages

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.