久久久久久久精品少妇9999,国产亚洲欧美丝袜在线观看三区,国产精品人人爱一区二区白浆,丁香婷婷综合激情国产

We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads, and these insights also provide natural suggestions for alternative architectures.

相關內容

變換

關注 2

可辨認的 · 潛變量/隱變量 · 潛在 · MoDELS · 離散化 ·

2024 年 3 月 19 日

Blessing of Dependence: Identifiability and Geometry of Discrete Models with Multiple Binary Latent Variables

Yuqi Gu

from arxiv, To appear in Bernoulli

Identifiability of discrete statistical models with latent variables is known to be challenging to study, yet crucial to a model's interpretability and reliability. This work presents a general algebraic technique to investigate identifiability of discrete models with latent and graphical components. Specifically, motivated by diagnostic tests collecting multivariate categorical data, we focus on discrete models with multiple binary latent variables. We consider the BLESS model in which the latent variables can have arbitrary dependencies among themselves while the latent-to-observed measurement graph takes a "star-forest" shape. We establish necessary and sufficient graphical criteria for identifiability, and reveal an interesting and perhaps surprising geometry of blessing-of-dependence: under the minimal conditions for generic identifiability, the parameters are identifiable if and only if the latent variables are not statistically independent. Thanks to this theory, we can perform formal hypothesis tests of identifiability in the boundary case by testing marginal independence of the observed variables. In addition to the BLESS model, we also use the technique to show identifiability and the blessing-of-dependence geometry for a more flexible model, which has a general measurement graph beyond a start forest. Our results give new understanding of statistical properties of graphical models with latent variables. They also entail useful implications for designing diagnostic tests or surveys that measure binary latent traits.

回合 · 穩健性 · Things · 網絡結構 ·

2024 年 3 月 19 日

Developing Algorithms for the Internet of Flying Things Through Environments With Varying Degrees of Realism

Thiago de Souza Lamenza,Josef Kamysek,Bruno Jose Olivieri de Souza,Markus Endler

from arxiv, 11 pages

This work discusses the benefits of having multiple simulated environments with different degrees of realism for the development of algorithms in scenarios populated by autonomous nodes capable of communication and mobility. This approach aids the development experience and generates robust algorithms. It also proposes GrADyS-SIM NextGen as a solution that enables development on a single programming language and toolset over multiple environments with varying levels of realism. Finally, we illustrate the usefulness of this approach with a toy problem that makes use of the simulation framework, taking advantage of the proposed environments to iteratively develop a robust solution.

MoDELS · Processing（編程語言） · 情景 · 樣本 · 參數化模型 ·

2024 年 3 月 18 日

Probabilistic Modeling for Sequences of Sets in Continuous-Time

Yuxin Chang,Alex Boyd,Padhraic Smyth

from arxiv, Oral presentation at AISTATS 2024

Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a "mark") -- but such models are not suited for the practical situation where each event is associated with a set of items. In this work, we develop a general framework for modeling set-valued data in continuous-time, compatible with any intensity-based recurrent neural point process model. In addition, we develop inference methods that can use such models to answer probabilistic queries such as "the probability of item $A$ being observed before item $B$," conditioned on sequence history. Computing exact answers for such queries is generally intractable for neural models due to both the continuous-time nature of the problem setting and the combinatorially-large space of potential outcomes for each event. To address this, we develop a class of importance sampling methods for querying with set-based sequences and demonstrate orders-of-magnitude improvements in efficiency over direct sampling via systematic experiments with four real-world datasets. We also illustrate how to use this framework to perform model selection using likelihoods that do not involve one-step-ahead prediction.

正則化項 · 隨機梯度下降 · 估計/估計量 · Performer · 值域 ·

2024 年 3 月 18 日

On the Convergence of A Data-Driven Regularized Stochastic Gradient Descent for Nonlinear Ill-Posed Problems

Zehui Zhou

from arxiv, 41 pages, 1 figure

Stochastic gradient descent (SGD) is a promising method for solving large-scale inverse problems, due to its excellent scalability with respect to data size. In this work, we analyze a new data-driven regularized stochastic gradient descent for the efficient numerical solution of a class of nonlinear ill-posed inverse problems in infinite dimensional Hilbert spaces. At each step of the iteration, the method randomly selects one equation from the nonlinear system combined with a corresponding equation from the learned system based on training data to obtain a stochastic estimate of the gradient and then performs a descent step with the estimated gradient. We prove the regularizing property of this method under the tangential cone condition and a priori parameter choice and then derive the convergence rates under the additional source condition and range invariance conditions. Several numerical experiments are provided to complement the analysis.

可約的 · 講稿 · 示例 ·

2024 年 3 月 16 日

Efficient Algorithms for Complexes of Persistence Modules with Applications

Tamal K. Dey,Florian Russold,Shreyas N. Samaga

from arxiv, This is the full version of a paper accepted at the 40th International Symposium on Computational Geometry (SoCG 2024)

We extend the persistence algorithm, viewed as an algorithm computing the homology of a complex of free persistence or graded modules, to complexes of modules that are not free. We replace persistence modules by their presentations and develop an efficient algorithm to compute the homology of a complex of presentations. To deal with inputs that are not given in terms of presentations, we give an efficient algorithm to compute a presentation of a morphism of persistence modules. This allows us to compute persistent (co)homology of instances giving rise to complexes of non-free modules. Our methods lead to a new efficient algorithm for computing the persistent homology of simplicial towers and they enable efficient algorithms to compute the persistent homology of cosheaves over simplicial towers and cohomology of persistent sheaves on simplicial complexes. We also show that we can compute the cohomology of persistent sheaves over arbitrary finite posets by reducing the computation to a computation over simplicial complexes.

原點 · 方陣 · 秩 · 近似 · 控制器 ·

2024 年 3 月 15 日

Supplement Matrix and a Practical Method for Computing Eigenvalues of a Dual Hermitian Matrix

Liqun Qi,Chunfeng Cui

We study dual number symmetric matrices, dual complex Hermitian matrices and dual quaternion Hermitian matrices in a unified frame of dual Hermitian matrices. Suppose we have a ring, which can be the real field, the complex field, or the quaternion ring. Then an $n \times n$ dual Hermitian matrix has $n$ dual number eigenvalues. We define supplement matrices for a dual Hermitian matrix. Supplement matrices are Hermitian matrices in the original ring. The standard parts of the eigenvalues of that dual Hermitian matrix are the eigenvalues of the standard part Hermitian matrix in the original ring, while the dual parts of the eigenvalues of that dual Hermitian matrix are the eigenvalues of those {supplement} matrices. Hence, by apply any practical method for computing eigenvalues of Hermitian matrices in the original ring, we have a practical method for computing eigenvalues of a dual Hermitian matrix. We call this method the supplement matrix method. Applications to low rank approximation and generalized inverses of dual matrices, dual least squares problem and formation control are discussed. Numerical experiments are reported.

MoDELS · 邊緣分布 · 邊緣化 · 秩 · 應用統計 ·

2024 年 3 月 15 日

Probabilistic Models of Profiles for Voting by Evaluation

Antoine Rolland,Jean-Baptiste Aubin,Irène Gannaz,Samuela Leoni

Considering voting rules based on evaluation inputs rather than preference rankings modifies the paradigm of probabilistic studies of voting procedures. This article proposes several simulation models for generating evaluation-based voting inputs. These models can cope with dependent and non identical marginal distributions of the evaluations received by the candidates. A last part is devoted to fitting these models to real data sets.

Performer · MoDELS · 語言模型化 · 大語言模型 · MINE ·

2024 年 3 月 15 日

Exploring the Potential of Large Language Models in Computational Argumentation

Guizhen Chen,Liying Cheng,Luu Anh Tuan,Lidong Bing

from arxiv, 20 pages, 3 figures

Computational argumentation has become an essential tool in various fields, including artificial intelligence, law, and public policy. It is an emerging research field in natural language processing that attracts increasing attention. Research on computational argumentation mainly involves two types of tasks: argument mining and argument generation. As large language models have demonstrated strong abilities in understanding context and generating natural language, it is worthwhile to evaluate the performance of LLMs on various computational argumentation tasks. This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models and LLaMA2 models, under zero-shot and few-shot settings within the realm of computational argumentation. We organize existing tasks into six main categories and standardise the format of fourteen open-sourced datasets. In addition, we present a new benchmark dataset on counter speech generation, that aims to holistically evaluate the end-to-end performance of LLMs on argument mining and argument generation. Extensive experiments show that LLMs exhibit commendable performance across most of these datasets, demonstrating their capabilities in the field of argumentation. Our analysis offers valuable suggestions for evaluating computational argumentation and its integration with LLMs in future research endeavors.

平穩的 · 情景 · 線性的 · 離散化 · MoDELS ·

2024 年 3 月 15 日

Formalization of Asymptotic Convergence for Stationary Iterative Methods

Mohit Tekriwal,Joshua Miller,Jean-Baptiste Jeannin

from arxiv, This paper has been accepted for publication at the NFM, 2024 conference

Solutions to differential equations, which are used to model physical systems, are computed numerically by solving a set of discretized equations. This set of discretized equations is reduced to a large linear system, whose solution is typically found using an iterative solver. We start with an initial guess, $x_0$, and iterate the algorithm to obtain a sequence of solution vectors, $x_k$, which are approximations to the exact solution of the linear system, $x$. The iterative algorithm is said to converge to $x$, in the field of reals, if and only if $x_k$ converges to $x$ in the limit of $k \to \infty$. In this paper, we formally prove the asymptotic convergence of a particular class of iterative methods called the stationary iterative methods, in the Coq theorem prover. We formalize the necessary and sufficient conditions required for the iterative convergence, and extend this result to two classical iterative methods: the Gauss--Seidel method and the Jacobi method. For the Gauss--Seidel method, we also formalize a set of easily testable conditions for iterative convergence, called the Reich theorem, for a particular matrix structure, and apply this on a model problem of the one-dimensional heat equation. We also apply the main theorem of iterative convergence to prove convergence of the Jacobi method on the model problem.

INFORMS · 圖 · 可約的 · 知識圖譜 · 可辨認的 ·

2018 年 8 月 29 日

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Yi Luan,Luheng He,Mari Ostendorf,Hannaneh Hajishirzi

We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.