黄色视频在线观看男人插女人的视频在线观看_高中小鲜肉自慰GAY免费_欧美日韩视频在线第一区二区三区_印度人交乣女BBW_亚洲欧洲日产国码无码AV专区_人人射人人干人人插_欧美久久免费频精品99一

We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads. These theoretical insights are validated experimentally and offer natural suggestions for alternative architectures.

相關內容

變換

關注 2

訓練數據 · MoDELS · 泛函 · 優化器 · 數據點 ·

2024 年 12 月 12 日

Capturing the Temporal Dependence of Training Data Influence

Jiachen T. Wang,Dawn Song,James Zou,Prateek Mittal,Ruoxi Jia

from arxiv, Correspondence to Jiachen T. Wang and Ruoxi Jia

Traditional data influence estimation methods, like influence function, assume that learning algorithms are permutation-invariant with respect to training data. However, modern training paradigms, especially for foundation models using stochastic algorithms and multi-stage curricula, are sensitive to data ordering, thus violating this assumption. This mismatch renders influence functions inadequate for answering a critical question in machine learning: How can we capture the dependence of data influence on the optimization trajectory during training? To address this gap, we formalize the concept of trajectory-specific leave-one-out (LOO) influence, which quantifies the impact of removing a data point from a specific iteration during training, accounting for the exact sequence of data encountered and the model's optimization trajectory. However, exactly evaluating the trajectory-specific LOO presents a significant computational challenge. To address this, we propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO. Specifically, we compute a training data embedding that encapsulates the cumulative interactions between data and the evolving model parameters. The LOO can then be efficiently approximated through a simple dot-product between the data value embedding and the gradient of the given test data. As data value embedding captures training data ordering, it offers valuable insights into model training dynamics. In particular, we uncover distinct phases of data influence, revealing that data points in the early and late stages of training exert a greater impact on the final model. These insights translate into actionable strategies for managing the computational overhead of data selection by strategically timing the selection process, potentially opening new avenues in data curation research.

Performer · 語言模型化 · 可辨認的 · CASES · 分解的 ·

2024 年 12 月 12 日

Exploring the Limitations of Detecting Machine-Generated Text

Jad Doughman,Osama Mohammed Afzal,Hawau Olamide Toyin,Shady Shehata,Preslav Nakov,Zeerak Talat

from arxiv, Accepted to COLING 2025

Recent improvements in the quality of the generations by large language models have spurred research into identifying machine-generated text. Such work often presents high-performing detectors. However, humans and machines can produce text in different styles and domains, yet the performance impact of such on machine generated text detection systems remains unclear. In this paper, we audit the classification performance for detecting machine-generated text by evaluating on texts with varying writing styles. We find that classifiers are highly sensitive to stylistic changes and differences in text complexity, and in some cases degrade entirely to random classifiers. We further find that detection systems are particularly susceptible to misclassify easy-to-read texts while they have high performance for complex texts, leading to concerns about the reliability of detection systems. We recommend that future work attends to stylistic factors and reading difficulty levels of human-written and machine-generated text.

MoDELS · Extensibility · Learning · 代碼 · 輸出層 ·

2024 年 12 月 11 日

Compression of Higher Order Ambisonics with Multichannel RVQGAN

Toni Hirvonen,Mahmoud Namazi

A multichannel extension to the RVQGAN neural coding method is proposed, and realized for data-driven compression of third-order Ambisonics audio. The input- and output layers of the generator and discriminator models are modified to accept multiple (16) channels without increasing the model bitrate. We also propose a loss function for accounting for spatial perception in immersive reproduction, and transfer learning from single-channel models. Listening test results with 7.1.4 immersive playback show that the proposed extension is suitable for coding scene-based, 16-channel Ambisonics content with good quality at 16 kbps when trained and tested on the EigenScape database. The model has potential applications for learning other types of content and multichannel formats.

表示 · Principle · 成比例 · 原點 · 數據庫 ·

2024 年 12 月 11 日

A Principled Solution to the Disjunction Problem of Diagrammatic Query Representations

Wolfgang Gatterbauer

from arxiv, 41 pages, 27 figures

Finding unambiguous diagrammatic representations for first-order logical formulas and relational queries with arbitrarily nested disjunctions has been a surprisingly long-standing unsolved problem. We refer to this problem as the disjunction problem (of diagrammatic query representations). This work solves the disjunction problem. Our solution unifies, generalizes, and overcomes the shortcomings of prior approaches for disjunctions. It extends the recently proposed Relational Diagrams and is identical for disjunction-free queries. However, it can preserve the relational patterns and the safety for all well-formed Tuple Relational Calculus (TRC) queries, even with arbitrary disjunctions. Additionally, its size is proportional to the original TRC query and can thus be exponentially more succinct than Relational Diagrams.

Learning · MoDELS · 可辨認的 · 潛在 · 生成模型 ·

2024 年 12 月 11 日

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Core Francisco Park,Maya Okawa,Andrew Lee,Hidenori Tanaka,Ekdeep Singh Lubana

from arxiv, NeurIPS 2024 (Spotlight)

Modern generative models demonstrate impressive capabilities, likely stemming from an ability to identify and manipulate abstract concepts underlying their training data. However, fundamental questions remain: what determines the concepts a model learns, the order in which it learns them, and its ability to manipulate those concepts? To address these questions, we propose analyzing a model's learning dynamics via a framework we call the concept space, where each axis represents an independent concept underlying the data generating process. By characterizing learning dynamics in this space, we identify how the speed at which a concept is learned, and hence the order of concept learning, is controlled by properties of the data we term concept signal. Further, we observe moments of sudden turns in the direction of a model's learning dynamics in concept space. Surprisingly, these points precisely correspond to the emergence of hidden capabilities, i.e., where latent interventions show the model possesses the capability to manipulate a concept, but these capabilities cannot yet be elicited via naive input prompting. While our results focus on synthetically defined toy datasets, we hypothesize a general claim on emergence of hidden capabilities may hold: generative models possess latent capabilities that emerge suddenly and consistently during training, though a model might not exhibit these capabilities under naive input prompting.

線性的 · 邊 · 有偏 · 線性回歸 · 通用動力公司 ·

2024 年 12 月 11 日

Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

Peiyuan Zhang,Amin Karbasi

from arxiv, 31 pages, 9 figures

Classical optimization theory requires a small step-size for gradient-based methods to converge. Nevertheless, recent findings challenge the traditional idea by empirically demonstrating Gradient Descent (GD) converges even when the step-size $\eta$ exceeds the threshold of $2/L$, where $L$ is the global smooth constant. This is usually known as the Edge of Stability (EoS) phenomenon. A widely held belief suggests that an objective function with subquadratic growth plays an important role in incurring EoS. In this paper, we provide a more comprehensive answer by considering the task of finding linear interpolator $\beta \in R^{d}$ for regression with loss function $l(\cdot)$, where $\beta$ admits parameterization as $\beta = w^2_{+} - w^2_{-}$. Contrary to the previous work that suggests a subquadratic $l$ is necessary for EoS, our novel finding reveals that EoS occurs even when $l$ is quadratic under proper conditions. This argument is made rigorous by both empirical and theoretical evidence, demonstrating the GD trajectory converges to a linear interpolator in a non-asymptotic way. Moreover, the model under quadratic $l$, also known as a depth-$2$ diagonal linear network, remains largely unexplored under the EoS regime. Our analysis then sheds some new light on the implicit bias of diagonal linear networks when a larger step-size is employed, enriching the understanding of EoS on more practical models.

MoDELS · 相互獨立的 · SR · 操作 · Performer ·

2024 年 12 月 10 日

TorchSISSO: A PyTorch-Based Implementation of the Sure Independence Screening and Sparsifying Operator for Efficient and Interpretable Model Discovery

Madhav Muthyala,Farshud Sorourifar,Joel A. Paulson

Symbolic regression (SR) is a powerful machine learning approach that searches for both the structure and parameters of algebraic models, offering interpretable and compact representations of complex data. Unlike traditional regression methods, SR explores progressively complex feature spaces, which can uncover simple models that generalize well, even from small datasets. Among SR algorithms, the Sure Independence Screening and Sparsifying Operator (SISSO) has proven particularly effective in the natural sciences, helping to rediscover fundamental physical laws as well as discover new interpretable equations for materials property modeling. However, its widespread adoption has been limited by performance inefficiencies and the challenges posed by its FORTRAN-based implementation, especially in modern computing environments. In this work, we introduce TorchSISSO, a native Python implementation built in the PyTorch framework. TorchSISSO leverages GPU acceleration, easy integration, and extensibility, offering a significant speed-up and improved accuracy over the original. We demonstrate that TorchSISSO matches or exceeds the performance of the original SISSO across a range of tasks, while dramatically reducing computational time and improving accessibility for broader scientific applications.

MoDELS · 隨機場 · 泛函 · MINE · 推斷 ·

2024 年 12 月 10 日

Dual Random Fields and their Application to Mineral Potential Mapping

álvaro I. Riquelme

In various geosciences branches, including mineral exploration, geometallurgical characterization on established mining operations, and remote sensing, the regionalized input variables are spatially well-sampled across the domain of interest, limiting the scope of spatial uncertainty quantification procedures. In turn, response outcomes such as the mineral potential in a given region, mining throughput, metallurgical recovery, or in-situ estimations from remote satellite imagery, are usually modeled from a much-restricted subset of testing samples, collected at certain locations due to accessibility restrictions and the high acquisition costs. Our limited understanding of these functions, in terms of the multi-dimensional complexity of causalities and unnoticed dependencies on inaccessible inputs, may lead to observing changes in such functions based on their geographical location. Pooling together different response functions across the domain is critical to correctly predict outcome responses, the uncertainty associated with these inferred values, and the significance of inputs in such predictions at unexplored areas. This paper introduces the notion of a dual random field (dRF), where the response function itself is considered a regionalized variable. In this way, different established response models across the geographic domain can be considered as observations of a dRF realization, enabling the spatial inference and uncertainty assessment of both response models and their predictions. We explain how dRFs inherit all the properties from classical random fields, allowing the use of standard Gaussian simulation procedures to simulate them. These models are combined to obtain a mineral potential response, providing an example of how to rigorously integrate machine learning approaches with geostatistics.

矩 · 離散化 · 近似 · 情景 · 噪聲 ·

2024 年 12 月 10 日

Strong Convergence of a Splitting Method for the Stochastic Complex Ginzburg-Landau Equation

Marvin Jans,Gabriel J. Lord,Mariya Ptashnyk

from arxiv, 22 pages, 8 figures

We consider the numerical approximation of the stochastic complex Ginzburg-Landau equation with additive noise on the one dimensional torus. The complex nature of the equation means that many of the standard approaches developed for stochastic partial differential equations can not be directly applied. We use an energy approach to prove an existence and uniqueness result as well to obtain moment bounds on the stochastic PDE before introducing our numerical discretization. For such a well studied deterministic equation it is perhaps surprising that its numerical approximation in the stochastic setting has not been considered before. Our method is based on a spectral discretization in space and a Lie-Trotter splitting method in time. We obtain moment bounds for the numerical method before proving our main result: strong convergence on a set of arbitrarily large probability. From this we obtain a result on convergence in probability. We conclude with some numerical experiments that illustrate the effectiveness of our method.

INFORMS · 圖 · 可約的 · 知識圖譜 · 可辨認的 ·

2018 年 8 月 29 日

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Yi Luan,Luheng He,Mari Ostendorf,Hannaneh Hajishirzi

We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.