久久久久久久精品少妇9999_日韩纯肉无遮挡一区二区视频_97人人模人人妻人人添_国产已婚妇女精油推拿按摩_亚洲一级特黄大片在线观看_国产区亚洲综合在线观看_久久久久久精品午夜福利

Transformers with linear attention (i.e., linear transformers) and state-space models have recently been suggested as a viable linear-time alternative to transformers with softmax attention. However, these models still underperform transformers especially on tasks that require in-context retrieval. While more expressive variants of linear transformers which replace the additive update in linear transformers with the delta rule (DeltaNet) have been found to be more effective at associative recall, existing algorithms for training such models do not parallelize over sequence length and are thus inefficient to train on modern hardware. This work describes a hardware-efficient algorithm for training linear transformers with the delta rule, which exploits a memory-efficient representation for computing products of Householder matrices. This algorithm allows us to scale up DeltaNet to standard language modeling settings. We train a 1.3B model for 100B tokens and find that it outperforms recent linear-time baselines such as Mamba and GLA in terms of perplexity and zero-shot performance on downstream tasks. We also experiment with two hybrid models which combine DeltaNet layers with (1) sliding-window attention layers every other layer or (2) two global attention layers, and find that these hybrids outperform strong transformer baselines.

相關內容

變換

關注 2

泛函 · 近似 · SPM · 路徑 · 表示 ·

2024 年 12 月 13 日

Approximations of the Green's Function in Multiple Scattering Theory for Crystalline Systems

Xiaoxu Li,Huajie Chen

from arxiv, 25 pages, 25 figures

The multiple scattering theory (MST) is a Green's function method that has been widely used in electronic structure calculations for crystalline disordered systems. The key property of the MST method is the scattering path matrix (SPM) that characterizes the Green's function within a local solution representation. This paper studies various approximations of the SPM, under the condition that an appropriate reference is used for perturbation. In particular, we justify the convergence of the SPM approximations with respect to the size of scattering region and the length of scattering path, which are the central numerical parameters to achieve a linear-scaling MST method. We present numerical experiments on several typical systems to support the theory.

控制器 · 約束 · 優化器 · 確切的 · 機器人 ·

2024 年 12 月 13 日

Distributed Inverse Dynamics Control for Quadruped Robots using Geometric Optimization

Nimesh Khandelwal,Amritanshu Manu,Shakti S. Gupta,Mangal Kothari,Prashanth Krishnamurthy,Farshad Khorrami

This paper presents a distributed inverse dynamics controller (DIDC) for quadruped robots that addresses the limitations of existing reactive controllers: simplified dynamical models, the inability to handle exact friction cone constraints, and the high computational requirements of whole-body controllers. Current methods either ignore friction constraints entirely or use linear approximations, leading to potential slip and instability, while comprehensive whole-body controllers demand significant computational resources. Our approach uses full rigid-body dynamics and enforces exact friction cone constraints through a novel geometric optimization-based solver. DIDC combines the required generalized forces corresponding to the actuated and unactuated spaces by projecting them onto the actuated space while satisfying the physical constraints and maintaining orthogonality between the base and joint tracking objectives. Experimental validation shows that our approach reduces foot slippage, improves orientation tracking, and converges at least two times faster than existing reactive controllers with generic QP-based implementations. The controller enables stable omnidirectional trotting at various speeds and consumes less power than comparable methods while running efficiently on embedded processors.

核化 · Learning · Machine Learning · 線性的 · 泛函 ·

2024 年 12 月 12 日

Experimental Machine Learning with Classical and Quantum Data via NMR Quantum Kernels

Vivek Sabarad,T. S. Mahesh

from arxiv, 8 pages, 5 figures

Kernel methods map data into high-dimensional spaces, enabling linear algorithms to learn nonlinear functions without explicitly storing the feature vectors. Quantum kernel methods promise efficient learning by encoding feature maps into exponentially large Hilbert spaces inherent in quantum systems. In this work we implement quantum kernels on a 10-qubit star-topology register in a nuclear magnetic resonance (NMR) platform. We experimentally encode classical data in the evolution of multiple quantum coherence orders using data-dependent unitary transformations and then demonstrate one-dimensional regression and two-dimensional classification tasks. By extending the register to a double-layered star configuration, we propose an extended quantum kernel to handle non-parametrized operator inputs. By numerically simulating the extended quantum kernel, we show classification of entangling and nonentangling unitaries. These results confirm that quantum kernels exhibit strong capabilities in classical as well as quantum machine learning tasks.

CAD · 可約的 · MoDELS · prototype · Integration ·

2024 年 12 月 12 日

Reducing Meshing Requirements for Electrostatic Problems using a Galerkin Boundary Element Method

Benjamin Marussig,Thomas Rüberg,Jürgen Zechner,Lars Kielhorn,Thomas-Peter Fries

This work focuses on model preparation for electrostatic simulations of CAD designs to realize a rapid virtual prototyping concept. We present a boundary element method (BEM) allowing discontinuous fields between surfaces. The corresponding edges of the CAD model are enhanced with the data required to integrate over non-conforming elements. Finally, we generate a mesh for each CAD surface. The approach is verified via numerical experiments and shows excellent agreement with conforming BEM results.

MoDELS · Extensibility · Learning · 代碼 · 輸出層 ·

2024 年 12 月 11 日

Compression of Higher Order Ambisonics with Multichannel RVQGAN

Toni Hirvonen,Mahmoud Namazi

A multichannel extension to the RVQGAN neural coding method is proposed, and realized for data-driven compression of third-order Ambisonics audio. The input- and output layers of the generator and discriminator models are modified to accept multiple (16) channels without increasing the model bitrate. We also propose a loss function for accounting for spatial perception in immersive reproduction, and transfer learning from single-channel models. Listening test results with 7.1.4 immersive playback show that the proposed extension is suitable for coding scene-based, 16-channel Ambisonics content with good quality at 16 kbps when trained and tested on the EigenScape database. The model has potential applications for learning other types of content and multichannel formats.

表示 · Principle · 成比例 · 原點 · 數據庫 ·

2024 年 12 月 11 日

A Principled Solution to the Disjunction Problem of Diagrammatic Query Representations

Wolfgang Gatterbauer

from arxiv, 41 pages, 27 figures

Finding unambiguous diagrammatic representations for first-order logical formulas and relational queries with arbitrarily nested disjunctions has been a surprisingly long-standing unsolved problem. We refer to this problem as the disjunction problem (of diagrammatic query representations). This work solves the disjunction problem. Our solution unifies, generalizes, and overcomes the shortcomings of prior approaches for disjunctions. It extends the recently proposed Relational Diagrams and is identical for disjunction-free queries. However, it can preserve the relational patterns and the safety for all well-formed Tuple Relational Calculus (TRC) queries, even with arbitrary disjunctions. Additionally, its size is proportional to the original TRC query and can thus be exponentially more succinct than Relational Diagrams.

MoDELS · 多峰值 · CASE · INFORMS · Performer ·

2024 年 12 月 11 日

Using Game Play to Investigate Multimodal and Conversational Grounding in Large Multimodal Models

Sherzod Hakimov,Yerkezhan Abdullayeva,Kushal Koshti,Antonia Schmidt,Yan Weiser,Anne Beyer,David Schlangen

from arxiv, Accepted at COLING 2025

While the situation has improved for text-only models, it again seems to be the case currently that multimodal (text and image) models develop faster than ways to evaluate them. In this paper, we bring a recently developed evaluation paradigm from text models to multimodal models, namely evaluation through the goal-oriented game (self) play, complementing reference-based and preference-based evaluation. Specifically, we define games that challenge a model's capability to represent a situation from visual information and align such representations through dialogue. We find that the largest closed models perform rather well on the games that we define, while even the best open-weight models struggle with them. On further analysis, we find that the exceptional deep captioning capabilities of the largest models drive some of the performance. There is still room to grow for both kinds of models, ensuring the continued relevance of the benchmark.

泛函 · 情景 · CASE · 原點 · Learning ·

2024 年 12 月 11 日

Randomized Lower Bounds for Tarski Fixed Points in High Dimensions

Simina Branzei,Reed Phillips,Nicholas Recker

from arxiv, 12 pages, 2 figures

The Knaster-Tarski theorem, also known as Tarski's theorem, guarantees that every monotone function defined on a complete lattice has a fixed point. We analyze the query complexity of finding such a fixed point on the $k$-dimensional grid of side length $n$ under the $\leq$ relation. Specifically, there is an unknown monotone function $f: \{0,1,\ldots, n-1\}^k \to \{0,1,\ldots, n-1\}^k$ and an algorithm must query a vertex $v$ to learn $f(v)$. A key special case of interest is the Boolean hypercube $\{0,1\}^k$, which is isomorphic to the power set lattice -- the original setting of the Knaster-Tarski theorem. Our lower bound characterizes the randomized and deterministic query complexity of the Tarski search problem on the Boolean hypercube as $\Theta(k)$. More generally, we prove a randomized lower bound of $\Omega\left( k + \frac{k \cdot \log{n}}{\log{k}} \right)$ for the $k$-dimensional grid of side length $n$, which is asymptotically tight in high dimensions when $k$ is large relative to $n$.

MoDELS · 語言模型化 · Learning · 逼真度 · 講稿 ·

2024 年 4 月 11 日

Best Practices and Lessons Learned on Synthetic Data for Language Models

Ruibo Liu,Jerry Wei,Fangyu Liu,Chenglei Si,Yanzhe Zhang,Jinmeng Rao,Steven Zheng,Daiyi Peng,Diyi Yang,Denny Zhou,Andrew M. Dai

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.

entity · 鏈路預測 · Performer · 圖 · 知識圖譜 ·

2019 年 9 月 26 日

Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Yao Zhu,Hongzhi Liu,Zhonghai Wu,Yang Song,Tao Zhang

Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.