国产高清一区二区在线影院,精品国产V一区二区三区,手机免费观看伦理片,欧美巨大黑人精品10P,久热99这里只有精品

There is growing evidence that pretraining on high quality, carefully thought-out tokens such as code or mathematics plays an important role in improving the reasoning abilities of large language models. For example, Minerva, a PaLM model finetuned on billions of tokens of mathematical documents from arXiv and the web, reported dramatically improved performance on problems that require quantitative reasoning. However, because all known open source web datasets employ preprocessing that does not faithfully preserve mathematical notation, the benefits of large scale training on quantitive web documents are unavailable to the research community. We introduce OpenWebMath, an open dataset inspired by these works containing 14.7B tokens of mathematical webpages from Common Crawl. We describe in detail our method for extracting text and LaTeX content and removing boilerplate from HTML documents, as well as our methods for quality filtering and deduplication. Additionally, we run small-scale experiments by training 1.4B parameter language models on OpenWebMath, showing that models trained on 14.7B tokens of our dataset surpass the performance of models trained on over 20x the amount of general language data. We hope that our dataset, openly released on the Hugging Face Hub, will help spur advances in the reasoning abilities of large language models.

相關內容

MoDELS

關注 0

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 變換 · Learning · 代碼 · 遷移學習 ·

2023 年 11 月 27 日

Transformer-QEC: Quantum Error Correction Code Decoding with Transferable Transformers

Hanrui Wang,Pengyu Liu,Kevin Shao,Dantong Li,Jiaqi Gu,David Z. Pan,Yongshan Ding,Song Han

from arxiv, Accepted to ICCAD 2023, FAST ML for Science Workshop; 7 pages, 8 figures

Quantum computing has the potential to solve problems that are intractable for classical systems, yet the high error rates in contemporary quantum devices often exceed tolerable limits for useful algorithm execution. Quantum Error Correction (QEC) mitigates this by employing redundancy, distributing quantum information across multiple data qubits and utilizing syndrome qubits to monitor their states for errors. The syndromes are subsequently interpreted by a decoding algorithm to identify and correct errors in the data qubits. This task is complex due to the multiplicity of error sources affecting both data and syndrome qubits as well as syndrome extraction operations. Additionally, identical syndromes can emanate from different error sources, necessitating a decoding algorithm that evaluates syndromes collectively. Although machine learning (ML) decoders such as multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs) have been proposed, they often focus on local syndrome regions and require retraining when adjusting for different code distances. We introduce a transformer-based QEC decoder which employs self-attention to achieve a global receptive field across all input syndromes. It incorporates a mixed loss training approach, combining both local physical error and global parity label losses. Moreover, the transformer architecture's inherent adaptability to variable-length inputs allows for efficient transfer learning, enabling the decoder to adapt to varying code distances without retraining. Evaluation on six code distances and ten different error configurations demonstrates that our model consistently outperforms non-ML decoders, such as Union Find (UF) and Minimum Weight Perfect Matching (MWPM), and other ML decoders, thereby achieving best logical error rates. Moreover, the transfer learning can save over 10x of training cost.

Extensibility · JAX · Learning · Machine Learning · 縮放 ·

2023 年 11 月 27 日

XLB: Distributed Multi-GPU Lattice Boltzmann Simulation Framework for Differentiable Scientific Machine Learning

Mohammadmehdi Ataei,Hesam Salehipour

The lattice Boltzmann method (LBM) has emerged as a prominent technique for solving fluid dynamics problems due to its algorithmic potential for computational scalability. We introduce XLB framework, a Python-based differentiable LBM library which harnesses the capabilities of the JAX framework. The architecture of XLB is predicated upon ensuring accessibility, extensibility, and computational performance, enabling scaling effectively across CPU, multi-GPU, and distributed multi-GPU systems. The framework can be readily augmented with novel boundary conditions, collision models, or simulation capabilities. XLB offers the unique advantage of integration with JAX's extensive machine learning echosystem, and the ability to utilize automatic differentiation for tackling physics-based machine learning, optimization, and inverse problems. XLB has been successfully scaled to handle simulations with billions of cells, achieving giga-scale lattice updates per second. XLB is released under the permissive Apache-2.0 license and is available on GitHub at //github.com/Autodesk/XLB.

Subspace · Projection · 估計/估計量 · 線性的 · 可約的 ·

2023 年 11 月 27 日

Sketched and Truncated Polynomial Krylov Subspace Methods: Matrix Equations

Davide Palitta,Marcel Schweitzer,Valeria Simoncini

Thanks to its great potential in reducing both computational cost and memory requirements, combining sketching and Krylov subspace techniques has attracted a lot of attention in the recent literature on projection methods for linear systems, matrix function approximations, and eigenvalue problems. Applying this appealing strategy in the context of linear matrix equations turns out to be far more involved than a straightforward generalization. These difficulties include establishing well-posedness of the projected problem and deriving possible error estimates depending on the sketching properties. Further computational complications include the lack of a natural residual norm estimate and of an explicit basis for the generated subspace. In this paper we propose a new sketched-and-truncated polynomial Krylov subspace method for Sylvester equations that aims to address all these issues. The potential of our novel approach, in terms of both computational time and storage demand, is illustrated with numerical experiments. Comparisons with a state-of-the-art projection scheme based on rational Krylov subspaces are also included.

任務對話系統 · 推斷 · MoDELS · INFORMS · 分離的 ·

2023 年 11 月 26 日

InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation

Guoqing Lv,Jiang Li,Xiaoping Wang,Zhigang Zeng

from arxiv, Accepted by the 45th Annual Meeting of the Cognitive Science Society (CogSci 2023)

Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through the multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.

情景 · 自助法/自舉法 · 劃分 · 優化器 · 可約的 ·

2023 年 11 月 23 日

Work-Efficient Parallel Derandomization II: Optimal Concentrations via Bootstrapping

Mohsen Ghaffari,Christoph Grunau

We present an efficient parallel derandomization method for randomized algorithms that rely on concentrations such as the Chernoff bound. This settles a classic problem in parallel derandomization, which dates back to the 1980s. Consider the \textit{set balancing} problem where $m$ sets of size at most $s$ are given in a ground set of size $n$, and we should partition the ground set into two parts such that each set is split evenly up to a small additive (discrepancy) bound. A random partition achieves a discrepancy of $O(\sqrt{s \log m})$ in each set, by Chernoff bound. We give a deterministic parallel algorithm that matches this bound, using near-linear work and polylogarithmic depth. The previous results were weaker in discrepancy and/or work bounds: Motwani, Naor, and Naor [FOCS'89] and Berger and Rompel [FOCS'89] achieve discrepancy $s^{\varepsilon} \cdot O(\sqrt{s \log m})$ with work $\tilde{O}(m+n+\sum_{i=1}^{m} |S_i|) \cdot m^{\Theta(1/\varepsilon)}$ and polylogarithmic depth; the discrepancy was optimized to $O(\sqrt{s \log m})$ in later work, e.g. by Harris [Algorithmica'19], but the work bound remained high at $\tilde{O}(m^4n^3)$. Ghaffari, Grunau, and Rozhon [FOCS'23] achieve discrepancy $s/poly(\log(nm)) + O(\sqrt{s \log m})$ with near-linear work and polylogarithmic-depth. Notice that this discrepancy is barely sublinear with respect to the trivial bound of $s$. Our method relies on a novel bootstrapping idea that uses crude partitioning algorithms as a subroutine. In particular, we solve the problem recursively, by using the crude partition in each iteration to split the variables into many smaller parts, and then we find a constraint for the variables in each part such that we reduce the overall number of variables in the problem. The scheme relies on an interesting application of the multiplicative weights update method to control the variance losses in each iteration.

知識 (knowledge) · Processing（編程語言） · 圖 · NLP · 知識圖譜 ·

2022 年 9 月 30 日

A Decade of Knowledge Graphs in Natural Language Processing: A Survey

Phillip Schneider,Tim Schopf,Juraj Vladika,Mikhail Galkin,Elena Simperl,Florian Matthes

from arxiv, Accepted to AACL-IJCNLP 2022

In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.

Learning · Processing（編程語言） · Machine Learning · 評論員 · Vision ·

2022 年 6 月 30 日

Causal Machine Learning: A Survey and Open Problems

Jean Kaddour,Aengus Lynch,Qi Liu,Matt J. Kusner,Ricardo Silva

Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This allows one to reason about the effects of changes to this process (i.e., interventions) and what would have happened in hindsight (i.e., counterfactuals). We categorize work in \causalml into five groups according to the problems they tackle: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, (5) causal reinforcement learning. For each category, we systematically compare its methods and point out open problems. Further, we review modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work.

MoDELS · 可約的 · INTERACT · 估計/估計量 · 學成 ·

2022 年 3 月 3 日

NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields

Shanyan Guan,Huayu Deng,Yunbo Wang,Xiaokang Yang

Deep learning has shown great potential for modeling the physical dynamics of complex particle systems such as fluids (in Lagrangian descriptions). Existing approaches, however, require the supervision of consecutive particle properties, including positions and velocities. In this paper, we consider a partially observable scenario known as fluid dynamics grounding, that is, inferring the state transitions and interactions within the fluid particle systems from sequential visual observations of the fluid surface. We propose a differentiable two-stage network named NeuroFluid. Our approach consists of (i) a particle-driven neural renderer, which involves fluid physical properties into the volume rendering function, and (ii) a particle transition model optimized to reduce the differences between the rendered and the observed images. NeuroFluid provides the first solution to unsupervised learning of particle-based fluid dynamics by training these two models jointly. It is shown to reasonably estimate the underlying physics of fluids with different initial shapes, viscosity, and densities. It is a potential alternative approach to understanding complex fluid mechanics, such as turbulence, that are difficult to model using traditional methods of mathematical physics.

學成 · 深度學習 · Vision · 計算機視覺 · Performance ·

2019 年 9 月 5 日

Scene Text Detection and Recognition: The Deep Learning Era

Shangbang Long,Xin He,Cong Yao

from arxiv, Submitted version

With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inescapably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, approach and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and the grand challenges still remained. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected and compiled in our Github repository: //github.com/Jyouhou/SceneTextPapers.

情感分析 · MoDELS · 循環神經網絡 · entity · Neural Networks ·

2018 年 6 月 8 日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Ethem F. Can,Aysu Ezen-Can,Fazli Can

from arxiv, ACM SIGIR 2018 Workshop on Learning from Limited or Noisy Data (LND4IR'18)

Sentiment analysis is a widely studied NLP task where the goal is to determine opinions, emotions, and evaluations of users towards a product, an entity or a service that they are reviewing. One of the biggest challenges for sentiment analysis is that it is highly language dependent. Word embeddings, sentiment lexicons, and even annotated data are language specific. Further, optimizing models for each language is very time consuming and labor intensive especially for recurrent neural network models. From a resource perspective, it is very challenging to collect data for different languages. In this paper, we look for an answer to the following research question: can a sentiment analysis model trained on a language be reused for sentiment analysis in other languages, Russian, Spanish, Turkish, and Dutch, where the data is more limited? Our goal is to build a single model in the language with the largest dataset available for the task, and reuse it for languages that have limited resources. For this purpose, we train a sentiment analysis model using recurrent neural networks with reviews in English. We then translate reviews in other languages and reuse this model to evaluate the sentiments. Experimental results show that our robust approach of single model trained on English reviews statistically significantly outperforms the baselines in several different languages.