欧美狂野视频一区国产精品_日韩专区欧美专区亚洲福利_国产精品成人18禁无码黄网站_又大又黄又粗又色在线播放_真人无码孕妇作爱视频_亚洲日韩国产欧美综合V_免看黄色29分钟录像视频

Kernel methods are learning algorithms that enjoy solid theoretical foundations while suffering from important computational limitations. Sketching, which consists in looking for solutions among a subspace of reduced dimension, is a well studied approach to alleviate these computational burdens. However, statistically-accurate sketches, such as the Gaussian one, usually contain few null entries, such that their application to kernel methods and their non-sparse Gram matrices remains slow in practice. In this paper, we show that sparsified Gaussian (and Rademacher) sketches still produce theoretically-valid approximations while allowing for important time and space savings thanks to an efficient \emph{decomposition trick}. To support our method, we derive excess risk bounds for both single and multiple output kernel problems, with generic Lipschitz losses, hereby providing new guarantees for a wide range of applications, from robust regression to multiple quantile regression. Our theoretical results are complemented with experiments showing the empirical superiority of our approach over SOTA sketching methods.

相關內容

核化

關注 1

全局模型 · 正則化 · 聯邦學習 · 計算成本 · 三元 ·

2023 年 4 月 12 日

FedTrip: A Resource-Efficient Federated Learning Method with Triplet Regularization

Xujing Li,Min Liu,Sheng Sun,Yuwei Wang,Hui Jiang,Xuefeng Jiang

from arxiv, 11 pages, to be published in the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

In the federated learning scenario, geographically distributed clients collaboratively train a global model. Data heterogeneity among clients significantly results in inconsistent model updates, which evidently slow down model convergence. To alleviate this issue, many methods employ regularization terms to narrow the discrepancy between client-side local models and the server-side global model. However, these methods impose limitations on the ability to explore superior local models and ignore the valuable information in historical models. Besides, although the up-to-date representation method simultaneously concerns the global and historical local models, it suffers from unbearable computation cost. To accelerate convergence with low resource consumption, we innovatively propose a model regularization method named FedTrip, which is designed to restrict global-local divergence and decrease current-historical correlation for alleviating the negative effects derived from data heterogeneity. FedTrip helps the current local model to be close to the global model while keeping away from historical local models, which contributes to guaranteeing the consistency of local updates among clients and efficiently exploring superior local models with negligible additional computation cost on attaching operations. Empirically, we demonstrate the superiority of FedTrip via extensive evaluations. To achieve the target accuracy, FedTrip outperforms the state-of-the-art baselines in terms of significantly reducing the total overhead of client-server communication and local computation.

預條件 · 收斂速度 · 并行 · 區域分解 · 目標特征 ·

2023 年 4 月 12 日

A Two-Level Block Preconditioned Jacobi-Davidson Method for Multiple and Clustered Eigenvalues of Elliptic Operators

Qigang Liang,Wei Wang,Xuejun Xu

In this paper, we propose a two-level block preconditioned Jacobi-Davidson (BPJD) method for efficiently solving discrete eigenvalue problems resulting from finite element approximations of $2m$th ($m = 1, 2$) order symmetric elliptic eigenvalue problems. Our method works effectively to compute the first several eigenpairs, including both multiple and clustered eigenvalues with corresponding eigenfunctions, particularly. The method is highly parallelizable by constructing a new and efficient preconditioner using an overlapping domain decomposition (DD). It only requires computing a couple of small scale parallel subproblems and a quite small scale eigenvalue problem per iteration. Our theoretical analysis reveals that the convergence rate of the method is bounded by $c(H)(1-C\frac{\delta^{2m-1}}{H^{2m-1}})^{2}$, where $H$ is the diameter of subdomains and $\delta$ is the overlapping size among subdomains. The constant $C$ is independent of the mesh size $h$ and the internal gaps among the target eigenvalues, demonstrating that our method is optimal and cluster robust. Meanwhile, the $H$-dependent constant $c(H)$ decreases monotonically to $1$, as $H \to 0$, which means that more subdomains lead to the better convergence rate. Numerical results supporting our theory are given.

函數逼近 · 穩健 · 準則 · 超參數 · 算法 ·

2023 年 4 月 11 日

A posteriori error bounds for the block-Lanczos method for matrix function approximation

Qichen Xu,Tyler Chen

We extend the error bounds from [SIMAX, Vol. 43, Iss. 2, pp. 787-811 (2022)] for the Lanczos method for matrix function approximation to the block algorithm. Numerical experiments suggest that our bounds are fairly robust to changing block size and have the potential for use as a practical stopping criteria. Further experiments work towards a better understanding of how certain hyperparameters should be chosen in order to maximize the quality of the error bounds, even in the previously studied block-size one case.

奇異值分解 · 高維 · 張量核 · 估計誤差 · 奇異值 ·

2023 年 4 月 11 日

Generative Modeling via Hierarchical Tensor Sketching

Yifan Peng,Yian Chen,E. Miles Stoudenmire,Yuehaw Khoo

We propose a hierarchical tensor-network approach for approximating high-dimensional probability density via empirical distribution. This leverages randomized singular value decomposition (SVD) techniques and involves solving linear equations for tensor cores in this tensor network. The complexity of the resulting algorithm scales linearly in the dimension of the high-dimensional density. An analysis of estimation error demonstrates the effectiveness of this method through several numerical experiments.

文本數據 · 連續空間 · 噪聲 · 離散 · 擴散模型 ·

2023 年 4 月 10 日

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

Jiaao Chen,Aston Zhang,Mu Li,Alex Smola,Diyi Yang

from arxiv, Code is available at //github.com/amazon-science/masked-diffusion-lm

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have some limitations in modeling discrete data, e.g., languages. For example, the generally used Gaussian noise can not handle the discrete corruption well, and the objectives in continuous spaces fail to be stable for textual data in the diffusion process especially when the dimension is high. To alleviate these issues, we introduce a novel diffusion model for language modeling, Masked-Diffuse LM, with lower training cost and better performances, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. Also, we directly predict the categorical distribution with cross-entropy loss function in every diffusion step to connect the continuous space and discrete space in a more efficient and straightforward way. Through experiments on 5 controlled generation tasks, we demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.

線性系統 · 類別 · 系統 · 穩健 · 周知 ·

2023 年 4 月 8 日

A comparison of Krylov methods for Shifted Skew-Symmetric Systems

R. Idema,C. Vuik

from arxiv, 23 pages, 3 figures

It is well known that for general linear systems, only optimal Krylov methods with long recurrences exist. For special classes of linear systems it is possible to find optimal Krylov methods with short recurrences. In this paper we consider the important class of linear systems with a shifted skew-symmetric coefficient matrix. We present the MRS3 solver, a minimal residual method that solves these problems using short vector recurrences. We give an overview of existing Krylov solvers that can be used to solve these problems, and compare them with the MRS3 method, both theoretically and by numerical experiments. From this comparison we argue that the MRS3 solver is the fastest and most robust of these Krylov method for systems with a shifted skew-symmetric coefficient matrix.

函數空間 · 映射 · PDE · 離散 · 積分算子 ·

2023 年 4 月 7 日

Neural Operator: Learning Maps Between Function Spaces

Nikola Kovachki,Zongyi Li,Burigede Liu,Kamyar Azizzadenesheli,Kaushik Bhattacharya,Andrew Stuart,Anima Anandkumar

The classical development of neural networks has primarily focused on learning mappings between finite dimensional Euclidean spaces or finite sets. We propose a generalization of neural networks to learn operators, termed neural operators, that map between infinite dimensional function spaces. We formulate the neural operator as a composition of linear integral operators and nonlinear activation functions. We prove a universal approximation theorem for our proposed neural operator, showing that it can approximate any given nonlinear continuous operator. The proposed neural operators are also discretization-invariant, i.e., they share the same model parameters among different discretization of the underlying function spaces. Furthermore, we introduce four classes of efficient parameterization, viz., graph neural operators, multi-pole graph neural operators, low-rank neural operators, and Fourier neural operators. An important application for neural operators is learning surrogate maps for the solution operators of partial differential equations (PDEs). We consider standard PDEs such as the Burgers, Darcy subsurface flow, and the Navier-Stokes equations, and show that the proposed neural operators have superior performance compared to existing machine learning based methodologies, while being several orders of magnitude faster than conventional PDE solvers.

排序 · 聚類方法 · 一致 · 數據集 · 特征描述 ·

2023 年 4 月 7 日

Consistency between ordering and clustering methods for graphs

Tatsuro Kawamoto,Masaki Ochi,Teruyoshi Kobayashi

from arxiv, 30 pages, 26 figures

A relational dataset is often analyzed by optimally assigning a label to each element through clustering or ordering. While similar characterizations of a dataset would be achieved by both clustering and ordering methods, the former has been studied much more actively than the latter, particularly for the data represented as graphs. This study fills this gap by investigating methodological relationships between several clustering and ordering methods, focusing on spectral techniques. Furthermore, we evaluate the resulting performance of the clustering and ordering methods. To this end, we propose a measure called the label continuity error, which generically quantifies the degree of consistency between a sequence and partition for a set of elements. Based on synthetic and real-world datasets, we evaluate the extents to which an ordering method identifies a module structure and a clustering method identifies a banded structure.

周期序列 · 有限自動機 · 序列 · 自動機理論 · 最優 ·

2023 年 4 月 7 日

Magic numbers in periodic sequences

Savinien Kreczman,Luca Prigioniero,Eric Rowland,Manon Stipulanti

from arxiv, 19 pages, 2 figures, 3 tables, accepted at the international conferences WORDS 2023

In formal languages and automata theory, the magic number problem can be formulated as follows: for a given integer n, is it possible to find a number d in the range [n,2^n] such that there is no minimal deterministic finite automaton with d states that can be simulated by an optimal nondeterministic finite automaton with exactly n states? If such a number d exists, it is called magic. In this paper, we consider the magic number problem in the framework of deterministic automata with output, which are known to characterize automatic sequences. More precisely, we investigate magic numbers for periodic sequences viewed as either automatic, regular, or constant-recursive.

Prompt · 語言模型化 · MoDELS · Performer · Processing（編程語言） ·

2021 年 7 月 28 日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Pengfei Liu,Weizhe Yuan,Jinlan Fu,Zhengbao Jiang,Hiroaki Hayashi,Graham Neubig

from arxiv, Website: //pretrain.nlpedia.ai/

This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning". Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x' that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g.the choice of pre-trained models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website //pretrain.nlpedia.ai/ including constantly-updated survey, and paperlist.