91婷婷国产精选国产色-日韩精品人妻中文字幕有码网址

We develop an algorithm that computes strongly continuous semigroups on infinite-dimensional Hilbert spaces with explicit error control. Given a generator $A$, a time $t>0$, an arbitrary initial vector $u_0$ and an error tolerance $\epsilon>0$, the algorithm computes $\exp(tA)u_0$ with error bounded by $\epsilon$. The algorithm is based on a combination of a regularized functional calculus, suitable contour quadrature rules, and the adaptive computation of resolvents in infinite dimensions. As a particular case, we show that it is possible, even when only allowing pointwise evaluation of coefficients, to compute, with error control, semigroups on the unbounded domain $L^2(\mathbb{R}^d)$ that are generated by partial differential operators with polynomially bounded coefficients of locally bounded total variation. For analytic semigroups (and more general Laplace transform inversion), we provide a quadrature rule whose error decreases like $\exp(-cN/\log(N))$ for $N$ quadrature points, that remains stable as $N\rightarrow\infty$, and which is also suitable for infinite-dimensional operators. Numerical examples are given, including: Schr\"odinger and wave equations on the aperiodic Ammann--Beenker tiling, complex perturbed fractional diffusion equations on $L^2(\mathbb{R})$, and damped Euler--Bernoulli beam equations.

相關內容

控制器

關注 5

正則化項 · FAST · Lipschitz · 線搜索 · Performance ·

2021 年 12 月 3 日

Regularized Newton Method with Global $O(1/k^2)$ Convergence

Konstantin Mishchenko

from arxiv, 19 pages, 1 figure

We present a Newton-type method that converges fast from any initialization and for arbitrary convex objectives with Lipschitz Hessians. We achieve this by merging the ideas of cubic regularization with a certain adaptive Levenberg--Marquardt penalty. In particular, we show that the iterates given by $x^{k+1}=x^k - \bigl(\nabla^2 f(x^k) + \sqrt{H\|\nabla f(x^k)\|} \mathbf{I}\bigr)^{-1}\nabla f(x^k)$, where $H>0$ is a constant, converge globally with a $\mathcal{O}(\frac{1}{k^2})$ rate. Our method is the first variant of Newton's method that has both cheap iterations and provably fast global convergence. Moreover, we prove that locally our method converges superlinearly when the objective is strongly convex. To boost the method's performance, we present a line search procedure that does not need hyperparameters and is provably efficient.

近似 · 方陣 · 泛函 · 均值 · Performer ·

2021 年 12 月 3 日

Computation of conditional expectations with guarantees

Patrick Cheridito,Balint Gersey

Theoretically, the conditional expectation of a square-integrable random variable $Y$ given a $d$-dimensional random vector $X$ can be obtained by minimizing the mean squared distance between $Y$ and $f(X)$ over all Borel measurable functions $f \colon \mathbb{R}^d \to \mathbb{R}$. However, in many applications this minimization problem cannot be solved exactly, and instead, a numerical method that computes an approximate minimum over a suitable subfamily of Borel functions has to be used. The quality of the result depends on the adequacy of the subfamily and the performance of the numerical method. In this paper, we derive an expected value representation of the minimal mean square distance which in many applications can efficiently be approximated with a standard Monte Carlo average. This enables us to provide guarantees for the accuracy of any numerical approximation of a given conditional expectation. We illustrate the method by assessing the quality of approximate conditional expectations obtained by linear, polynomial as well as neural network regression in different concrete examples.

離散化 · 確切的 · UniFormer · 極小點 · contrastive ·

2021 年 12 月 3 日

Classical computation of quantum guesswork

Michele Dall'Arno,Francesco Buscemi,Takeshi Koshiba

from arxiv, 6 pages, 2 tables

The guesswork quantifies the minimum number of queries needed to guess the state of a quantum ensemble if one is allowed to query only one state at a time. Previous approaches to the computation of the guesswork were based on standard semi-definite programming techniques and therefore lead to approximated results. In contrast, our main result is an algorithm that, upon the input of any qubit ensemble over a discrete ring and with uniform probability distribution, after finitely many steps outputs the exact closed-form analytic expression of its guesswork. The complexity of our guesswork-computing algorithm is factorial in the number of states, with a more-than-quadratic speedup for symmetric ensembles. To find such symmetries, we provide an algorithm that, upon the input of any point set over a discrete ring, after finitely many steps outputs its exact symmetries. The complexity of our symmetries-finding algorithm is polynomial in the number of points. As examples, we compute the guesswork of regular and quasi-regular sets of qubit states.

貪心逐層預訓練 · 貪心 · 正交 · 優化器 · Notability ·

2021 年 12 月 2 日

Optimal Convergence Rates for the Orthogonal Greedy Algorithm

Jonathan W. Siegel,Jinchao Xu

We analyze the orthogonal greedy algorithm when applied to dictionaries $\mathbb{D}$ whose convex hull has small entropy. We show that if the metric entropy of the convex hull of $\mathbb{D}$ decays at a rate of $O(n^{-\frac{1}{2}-\alpha})$ for $\alpha > 0$, then the orthogonal greedy algorithm converges at the same rate on the variation space of $\mathbb{D}$. This improves upon the well-known $O(n^{-\frac{1}{2}})$ convergence rate of the orthogonal greedy algorithm in many cases, most notably for dictionaries corresponding to shallow neural networks. These results hold under no additional assumptions on the dictionary beyond the decay rate of the entropy of its convex hull. In addition, they are robust to noise in the target function and can be extended to convergence rates on the interpolation spaces of the variation norm. Finally, we show that these improved rates are sharp and prove a negative result showing that the iterates generated by the orthogonal greedy algorithm cannot in general be bounded in the variation norm of $\mathbb{D}$.

MoDELS · 控制器 · 生成模型 · 樣本 · 聯合分布 ·

2021 年 10 月 21 日

Controllable and Compositional Generation with Latent-Space Energy-Based Models

Weili Nie,Arash Vahdat,Anima Anandkumar

from arxiv, 32 pages, NeurIPS 2021

Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE). Given a pre-trained generator, all we need for controllable generation is to train an attribute classifier. Sampling with ODEs is done efficiently in the latent space and is robust to hyperparameters. Thus, our method is simple, fast to train, and efficient to sample. Experimental results show that our method outperforms the state-of-the-art in both conditional sampling and sequential editing. In compositional generation, our method excels at zero-shot generation of unseen attribute combinations. Also, by composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.

contrastive · 對比學習 · 學成 · 聲紋識別 · 損失 ·

2020 年 6 月 8 日

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Nakamasa Inoue,Keita Goto

This paper introduces a semi-supervised contrastive learning framework and its application to text-independent speaker verification. The proposed framework employs generalized contrastive loss (GCL). GCL unifies losses from two different learning frameworks, supervised metric learning and unsupervised contrastive learning, and thus it naturally determines the loss for semi-supervised learning. In experiments, we applied the proposed framework to text-independent speaker verification on the VoxCeleb dataset. We demonstrate that GCL enables the learning of speaker embeddings in three manners, supervised learning, semi-supervised learning, and unsupervised learning, without any changes in the definition of the loss function.

學成 · 替代損失 · 在線 · Bandits · 賭博機/老虎機 ·

2019 年 12 月 31 日

A Modern Introduction to Online Learning

Francesco Orabona

In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the proofs have been carefully chosen to be as simple and as short as possible.

Performer · 線性的 · 高斯誤差線性單元 · Weight · 正則化項 ·

2018 年 11 月 11 日

Gaussian Error Linear Units (GELUs)

Dan Hendrycks,Kevin Gimpel

from arxiv, Trimmed version of 2016 draft

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU nonlinearity is the expected transformation of a stochastic regularizer which randomly applies the identity or zero map to a neuron's input. The GELU nonlinearity weights inputs by their magnitude, rather than gates inputs by their sign as in ReLUs. We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.

話題模型 · MoDELS · PCA · LDA · CASE ·

2018 年 6 月 4 日

Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps

Tobias Hecking,Loet Leydesdorff

Using the 6,638 case descriptions of societal impact submitted for evaluation in the Research Excellence Framework (REF 2014), we replicate the topic model (Latent Dirichlet Allocation or LDA) made in this context and compare the results with factor-analytic results using a traditional word-document matrix (Principal Component Analysis or PCA). Removing a small fraction of documents from the sample, for example, has on average a much larger impact on LDA than on PCA-based models to the extent that the largest distortion in the case of PCA has less effect than the smallest distortion of LDA-based models. In terms of semantic coherence, however, LDA models outperform PCA-based models. The topic models inform us about the statistical properties of the document sets under study, but the results are statistical and should not be used for a semantic interpretation - for example, in grant selections and micro-decision making, or scholarly work-without follow-up using domain-specific semantic maps.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.