成年人日屄视频免费观看_一区二区三区人妻美穴又白_日本人的色道免费一区网站_丁香五月婷婷综合在线视频_亚洲欧美激情国产一区二区_色综合天天狠天天透天天伊人_青青草国产线观

It is often observed that stochastic gradient descent (SGD) and its variants implicitly select a solution with good generalization performance; such implicit bias is often characterized in terms of the sharpness of the minima. Kleinberg et al. (2018) connected this bias with the smoothing effect of SGD which eliminates sharp local minima by the convolution using the stochastic gradient noise. We follow this line of research and study the commonly-used averaged SGD algorithm, which has been empirically observed in Izmailov et al. (2018) to prefer a flat minimum and therefore achieves better generalization. We prove that in certain problem settings, averaged SGD can efficiently optimize the smoothed objective which avoids sharp local minima. In experiments, we verify our theory and show that parameter averaging with an appropriate step size indeed leads to significant improvement in the performance of SGD.

相關內容

SGD

關注 0

截斷點 · 流形 · 正交 · 正則的 · 曲率 ·

2024 年 7 月 8 日

On the Injectivity Radius of the Stiefel Manifold: Numerical investigations and an explicit construction of a cut point at short distance

Jakob Stoye,Ralf Zimmermann

Arguably, geodesics are the most important geometric objects on a differentiable manifold. They describe candidates for shortest paths and are guaranteed to be unique shortest paths when the starting velocity stays within the so-called injectivity radius of the manifold. In this work, we investigate the injectivity radius of the Stiefel manifold under the canonical metric. The Stiefel manifold $St(n,p)$ is the set of rectangular matrices of dimension $n$-by-$p$ with orthogonal columns, sometimes also called the space of orthogonal $p$-frames in $\mathbb{R}^n$. Using a standard curvature argument, Rentmeesters has shown in 2013 that the injectivity radius of the Stiefel manifold is bounded by $\sqrt{\frac{4}{5}}\pi$. It is an open question, whether this bound is sharp. With the definition of the injectivity radius via cut points of geodesics, we gain access to the information of the injectivity radius by investigating geodesics. More precisely, we consider the behavior of special variations of geodesics, called Jacobi fields. By doing so, we are able to present an explicit example of a cut point. In addition, since the theoretical analysis of geodesics for cut points and especially conjugate points as a type of cut points is difficult, we investigate the question of the sharpness of the bound by means of numerical experiments.

MoDELS · Stable Diffusion · Performer · 主成分回歸 · Extensibility ·

2024 年 7 月 8 日

Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing

Siao Tang,Xin Wang,Hong Chen,Chaoyu Guan,Zewen Wu,Yansong Tang,Wenwu Zhu

from arxiv, Accepted by ECCV2024

High computational overhead is a troublesome problem for diffusion models. Recent studies have leveraged post-training quantization (PTQ) to compress diffusion models. However, most of them only focus on unconditional models, leaving the quantization of widely-used pretrained text-to-image models, e.g., Stable Diffusion, largely unexplored. In this paper, we propose a novel post-training quantization method PCR (Progressive Calibration and Relaxing) for text-to-image diffusion models, which consists of a progressive calibration strategy that considers the accumulated quantization error across timesteps, and an activation relaxing strategy that improves the performance with negligible cost. Additionally, we demonstrate the previous metrics for text-to-image diffusion model quantization are not accurate due to the distribution gap. To tackle the problem, we propose a novel QDiffBench benchmark, which utilizes data in the same domain for more accurate evaluation. Besides, QDiffBench also considers the generalization performance of the quantized model outside the calibration dataset. Extensive experiments on Stable Diffusion and Stable Diffusion XL demonstrate the superiority of our method and benchmark. Moreover, we are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.

優化器 · MoDELS · 余弦 · 模型評估 · Processing（編程語言） ·

2024 年 7 月 8 日

Speed-accuracy trade-off for the diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport

Kotaro Ikeda,Tomoya Uda,Daisuke Okanohara,Sosuke Ito

from arxiv, 26 pages, 5 figures

We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics for the Fokker-Planck equation, called stochastic thermodynamics. Based on the techniques of stochastic thermodynamics, we derive the speed-accuracy trade-off for the diffusion models, which is a trade-off relationship between the speed and accuracy of data generation in diffusion models. Our result implies that the entropy production rate in the forward process affects the errors in data generation. From a stochastic thermodynamic perspective, our results provide quantitative insight into how best to generate data in diffusion models. The optimal learning protocol is introduced by the conservative force in stochastic thermodynamics and the geodesic of space by the 2-Wasserstein distance in optimal transport theory. We numerically illustrate the validity of the speed-accuracy trade-off for the diffusion models with different noise schedules such as the cosine schedule, the conditional optimal transport, and the optimal transport.

周期的 · Performer · motivation · Extensibility · 相同 ·

2024 年 7 月 7 日

Balanced assignments of periodic tasks

Hélo?se Gachet,Frédéric Meunier

This work deals with a problem of assigning periodic tasks to employees in such a way that each employee performs each task with the same frequency in the long term. The motivation comes from a collaboration with the SNCF, the main French railway company. An almost complete solution is provided under the form of a necessary and sufficient condition that can be checked in polynomial time. A complementary discussion about possible extensions is also proposed.

損失函數（機器學習） · 泛函 · 損失 · Lipschitz · Lipschitz連續 ·

2024 年 7 月 7 日

Privacy of the last iterate in cyclically-sampled DP-SGD on nonconvex composite losses

Weiwei Kong,Mónica Ribero

Differentially private stochastic gradient descent (DP-SGD) refers to a family of optimization algorithms that provide a guaranteed level of differential privacy (DP) through DP accounting techniques. However, current accounting techniques make assumptions that diverge significantly from practical DP-SGD implementations. For example, they may assume the loss function is Lipschitz continuous and convex, sample the batches randomly with replacement, or omit the gradient clipping step. In this work, we analyze the most commonly used variant of DP-SGD, in which we sample batches cyclically with replacement, perform gradient clipping, and only release the last DP-SGD iterate. More specifically - without assuming convexity, smoothness, or Lipschitz continuity of the loss function - we establish new R\'enyi differential privacy (RDP) bounds for the last DP-SGD iterate under the mild assumption that (i) the DP-SGD stepsize is small relative to the topological constants in the loss function, and (ii) the loss function is weakly-convex. Moreover, we show that our bounds converge to previously established convex bounds when the weak-convexity parameter of the objective function approaches zero. In the case of non-Lipschitz smooth loss functions, we provide a weaker bound that scales well in terms of the number of DP-SGD iterations.

不變 · Fractal · 泛函 · 近似 · CASES ·

2024 年 7 月 7 日

On the Higuchi fractal dimension of invariant measures for countable idempotent iterated function systems

Elismar R. Oliveira

We investigate the set of invariant idempotent probabilities for countable idempotent iterated function systems (IFS) defined in compact metric spaces. We demonstrate that, with constant weights, there exists a unique invariant idempotent probability. Utilizing Secelean's approach to countable IFSs, we introduce partially finite idempotent IFSs and prove that the sequence of invariant idempotent measures for these systems converges to the invariant measure of the original countable IFS. We then apply these results to approximate such measures with discrete systems, producing, in the one-dimensional case, data series whose Higuchi fractal dimension can be calculated. Finally, we provide numerical approximations for two-dimensional cases and discuss the application of generalized Higuchi dimensions in these scenarios.

線性的 · 潛變量/隱變量 · 潛在 · 觀測變量 · 相互獨立的 ·

2024 年 7 月 5 日

Linear causal disentanglement via higher-order cumulants

Paula Leyes Carreno,Chiara Meroni,Anna Seigal

Linear causal disentanglement is a recent method in causal representation learning to describe a collection of observed variables via latent variables with causal dependencies between them. It can be viewed as a generalization of both independent component analysis and linear structural equation models. We study the identifiability of linear causal disentanglement, assuming access to data under multiple contexts, each given by an intervention on a latent variable. We show that one perfect intervention on each latent variable is sufficient and in the worst case necessary to recover parameters under perfect interventions, generalizing previous work to allow more latent than observed variables. We give a constructive proof that computes parameters via a coupled tensor decomposition. For soft interventions, we find the equivalence class of latent graphs and parameters that are consistent with observed data, via the study of a system of polynomial equations. Our results hold assuming the existence of non-zero higher-order cumulants, which implies non-Gaussianity of variables.

優化器 · MoDELS · 余弦 · 模型評估 · Processing（編程語言） ·

2024 年 7 月 5 日

Speed-accuracy trade-off for the diffusion models: Wisdom from nonequlibrium thermodynamics and optimal transport

Kotaro Ikeda,Tomoya Uda,Daisuke Okanohara,Sosuke Ito

from arxiv, 26 pages, 5 figures

Tensor · 流 · 近似 · 講稿 · 線性的 ·

2024 年 7 月 4 日

A sequential multilinear Nystr?m algorithm for streaming low-rank approximation of tensors in Tucker format

Alberto Bucci,Behnam Hashemi

We present a sequential version of the multilinear Nystr\"om algorithm which is suitable for the low-rank Tucker approximation of tensors given in a streaming format. Accessing the tensor $\mathcal{A}$ exclusively through random sketches of the original data, the algorithm effectively leverages structures in $\mathcal{A}$, such as low-rankness, and linear combinations. We present a deterministic analysis of the algorithm and demonstrate its superior speed and efficiency in numerical experiments including an application in video processing.

MoDELS · 模型評估 · NLP · Extensibility · 可辨認的 ·

2020 年 5 月 8 日

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Marco Tulio Ribeiro,Tongshuang Wu,Carlos Guestrin,Sameer Singh

Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.