国产成人精品三级在线_99久久国产精品综合久久国产_久热精品视频一区二区三区_鲁丝丝国产一区二区_国产在线播放剧情演绎_大胸少妇AV一区二区三区_蘑菇视频免费版在线观看版下载

Recently, the stochastic Polyak step size (SPS) has emerged as a competitive adaptive step size scheme for stochastic gradient descent. Here we develop ProxSPS, a proximal variant of SPS that can handle regularization terms. Developing a proximal variant of SPS is particularly important, since SPS requires a lower bound of the objective function to work well. When the objective function is the sum of a loss and a regularizer, available estimates of a lower bound of the sum can be loose. In contrast, ProxSPS only requires a lower bound for the loss which is often readily available. As a consequence, we show that ProxSPS is easier to tune and more stable in the presence of regularization. Furthermore for image classification tasks, ProxSPS performs as well as AdamW with little to no tuning, and results in a network with smaller weight parameters. We also provide an extensive convergence analysis for ProxSPS that includes the non-smooth, smooth, weakly convex and strongly convex setting.

相關內容

正則化項

關注 0

Processing（編程語言） · 線性的 · 操作 · 樣本 · 線性變換 ·

2023 年 6 月 16 日

Images of Gaussian and other stochastic processes under closed, densely-defined, unbounded linear operators

Tadashi Matsumoto,T. J. Sullivan

from arxiv, 7 pages

Gaussian processes (GPs) are widely-used tools in spatial statistics and machine learning and the formulae for the mean function and covariance kernel of a GP $v$ that is the image of another GP $u$ under a linear transformation $T$ acting on the sample paths of $u$ are well known, almost to the point of being folklore. However, these formulae are often used without rigorous attention to technical details, particularly when $T$ is an unbounded operator such as a differential operator, which is common in several modern applications. This note provides a self-contained proof of the claimed formulae for the case of a closed, densely-defined operator $T$ acting on the sample paths of a square-integrable stochastic process. Our proof technique relies upon Hille's theorem for the Bochner integral of a Banach-valued random variable.

非凸 · Principle · SGD · 泛函 · 優化器 ·

2023 年 6 月 16 日

Gradient is All You Need?

Konstantin Riedl,Timo Klock,Carina Geldhauser,Massimo Fornasier

from arxiv, 38 pages, 4 figures

In this paper we provide a novel analytical perspective on the theoretical understanding of gradient-based learning algorithms by interpreting consensus-based optimization (CBO), a recently proposed multi-particle derivative-free optimization method, as a stochastic relaxation of gradient descent. Remarkably, we observe that through communication of the particles, CBO exhibits a stochastic gradient descent (SGD)-like behavior despite solely relying on evaluations of the objective function. The fundamental value of such link between CBO and SGD lies in the fact that CBO is provably globally convergent to global minimizers for ample classes of nonsmooth and nonconvex objective functions, hence, on the one side, offering a novel explanation for the success of stochastic relaxations of gradient descent. On the other side, contrary to the conventional wisdom for which zero-order methods ought to be inefficient or not to possess generalization abilities, our results unveil an intrinsic gradient descent nature of such heuristics. This viewpoint furthermore complements previous insights into the working principles of CBO, which describe the dynamics in the mean-field limit through a nonlinear nonlocal partial differential equation that allows to alleviate complexities of the nonconvex function landscape. Our proofs leverage a completely nonsmooth analysis, which combines a novel quantitative version of the Laplace principle (log-sum-exp trick) and the minimizing movement scheme (proximal iteration). In doing so, we furnish useful and precise insights that explain how stochastic perturbations of gradient descent overcome energy barriers and reach deep levels of nonconvex functions. Instructive numerical illustrations support the provided theoretical insights.

CASE · 估計/估計量 · 相互獨立的 · CASES · 置信度 ·

2023 年 6 月 15 日

Average Case Error Estimates of the Strong Lucas Test

Semira Einsele,Kenneth Paterson

Reliable probabilistic primality tests are fundamental in public-key cryptography. In adversarial scenarios, a composite with a high probability of passing a specific primality test could be chosen. In such cases, we need worst-case error estimates for the test. However, in many scenarios the numbers are randomly chosen and thus have significantly smaller error probability. Therefore, we are interested in average case error estimates. In this paper, we establish such bounds for the strong Lucas primality test, as only worst-case, but no average case error bounds, are currently available. This allows us to use this test with more confidence. We examine an algorithm that draws odd $k$-bit integers uniformly and independently, runs $t$ independent iterations of the strong Lucas test with randomly chosen parameters, and outputs the first number that passes all $t$ consecutive rounds. We attain numerical upper bounds on the probability on returing a composite. Furthermore, we consider a modified version of this algorithm that excludes integers divisible by small primes, resulting in improved bounds. Additionally, we classify the numbers that contribute most to our estimate.

估計/估計量 · Weight · Performer · 統計量 · 分離的 ·

2023 年 6 月 15 日

One-Step weighting to generalize and transport treatment effect estimates to a target population

Ambarish Chattopadhyay,Eric R. Cohn,Jose R. Zubizarreta

The problem of generalization and transportation of treatment effect estimates from a study sample to a target population is central to empirical research and statistical methodology. In both randomized experiments and observational studies, weighting methods are often used with this objective. Traditional methods construct the weights by separately modeling the treatment assignment and study selection probabilities and then multiplying functions (e.g., inverses) of their estimates. In this work, we provide a justification and an implementation for weighting in a single step. We show a formal connection between this one-step method and inverse probability and inverse odds weighting. We demonstrate that the resulting estimator for the target average treatment effect is consistent, asymptotically Normal, multiply robust, and semiparametrically efficient. We evaluate the performance of the one-step estimator in a simulation study. We illustrate its use in a case study on the effects of physician racial diversity on preventive healthcare utilization among Black men in California. We provide R code implementing the methodology.

估計/估計量 · 可約的 · SimPLe · 線性的 · 方差 ·

2023 年 6 月 15 日

Easier Estimation of Extremes under Randomized Response

Jonathan Hehir

from arxiv, 8 pages

In this brief note, we consider estimation of the bitwise combination $x_1 \lor \dots \lor x_n = \max_i x_i$ observing a set of noisy bits $\tilde x_i \in \{0, 1\}$ that represent the true, unobserved bits $x_i \in \{0, 1\}$ under randomized response. We demonstrate that various existing estimators for the extreme bit, including those based on computationally costly estimates of the sum of bits, can be reduced to a simple closed form computed in linear time (in $n$) and constant space, including in an online fashion as new $\tilde x_i$ are observed. In particular, we derive such an estimator and provide its variance using only elementary techniques.

控制器 · Learning · 優化器 · 估計/估計量 · 動力系統 ·

2023 年 6 月 15 日

Optimal Exploration for Model-Based RL in Nonlinear Systems

Andrew Wagenmaker,Guanya Shi,Kevin Jamieson

Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. A commonly applied approach is to first explore the environment (exploration), learn an accurate model of it (system identification), and then compute an optimal controller with the minimum cost on this estimated system (policy optimization). While existing work has shown that it is possible to learn a uniformly good model of the system~\citep{mania2020active}, in practice, if we aim to learn a good controller with a low cost on the actual system, certain system parameters may be significantly more critical than others, and we therefore ought to focus our exploration on learning such parameters. In this work, we consider the setting of nonlinear dynamical systems and seek to formally quantify, in such settings, (a) which parameters are most relevant to learning a good controller, and (b) how we can best explore so as to minimize uncertainty in such parameters. Inspired by recent work in linear systems~\citep{wagenmaker2021task}, we show that minimizing the controller loss in nonlinear systems translates to estimating the system parameters in a particular, task-dependent metric. Motivated by this, we develop an algorithm able to efficiently explore the system to reduce uncertainty in this metric, and prove a lower bound showing that our approach learns a controller at a near-instance-optimal rate. Our algorithm relies on a general reduction from policy optimization to optimal experiment design in arbitrary systems, and may be of independent interest. We conclude with experiments demonstrating the effectiveness of our method in realistic nonlinear robotic systems.

情景 · 可辨認的 · 極小點 · 線性的 · 值域 ·

2023 年 6 月 15 日

Set Selection under Explorable Stochastic Uncertainty via Covering Techniques

Nicole Megow,Jens Schl?ter

from arxiv, An extended abstract of this paper appears in the proceedings of IPCO 2023

Given subsets of uncertain values, we study the problem of identifying the subset of minimum total value (sum of the uncertain values) by querying as few values as possible. This set selection problem falls into the field of explorable uncertainty and is of intrinsic importance therein as it implies strong adversarial lower bounds for a wide range of interesting combinatorial problems such as knapsack and matchings. We consider a stochastic problem variant and give algorithms that, in expectation, improve upon these adversarial lower bounds. The key to our results is to prove a strong structural connection to a seemingly unrelated covering problem with uncertainty in the constraints via a linear programming formulation. We exploit this connection to derive an algorithmic framework that can be used to solve both problems under uncertainty, obtaining nearly tight bounds on the competitive ratio. This is the first non-trivial stochastic result concerning the sum of unknown values without further structure known for the set. With our novel methods, we lay the foundations for solving more general problems in the area of explorable uncertainty.

估計/估計量 · 均值 · 成比例 · 塑造 · 噪聲 ·

2023 年 6 月 14 日

PLAN: Variance-Aware Private Mean Estimation

Martin Aumüller,Christian Janos Lebeda,Boel Nelson,Rasmus Pagh

Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning. Though the trade-off between privacy and utility is well understood in the worst case, many datasets exhibit structure that could potentially be exploited to yield better algorithms. In this paper we present $\textit{Private Limit Adapted Noise (PLAN)}$, a family of differentially private algorithms for mean estimation in the setting where inputs are independently sampled from a distribution $\mathcal{D}$ over $\mathbf{R}^d$, with coordinate-wise standard deviations $\boldsymbol{\sigma} \in \mathbf{R}^d$. Similar to mean estimation under Mahalanobis distance, PLAN tailors the shape of the noise to the shape of the data, but unlike previous algorithms the privacy budget is spent non-uniformly over the coordinates. Under a concentration assumption on $\mathcal{D}$, we show how to exploit skew in the vector $\boldsymbol{\sigma}$, obtaining a (zero-concentrated) differentially private mean estimate with $\ell_2$ error proportional to $\|\boldsymbol{\sigma}\|_1$. Previous work has either not taken $\boldsymbol{\sigma}$ into account, or measured error in Mahalanobis distance $\unicode{x2013}$ in both cases resulting in $\ell_2$ error proportional to $\sqrt{d}\|\boldsymbol{\sigma}\|_2$, which can be up to a factor $\sqrt{d}$ larger. To verify the effectiveness of \algorithmname, we empirically evaluate accuracy on both synthetic and real world data.

簇 · 近似 · 圖 · 有偏 · motivation ·

2023 年 6 月 14 日

Multi-class Graph Clustering via Approximated Effective $p$-Resistance

Shota Saito,Mark Herbster

from arxiv, To appear at ICML 2023

This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph $p$-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the $p$-Laplacian is that the parameter $p$ induces a controllable bias on cluster structure. The drawback of $p$-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the $p$-resistance induced by the $p$-Laplacian for clustering. For $p$-resistance, small $p$ biases towards clusters with high internal connectivity while large $p$ biases towards clusters of small ``extent,'' that is a preference for smaller shortest-path distances between vertices in the cluster. However, the $p$-resistance is expensive to compute. We overcome this by developing an approximation to the $p$-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of $p$-resistance for clustering. Finally, we provide experiments comparing our approximated $p$-resistance clustering to other $p$-Laplacian based methods.

MoDELS · 塑造 · 縮放 · 混合技術 · motivation ·

2023 年 6 月 14 日

Asymptotic theory for extreme value generalized additive models

Takuma Yoshida

from arxiv, 41pages

The classical approach to analyzing extreme value data is the generalized Pareto distribution (GPD). When the GPD is used to explain a target variable with the large dimension of covariates, the shape and scale function of covariates included in GPD are sometimes modeled using the generalized additive models (GAM). In contrast to many results of application, there are no theoretical results on the hybrid technique of GAM and GPD, which motivates us to develop its asymptotic theory. We provide the rate of convergence of the estimator of shape and scale functions, as well as its local asymptotic normality.