99视频在线播放喷射_男女一边脱一边亲一边膜_久久91超碰色中文字幕总站_色综合天天综合网国产国产人_久久久久久精品人妻无码动漫专区_久久亚洲一区二区三区乱码_久久精品无码少妇无码一区二区

Bilevel optimization is one of the fundamental problems in machine learning and optimization. Recent theoretical developments in bilevel optimization focus on finding the first-order stationary points for nonconvex-strongly-convex cases. In this paper, we analyze algorithms that can escape saddle points in nonconvex-strongly-convex bilevel optimization. Specifically, we show that the perturbed approximate implicit differentiation (AID) with a warm start strategy finds $\epsilon$-approximate local minimum of bilevel optimization in $\tilde{O}(\epsilon^{-2})$ iterations with high probability. Moreover, we propose an inexact NEgative-curvature-Originated-from-Noise Algorithm (iNEON), a pure first-order algorithm that can escape saddle point and find local minimum of stochastic bilevel optimization. As a by-product, we provide the first nonasymptotic analysis of perturbed multi-step gradient descent ascent (GDmax) algorithm that converges to local minimax point for minimax problems.

相關內容

鞍點

關注 0

在數學中，鞍點(dian)或極大極小點(dian)是(shi)函數圖(tu)形表面(mian)上的一點(dian)，其正(zheng)交方(fang)向(xiang)上的斜(xie)率(導數)都為零(ling)，但它(ta)不是(shi)函數的局部極值。鞍點(dian)是(shi)在某一軸向(xiang)(峰值之間)有(you)一個相(xiang)對(dui)最(zui)(zui)小的臨(lin)界點(dian)，在交叉(cha)軸上有(you)一個相(xiang)對(dui)最(zui)(zui)大的臨(lin)界點(dian)。

駐點 · 平穩的 · 優化器 · Analysis · Oracle ·

2023 年 6 月 26 日

Near-Optimal Fully First-Order Algorithms for Finding Stationary Points in Bilevel Optimization

Lesi Chen,Yaohua Ma,Jingzhao Zhang

Bilevel optimization has various applications such as hyper-parameter optimization and meta-learning. Designing theoretically efficient algorithms for bilevel optimization is more challenging than standard optimization because the lower-level problem defines the feasibility set implicitly via another optimization problem. One tractable case is when the lower-level problem permits strong convexity. Recent works show that second-order methods can provably converge to an $\epsilon$-first-order stationary point of the problem at a rate of $\tilde{\mathcal{O}}(\epsilon^{-2})$, yet these algorithms require a Hessian-vector product oracle. Kwon et al. (2023) resolved the problem by proposing a first-order method that can achieve the same goal at a slower rate of $\tilde{\mathcal{O}}(\epsilon^{-3})$. In this work, we provide an improved analysis demonstrating that the first-order method can also find an $\epsilon$-first-order stationary point within $\tilde {\mathcal{O}}(\epsilon^{-2})$ oracle complexity, which matches the upper bounds for second-order methods in the dependency on $\epsilon$. Our analysis further leads to simple first-order algorithms that can achieve similar near-optimal rates in finding second-order stationary points and in distributed bilevel problems.

異方差 · 估計/估計量 · Analysis · 線性回歸 · 線性的 ·

2023 年 6 月 25 日

Near Optimal Heteroscedastic Regression with Symbiotic Learning

Dheeraj Baby,Aniket Das,Dheeraj Nagaraj,Praneeth Netrapalli

from arxiv, To appear in Conference on Learning Theory 2023 (COLT'23)

We consider the classical problem of heteroscedastic linear regression, where we are given $n$ samples $(\mathbf{x}_i, y_i) \in \mathbb{R}^d \times \mathbb{R}$ obtained from $y_i = \langle \mathbf{w}^{*}, \mathbf{x}_i \rangle + \epsilon_i \cdot \langle \mathbf{f}^{*}, \mathbf{x}_i \rangle$, where $\mathbf{x}_i \sim N(0,\mathbf{I})$, $\epsilon_i \sim N(0,1)$, and our task is to estimate $\mathbf{w}^{*}$. In addition to the classical applications of heteroscedastic models in fields such as statistics, econometrics, time series analysis etc., it is also particularly relevant in machine learning when data is collected from multiple sources of varying but apriori unknown quality, e.g., large model training. Our work shows that we can estimate $\mathbf{w}^{*}$ in squared norm up to an error of $\tilde{O}\left(\|\mathbf{f}^{*}\|^2 \cdot \left(\frac{1}{n} + \left(\frac{d}{n}\right)^2\right)\right)$ and prove a matching lower bound (up to logarithmic factors). Our result substantially improves upon the previous best known upper bound of $\tilde{O}\left(\|\mathbf{f}^{*}\|^2\cdot \frac{d}{n}\right)$. Our upper bound result is based on a novel analysis of a simple, classical heuristic going back to at least Davidian and Carroll (1987) and constitutes the first non-asymptotic convergence guarantee for this approach. As a byproduct, our analysis also provides improved rates of estimation for both linear regression and phase retrieval with multiplicative noise, which maybe of independent interest. The lower bound result relies on a careful application of LeCam's two point method, adapted to work with heavy tailed random variables where the relevant mutual information quantities are infinite (precluding a direct application of LeCam's method), and could also be of broader interest.

Networks · 優化器 · 貪心 · 結點 · Extensibility ·

2023 年 6 月 25 日

On Large-Scale Multiple Testing Over Networks: An Asymptotic Approach

Mehrdad Pournaderi,Yu Xiang

from arxiv, Accepted to the IEEE Transactions on Signal and Information Processing over Networks

This work concerns developing communication- and computation-efficient methods for large-scale multiple testing over networks, which is of interest to many practical applications. We take an asymptotic approach and propose two methods, proportion-matching and greedy aggregation, tailored to distributed settings. The proportion-matching method achieves the global BH performance yet only requires a one-shot communication of the (estimated) proportion of true null hypotheses as well as the number of p-values at each node. By focusing on the asymptotic optimal power, we go beyond the BH procedure by providing an explicit characterization of the asymptotic optimal solution. This leads to the greedy aggregation method that effectively approximates the optimal rejection regions at each node, while computation efficiency comes from the greedy-type approach naturally. Moreover, for both methods, we provide the rate of convergence for both the FDR and power. Extensive numerical results over a variety of challenging settings are provided to support our theoretical findings.

Analysis · 穩健性 · Better · 方陣 · Weight ·

2023 年 6 月 24 日

Analysis of Scale-Variant Robust Kernel Optimization for Non-linear Least Squares Problems

Shounak Das,Jason Gross

from arxiv, Accepted for publication in IEEE Transactions on Aerospace and Electronic Systems

In this article, we present a method for increasing adaptivity of an existing robust estimation algorithm by learning two parameters to better fit the residual distribution. The analyzed method uses these two parameters to calculate weights for Iterative Re-weighted Least Squares. This adaptive nature of the weights can be helpful in situations where the noise level varies in the measurements. We test our algorithm first on the point cloud registration problem with synthetic data sets and LiDAR odometry with open source real-world data sets. We show that the existing approach needs an additional manual tuning of a residual scale parameter which our method directly learns from data and has similar or better performance. We further present the idea of decoupling scale and shape parameters to improve performance of the algorithm. We give detailed analysis of our algorithm along with its comparison with similar well-known algorithms from literature to show the benefits of the proposed approach.

最優化 · 統計量 · 平滑 · 穩健性 · 泛函 ·

2023 年 6 月 24 日

Smoothed $f$-Divergence Distributionally Robust Optimization: Exponential Rate Efficiency and Complexity-Free Calibration

Zhenyuan Liu,Bart P. G. Van Parys,Henry Lam

In data-driven optimization, sample average approximation is known to suffer from the so-called optimizer's curse that causes optimistic bias in evaluating the solution performance. This can be tackled by adding a "margin" to the estimated objective value, or via distributionally robust optimization (DRO), a fast-growing approach based on worst-case analysis, which gives a protective bound on the attained objective value. However, in all these existing approaches, a statistically guaranteed bound on the true solution performance either requires restrictive conditions and knowledge on the objective function complexity, or otherwise exhibits an over-conservative rate that depends on the distribution dimension. We argue that a special type of DRO offers strong theoretical advantages in regard to these challenges: It attains a statistical bound on the true solution performance that is the tightest possible in terms of exponential decay rate, for a wide class of objective functions that notably does not hinge on function complexity. Correspondingly, its calibration also does not require any complexity information. This DRO uses an ambiguity set based on a KL-divergence smoothed by the Wasserstein or Levy-Prokhorov distance via a suitable distance optimization. Computationally, we also show that such a DRO, and its generalized version using smoothed $f$-divergence, is not much harder than standard DRO problems using the $f$-divergence or Wasserstein distance, thus supporting the strengths of such DRO as both statistically optimal and computationally viable.

規范化的 · 隨機變量 · 極大似然 · 最大似然估計 · 對數似然 ·

2023 年 6 月 23 日

On tracking varying bounds when forecasting bounded time series

Amandine Pierrot,Pierre Pinson

from arxiv, 43 pages, 9 figures

We consider a new framework where a continuous, though bounded, random variable has unobserved bounds that vary over time. In the context of univariate time series, we look at the bounds as parameters of the distribution of the bounded random variable. We introduce an extended log-likelihood estimation and design algorithms to track the bound through online maximum likelihood estimation. Since the resulting optimization problem is not convex, we make use of recent theoretical results on Normalized Gradient Descent (NGD) for quasiconvex optimization, to eventually derive an Online Normalized Gradient Descent algorithm. We illustrate and discuss the workings of our approach based on both simulation studies and a real-world wind power forecasting problem.

非凸 · 優化器 · Projection · 平滑 · 樣本 ·

2023 年 6 月 23 日

Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data

Ahmet Alacaoglu,Hanbaek Lyu

from arxiv, 32 pages, 1 figure, 1 table

We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. We show the worst-case rate of convergence $\tilde{O}(t^{-1/4})$ and complexity $\tilde{O}(\varepsilon^{-4})$ for achieving an $\varepsilon$-near stationary point in terms of the norm of the gradient of Moreau envelope and gradient mapping. While classical convergence guarantee requires i.i.d. data sampling from the target distribution, we only require a mild mixing condition of the conditional distribution, which holds for a wide class of Markov chain sampling algorithms. This improves the existing complexity for the constrained smooth nonconvex optimization with dependent data from $\tilde{O}(\varepsilon^{-8})$ to $\tilde{O}(\varepsilon^{-4})$ with a significantly simpler analysis. We illustrate the generality of our approach by deriving convergence results with dependent data for stochastic proximal gradient methods, adaptive stochastic gradient algorithm AdaGrad and stochastic gradient algorithm with heavy ball momentum. As an application, we obtain first online nonnegative matrix factorization algorithms for dependent data based on stochastic projected gradient methods with adaptive step sizes and optimal rate of convergence.

估計/估計量 · 可約的 · 優化器 · 狀態估計 · CC ·

2023 年 6 月 22 日

Iteratively Preconditioned Gradient-Descent Approach for Moving Horizon Estimation Problems

Tianchen Liu,Kushal Chakrabarti,Nikhil Chopra

Moving horizon estimation (MHE) is a widely studied state estimation approach in several practical applications. In the MHE problem, the state estimates are obtained via the solution of an approximated nonlinear optimization problem. However, this optimization step is known to be computationally complex. Given this limitation, this paper investigates the idea of iteratively preconditioned gradient-descent (IPG) to solve MHE problem with the aim of an improved performance than the existing solution techniques. To our knowledge, the preconditioning technique is used for the first time in this paper to reduce the computational cost and accelerate the crucial optimization step for MHE. The convergence guarantee of the proposed iterative approach for a class of MHE problems is presented. Additionally, sufficient conditions for the MHE problem to be convex are also derived. Finally, the proposed method is implemented on a unicycle localization example. The simulation results demonstrate that the proposed approach can achieve better accuracy with reduced computational costs.

優化器 · 可辨認的 · 可約的 · 估計誤差 · 代價函數 ·

2023 年 6 月 22 日

Efficient Query Re-optimization with Judicious Subquery Selections

Junyi Zhao,Huanchen Zhang,Yihan Gao

Query re-optimization is an adaptive query processing technique that re-invokes the optimizer at certain points in query execution. The goal is to dynamically correct the cardinality estimation errors using the statistics collected at runtime to adjust the query plan to improve the overall performance. We identify a key weakness in existing re-optimization algorithms: their subquery division and re-optimization trigger strategies rely heavily on the optimizer's initial plan, which can be far away from optimal. We, therefore, propose QuerySplit, a novel re-optimization algorithm that skips the potentially misleading global plan and instead generates subqueries directly from the logical plan as the basic re-optimization units. By developing a cost function that prioritizes the execution of less "damaging" subqueries, QuerySplit successfully postpones (sometimes avoids) the execution of complex large joins to maximize their probability of having smaller input sizes. We implemented QuerySplit in PostgreSQL and compared our solution against four state-of-the-art re-optimization algorithms using the Join Order Benchmark. Our experiments show that QuerySplit reduces the benchmark execution time by 35% compared to the second-best alternative. The performance gap between QuerySplit and an optimal optimizer is within 4%.

潛變量/隱變量 · 估計/估計量 · MoDELS · 潛在 · 隨機梯度下降 ·

2023 年 6 月 22 日

Efficient preconditioned stochastic gradient descent for estimation in latent variable models

Charlotte Baey,Maud Delattre,Estelle Kuhn,Jean-Benoist Leger,Sarah Lemler

Latent variable models are powerful tools for modeling complex phenomena involving in particular partially observed data, unobserved variables or underlying complex unknown structures. Inference is often difficult due to the latent structure of the model. To deal with parameter estimation in the presence of latent variables, well-known efficient methods exist, such as gradient-based and EM-type algorithms, but with practical and theoretical limitations. In this paper, we propose as an alternative for parameter estimation an efficient preconditioned stochastic gradient algorithm. Our method includes a preconditioning step based on a positive definite Fisher information matrix estimate. We prove convergence results for the proposed algorithm under mild assumptions for very general latent variables models. We illustrate through relevant simulations the performance of the proposed methodology in a nonlinear mixed effects model and in a stochastic block model.