东京热加勒比中文无码,久久久久久精品女人国产毛片,日韩AV免费在线观看,亚洲六月丁香色婷婷综合久久,国内自产拍无码精品视频网站

We study a general matrix optimization problem with a fixed-rank positive semidefinite (PSD) constraint. We perform the Burer-Monteiro factorization and consider a particular Riemannian quotient geometry in a search space that has a total space equipped with the Euclidean metric. When the original objective f satisfies standard restricted strong convexity and smoothness properties, we characterize the global landscape of the factorized objective under the Riemannian quotient geometry. We show the entire search space can be divided into three regions: (R1) the region near the target parameter of interest, where the factorized objective is geodesically strongly convex and smooth; (R2) the region containing neighborhoods of all strict saddle points; (R3) the remaining regions, where the factorized objective has a large gradient. To our best knowledge, this is the first global landscape analysis of the Burer-Monteiro factorized objective under the Riemannian quotient geometry. Our results provide a fully geometric explanation for the superior performance of vanilla gradient descent under the Burer-Monteiro factorization. When f satisfies a weaker restricted strict convexity property, we show there exists a neighborhood near local minimizers such that the factorized objective is geodesically convex. To prove our results we provide a comprehensive landscape analysis of a matrix factorization problem with a least squares objective, which serves as a critical bridge. Our conclusions are also based on a result of independent interest stating that the geodesic ball centered at Y with a radius 1/3 of the least singular value of Y is a geodesically convex set under the Riemannian quotient geometry, which as a corollary, also implies a quantitative bound of the convexity radius in the Bures-Wasserstein space. The convexity radius obtained is sharp up to constants.

相關內容

分解的

關注 1

INFORMS · 估計/估計量 · 可理解性 · AI · MoDELS ·

2023 年 1 月 31 日

An Objective Metric for Explainable AI: How and Why to Estimate the Degree of Explainability

Francesco Sovrano,Fabio Vitali

from arxiv, 24 pages, 7 figures, 6 tables, Source code available at: //github.com/Francesco-Sovrano/DoXpy

Explainable AI was born as a pathway to allow humans to explore and understand the inner working of complex systems. However, establishing what is an explanation and objectively evaluating explainability are not trivial tasks. This paper presents a new model-agnostic metric to measure the Degree of Explainability of information in an objective way. We exploit a specific theoretical model from Ordinary Language Philosophy called the Achinstein's Theory of Explanations, implemented with an algorithm relying on deep language models for knowledge graph extraction and information retrieval. To understand whether this metric can measure explainability, we devised a few experiments and user studies involving more than 190 participants, evaluating two realistic systems for healthcare and finance using famous AI technology, including Artificial Neural Networks and TreeSHAP. The results we obtained are statistically significant (with P values lower than .01), suggesting that our proposed metric for measuring the Degree of Explainability is robust in several scenarios, and it aligns with concrete expectations.

近端梯度下降 · 正則化項 · 去噪 · 圖像還原 · Performer ·

2023 年 1 月 31 日

A relaxed proximal gradient descent algorithm for convergent plug-and-play with proximal denoiser

Samuel Hurault,Antonin Chambolle,Arthur Leclaire,Nicolas Papadakis

This paper presents a new convergent Plug-and-Play (PnP) algorithm. PnP methods are efficient iterative algorithms for solving image inverse problems formulated as the minimization of the sum of a data-fidelity term and a regularization term. PnP methods perform regularization by plugging a pre-trained denoiser in a proximal algorithm, such as Proximal Gradient Descent (PGD). To ensure convergence of PnP schemes, many works study specific parametrizations of deep denoisers. However, existing results require either unverifiable or suboptimal hypotheses on the denoiser, or assume restrictive conditions on the parameters of the inverse problem. Observing that these limitations can be due to the proximal algorithm in use, we study a relaxed version of the PGD algorithm for minimizing the sum of a convex function and a weakly convex one. When plugged with a relaxed proximal denoiser, we show that the proposed PnP-$\alpha$PGD algorithm converges for a wider range of regularization parameters, thus allowing more accurate image restoration.

平滑 · 樣本 · 勢函數 · 離散化 · 散度 ·

2023 年 1 月 31 日

Non-convex sampling for a mixture of locally smooth potentials

Dao Nguyen

from arxiv, arXiv admin note: substantial text overlap with arXiv:2112.09311

The purpose of this paper is to examine the sampling problem through Euler discretization, where the potential function is assumed to be a mixture of locally smooth distributions and weakly dissipative. We introduce $\alpha_{G}$-mixture locally smooth and $\alpha_{H}$-mixture locally Hessian smooth, which are novel and typically satisfied with a mixture of distributions. Under our conditions, we prove the convergence in Kullback-Leibler (KL) divergence with the number of iterations to reach $\epsilon$-neighborhood of a target distribution in only polynomial dependence on the dimension. The convergence rate is improved when the potential is $1$-smooth and $\alpha_{H}$-mixture locally Hessian smooth. Our result for the non-strongly convex outside the ball of radius $R$ is obtained by convexifying the non-convex domains. In addition, we provide some nice theoretical properties of $p$-generalized Gaussian smoothing and prove the convergence in the $L_{\beta}$-Wasserstein distance for stochastic gradients in a general setting.

估計/估計量 · 貝葉斯估計 · 香農熵 · 稀疏 · 樣本 ·

2023 年 1 月 31 日

Bayesian estimation of information-theoretic metrics for sparsely sampled distributions

Angelo Piga,Lluc Font-Pomarol,Marta Sales-Pardo,Roger Guimerà

from arxiv, 10 pages, 5 figures

Estimating the Shannon entropy of a discrete distribution from which we have only observed a small sample is challenging. Estimating other information-theoretic metrics, such as the Kullback-Leibler divergence between two sparsely sampled discrete distributions, is even harder. Existing approaches to address these problems have shortcomings: they are biased, heuristic, work only for some distributions, and/or cannot be applied to all information-theoretic metrics. Here, we propose a fast, semi-analytical estimator for sparsely sampled distributions that is efficient, precise, and general. Its derivation is grounded in probabilistic considerations and uses a hierarchical Bayesian approach to extract as much information as possible from the few observations available. Our approach provides estimates of the Shannon entropy with precision at least comparable to the state of the art, and most often better. It can also be used to obtain accurate estimates of any other information-theoretic metric, including the notoriously challenging Kullback-Leibler divergence. Here, again, our approach performs consistently better than existing estimators.

簇 · 推斷 · 方差 · 情景 · 噪聲 ·

2023 年 1 月 30 日

Selective inference for clustering with unknown variance

Youngjoo Yun,Rina Foygel Barber

In many modern statistical problems, the limited available data must be used both to develop the hypotheses to test, and to test these hypotheses-that is, both for exploratory and confirmatory data analysis. Reusing the same dataset for both exploration and testing can lead to massive selection bias, leading to many false discoveries. Selective inference is a framework that allows for performing valid inference even when the same data is reused for exploration and testing. In this work, we are interested in the problem of selective inference for data clustering, where a clustering procedure is used to hypothesize a separation of the data points into a collection of subgroups, and we then wish to test whether these data-dependent clusters in fact represent meaningful differences within the data. Recent work by Gao et al. [2022] provides a framework for doing selective inference for this setting, where the hierarchical clustering algorithm is used for producing the cluster assignments, which was then extended to k-means clustering by Chen and Witten [2022]. Both these works rely on assuming a known covariance structure for the data, but in practice, the noise level needs to be estimated-and this is particularly challenging when the true cluster structure is unknown. In our work, we extend to the setting of noise with unknown variance, and provide a selective inference method for this more general setting. Empirical results show that our new method is better able to maintain high power while controlling Type I error when the true noise level is unknown.

SimPLe · UniFormer · 優化器 · 秩 · 相互獨立的 ·

2023 年 1 月 30 日

A Simple Optimal Contention Resolution Scheme for Uniform Matroids

Danish Kashaev,Richard Santiago

Contention resolution schemes (or CR schemes), introduced by Chekuri, Vondrak and Zenklusen, are a class of randomized rounding algorithms for converting a fractional solution to a relaxation for a down-closed constraint family into an integer solution. A CR scheme takes a fractional point $x$ in a relaxation polytope, rounds each coordinate $x_i$ independently to get a possibly non-feasible set, and then drops some elements in order to satisfy the constraints. Intuitively, a contention resolution scheme is $c$-balanced if every element $i$ is selected with probability at least $c \cdot x_i$. It is known that general matroids admit a $(1-1/e)$-balanced CR scheme, and that this is (asymptotically) optimal. This is in particular true for the special case of uniform matroids of rank one. In this work, we provide a simple and explicit monotone CR scheme for uniform matroids of rank $k$ on $n$ elements with a balancedness of $1 - \binom{n}{k}\:\left(1-\frac{k}{n}\right)^{n+1-k}\:\left(\frac{k}{n}\right)^k$, and show that this is optimal. As $n$ grows, this expression converges from above to $1 - e^{-k}k^k/k!$. While this asymptotic bound can be obtained by combining previously known results, these require defining an exponential-sized linear program, as well as using random sampling and the ellipsoid algorithm. Our procedure, on the other hand, has the advantage of being simple and explicit. This scheme extends naturally into an optimal CR scheme for partition matroids.

Minimax · 規范化的 · 鞍點 · 泛函 · Extensibility ·

2023 年 1 月 30 日

Nonmonotone local minimax methods for finding multiple saddle points

Wei Liu,Ziqing Xie,Wenfan Yi

from arxiv, 32 pages, 7 figures; Accepted by Journal of Computational Mathematics on January 3, 2023

In this paper, by designing a normalized nonmonotone search strategy with the Barzilai--Borwein-type step-size, a novel local minimax method (LMM), which is a globally convergent iterative method, is proposed and analyzed to find multiple (unstable) saddle points of nonconvex functionals in Hilbert spaces. Compared to traditional LMMs with monotone search strategies, this approach, which does not require strict decrease of the objective functional value at each iterative step, is observed to converge faster with less computations. Firstly, based on a normalized iterative scheme coupled with a local peak selection that pulls the iterative point back onto the solution submanifold, by generalizing the Zhang--Hager (ZH) search strategy in the optimization theory to the LMM framework, a kind of normalized ZH-type nonmonotone step-size search strategy is introduced, and then a novel nonmonotone LMM is constructed. Its feasibility and global convergence results are rigorously carried out under the relaxation of the monotonicity for the functional at the iterative sequences. Secondly, in order to speed up the convergence of the nonmonotone LMM, a globally convergent Barzilai--Borwein-type LMM (GBBLMM) is presented by explicitly constructing the Barzilai--Borwein-type step-size as a trial step-size of the normalized ZH-type nonmonotone step-size search strategy in each iteration. Finally, the GBBLMM algorithm is implemented to find multiple unstable solutions of two classes of semilinear elliptic boundary value problems with variational structures: one is the semilinear elliptic equations with the homogeneous Dirichlet boundary condition and another is the linear elliptic equations with semilinear Neumann boundary conditions. Extensive numerical results indicate that our approach is very effective and speeds up the LMMs significantly.

通用動力公司 · Learning · Analysis · 可理解性 · 真實值 ·

2023 年 1 月 27 日

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing

Jikai Jin,Zhiyuan Li,Kaifeng Lyu,Simon S. Du,Jason D. Lee

It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models. This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem, whose goal is to recover a low-rank ground-truth matrix from near-isotropic linear measurements. It is shown that GD with small initialization behaves similarly to the greedy low-rank learning heuristics (Li et al., 2020) and follows an incremental learning procedure (Gissin et al., 2019): GD sequentially learns solutions with increasing ranks until it recovers the ground truth matrix. Compared to existing works which only analyze the first learning phase for rank-1 solutions, our result provides characterizations for the whole learning process. Moreover, besides the over-parameterized regime that many prior works focused on, our analysis of the incremental learning procedure also applies to the under-parameterized regime. Finally, we conduct numerical experiments to confirm our theoretical findings.

泛函 · 優化器 · 共軛梯度 · 共軛 · Performer ·

2023 年 1 月 26 日

First Order Methods for Geometric Optimization of Crystal Structures

Antonia Tsili,Matthew Dyer,Vladimir Gusev,Piotr Krysta,Rahul Savani

The geometric optimisation of crystal structures is a procedure widely used in Chemistry that changes the geometrical placement of the particles inside a structure. It is called structural relaxation and constitutes a local minimization problem with a non-convex objective function whose domain complexity increases along with the number of particles involved. In this work we study the performance of the two most popular first order optimisation methods, Gradient Descent and Conjugate Gradient, in structural relaxation. The respective pseudocodes can be found in Section 6. Although frequently employed, there is a lack of their study in this context from an algorithmic point of view. In order to accurately define the problem, we provide a thorough derivation of all necessary formulae related to the crystal structure energy function and the function's differentiation. We run each algorithm in combination with a constant step size, which provides a benchmark for the methods' analysis and direct comparison. We also design dynamic step size rules and study how these improve the two algorithms' performance. Our results show that there is a trade-off between convergence rate and the possibility of an experiment to succeed, hence we construct a function to assign utility to each method based on our respective preference. The function is built according to a recently introduced model of preference indication concerning algorithms with deadline and their run time. Finally, building on all our insights from the experimental results, we provide algorithmic recipes that best correspond to each of the presented preferences and select one recipe as the optimal for equally weighted preferences.

非凸 · 優化器 · 因子分解 · 統計量 · 分解的 ·

2019 年 9 月 19 日

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Yuejie Chi,Yue M. Lu,Yuxin Chen

from arxiv, Invited overview article

Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.