特警力量全集免费观看_国产欧美日韩精品A在线播放_亚洲国产人成中文幕一级二级_欧美国产超碰人人做人人爽WWW_国产精品一区二区久久宅男_国产免费永久在线视频_99在线精品国自产拍中

In this paper, we propose a gradient-based block coordinate descent (BCD-G) framework to solve the joint approximate diagonalization of matrices defined on the product of the complex Stiefel manifold and the special linear group. Instead of the cyclic fashion, we choose a block optimization based on the Riemannian gradient. To update the first block variable in the complex Stiefel manifold, we use the well-known line search descent method. To update the second block variable in the special linear group, based on four kinds of different elementary transformations, we construct three classes: GLU, GQU and GU, and then get three BCD-G algorithms: BCD-GLU, BCD-GQU and BCD-GU. We establish the global and weak convergence of these three algorithms using the \L{}ojasiewicz gradient inequality under the assumption that the iterates are bounded. We also propose a gradient-based Jacobi-type framework to solve the joint approximate diagonalization of matrices defined on the special linear group. As in the BCD-G case, using the GLU and GQU classes of elementary transformations, we focus on the Jacobi-GLU and Jacobi-GQU algorithms and establish their global and weak convergence. All the algorithms and convergence results described in this paper also apply to the real case.

相關內容

塊

關注 1

穩健性 · 泛函 · 損失函數（機器學習） · Learning · 樣例 ·

2023 年 6 月 9 日

Distributionally Robust Learning with Weakly Convex Losses: Convergence Rates and Finite-Sample Guarantees

Landi Zhu,Mert Gürbüzbalaban,Andrzej Ruszczyński

We consider a distributionally robust stochastic optimization problem and formulate it as a stochastic two-level composition optimization problem with the use of the mean--semideviation risk measure. In this setting, we consider a single time-scale algorithm, involving two versions of the inner function value tracking: linearized tracking of a continuously differentiable loss function, and SPIDER tracking of a weakly convex loss function. We adopt the norm of the gradient of the Moreau envelope as our measure of stationarity and show that the sample complexity of $\mathcal{O}(\varepsilon^{-3})$ is possible in both cases, with only the constant larger in the second case. Finally, we demonstrate the performance of our algorithm with a robust learning example and a weakly convex, non-smooth regression example.

方陣 · 線性的 · 無偏估計 · 坐標下降 · motivation ·

2023 年 6 月 9 日

Linearly convergent adjoint free solution of least squares problems by random descent

Dirk A. Lorenz,Felix Schneppe,Lionel Tondji

We consider the problem of solving linear least squares problems in a framework where only evaluations of the linear map are possible. We derive randomized methods that do not need any other matrix operations than forward evaluations, especially no evaluation of the adjoint map is needed. Our method is motivated by the simple observation that one can get an unbiased estimate of the application of the adjoint. We show convergence of the method and then derive a more efficient method that uses an exact linesearch. This method, called random descent, resembles known methods in other context and has the randomized coordinate descent method as special case. We provide convergence analysis of the random descent method emphasizing the dependence on the underlying distribution of the random vectors. Furthermore we investigate the applicability of the method in the context of ill-posed inverse problems and show that the method can have beneficial properties when the unknown solution is rough. We illustrate the theoretical findings in numerical examples. One particular result is that the random descent method actually outperforms established transposed-free methods (TFQMR and CGS) in examples.

近似 · Learning · 隨機變量 · 蒙特卡羅 · 隨機梯度下降 ·

2023 年 6 月 8 日

Learning the random variables in Monte Carlo simulations with stochastic gradient descent: Machine learning for parametric PDEs and financial derivative pricing

Sebastian Becker,Arnulf Jentzen,Marvin S. Müller,Philippe von Wurstemberger

from arxiv, 71 pages, 4 Figures, 14 Tables; to appear in Math. Finance

In financial engineering, prices of financial products are computed approximately many times each trading day with (slightly) different parameters in each calculation. In many financial models such prices can be approximated by means of Monte Carlo (MC) simulations. To obtain a good approximation the MC sample size usually needs to be considerably large resulting in a long computing time to obtain a single approximation. In this paper we introduce a new approximation strategy for parametric approximation problems including the parametric financial pricing problems described above. A central aspect of the approximation strategy proposed in this article is to combine MC algorithms with machine learning techniques to, roughly speaking, learn the random variables (LRV) in MC simulations. In other words, we employ stochastic gradient descent (SGD) optimization methods not to train parameters of standard artificial neural networks (ANNs) but to learn random variables appearing in MC approximations. We numerically test the LRV strategy on various parametric problems with convincing results when compared with standard MC simulations, Quasi-Monte Carlo simulations, SGD-trained shallow ANNs, and SGD-trained deep ANNs. Our numerical simulations strongly indicate that the LRV strategy might be capable to overcome the curse of dimensionality in the $L^\infty$-norm in several cases where the standard deep learning approach has been proven not to be able to do so. This is not a contradiction to lower bounds established in the scientific literature because this new LRV strategy is outside of the class of algorithms for which lower bounds have been established in the scientific literature. The proposed LRV strategy is of general nature and not only restricted to the parametric financial pricing problems described above, but applicable to a large class of approximation problems.

Networking · Lipschitz連續 · Continuity · Lipschitz · 平滑 ·

2023 年 6 月 7 日

Achieving Consensus over Compact Submanifolds

Jiang Hu,Jiaojiao Zhang,Kangkang Deng

from arxiv, 25 pages

We consider the consensus problem in a decentralized network, focusing on a compact submanifold that acts as a nonconvex constraint set. By leveraging the proximal smoothness of the compact submanifold, which encompasses the local singleton property and the local Lipschitz continuity of the projection operator on the manifold, and establishing the connection between the projection operator and general retraction, we show that the Riemannian gradient descent with a unit step size has locally linear convergence if the network has a satisfactory level of connectivity. Moreover, based on the geometry of the compact submanifold, we prove that a convexity-like regularity condition, referred to as the restricted secant inequality, always holds in an explicitly characterized neighborhood around the solution set of the nonconvex consensus problem. By leveraging this restricted secant inequality and imposing a weaker connectivity requirement on the decentralized network, we present a comprehensive analysis of the linear convergence of the Riemannian gradient descent, taking into consideration appropriate initialization and step size. Furthermore, if the network is well connected, we demonstrate that the local Lipschitz continuity endowed by proximal smoothness is a sufficient condition for the restricted secant inequality, thus contributing to the local error bound. We believe that our established results will find more application in the consensus problems over a more general proximally smooth set. Numerical experiments are conducted to validate our theoretical findings.

估計/估計量 · Learning · 主成分回歸 · 縮放 · 操作 ·

2023 年 6 月 7 日

Estimating Koopman operators with sketching to provably learn large scale dynamical systems

Giacomo Meanti,Antoine Chatalic,Vladimir R. Kostic,Pietro Novelli,Massimiliano Pontil,Lorenzo Rosasco

from arxiv, 9 pages, 4 figures

The theory of Koopman operators allows to deploy non-parametric machine learning algorithms to predict and analyze complex dynamical systems. Estimators such as principal component regression (PCR) or reduced rank regression (RRR) in kernel spaces can be shown to provably learn Koopman operators from finite empirical observations of the system's time evolution. Scaling these approaches to very long trajectories is a challenge and requires introducing suitable approximations to make computations feasible. In this paper, we boost the efficiency of different kernel-based Koopman operator estimators using random projections (sketching). We derive, implement and test the new "sketched" estimators with extensive experiments on synthetic and large-scale molecular dynamics datasets. Further, we establish non asymptotic error bounds giving a sharp characterization of the trade-offs between statistical learning rates and computational efficiency. Our empirical and theoretical analysis shows that the proposed estimators provide a sound and efficient way to learn large scale dynamical systems. In particular our experiments indicate that the proposed estimators retain the same accuracy of PCR or RRR, while being much faster.

合一 · Extensibility · 全 · 示例 · 論文 ·

2023 年 6 月 7 日

E-unification for Second-Order Abstract Syntax

Nikolai Kudasov

from arxiv, An extended version (with a few more examples and some extra remarks)

Higher-order unification (HOU) concerns unification of (extensions of) $\lambda$-calculus and can be seen as an instance of equational unification ($E$-unification) modulo $\beta\eta$-equivalence of $\lambda$-terms. We study equational unification of terms in languages with arbitrary variable binding constructions modulo arbitrary second-order equational theories. Abstract syntax with general variable binding and parametrised metavariables allows us to work with arbitrary binders without committing to $\lambda$-calculus or use inconvenient and error-prone term encodings, leading to a more flexible framework. In this paper, we introduce $E$-unification for second-order abstract syntax and describe a unification procedure for such problems, merging ideas from both full HOU and general $E$-unification. We prove that the procedure is sound and complete.

穩健性 · anchor · 表示 · 可約的 · Extensibility ·

2023 年 6 月 7 日

A2B: Anchor to Barycentric Coordinate for Robust Correspondence

Weiyue Zhao,Hao Lu,Zhiguo Cao,Xin Li

from arxiv, Accepted by International Journal of Computer Vision

There is a long-standing problem of repeated patterns in correspondence problems, where mismatches frequently occur because of inherent ambiguity. The unique position information associated with repeated patterns makes coordinate representations a useful supplement to appearance representations for improving feature correspondences. However, the issue of appropriate coordinate representation has remained unresolved. In this study, we demonstrate that geometric-invariant coordinate representations, such as barycentric coordinates, can significantly reduce mismatches between features. The first step is to establish a theoretical foundation for geometrically invariant coordinates. We present a seed matching and filtering network (SMFNet) that combines feature matching and consistency filtering with a coarse-to-fine matching strategy in order to acquire reliable sparse correspondences. We then introduce DEGREE, a novel anchor-to-barycentric (A2B) coordinate encoding approach, which generates multiple affine-invariant correspondence coordinates from paired images. DEGREE can be used as a plug-in with standard descriptors, feature matchers, and consistency filters to improve the matching quality. Extensive experiments in synthesized indoor and outdoor datasets demonstrate that DEGREE alleviates the problem of repeated patterns and helps achieve state-of-the-art performance. Furthermore, DEGREE also reports competitive performance in the third Image Matching Challenge at CVPR 2021. This approach offers a new perspective to alleviate the problem of repeated patterns and emphasizes the importance of choosing coordinate representations for feature correspondences.

估計/估計量 · 邊緣似然函數 · 超參數 · 邊緣化 · 似然 ·

2023 年 6 月 6 日

Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels

Alexander Immer,Tycho F. A. van der Ouderaa,Mark van der Wilk,Gunnar R?tsch,Bernhard Sch?lkopf

from arxiv, ICML 2023

Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. In contrast to previous estimators, these bounds are amenable to stochastic-gradient-based optimization and allow to trade off estimation accuracy against computational complexity. We derive them using the function-space form of the linearized Laplace, which can be estimated using the neural tangent kernel. Experimentally, we show that the estimators can significantly accelerate gradient-based hyperparameter optimization.

估計/估計量 · 估計誤差 · MoDELS · 學成 · 無偏 ·

2020 年 12 月 17 日

The Causal Learning of Retail Delinquency

Yiyan Huang,Cheuk Hang Leung,Xing Yan,Qi Wu,Nanbo Peng,Dongdong Wang,Zhixiang Huang

from arxiv, This paper was accepted and will be published in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

MoDELS · Transformer模型 · 變換 · 推斷 · 模型評估 ·

2020 年 6 月 23 日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li,Eric Wallace,Sheng Shen,Kevin Lin,Kurt Keutzer,Dan Klein,Joseph E. Gonzalez

from arxiv, ICML 2020

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.