亚洲成AV人片乱码色午夜刚交_麻豆尤物国产AV一区二区_大陆国产VA导航_久久久久久久国产精品久三级_国产城中村嫖妓在线视频_美女裸体视频一区二区播放_色综合久久久无码中文字幕波E

We investigate the problem of recovering a partially observed high-rank matrix whose columns obey a nonlinear structure such as a union of subspaces, an algebraic variety or grouped in clusters. The recovery problem is formulated as the rank minimization of a nonlinear feature map applied to the original matrix, which is then further approximated by a constrained non-convex optimization problem involving the Grassmann manifold. We propose two sets of algorithms, one arising from Riemannian optimization and the other as an alternating minimization scheme, both of which include first- and second-order variants. Both sets of algorithms have theoretical guarantees. In particular, for the alternating minimization, we establish global convergence and worst-case complexity bounds. Additionally, using the Kurdyka-Lojasiewicz property, we show that the alternating minimization converges to a unique limit point. We provide extensive numerical results for the recovery of union of subspaces and clustering under entry sampling and dense Gaussian sampling. Our methods are competitive with existing approaches and, in particular, high accuracy is achieved in the recovery using Riemannian second-order methods.

相關內容

優(you)化器

關注 4

Integration · 近似 · 散度 · 模型評估 · 平滑 ·

2021 年 11 月 1 日

Third-order accurate initialization of VOF volume fractions on unstructured grids with arbitrary polyhedral cells

Johannes Kromer,Dieter Bothe

This paper introduces a novel method for the efficient and accurate computation of volume fractions on unstructured polyhedral meshes, where the phase boundary is an orientable hypersurface, implicitly given as the iso-contour of a sufficiently smooth level-set function. Locally, i.e.~in each mesh cell, we compute a principal coordinate system in which the hypersurface can be approximated as the graph of an osculating paraboloid. A recursive application of the \textsc{Gaussian} divergence theorem then allows to analytically transform the volume integrals to curve integrals associated to the polyhedron faces, which can be easily approximated numerically by means of standard \textsc{Gauss-Legendre} quadrature. This face-based formulation enables the applicability to unstructured meshes and considerably simplifies the numerical procedure for applications in three spatial dimensions. We discuss the theoretical foundations and provide details of the numerical algorithm. Finally, we present numerical results for convex and non-convex hypersurfaces embedded in cuboidal and tetrahedral meshes, showing both high accuracy and third- to fourth-order convergence with spatial resolution.

Networking · 流形 · state-of-the-art · 相關系數 · 計算成本 ·

2021 年 10 月 31 日

Intrusion Detection using Spatial-Temporal features based on Riemannian Manifold

Amardeep Singh,Julian Jang-Jaccard

Network traffic data is a combination of different data bytes packets under different network protocols. These traffic packets have complex time-varying non-linear relationships. Existing state-of-the-art methods rise up to this challenge by fusing features into multiple subsets based on correlations and using hybrid classification techniques that extract spatial and temporal characteristics. This often requires high computational cost and manual support that limit them for real-time processing of network traffic. To address this, we propose a new novel feature extraction method based on covariance matrices that extract spatial-temporal characteristics of network traffic data for detecting malicious network traffic behavior. The covariance matrices in our proposed method not just naturally encode the mutual relationships between different network traffic values but also have well-defined geometry that falls in the Riemannian manifold. Riemannian manifold is embedded with distance metrics that facilitate extracting discriminative features for detecting malicious network traffic. We evaluated our model on NSL-KDD and UNSW-NB15 datasets and showed our proposed method significantly outperforms the conventional method and other existing studies on the dataset.

矩陣論 · Batch Size · Neural Networks · 縮放 · 隨機梯度下降 ·

2021 年 10 月 31 日

Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training

Diego Granziol,Stefan Zohren,Stephen Roberts

We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory. We demonstrate that the magnitude of the extremal values of the batch Hessian are larger than those of the empirical Hessian. We also derive similar results for the Generalised Gauss-Newton matrix approximation of the Hessian. As a consequence of our theorems we derive an analytical expressions for the maximal learning rates as a function of batch size, informing practical training regimens for both stochastic gradient descent (linear scaling) and adaptive algorithms, such as Adam (square root scaling), for smooth, non-convex deep neural networks. Whilst the linear scaling for stochastic gradient descent has been derived under more restrictive conditions, which we generalise, the square root scaling rule for adaptive optimisers is, to our knowledge, completely novel. %For stochastic second-order methods and adaptive methods, we derive that the minimal damping coefficient is proportional to the ratio of the learning rate to batch size. We validate our claims on the VGG/WideResNet architectures on the CIFAR-$100$ and ImageNet datasets. Based on our investigations of the sub-sampled Hessian we develop a stochastic Lanczos quadrature based on the fly learning rate and momentum learner, which avoids the need for expensive multiple evaluations for these key hyper-parameters and shows good preliminary results on the Pre-Residual Architecure for CIFAR-$100$.

奇異的 · CASES · 極小點 · 正則化項 · CASE ·

2021 年 10 月 29 日

MINRES for second-order PDEs with singular data

Thomas Führer,Norbert Heuer,Michael Karkulik

Minimum residual methods such as the least-squares finite element method (FEM) or the discontinuous Petrov--Galerkin method with optimal test functions (DPG) usually exclude singular data, e.g., non square-integrable loads. We consider a DPG method and a least-squares FEM for the Poisson problem. For both methods we analyze regularization approaches that allow the use of $H^{-1}$ loads, and also study the case of point loads. For all cases we prove appropriate convergence orders. We present various numerical experiments that confirm our theoretical results. Our approach extends to general well-posed second-order problems.

流形 · 流形學習 · 易處理的 · 學成 · Performer ·

2021 年 10 月 29 日

Rectangular Flows for Manifold Learning

Anthony L. Caterini,Gabriel Loaiza-Ganem,Geoff Pleiss,John P. Cunningham

from arxiv, NeurIPS 2021 Camera Ready. Code available at //github.com/layer6ai-labs/rectangular-flows

Normalizing flows are inevitable neural networks with tractable change-of-volume terms, which allow optimization of their parameters to be efficiently performed via maximum likelihood. However, data of interest are typically assumed to live in some (often unknown) low-dimensional manifold embedded in a high-dimensional ambient space. The result is a modelling mismatch since -- by construction -- the invertibility requirement implies high-dimensional support of the learned distribution. Injective flows, mappings from low- to high-dimensional spaces, aim to fix this discrepancy by learning distributions on manifolds, but the resulting volume-change term becomes more challenging to evaluate. Current approaches either avoid computing this term entirely using various heuristics, or assume the manifold is known beforehand and therefore are not widely applicable. Instead, we propose two methods to tractably calculate the gradient of this term with respect to the parameters of the model, relying on careful use of automatic differentiation and techniques from numerical linear algebra. Both approaches perform end-to-end nonlinear manifold learning and density estimation for data projected onto this manifold. We study the trade-offs between our proposed methods, empirically verify that we outperform approaches ignoring the volume-change term by more accurately learning manifolds and the corresponding distributions on them, and show promising results on out-of-distribution detection. Our code is available at //github.com/layer6ai-labs/rectangular-flows.

欠定的 · 流形 · 規范化的 · 數值分析 ·

2021 年 10 月 29 日

A Riemannian Inexact Newton Dogleg Method for Constructing a Symmetric Nonnegative Matrix with Prescribed Spectrum

Zhi Zhao,Teng-Teng Yao,Zheng-Jian Bai,Xiao-Qing Jin

from arxiv, 32 pages, 6 figures

This paper is concerned with the inverse problem of constructing a symmetric nonnegative matrix from realizable spectrum. We reformulate the inverse problem as an underdetermined nonlinear matrix equation over a Riemannian product manifold. To solve it, we develop a Riemannian underdetermined inexact Newton dogleg method for solving a general underdetermined nonlinear equation defined between Riemannian manifolds and Euclidean spaces. The global and quadratic convergence of the proposed method is established under some mild assumptions. Then we solve the inverse problem by applying the proposed method to its equivalent nonlinear matrix equation and a preconditioner for the perturbed normal Riemannian Newton equation is also constructed. Numerical tests show the efficiency of the proposed method for solving the inverse problem.

優化器 · 方差 · 協方差矩陣 · 分離的 · Continuity ·

2018 年 12 月 18 日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Perttu H?m?l?inen,Amin Babadi,Xiaoxiao Ma,Jaakko Lehtinen

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, in continuous state and actions spaces and a Gaussian policy -- common in computer animation and robotics -- PPO is prone to getting stuck in local optima. In this paper, we observe a tendency of PPO to prematurely shrink the exploration variance, which naturally leads to slow progress. Motivated by this, we borrow ideas from CMA-ES, a black-box optimization method designed for intelligent adaptive Gaussian exploration, to derive PPO-CMA, a novel proximal policy optimization approach that can expand the exploration variance on objective function slopes and shrink the variance when close to the optimum. This is implemented by using separate neural networks for policy mean and variance and training the mean and variance in separate passes. Our experiments demonstrate a clear improvement over vanilla PPO in many difficult OpenAI Gym MuJoCo tasks.

優化器 · Lipschitz連續 · 正則化項 · Continuity · Lipschitz ·

2018 年 6 月 1 日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman,Francis Bach,Sébastien Bubeck,Yin Tat Lee,Laurent Massoulié

from arxiv, 17 pages

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

最大平均偏差 · 優化器 · Performer · CASES · tuning ·

2018 年 1 月 30 日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Ben Usman,Kate Saenko,Brian Kulis

from arxiv, ICLR 2018 Conference Invite to Workshop

Methods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.