亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='071zv'></tfoot>

<legend id='071zv'><style id='071zv'><dir id='071zv'><q id='071zv'></q></dir></style></legend>

<i id='071zv'><tr id='071zv'><dt id='071zv'><q id='071zv'><span id='071zv'><b id='071zv'><form id='071zv'><ins id='071zv'></ins><ul id='071zv'></ul><sub id='071zv'></sub></form><legend id='071zv'></legend><bdo id='071zv'><pre id='071zv'><center id='071zv'></center></pre></bdo></b><th id='071zv'></th></span></q></dt></tr></i><div id='071zv'><tfoot id='071zv'></tfoot><dl id='071zv'><fieldset id='071zv'></fieldset></dl></div>

·

去噪 · 平滑 · Performer · 估計/估計量 · 圖 ·

2021 年 12 月 3 日

Error analysis for denoising smooth modulo signals on a graph

from arxiv, 36 pages, 2 figures. Added Section 5 (Simulations) and made minor changes as per reviewers comments

In many applications, we are given access to noisy modulo samples of a smooth function with the goal being to robustly unwrap the samples, i.e., to estimate the original samples of the function. In a recent work, Cucuringu and Tyagi proposed denoising the modulo samples by first representing them on the unit complex circle and then solving a smoothness regularized least squares problem -- the smoothness measured w.r.t the Laplacian of a suitable proximity graph $G$ -- on the product manifold of unit circles. This problem is a quadratically constrained quadratic program (QCQP) which is nonconvex, hence they proposed solving its sphere-relaxation leading to a trust region subproblem (TRS). In terms of theoretical guarantees, $\ell_2$ error bounds were derived for (TRS). These bounds are however weak in general and do not really demonstrate the denoising performed by (TRS). In this work, we analyse the (TRS) as well as an unconstrained relaxation of (QCQP). For both these estimators we provide a refined analysis in the setting of Gaussian noise and derive noise regimes where they provably denoise the modulo observations w.r.t the $\ell_2$ norm. The analysis is performed in a general setting where $G$ is any connected graph.

相關內容

估計/估計量 · 優化器 · 統計量 · 正則化項 · 向量化 ·

2022 年 2 月 7 日

Fundamental Barriers to High-Dimensional Regression with Convex Penalties

Michael Celentano,Andrea Montanari

from arxiv, 100 pages; 1 pdf figure

In high-dimensional regression, we attempt to estimate a parameter vector $\beta_0\in\mathbb{R}^p$ from $n\lesssim p$ observations $\{(y_i,x_i)\}_{i\leq n}$ where $x_i\in\mathbb{R}^p$ is a vector of predictors and $y_i$ is a response variable. A well-established approach uses convex regularizers to promote specific structures (e.g. sparsity) of the estimate $\widehat{\beta}$, while allowing for practical algorithms. Theoretical analysis implies that convex penalization schemes have nearly optimal estimation properties in certain settings. However, in general the gaps between statistically optimal estimation (with unbounded computational resources) and convex methods are poorly understood. We show that when the statistican has very simple structural information about the distribution of the entries of $\beta_0$, a large gap frequently exists between the best performance achieved by any convex regularizer satisfying a mild technical condition and either (i) the optimal statistical error or (ii) the statistical error achieved by optimal approximate message passing algorithms. Remarkably, a gap occurs at high enough signal-to-noise ratio if and only if the distribution of the coordinates of $\beta_0$ is not log-concave. These conclusions follow from an analysis of standard Gaussian designs. Our lower bounds are expected to be generally tight, and we prove tightness under certain conditions.

優化器 · 代價函數 · Performer · CASES · 特化 ·

2022 年 2 月 7 日

A Dimension-Insensitive Algorithm for Stochastic Zeroth-Order Optimization

Hongcheng Liu,Yu Yang

This paper concerns a convex, stochastic zeroth-order optimization (S-ZOO) problem. The objective is to minimize the expectation of a cost function whose gradient is not directly accessible. For this problem, traditional optimization algorithms mostly yield query complexities that grow polynomially with dimensionality (the number of decision variables). Consequently, these methods may not perform well in solving massive-dimensional problems arising in many modern applications. Although more recent methods can be provably dimension-insensitive, almost all of them require arguably more stringent conditions such as everywhere sparse or compressible gradient. In this paper, we propose a sparsity-inducing stochastic gradient-free (SI-SGF) algorithm, which provably yields a dimension-free (up to a logarithmic term) query complexity in both convex and strongly convex cases. Such insensitivity to the dimensionality growth is proven, for the first time, to be achievable when neither gradient sparsity nor gradient compressibility is satisfied. Our numerical results demonstrate a consistency between our theoretical prediction and the empirical performance.

列 · Pivotal（公司） · Better · 可約的 · 標準正交 ·

2022 年 2 月 4 日

Simpler is better: A comparative study of randomized algorithms for computing the CUR decomposition

Yijun Dong,Per-Gunnar Martinsson

from arxiv, 33 pages, 13 figures

The CUR decomposition is a technique for low-rank approximation that selects small subsets of the columns and rows of a given matrix to use as bases for its column and rowspaces. It has recently attracted much interest, as it has several advantages over traditional low rank decompositions based on orthonormal bases. These include the preservation of properties such as sparsity or non-negativity, the ability to interpret data, and reduced storage requirements. The problem of finding the skeleton sets that minimize the norm of the residual error is known to be NP-hard, but classical pivoting schemes such as column pivoted QR work tend to work well in practice. When combined with randomized dimension reduction techniques, classical pivoting based methods become particularly effective, and have proven capable of very rapidly computing approximate CUR decompositions of large, potentially sparse, matrices. Another class of popular algorithms for computing CUR de-compositions are based on drawing the columns and rows randomly from the full index sets, using specialized probability distributions based on leverage scores. Such sampling based techniques are particularly appealing for very large scale problems, and are well supported by theoretical performance guarantees. This manuscript provides a comparative study of the various randomized algorithms for computing CUR decompositions that have recently been proposed. Additionally, it proposes some modifications and simplifications to the existing algorithms that leads to faster execution times.

鞍點 · 優化器 · 平滑 · surge · Continuity ·

2022 年 2 月 3 日

Optimality and Stability in Non-Convex Smooth Games

Guojun Zhang,Pascal Poupart,Yaoliang Yu

from arxiv, accepted by JMLR 2022

Convergence to a saddle point for convex-concave functions has been studied for decades, while recent years has seen a surge of interest in non-convex (zero-sum) smooth games, motivated by their recent wide applications. It remains an intriguing research challenge how local optimal points are defined and which algorithm can converge to such points. An interesting concept is known as the local minimax point, which strongly correlates with the widely-known gradient descent ascent algorithm. This paper aims to provide a comprehensive analysis of local minimax points, such as their relation with other solution concepts and their optimality conditions. We find that local saddle points can be regarded as a special type of local minimax points, called uniformly local minimax points, under mild continuity assumptions. In (non-convex) quadratic games, we show that local minimax points are (in some sense) equivalent to global minimax points. Finally, we study the stability of gradient algorithms near local minimax points. Although gradient algorithms can converge to local/global minimax points in the non-degenerate case, they would often fail in general cases. This implies the necessity of either novel algorithms or concepts beyond saddle points and minimax points in non-convex smooth games.

Shapley value · 估計/估計量 · 近似 · 樣本 · 采樣法 ·

2022 年 2 月 3 日

Sampling Permutations for Shapley Value Estimation

Rory Mitchell,Joshua Cooper,Eibe Frank,Geoffrey Holmes

from arxiv, 33 pages, 13 figures

Game-theoretic attribution techniques based on Shapley values are used to interpret black-box machine learning models, but their exact calculation is generally NP-hard, requiring approximation methods for non-trivial models. As the computation of Shapley values can be expressed as a summation over a set of permutations, a common approach is to sample a subset of these permutations for approximation. Unfortunately, standard Monte Carlo sampling methods can exhibit slow convergence, and more sophisticated quasi-Monte Carlo methods have not yet been applied to the space of permutations. To address this, we investigate new approaches based on two classes of approximation methods and compare them empirically. First, we demonstrate quadrature techniques in a RKHS containing functions of permutations, using the Mallows kernel in combination with kernel herding and sequential Bayesian quadrature. The RKHS perspective also leads to quasi-Monte Carlo type error bounds, with a tractable discrepancy measure defined on permutations. Second, we exploit connections between the hypersphere $\mathbb{S}^{d-2}$ and permutations to create practical algorithms for generating permutation samples with good properties. Experiments show the above techniques provide significant improvements for Shapley value estimates over existing methods, converging to a smaller RMSE in the same number of model evaluations.

估計/估計量 · 泛函 · 分段 · Continuity · Lipschitz ·

2022 年 2 月 3 日

Error Estimates for Adaptive Spectral Decompositions

Daniel H. Baffet,Yannik G. Gleichmann,Marcus J. Grote

Adaptive spectral (AS) decompositions associated with a piecewise constant function $u$ yield small subspaces where the characteristic functions comprising $u$ are well approximated. When combined with Newton-like optimization methods for the solution of inverse medium problems, AS decompositions have proved remarkably efficient in providing at each nonlinear iteration a low-dimensional search space. Here, we derive $L^2$-error estimates for the AS decomposition of $u$, truncated after $K$ terms, when $u$ is piecewise constant and consists of $K$ characteristic functions over Lipschitz domains and a background. Our estimates apply both to the continuous and the discrete Galerkin finite element setting. Numerical examples illustrate the accuracy of the AS decomposition for media that either do, or do not, satisfy the assumptions of the theory.

未標記 · Networking · MoDELS · 樣本復雜度 · 無監督 ·

2021 年 2 月 8 日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Colin Wei,Kendrick Shen,Yining Chen,Tengyu Ma

Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic ``expansion'' assumption, which states that a low-probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.

圖 · Processing（編程語言） · Signal Processing · 傅立葉變換 · Extensibility ·

2019 年 9 月 23 日

Graph Signal Processing -- Part II: Processing and Analyzing Signals on Graphs

Ljubisa Stankovic,Danilo Mandic,Milos Dakovic,Milos Brajovic,Bruno Scalzo,Anthony G. Constantinides

from arxiv, 60 pages, 50 figures,

The focus of Part I of this monograph has been on both the fundamental properties, graph topologies, and spectral representations of graphs. Part II embarks on these concepts to address the algorithmic and practical issues centered round data/signal processing on graphs, that is, the focus is on the analysis and estimation of both deterministic and random data on graphs. The fundamental ideas related to graph signals are introduced through a simple and intuitive, yet illustrative and general enough case study of multisensor temperature field estimation. The concept of systems on graph is defined using graph signal shift operators, which generalize the corresponding principles from traditional learning systems. At the core of the spectral domain representation of graph signals and systems is the Graph Discrete Fourier Transform (GDFT). The spectral domain representations are then used as the basis to introduce graph signal filtering concepts and address their design, including Chebyshev polynomial approximation series. Ideas related to the sampling of graph signals are presented and further linked with compressive sensing. Localized graph signal analysis in the joint vertex-spectral domain is referred to as the vertex-frequency analysis, since it can be considered as an extension of classical time-frequency analysis to the graph domain of a signal. Important topics related to the local graph Fourier transform (LGFT) are covered, together with its various forms including the graph spectral and vertex domain windows and the inversion conditions and relations. A link between the LGFT with spectral varying window and the spectral graph wavelet transform (SGWT) is also established. Realizations of the LGFT and SGWT using polynomial (Chebyshev) approximations of the spectral functions are further considered. Finally, energy versions of the vertex-frequency representations are introduced.

隨機梯度下降 · 規范化的 · Batch Size · 優化器 · 寬度 ·

2019 年 5 月 9 日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Daniel S. Park,Jascha Sohl-Dickstein,Quoc V. Le,Samuel L. Smith

from arxiv, 17 pages, 3 tables, 17 figures; accepted to ICML 2019

We investigate how the final parameters found by stochastic gradient descent are influenced by over-parameterization. We generate families of models by increasing the number of channels in a base network, and then perform a large hyper-parameter search to study how the test error depends on learning rate, batch size, and network width. We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions. In the absence of batch normalization, the optimal normalized noise scale is directly proportional to width. Wider networks, with their higher optimal noise scale, also achieve higher test accuracy. These observations hold for MLPs, ConvNets, and ResNets, and for two different parameterization schemes ("Standard" and "NTK"). We observe a similar trend with batch normalization for ResNets. Surprisingly, since the largest stable learning rate is bounded, the largest batch size consistent with the optimal normalized noise scale decreases as the width increases.

Networking · Neural Networks · 優化器 · contrastive · CASE ·

2018 年 8 月 3 日

A Dual Approach to Scalable Verification of Deep Networks

Krishnamurthy, Dvijotham,Robert Stanforth,Sven Gowal,Timothy Mann,Pushmeet Kohli

This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

估計/估計量

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tr id='72Y9Q'><strong id='TQYrU'></strong><small id='7FqTE'></small><button id='5wQAK'></button><li id='Bnv78'><noscript id='NYLrQ'><big id='FgL6s'></big><dt id='mhcSa'></dt></noscript></li></tr><ol id='l2e8K'><option id='orbZV'><table id='rYMlo'><blockquote id='SF2ac'><tbody id='6eHLo'></tbody></blockquote></table></option></ol><u id='14uAr'></u><kbd id='Xt9zz'><kbd id='KixvT'></kbd></kbd>

<code id='KYg6U'><strong id='am646'></strong></code>

<fieldset id='k6ODD'></fieldset>

<span id='3nsvm'></span>

<ins id='ml5R8'></ins>

<acronym id='2fE8E'><em id='GAPC8'></em><td id='sUL8a'><div id='MICM9'></div></td></acronym><address id='O9ELS'><big id='ylPtE'><big id='AZfbV'></big><legend id='RbYdg'></legend></big></address>

<i id='n8Ghu'><div id='CdJn7'><ins id='Pbj9H'></ins></div></i>

<i id='DYynN'></i>