草莓视频在线观看免费完整_啊在线不卡视频无码_美女免费在线播放_国产免费人成在线视频观看_国产成人91亚洲精品无码_亚洲一区二区三区欧美久久精品_国产又长又粗又爽免费视频

from arxiv, Accepted at NeurIPS 2021, including the appendix. In the previous versions (v1 and v2), the experimental results of Table 10 are incorrect and have been corrected in the current version

One major problem in black-box adversarial attacks is the high query complexity in the hard-label attack setting, where only the top-1 predicted label is available. In this paper, we propose a novel geometric-based approach called Tangent Attack (TA), which identifies an optimal tangent point of a virtual hemisphere located on the decision boundary to reduce the distortion of the attack. Assuming the decision boundary is locally flat, we theoretically prove that the minimum $\ell_2$ distortion can be obtained by reaching the decision boundary along the tangent line passing through such tangent point in each iteration. To improve the robustness of our method, we further propose a generalized method which replaces the hemisphere with a semi-ellipsoid to adapt to curved decision boundaries. Our approach is free of hyperparameters and pre-training. Extensive experiments conducted on the ImageNet and CIFAR-10 datasets demonstrate that our approach can consume only a small number of queries to achieve the low-magnitude distortion. The implementation source code is released online at //github.com/machanic/TangentAttack.

相關內容

可約的

關注 2

SGD · 非凸 · 局部極小 · 極小值 · 泛函 ·

2022 年 2 月 18 日

Tackling benign nonconvexity with smoothing and stochastic gradients

Harsh Vardhan,Sebastian U. Stich

Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning. While such complex problems can often be successfully optimized in practice by using stochastic gradient descent (SGD), theoretical analysis cannot adequately explain this success. In particular, the standard analyses do not show global convergence of SGD on non-convex functions, and instead show convergence to stationary points (which can also be local minima or saddle points). We identify a broad class of nonconvex functions for which we can show that perturbed SGD (gradient descent perturbed by stochastic noise -- covering SGD as a special case) converges to a global minimum (or a neighborhood thereof), in contrast to gradient descent without noise that can get stuck in local minima far from a global solution. For example, on non-convex functions that are relatively close to a convex-like (strongly convex or PL) function we show that SGD can converge linearly to a global optimum.

估計/估計量 · 方差 · 隨機森林 · 可約的 · 子采樣 ·

2022 年 2 月 18 日

On Variance Estimation of Random Forests

Tianning Xu,Ruoqing Zhu,Xiaofeng Shao

Ensemble methods based on subsampling, such as random forests, are popular in applications due to their high predictive accuracy. Existing literature views a random forest prediction as an infinite-order incomplete U-statistic to quantify its uncertainty. However, these methods focus on a small subsampling size of each tree, which is theoretically valid but practically limited. This paper develops an unbiased variance estimator based on incomplete U-statistics, which allows the tree size to be comparable with the overall sample size, making statistical inference possible in a broader range of real applications. Simulation results demonstrate that our estimators enjoy lower bias and more accurate confidence interval coverage without additional computational costs. We also propose a local smoothing procedure to reduce the variation of our estimator, which shows improved numerical performance when the number of trees is relatively small. Further, we investigate the ratio consistency of our proposed variance estimator under specific scenarios. In particular, we develop a new "double U-statistic" formulation to analyze the Hoeffding decomposition of the estimator's variance.

MoDELS · 估計/估計量 · state-of-the-art · 似然 · 噪聲 ·

2022 年 2 月 18 日

Variational Diffusion Models

Diederik P. Kingma,Tim Salimans,Ben Poole,Jonathan Ho

from arxiv, Published at NeurIPS'21. Camera-ready version

Diffusion-based generative models have demonstrated a capacity for perceptually impressive synthesis, but can they also be great likelihood-based models? We answer this in the affirmative, and introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on standard image density estimation benchmarks. Unlike other diffusion-based models, our method allows for efficient optimization of the noise schedule jointly with the rest of the model. We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data, thereby improving our theoretical understanding of this model class. Using this insight, we prove an equivalence between several models proposed in the literature. In addition, we show that the continuous-time VLB is invariant to the noise schedule, except for the signal-to-noise ratio at its endpoints. This enables us to learn a noise schedule that minimizes the variance of the resulting VLB estimator, leading to faster optimization. Combining these advances with architectural improvements, we obtain state-of-the-art likelihoods on image density estimation benchmarks, outperforming autoregressive models that have dominated these benchmarks for many years, with often significantly faster optimization. In addition, we show how to use the model as part of a bits-back compression scheme, and demonstrate lossless compression rates close to the theoretical optimum.

蒸餾 · MoDELS · 學成 · 聯邦學習 · Continuity ·

2022 年 2 月 17 日

Preservation of Global Knowledge by Not-True Distillation in Federated Learning

Gihun Lee,Minchan Jeong,Yongjin Shin,Sangmin Bae,Se-Young Yun

from arxiv, Under review

In federated learning, a strong global model is collaboratively learned by aggregating clients' locally trained models. Although this precludes the need to access clients' data directly, the global model's convergence often suffers from data heterogeneity. This study starts from an analogy to continual learning and suggests that forgetting could be the bottleneck of federated learning. We observe that the global model forgets the knowledge from previous rounds, and the local training induces forgetting the knowledge outside of the local distribution. Based on our findings, we hypothesize that tackling down forgetting will relieve the data heterogeneity problem. To this end, we propose a novel and effective algorithm, Federated Not-True Distillation (FedNTD), which preserves the global perspective on locally available data only for the not-true classes. In the experiments, FedNTD shows state-of-the-art performance on various setups without compromising data privacy or incurring additional communication costs.

Better · Extensibility · MoDELS · 黑盒 · CRAFT ·

2022 年 2 月 17 日

Measuring the Transferability of $\ell_\infty$ Attacks by the $\ell_2$ Norm

Sizhe Chen,Qinghua Tao,Zhixing Ye,Xiaolin Huang

Deep neural networks could be fooled by adversarial examples with trivial differences to original samples. To keep the difference imperceptible in human eyes, researchers bound the adversarial perturbations by the $\ell_\infty$ norm, which is now commonly served as the standard to align the strength of different attacks for a fair comparison. However, we propose that using the $\ell_\infty$ norm alone is not sufficient in measuring the attack strength, because even with a fixed $\ell_\infty$ distance, the $\ell_2$ distance also greatly affects the attack transferability between models. Through the discovery, we reach more in-depth understandings towards the attack mechanism, i.e., several existing methods attack black-box models better partly because they craft perturbations with 70\% to 130\% larger $\ell_2$ distances. Since larger perturbations naturally lead to better transferability, we thereby advocate that the strength of attacks should be simultaneously measured by both the $\ell_\infty$ and $\ell_2$ norm. Our proposal is firmly supported by extensive experiments on ImageNet dataset from 7 attacks, 4 white-box models, and 9 black-box models.

Color · 近似 · 分解的 · 圖 · 類別 ·

2022 年 2 月 17 日

Hardness of the Generalized Coloring Numbers

Michael Breen-McKay,Brian Lavallee,Blair D. Sullivan

from arxiv, 16 pages, 5 figures

The generalized coloring numbers of Kierstead and Yang offer an algorithmically useful characterization of graph classes with bounded expansion. In this work, we consider the hardness and approximability of these parameters. First, we complete the work of Grohe et al. by showing that computing the weak 2-coloring number is NP-hard. Our approach further establishes that determining the weak $r$-coloring number is APX-hard for all $r \geq 2$. We adapt this to the $r$-coloring number as well, proving APX-hardness for all $r \geq 2$. Our reductions also imply that for every fixed $r \geq 2$, no XP algorithm (runtime $O(n^{f(k)})$) exists for testing if either generalized coloring number is at most $k$. Finally, we give an approximation algorithm for the $r$-coloring number which improves both the runtime and approximation factor of the existing approach of Dvo\v{r}\'{a}k. Our algorithm greedily orders vertices with small enough $\ell$-reach for every $\ell \leq r$ and achieves an $O(C_{r-1} k^{r-1})$-approximation, where $C_i$ is the $i$th Catalan number.

估計/估計量 · 判別器 · MoDELS · 模型評估 · Neural Networks ·

2022 年 2 月 16 日

An Adversarial Approach to Structural Estimation

Tetsuya Kaji,Elena Manresa,Guillaume Pouliot

from arxiv, 70 pages, 4 tables, 16 figures

We propose a new simulation-based estimation method, adversarial estimation, for structural models. The estimator is formulated as the solution to a minimax problem between a generator (which generates synthetic observations using the structural model) and a discriminator (which classifies if an observation is synthetic). The discriminator maximizes the accuracy of its classification while the generator minimizes it. We show that, with a sufficiently rich discriminator, the adversarial estimator attains parametric efficiency under correct specification and the parametric rate under misspecification. We advocate the use of a neural network as a discriminator that can exploit adaptivity properties and attain fast rates of convergence. We apply our method to the elderly's saving decision model and show that our estimator uncovers the bequest motive as an important source of saving across the wealth distribution, not only for the rich.

tuning · 方差 · Integration · MoDELS · 情景 ·

2021 年 3 月 29 日

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

Xiaosen Wang,Kun He

from arxiv, Accepted by CVPR 2021

Deep neural networks are vulnerable to adversarial examples that mislead the models with imperceptible perturbations. Though adversarial attacks have achieved incredible success rates in the white-box setting, most existing adversaries often exhibit weak transferability in the black-box setting, especially under the scenario of attacking models with defense mechanisms. In this work, we propose a new method called variance tuning to enhance the class of iterative gradient based attack methods and improve their attack transferability. Specifically, at each iteration for the gradient calculation, instead of directly using the current gradient for the momentum accumulation, we further consider the gradient variance of the previous iteration to tune the current gradient so as to stabilize the update direction and escape from poor local optima. Empirical results on the standard ImageNet dataset demonstrate that our method could significantly improve the transferability of gradient-based adversarial attacks. Besides, our method could be used to attack ensemble models or be integrated with various input transformations. Incorporating variance tuning with input transformations on iterative gradient-based attacks in the multi-model setting, the integrated method could achieve an average success rate of 90.1% against nine advanced defense methods, improving the current best attack performance significantly by 85.1% . Code is available at //github.com/JHL-HUST/VT.

優化器 · 小批量 · Better · 估計/估計量 · 無偏 ·

2021 年 3 月 5 日

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Kilian Fatras,Thibault Séjourné,Nicolas Courty,Rémi Flamary

Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.

最大平均偏差 · 優化器 · Performer · CASES · tuning ·

2018 年 1 月 30 日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Ben Usman,Kate Saenko,Brian Kulis

from arxiv, ICLR 2018 Conference Invite to Workshop

Methods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.