姑娘日本电影免费观看全集中文_女人让男人桶爽在线观看_日韩精品中文字幕一区二区_一夲道AV无码一区二区三_国内精品一区二区三区四区_在线播放一区二区三区视频_最近中文高清在线视频

Emergent properties have been widely adopted as a term to describe behavior not present in smaller models but observed in larger models. Recent work suggests that the trade-off incurred by quantization is also an emergent property, with sharp drops in performance in models over 6B parameters. In this work, we ask "are quantization cliffs in performance solely a factor of scale?" Against a backdrop of increased research focus on why certain emergent properties surface at scale, this work provides a useful counter-example. We posit that it is possible to optimize for a quantization friendly training recipe that suppresses large activation magnitude outliers. Here, we find that outlier dimensions are not an inherent product of scale, but rather sensitive to the optimization conditions present during pre-training. This both opens up directions for more efficient quantization, and poses the question of whether other emergent properties are inherent or can be altered and conditioned by optimization and architecture design choices. We successfully quantize models ranging in size from 410M to 52B with minimal degradation in performance.

相關內容

縮放

關注 0

稀疏化 · 泛函 · 情景 · 分解的 · 表示 ·

2023 年 7 月 18 日

Cut Sparsification and Succinct Representation of Submodular Hypergraphs

Yotam Kenneth,Robert Krauthgamer

In cut sparsification, all cuts of a hypergraph $H=(V,E,w)$ are approximated within $1\pm\epsilon$ factor by a small hypergraph $H'$. This widely applied method was generalized recently to a setting where the cost of cutting each $e\in E$ is provided by a splitting function, $g_e: 2^e\to\mathbb{R}_+$. This generalization is called a submodular hypergraph when the functions $\{g_e\}_{e\in E}$ are submodular, and it arises in machine learning, combinatorial optimization, and algorithmic game theory. Previous work focused on the setting where $H'$ is a reweighted sub-hypergraph of $H$, and measured size by the number of hyperedges in $H'$. We study such sparsification, and also a more general notion of representing $H$ succinctly, where size is measured in bits. In the sparsification setting, where size is the number of hyperedges, we present three results: (i) all submodular hypergraphs admit sparsifiers of size polynomial in $n=|V|$; (ii) monotone-submodular hypergraphs admit sparsifiers of size $O(\epsilon^{-2} n^3)$; and (iii) we propose a new parameter, called spread, to obtain even smaller sparsifiers in some cases. In the succinct-representation setting, we show that a natural family of splitting functions admits a succinct representation of much smaller size than via reweighted subgraphs (almost by factor $n$). This large gap is surprising because for graphs, the most succinct representation is attained by reweighted subgraphs. Along the way, we introduce the notion of deformation, where $g_e$ is decomposed into a sum of functions of small description, and we provide upper and lower bounds for deformation of common splitting functions.

Networking · MoDELS · 估計/估計量 · Integration · SimPLe ·

2023 年 7 月 17 日

Aberration-Aware Depth-from-Focus

Xinge Yang,Qiang Fu,Mohammed Elhoseiny,Wolfgang Heidrich

from arxiv, [ICCP & TPAMI 2023] Considering optical aberrations during network training can improve the generalizability

Computer vision methods for depth estimation usually use simple camera models with idealized optics. For modern machine learning approaches, this creates an issue when attempting to train deep networks with simulated data, especially for focus-sensitive tasks like Depth-from-Focus. In this work, we investigate the domain gap caused by off-axis aberrations that will affect the decision of the best-focused frame in a focal stack. We then explore bridging this domain gap through aberration-aware training (AAT). Our approach involves a lightweight network that models lens aberrations at different positions and focus distances, which is then integrated into the conventional network training pipeline. We evaluate the generality of pretrained models on both synthetic and real-world data. Our experimental results demonstrate that the proposed AAT scheme can improve depth estimation accuracy without fine-tuning the model or modifying the network architecture.

可理解性 · MoDELS · 相似度 · 正則化項 · Better ·

2023 年 7 月 15 日

Towards Understanding Adversarial Transferability From Surrogate Training

Yechao Zhang,Shengshan Hu,Leo Yu Zhang,Junyu Shi,Minghui Li,Xiaogeng Liu,Wei Wan,Hai Jin

from arxiv, Accepted by IEEE S&P (Oakland) 2024; 21 pages, 12 figures

Adversarial examples (AEs) for DNNs have been shown to be transferable: AEs that successfully fool white-box surrogate models can also deceive other black-box models with different architectures. Although a bunch of empirical studies have provided guidance on generating highly transferable AEs, many of these findings lack explanations and even lead to inconsistent advice. In this paper, we take a further step towards understanding adversarial transferability, with a particular focus on surrogate aspects. Starting from the intriguing little robustness phenomenon, where models adversarially trained with mildly perturbed adversarial samples can serve as better surrogates, we attribute it to a trade-off between two predominant factors: model smoothness and gradient similarity. Our investigations focus on their joint effects, rather than their separate correlations with transferability. Through a series of theoretical and empirical analyses, we conjecture that the data distribution shift in adversarial training explains the degradation of gradient similarity. Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability. Finally, we provide a general route for constructing better surrogates to boost transferability which optimizes both model smoothness and gradient similarity simultaneously, e.g., the combination of input gradient regularization and sharpness-aware minimization (SAM), validated by extensive experiments. In summary, we call for attention to the united impacts of these two factors for launching effective transfer attacks, rather than optimizing one while ignoring the other, and emphasize the crucial role of manipulating surrogate models.

推斷 · Performer · 確切的 · Projection · 近似 ·

2023 年 7 月 14 日

Verifying Performance Properties of Probabilistic Inference

Eric Atkinson,Ellie Y. Cheng,Guillaume Baudart,Louis Mandel,Michael Carbin

In this extended abstract, we discuss the opportunity to formally verify that inference systems for probabilistic programming guarantee good performance. In particular, we focus on hybrid inference systems that combine exact and approximate inference to try to exploit the advantages of each. Their performance depends critically on a) the division between exact and approximate inference, and b) the computational resources consumed by exact inference. We describe several projects in this direction. Semi-symbolic Inference (SSI) is a type of hybrid inference system that provides limited guarantees by construction on the exact/approximate division. In addition to these limited guarantees, we also describe ongoing work to extend guarantees to a more complex class of programs, requiring a program analysis to ensure the guarantees. Finally, we also describe work on verifying that inference systems using delayed sampling -- another type of hybrid inference -- execute in bounded memory. Together, these projects show that verification can deliver the performance guarantees that probabilistic programming languages need.

多樣性 · 優化器 · Analysis · 約束 · 隨機漫步 ·

2023 年 7 月 14 日

Rigorous Runtime Analysis of Diversity Optimization with GSEMO on OneMinMax

Denis Antipov,Aneta Neumann,Frank Neumann

from arxiv, The full version of the paper accepted to FOGA 2023 conference

The evolutionary diversity optimization aims at finding a diverse set of solutions which satisfy some constraint on their fitness. In the context of multi-objective optimization this constraint can require solutions to be Pareto-optimal. In this paper we study how the GSEMO algorithm with additional diversity-enhancing heuristic optimizes a diversity of its population on a bi-objective benchmark problem OneMinMax, for which all solutions are Pareto-optimal. We provide a rigorous runtime analysis of the last step of the optimization, when the algorithm starts with a population with a second-best diversity, and prove that it finds a population with optimal diversity in expected time $O(n^2)$, when the problem size $n$ is odd. For reaching our goal, we analyse the random walk of the population, which reflects the frequency of changes in the population and their outcomes.

Neural Networks · Networking · MoDELS · Networks · AI ·

2023 年 7 月 6 日

Artistic Strategies to Guide Neural Networks

Varvara Guljajeva,Mar Canet Sola,Isaac Joseph Clarke

Artificial Intelligence is present in the generation and distribution of culture. How do artists exploit neural networks? What impact do these algorithms have on artistic practice? Through a practice-based research methodology, this paper explores the potentials and limits of current AI technology, more precisely deep neural networks, in the context of image, text, form and translation of semiotic spaces. In a relatively short time, the generation of high-resolution images and 3D objects has been achieved. There are models, like CLIP and text2mesh, that do not need the same kind of media input as the output; we call them translation models. Such a twist contributes toward creativity arousal, which manifests itself in art practice and feeds back to the developers' pipeline. Yet again, we see how artworks act as catalysts for technology development. Those creative scenarios and processes are enabled not solely by AI models, but by the hard work behind implementing these new technologies. AI does not create a 'push-a-button' masterpiece but requires a deep understanding of the technology behind it, and a creative and critical mindset. Thus, AI opens new avenues for inspiration and offers novel tool sets, and yet again the question of authorship is asked.

Continuity · 學成 · Vision · 計算機視覺 · 批量學習 ·

2021 年 9 月 23 日

Recent Advances of Continual Learning in Computer Vision: An Overview

Haoxuan Qu,Hossein Rahmani,Li Xu,Bryan Williams,Jun Liu

from arxiv, 21 pages, 5 figures

In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing, and accumulating new knowledge coming at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularization, knowledge distillation, memory, generative replay, parameter isolation, and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.

Extensibility · 學成 · 噪聲分布 · Networking · 表征學習 ·

2021 年 7 月 25 日

Image Manipulation Detection by Multi-View Multi-Scale Supervision

Xinru Chen,Chengbo Dong,Jiaqi Ji,Juan Cao,Xirong Li

from arxiv, Accepted by ICCV 2021

The key challenge of image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images. Current research emphasizes the sensitivity, with the specificity overlooked. In this paper we address both aspects by multi-view feature learning and multi-scale supervision. By exploiting noise distribution and boundary artifact surrounding tampered regions, the former aims to learn semantic-agnostic and thus more generalizable features. The latter allows us to learn from authentic images which are nontrivial to be taken into account by current semantic segmentation network based methods. Our thoughts are realized by a new network which we term MVSS-Net. Extensive experiments on five benchmark sets justify the viability of MVSS-Net for both pixel-level and image-level manipulation detection.

Neural Networks · Networking · 可約的 · Continuity · 推斷 ·

2021 年 6 月 21 日

A Survey of Quantization Methods for Efficient Neural Network Inference

Amir Gholami,Sehoon Kim,Zhen Dong,Zhewei Yao,Michael W. Mahoney,Kurt Keutzer

from arxiv, Book Chapter: Low-Power Computer Vision: Improving the Efficiency of Artificial Intelligence

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.