国产乱人弄视频免费观看_二区亚洲国产精品一区久久_最新国产视频自拍2021_国产黄色视频入口_免费观看无码不卡AV_亚洲日韩区一区二区三区四区区_欧美日韩一区在线观看视频

In this paper we introduce InDistill, a model compression approach that combines knowledge distillation and channel pruning in a unified framework for the transfer of the critical information flow paths from a heavyweight teacher to a lightweight student. Such information is typically collapsed in previous methods due to an encoding stage prior to distillation. By contrast, InDistill leverages a pruning operation applied to the teacher's intermediate layers reducing their width to the corresponding student layers' width. In that way, we force architectural alignment enabling the intermediate layers to be directly distilled without the need of an encoding stage. Additionally, a curriculum learning-based training scheme is adopted considering the distillation difficulty of each layer and the critical learning periods in which the information flow paths are created. The proposed method surpasses state-of-the-art performance on three standard benchmarks, i.e. CIFAR-10, CUB-200, and FashionMNIST by 3.08%, 14.27%, and 1% mAP, respectively, as well as on more challenging evaluation settings, i.e. ImageNet and CIFAR-100 by 1.97% and 5.65% mAP, respectively.

相關內容

INFORMS

關注 10

《計算機信息》雜志發表高質量的論文，擴大了運籌學和計算的范圍，尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文，以及描述新的和有用的軟件工具的論文。官網鏈接： · 可辨認的 · 訓練數據 · Learning · MoDELS ·

2023 年 8 月 9 日

Learning of discrete models of variational PDEs from data

Christian Offen,Sina Ober-Bl?baum

We show how to learn discrete field theories from observational data of fields on a space-time lattice. For this, we train a neural network model of a discrete Lagrangian density such that the discrete Euler--Lagrange equations are consistent with the given training data. We, thus, obtain a structure-preserving machine learning architecture. Lagrangian densities are not uniquely defined by the solutions of a field theory. We introduce a technique to derive regularisers for the training process which optimise numerical regularity of the discrete field theory. Minimisation of the regularisers guarantees that close to the training data the discrete field theory behaves robust and efficient when used in numerical simulations. Further, we show how to identify structurally simple solutions of the underlying continuous field theory such as travelling waves. This is possible even when travelling waves are not present in the training data. This is compared to data-driven model order reduction based approaches, which struggle to identify suitable latent spaces containing structurally simple solutions when these are not present in the training data. Ideas are demonstrated on examples based on the wave equation and the Schr\"odinger equation.

重參數化 · 線性的 · 有偏 · MoDELS · 正則化項 ·

2023 年 8 月 9 日

How to induce regularization in generalized linear models: A guide to reparametrizing gradient flow

Hung-Hsu Chou,Johannes Maly,Dominik St?ger

In this work, we analyze the relation between reparametrizations of gradient flow and the induced implicit bias on general linear models, which encompass various basic classification and regression tasks. In particular, we aim at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow. Our results provide user-friendly conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed. We furthermore show how to use these insights for designing reparametrization functions that lead to specific implicit biases like $\ell_p$- or trigonometric regularizers.

MoDELS · Minimax · 有向模型 · 回火 · Performer ·

2023 年 8 月 9 日

Heavy-tailed Bayesian nonparametric adaptation

Sergios Agapiou,Isma?l Castillo

We propose a new Bayesian strategy for adaptation to smoothness in nonparametric models based on heavy tailed series priors. We illustrate it in a variety of settings, showing in particular that the corresponding Bayesian posterior distributions achieve adaptive rates of contraction in the minimax sense (up to logarithmic factors) without the need to sample hyperparameters. Unlike many existing procedures, where a form of direct model (or estimator) selection is performed, the method can be seen as performing a soft selection through the prior tail. In Gaussian regression, such heavy tailed priors are shown to lead to (near-)optimal simultaneous adaptation both in the $L^2$- and $L^\infty$-sense. Results are also derived for linear inverse problems, for anisotropic Besov classes, and for certain losses in more general models through the use of tempered posterior distributions. We present numerical simulations corroborating the theory.

Processing（編程語言） · MoDELS · 代碼 · 講稿 · INTERACT ·

2023 年 8 月 8 日

Flexible and rigorous numerical modelling of multiphysics processes in fractured porous media using PorePy

Ivar Stefansson,Jhabriel Varela,Eirik Keilegavlen,Inga Berre

from arxiv, Run scripts at DOI:10.5281/zenodo.8211479

Multiphysics processes in fractured porous media is a research field of importance for several subsurface applications and has received considerable attention over the last decade. The dynamics are characterised by strong couplings between processes as well as interaction between the processes and the structure of the fractured medium itself. The rich range of behavior calls for explorative mathematical modelling, such as experimentation with constitutive laws and novel coupling concepts between physical processes. Moreover, efficient simulations of the strong couplings between multiphysics processes and geological structures require the development of tailored numerical methods. We present a modelling framework and its implementation in the open-source simulation toolbox PorePy, which is designed for rapid prototyping of multiphysics processes in fractured porous media. PorePy uses a mixed-dimensional representation of the fracture geometry and generally applies fully implicit couplings between processes. The code design follows the paradigms of modularity and differentiable programming, which together allow for extreme flexibility in experimentation with governing equations with minimal changes to the code base. The code integrity is supported by a multilevel testing framework ensuring the reliability of the code. We present our modelling framework within a context of thermo-poroelasticity in deformable fractured porous media, illustrating the close relation between the governing equations and the source code. We furthermore discuss the design of the testing framework and present simulations showcasing the extendibility of PorePy, as well as the type of results that can be produced by mixed-dimensional simulation tools.

泛函 · 近似 · 估計/估計量 · 簇 · Analysis ·

2023 年 8 月 7 日

When rational functions meet virtual elements: The lightning Virtual Element Method

M. L. Trezzi,U. Zerbinati

We propose a lightning Virtual Element Method that eliminates the stabilisation term by actually computing the virtual component of the local VEM basis functions using a lightning approximation. In particular, the lightning VEM approximates the virtual part of the basis functions using rational functions with poles clustered exponentially close to the corners of each element of the polygonal tessellation. This results in two great advantages. First, the mathematical analysis of a priori error estimates is much easier and essentially identical to the one for any other non-conforming Galerkin discretisation. Second, the fact that the lightning VEM truly computes the basis functions allows the user to access the point-wise value of the numerical solution without needing any reconstruction techniques. The cost of the local construction of the VEM basis is the implementation price that one has to pay for the advantages of the lightning VEM method, but the embarrassingly parallelizable nature of this operation will ultimately result in a cost-efficient scheme almost comparable to standard VEM and FEM.

論文 ·

2023 年 8 月 6 日

The expressive power of revised Datalog on problems with closure properties

Shiguang Feng

In this paper, we study the expressive power of revised Datalog on the problems that are closed under substructures. We show that revised Datalog cannot define all the problems that are in PTIME and closed under substructures. As a corollary, LFP cannot define all the extension-closed problems that are in PTIME.

知識 (knowledge) · 蒸餾 · 泛函 · SOFT · ImageNet (數據集) ·

2023 年 8 月 4 日

A closer look at the training dynamics of knowledge distillation

Roy Miles,Krystian Mikolajczyk

In this paper we revisit the efficacy of knowledge distillation as a function matching and metric learning problem. In doing so we verify three important design decisions, namely the normalisation, soft maximum function, and projection layers as key ingredients. We theoretically show that the projector implicitly encodes information on past examples, enabling relational gradients for the student. We then show that the normalisation of representations is tightly coupled with the training dynamics of this projector, which can have a large impact on the students performance. Finally, we show that a simple soft maximum function can be used to address any significant capacity gap problems. Experimental results on various benchmark datasets demonstrate that using these insights can lead to superior or comparable performance to state-of-the-art knowledge distillation techniques, despite being much more computationally efficient. In particular, we obtain these results across image classification (CIFAR100 and ImageNet), object detection (COCO2017), and on more difficult distillation objectives, such as training data efficient transformers, whereby we attain a 77.2% top-1 accuracy with DeiT-Ti on ImageNet.

MoDELS · Networking · 簇 · Better · 原點 ·

2023 年 8 月 4 日

Revisiting small-world network models: Exploring technical realizations and the equivalence of the Newman-Watts and Harary models

Seora Son,Eun Ji Choi,Sang Hoon Lee

from arxiv, 11 pages, 5 figures, 1 table

We address the relatively less known facts on the equivalence and technical realizations surrounding two network models showing the "small-world" property, namely the Newman-Watts and the Harary models. We provide the most accurate (in terms of faithfulness to the original literature) versions of these models to clarify the deviation from them existing in their variants adopted in one of the most popular network analysis packages. The difference in technical realizations of those models could be conceived as minor details, but we discover significantly notable changes caused by the possibly inadvertent modification. For the Harary model, the stochasticity in the original formulation allows a much wider range of the clustering coefficient and the average shortest path length. For the Newman-Watts model, due to the drastically different degree distributions, the clustering coefficient can also be affected, which is verified by our higher-order analytic derivation. During the process, we discover the equivalence of the Newman-Watts (better known in the network science or physics community) and the Harary (better known in the graph theory or mathematics community) models under a specific condition of restricted parity in variables, which would bridge the two relatively independently developed models in different fields. Our result highlights the importance of each detailed step in constructing network models and the possibility of deeply related models, even if they might initially appear distinct in terms of the time period or the academic disciplines from which they emerged.

圖片分類 · 前饋網絡 · INTERACT · Networking · 前饋 ·

2021 年 5 月 7 日

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron,Piotr Bojanowski,Mathilde Caron,Matthieu Cord,Alaaeldin El-Nouby,Edouard Grave,Armand Joulin,Gabriel Synnaeve,Jakob Verbeek,Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.

語言模型化 · MoDELS · IR · 似然 · 掩碼語言模型化 ·

2020 年 10 月 20 日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Xinyu Ma,Jiafeng Guo,Ruqing Zhang,Yixing Fan,Xiang Ji,Xueqi Cheng

from arxiv, Accepted by WSDM2021

Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR). However, pre-training objectives tailored for ad-hoc retrieval have not been well explored. In this paper, we propose Pre-training with Representative wOrds Prediction (PROP) for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the "ideal" document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. Given an input document, we sample a pair of word sets according to the document language model, where the set with higher likelihood is deemed as more representative of the document. We then pre-train the Transformer model to predict the pairwise preference between the two word sets, jointly with the Masked Language Model (MLM) objective. By further fine-tuning on a variety of representative downstream ad-hoc retrieval tasks, PROP achieves significant improvements over baselines without pre-training or with other pre-training methods. We also show that PROP can achieve exciting performance under both the zero- and low-resource IR settings. The code and pre-trained models are available at //github.com/Albert-Ma/PROP.