在线亚洲91SE亚洲综合在线_69WW无码免费视频播放_中文字幕在线视频永久_808影视电视剧免费看_欧美日韩精品视频一区二_精品精品国产自在97久久_中文字字幕在线中文乱码不卡

In this paper we revisit the efficacy of knowledge distillation as a function matching and metric learning problem. In doing so we verify three important design decisions, namely the normalisation, soft maximum function, and projection layers as key ingredients. We theoretically show that the projector implicitly encodes information on past examples, enabling relational gradients for the student. We then show that the normalisation of representations is tightly coupled with the training dynamics of this projector, which can have a large impact on the students performance. Finally, we show that a simple soft maximum function can be used to address any significant capacity gap problems. Experimental results on various benchmark datasets demonstrate that using these insights can lead to superior or comparable performance to state-of-the-art knowledge distillation techniques, despite being much more computationally efficient. In particular, we obtain these results across image classification (CIFAR100 and ImageNet), object detection (COCO2017), and on more difficult distillation objectives, such as training data efficient transformers, whereby we attain a 77.2% top-1 accuracy with DeiT-Ti on ImageNet.

相關內容

知識 (knowledge)

關注 12

通過學習、實(shi)踐或探索所獲得(de)的認識、判斷或技能。

INFORMS · 自動問答 · 信息抽取 · MoDELS · Performer ·

2023 年 9 月 26 日

Fine-tuning and aligning question answering models for complex information extraction tasks

Matthias Engelbach,Dennis Klau,Felix Scheerer,Jens Drawehn,Maximilien Kintz

from arxiv, Accepted at: 15th International Conference on Knowledge Discovery an Information Retrieval (KDIR 2023), part of IC3K

The emergence of Large Language Models (LLMs) has boosted performance and possibilities in various NLP tasks. While the usage of generative AI models like ChatGPT opens up new opportunities for several business use cases, their current tendency to hallucinate fake content strongly limits their applicability to document analysis, such as information retrieval from documents. In contrast, extractive language models like question answering (QA) or passage retrieval models guarantee query results to be found within the boundaries of an according context document, which makes them candidates for more reliable information extraction in productive environments of companies. In this work we propose an approach that uses and integrates extractive QA models for improved feature extraction of German business documents such as insurance reports or medical leaflets into a document analysis solution. We further show that fine-tuning existing German QA models boosts performance for tailored extraction tasks of complex linguistic features like damage cause explanations or descriptions of medication appearance, even with using only a small set of annotated data. Finally, we discuss the relevance of scoring metrics for evaluating information extraction tasks and deduce a combined metric from Levenshtein distance, F1-Score, Exact Match and ROUGE-L to mimic the assessment criteria from human experts.

控制器 · Learning · 學習器 · 機器人 · 穩健性 ·

2023 年 9 月 25 日

A comparison of controller architectures and learning mechanisms for arbitrary robot morphologies

Jie Luo,Jakub Tomczak,Karine Miras,Agoston E. Eiben

The main question this paper addresses is: What combination of a robot controller and a learning method should be used, if the morphology of the learning robot is not known in advance? Our interest is rooted in the context of morphologically evolving modular robots, but the question is also relevant in general, for system designers interested in widely applicable solutions. We perform an experimental comparison of three controller-and-learner combinations: one approach where controllers are based on modelling animal locomotion (Central Pattern Generators, CPG) and the learner is an evolutionary algorithm, a completely different method using Reinforcement Learning (RL) with a neural network controller architecture, and a combination `in-between' where controllers are neural networks and the learner is an evolutionary algorithm. We apply these three combinations to a test suite of modular robots and compare their efficacy, efficiency, and robustness. Surprisingly, the usual CPG-based and RL-based options are outperformed by the in-between combination that is more robust and efficient than the other two setups.

Boosting（一種模型訓練加速方式） · PageRank · 情景 · Continuity · Principle ·

2023 年 9 月 25 日

On algorithmically boosting fixed-point computations

Ioannis Avramopoulos,Nikolaos Vasiloglou

The main topic of this paper are algorithms for computing Nash equilibria. We cast our particular methods as instances of a general algorithmic abstraction, namely, a method we call {\em algorithmic boosting}, which is also relevant to other fixed-point computation problems. Algorithmic boosting is the principle of computing fixed points by taking (long-run) averages of iterated maps and it is a generalization of exponentiation. We first define our method in the setting of nonlinear maps. Secondly, we restrict attention to convergent linear maps (for computing dominant eigenvectors, for example, in the PageRank algorithm) and show that our algorithmic boosting method can set in motion {\em exponential speedups in the convergence rate}. Thirdly, we show that algorithmic boosting can convert a (weak) non-convergent iterator to a (strong) convergent one. We also consider a {\em variational approach} to algorithmic boosting providing tools to convert a non-convergent continuous flow to a convergent one. Then, by embedding the construction of averages in the design of the iterated map, we constructively prove the existence of Nash equilibria (and, therefore, Brouwer fixed points). We then discuss implementations of averaging and exponentiation, an important matter even for the scalar case. We finally discuss a relationship between dominant (PageRank) eigenvectors and Nash equilibria.

Learning · 通道 · 分離的 · 查準率/準確率 · 前饋 ·

2023 年 9 月 23 日

Tight bounds on Pauli channel learning without entanglement

Senrui Chen,Changhun Oh,Sisi Zhou,Hsin-Yuan Huang,Liang Jiang

from arxiv, 22 pages, 1 figure. Comments welcome!

Entanglement is a useful resource for learning, but a precise characterization of its advantage can be challenging. In this work, we consider learning algorithms without entanglement to be those that only utilize separable states, measurements, and operations between the main system of interest and an ancillary system. These algorithms are equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward. We prove a tight lower bound for learning Pauli channels without entanglement that closes a cubic gap between the best-known upper and lower bound. In particular, we show that $\Theta(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $\Theta(\varepsilon^{-2})$ rounds of measurements. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for characterizing Pauli noise.

相互獨立的 · 條件獨立的 · 特征選擇 · 不變 · MoDELS ·

2023 年 9 月 22 日

Model-based causal feature selection for general response types

Lucas Kook,Sorawit Saengkyongam,Anton Rask Lundborg,Torsten Hothorn,Jonas Peters

from arxiv, Code available at //github.com/LucasKook/tramicp.git

Discovering causal relationships from observational data is a fundamental yet challenging task. In some applications, it may suffice to learn the causal features of a given response variable, instead of learning the entire underlying causal structure. Invariant causal prediction (ICP, Peters et al., 2016) is a method for causal feature selection which requires data from heterogeneous settings. ICP assumes that the mechanism for generating the response from its direct causes is the same in all settings and exploits this invariance to output a subset of the causal features. The framework of ICP has been extended to general additive noise models and to nonparametric settings using conditional independence testing. However, nonparametric conditional independence testing often suffers from low power (or poor type I error control) and the aforementioned parametric models are not suitable for applications in which the response is not measured on a continuous scale, but rather reflects categories or counts. To bridge this gap, we develop ICP in the context of transformation models (TRAMs), allowing for continuous, categorical, count-type, and uninformatively censored responses (we show that, in general, these model classes do not allow for identifiability when there is no exogenous heterogeneity). We propose TRAM-GCM, a test for invariance of a subset of covariates, based on the expected conditional covariance between environments and score residuals which satisfies uniform asymptotic level guarantees. For the special case of linear shift TRAMs, we propose an additional invariance test, TRAM-Wald, based on the Wald statistic. We implement both proposed methods in the open-source R package "tramicp" and show in simulations that under the correct model specification, our approach empirically yields higher power than nonparametric ICP based on conditional independence testing.

泛函 · 共軛 · 查準率/準確率 · Less · 近似 ·

2023 年 9 月 21 日

Harmonic functions on finitely-connected tori

Chiu-Yen Kao,Braxton Osting,édouard Oudet

from arxiv, 19 pages, 12 figures

In this paper, we prove a Logarithmic Conjugation Theorem on finitely-connected tori. The theorem states that a harmonic function can be written as the real part of a function whose derivative is analytic and a finite sum of terms involving the logarithm of the modulus of a modified Weierstrass sigma function. We implement the method using arbitrary precision and use the result to find approximate solutions to the Laplace problem and Steklov eigenvalue problem. Using a posteriori estimation, we show that the solution of the Laplace problem on a torus with a few circular holes has error less than $10^{-100}$ using a few hundred degrees of freedom and the Steklov eigenvalues have similar error.

知識 (knowledge) · INFORMS · 語言表示 · MoDELS · Extensibility ·

2022 年 7 月 28 日

MLRIP: Pre-training a military language representation model with informative factual knowledge and professional knowledge base

Hui Li,Xuekang Yang,Xin Zhao,Lin Yu,Jiping Zheng,Wei Sun

from arxiv, 11 pages, 6 figures

Incorporating prior knowledge into pre-trained language models has proven to be effective for knowledge-driven NLP tasks, such as entity typing and relation extraction. Current pre-training procedures usually inject external knowledge into models by using knowledge masking, knowledge fusion and knowledge replacement. However, factual information contained in the input sentences have not been fully mined, and the external knowledge for injecting have not been strictly checked. As a result, the context information cannot be fully exploited and extra noise will be introduced or the amount of knowledge injected is limited. To address these issues, we propose MLRIP, which modifies the knowledge masking strategies proposed by ERNIE-Baidu, and introduce a two-stage entity replacement strategy. Extensive experiments with comprehensive analyses illustrate the superiority of MLRIP over BERT-based models in military knowledge-driven NLP tasks.

Microsoft Surface · Neural Networks · Networking · MoDELS · 損失函數（機器學習） ·

2021 年 5 月 28 日

Incorporating prior financial domain knowledge into neural networks for implied volatility surface prediction

Yu Zheng,Yongxin Yang,Bowei Chen

from arxiv, 8 pages, SIGKDD 2021

In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.

Neural Networks · Parse · Networking · 粵港澳大灣區數字經濟研究院 · 解析樹 ·

2021 年 2 月 25 日

How to represent part-whole hierarchies in a neural network

Geoffrey Hinton

from arxiv, 43 pages, 5 figures

This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language

contrastive · 變換 · 學成 · 判別器 · Performer ·

2020 年 12 月 9 日

Contrastive Transformation for Self-supervised Correspondence Learning

Ning Wang,Wengang Zhou,Houqiang Li

from arxiv, To appear in AAAI 2021

In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled videos in the wild. Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation. The intra-video learning transforms the image contents across frames within a single video via the frame pair-wise affinity. To obtain the discriminative representation for instance-level separation, we go beyond the intra-video analysis and construct the inter-video affinity to facilitate the contrastive transformation across different videos. By forcing the transformation consistency between intra- and inter-video levels, the fine-grained correspondence associations are well preserved and the instance-level feature discrimination is effectively reinforced. Our simple framework outperforms the recent self-supervised correspondence methods on a range of visual tasks including video object tracking (VOT), video object segmentation (VOS), pose keypoint tracking, etc. It is worth mentioning that our method also surpasses the fully-supervised affinity representation (e.g., ResNet) and performs competitively against the recent fully-supervised algorithms designed for the specific tasks (e.g., VOT and VOS).