亚洲AV永久无码精品九之,一区二区三区免费观看在线视频播放

In this paper, we provide a comprehensive toolbox for understanding and enhancing self-supervised learning (SSL) methods through the lens of matrix information theory. Specifically, by leveraging the principles of matrix mutual information and joint entropy, we offer a unified analysis for both contrastive and feature decorrelation based methods. Furthermore, we propose the matrix variational masked auto-encoder (M-MAE) method, grounded in matrix information theory, as an enhancement to masked image modeling. The empirical evaluations underscore the effectiveness of M-MAE compared with the state-of-the-art methods, including a 3.9% improvement in linear probing ViT-Base, and a 1% improvement in fine-tuning ViT-Large, both on ImageNet.

相關內容

INFORMS

關注 10

《計算機信息》雜志發表高質量的論文，擴大了運籌學和計算的范圍，尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文，以及描述新的和有用的軟件工具的論文。官網鏈接： · 近似 · MoDELS · 泛函 · PAC學習理論 ·

2023 年 11 月 14 日

Scenario Approach for Parametric Markov Models

Ying Liu,Andrea Turrini,Moritz Hahn,Bai Xue,Lijun Zhang

from arxiv, 24 pages, 8 figures; updated to add acknowledgements and data availability

In this paper, we propose an approximating framework for analyzing parametric Markov models. Instead of computing complex rational functions encoding the reachability probability and the reward values of the parametric model, we exploit the scenario approach to synthesize a relatively simple polynomial approximation. The approximation is probably approximately correct (PAC), meaning that with high confidence, the approximating function is close to the actual function with an allowable error. With the PAC approximations, one can check properties of the parametric Markov models. We show that the scenario approach can also be used to check PRCTL properties directly, without synthesizing the polynomial at first hand. We have implemented our algorithm in a prototype tool and conducted thorough experiments. The experimental results demonstrate that our tool is able to compute polynomials for more benchmarks than state of the art tools such as PRISM and Storm, confirming the efficacy of our PAC-based synthesis.

MoDELS · 卷積 · 講稿 · 論文 · 數值分析 ·

2023 年 11 月 11 日

High-Order Numerical Method for 1D Non-local Diffusive Equation

D. Do,H. Nick Zinat Matin,M. L. Delle Monache

from arxiv, 17 pages and 8 figures

In this paper we present a non-local numerical scheme based on the Local Discontinuous Galerkin method for a non-local diffusive partial differential equation with application to traffic flow. In this model, the velocity is determined by both the average of the traffic density as well as the changes in the traffic density at a neighborhood of each point. We discuss nonphysical behaviors that can arise when including diffusion, and our measures to prevent them in our model. The numerical results suggest that this is an accurate method for solving this type of equation and that the model can capture desired traffic flow behavior. We show that computation of the non-local convolution results in $\mathcal{O}(n^2)$ complexity, but the increased computation time can be mitigated with high-order schemes like the one proposed.

INTERACT · 優化器 · 參數空間 · Better · 泛函 ·

2023 年 11 月 10 日

RIGA: A Regret-Based Interactive Genetic Algorithm

Nawal Benabbou,Cassandre Leroy,Thibaut Lust

In this paper, we propose an interactive genetic algorithm for solving multi-objective combinatorial optimization problems under preference imprecision. More precisely, we consider problems where the decision maker's preferences over solutions can be represented by a parameterized aggregation function (e.g., a weighted sum, an OWA operator, a Choquet integral), and we assume that the parameters are initially not known by the recommendation system. In order to quickly make a good recommendation, we combine elicitation and search in the following way: 1) we use regret-based elicitation techniques to reduce the parameter space in a efficient way, 2) genetic operators are applied on parameter instances (instead of solutions) to better explore the parameter space, and 3) we generate promising solutions (population) using existing solving methods designed for the problem with known preferences. Our algorithm, called RIGA, can be applied to any multi-objective combinatorial optimization problem provided that the aggregation function is linear in its parameters and that a (near-)optimal solution can be efficiently determined for the problem with known preferences. We also study its theoretical performances: RIGA can be implemented in such way that it runs in polynomial time while asking no more than a polynomial number of queries. The method is tested on the multi-objective knapsack and traveling salesman problems. For several performance indicators (computation times, gap to optimality and number of queries), RIGA obtains better results than state-of-the-art algorithms.

MoDELS · 剪枝 · 穩健性 · Networking · Neural Networks ·

2023 年 11 月 10 日

Preemptively Pruning Clever-Hans Strategies in Deep Neural Networks

Lorenz Linhardt,Klaus-Robert Müller,Grégoire Montavon

from arxiv, 18 pages + supplement

Robustness has become an important consideration in deep learning. With the help of explainable AI, mismatches between an explained model's decision strategy and the user's domain knowledge (e.g. Clever Hans effects) have been identified as a starting point for improving faulty models. However, it is less clear what to do when the user and the explanation agree. In this paper, we demonstrate that acceptance of explanations by the user is not a guarantee for a machine learning model to be robust against Clever Hans effects, which may remain undetected. Such hidden flaws of the model can nevertheless be mitigated, and we demonstrate this by contributing a new method, Explanation-Guided Exposure Minimization (EGEM), that preemptively prunes variations in the ML model that have not been the subject of positive explanation feedback. Experiments demonstrate that our approach leads to models that strongly reduce their reliance on hidden Clever Hans strategies, and consequently achieve higher accuracy on new data.

估計/估計量 · contrastive · INFORMS · 互信息 · 表示學習 ·

2021 年 6 月 25 日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Alessandro Sordoni,Nouha Dziri,Hannes Schulz,Geoff Gordon,Phil Bachman,Remi Tachet

from arxiv, ICML 2021

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

Performer · Extensibility · 聯邦學習 · 相似度 · 成對型 ·

2021 年 1 月 7 日

Personalized Cross-Silo Federated Learning on Non-IID Data

Yutao Huang,Lingyang Chu,Zirui Zhou,Lanjun Wang,Jiangchuan Liu,Jian Pei,Yong Zhang

from arxiv, Accepted by AAAI 2021. The API of this work is available at Huawei Cloud (//t.ly/nGN9), free registration is required before use

Non-IID data present a tough challenge for federated learning. In this paper, we explore a novel idea of facilitating pairwise collaborations between clients with similar data. We propose FedAMP, a new method employing federated attentive message passing to facilitate similar clients to collaborate more. We establish the convergence of FedAMP for both convex and non-convex models, and propose a heuristic method to further improve the performance of FedAMP when clients adopt deep neural networks as personalized models. Our extensive experiments on benchmark data sets demonstrate the superior performance of the proposed methods.

元學習 · 語音識別 · MAML · 學成 · 端到端 ·

2019 年 10 月 26 日

Meta Learning for End-to-End Low-Resource Speech Recognition

Jui-Yang Hsu,Yuan-Jui Chen,Hung-yi Lee

from arxiv, 5 pages, submitted to ICASSP 2020

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.

小樣本學習 · 邊緣化 · 損失函數（機器學習） · 學成 · Principle ·

2018 年 7 月 8 日

Large Margin Few-Shot Learning

Yong Wang,Xiao-Ming Wu,Qimai Li,Jiatao Gu,Wangmeng Xiang,Lei Zhang,Victor O. K. Li

from arxiv, 17 pages, 6 figures

The key issue of few-shot learning is learning to generalize. In this paper, we propose a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the softmax classification loss function with a large margin distance loss function for training. Extensive experiments on two state-of-the-art few-shot learning models, graph neural networks and prototypical networks, show that our method can improve the performance of existing models substantially with very little computational overhead, demonstrating the effectiveness of the large margin principle and the potential of our method.

注意力機制 · 機器閱讀理解 · Extensibility · state-of-the-art · MoDELS ·

2018 年 4 月 25 日

Reinforced Mnemonic Reader for Machine Reading Comprehension

Minghao Hu,Yuxing Peng,Zhen Huang,Xipeng Qiu,Furu Wei,Ming Zhou

from arxiv, Published in 26th International Joint Conference on Artificial Intelligence (IJCAI), 2018

In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.

MoDELS · 注意力機制 · RNN · 標注 · Networking ·

2017 年 12 月 20 日

Order-Free RNN with Visual Attention for Multi-Label Classification

Shang-Fu Chen,Yi-Chen Chen,Chih-Kuan Yeh,Yu-Chiang Frank Wang

from arxiv, Accepted at 32nd AAAI Conference on Artificial Intelligence (AAAI-18)

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.