97SE亚洲国产综合在线_亚洲日韩中文字幕一级乱码在线播放不卡_怡红院欧美一区二区三区在线_精品二区三区熟女日韩国产_国产亚洲一区二区三区无码_岛国AV无码免费无禁网站麦芽_久久精品亚洲一区二区三区蜜

Transfer learning is known to perform efficiently in many applications empirically, yet limited literature reports the mechanism behind the scene. This study establishes both formal derivations and heuristic analysis to formulate the theory of transfer learning in deep learning. Our framework utilizing layer variational analysis proves that the success of transfer learning can be guaranteed with corresponding data conditions. Moreover, our theoretical calculation yields intuitive interpretations towards the knowledge transfer process. Subsequently, an alternative method for network-based transfer learning is derived. The method shows an increase in efficiency and accuracy for domain adaptation. It is particularly advantageous when new domain data is sufficiently sparse during adaptation. Numerical experiments over diverse tasks validated our theory and verified that our analytic expression achieved better performance in domain adaptation than the gradient descent method.

相關內容

Analysis

關注 2

Analysis · 可辨認的 · 共軛梯度 · 正規方程 · 可約的 ·

2023 年 6 月 23 日

Algorithmic analysis torwards time-domain extended source waveform inversion

Pengliang Yang,Wei Zhou

Full waveform inversion (FWI) updates the subsurface model from an initial model by comparing observed and synthetic seismograms. Due to high nonlinearity, FWI is easy to be trapped into local minima. Extended domain FWI, including wavefield reconstruction inversion (WRI) and extended source waveform inversion (ESI) are attractive options to mitigate this issue. This paper makes an in-depth analysis for FWI in the extended domain, identifying key challenges and searching for potential remedies torwards practical applications. WRI and ESI are formulated within the same mathematical framework using Lagrangian-based adjoint-state method with a special focus on time-domain formulation using extended sources, while putting connections between classical FWI, WRI and ESI: both WRI and ESI can be viewed as weighted versions of classic FWI. Due to symmetric positive definite Hessian, the conjugate gradient is explored to efficiently solve the normal equation in a matrix free manner, while both time and frequency domain wave equation solvers are feasible. This study finds that the most significant challenge comes from the huge storage demand to store time-domain wavefields through iterations. To resolve this challenge, two possible workaround strategies can be considered, i.e., by extracting sparse frequencial wavefields or by considering time-domain data instead of wavefields for reducing such challenge. We suggest that these options should be explored more intensively for tractable workflows.

INFORMS · MASS · SimPLe · LISA · 模型評估 ·

2023 年 6 月 22 日

Hyperboloidal discontinuous time-symmetric numerical algorithm with higher order jumps for gravitational self-force computations in the time domain

Lidia J. Gomes Da Silva,Rodrigo Panosso Macedo,Jonathan E. Thompson,Juan A. Valiente Kroon,Leanne Durkan,Oliver Long

Within the next decade the Laser Interferometer Space Antenna (LISA) is due to be launched, providing the opportunity to extract physics from stellar objects and systems, such as \textit{Extreme Mass Ratio Inspirals}, (EMRIs) otherwise undetectable to ground based interferometers and Pulsar Timing Arrays (PTA). Unlike previous sources detected by the currently available observational methods, these sources can \textit{only} be simulated using an accurate computation of the gravitational self-force. Whereas the field has seen outstanding progress in the frequency domain, metric reconstruction and self-force calculations are still an open challenge in the time domain. Such computations would not only further corroborate frequency domain calculations and models, but also allow for full self-consistent evolution of the orbit under the effect of the self-force. Given we have \textit{a priori} information about the local structure of the discontinuity at the particle, we will show how to construct discontinuous spatial and temporal discretisations by operating on discontinuous Lagrange and Hermite interpolation formulae and hence recover higher order accuracy. In this work we demonstrate how this technique in conjunction with well-suited gauge choice (hyperboloidal slicing) and numerical (discontinuous collocation with time symmetric) methods can provide a relatively simple method of lines numerical algorithm to the problem. This is the first of a series of papers studying the behaviour of a point-particle prescribing circular geodesic motion in Schwarzschild in the \textit{time domain}. In this work we describe the numerical machinery necessary for these computations and show not only our work is capable of highly accurate flux radiation measurements but it also shows suitability for evaluation of the necessary field and it's derivatives at the particle limit.

學習率 · Networking · Neural Networks · Learning · 方差 ·

2023 年 6 月 22 日

Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation

Xin Yuan,Pedro Savarese,Michael Maire

We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple replication heuristics or utilize auxiliary gradient-based local optimization, we craft a parameterization scheme which dynamically stabilizes weight, activation, and gradient scaling as the architecture evolves, and maintains the inference functionality of the network. To address the optimization difficulty resulting from imbalanced training effort distributed to subnetworks fading in at different growth phases, we propose a learning rate adaption mechanism that rebalances the gradient contribution of these separate subcomponents. Experimental results show that our method achieves comparable or better accuracy than training large fixed-size models, while saving a substantial portion of the original computation budget for training. We demonstrate that these gains translate into real wall-clock training speedups.

Learning · 知識 (knowledge) · 遷移學習 · 可辨認的 · MoDELS ·

2023 年 6 月 21 日

Introspective Action Advising for Interpretable Transfer Learning

Joseph Campbell,Yue Guo,Fiona Xie,Simon Stepputtis,Katia Sycara

from arxiv, Accepted to CoLLAs 2023

Transfer learning can be applied in deep reinforcement learning to accelerate the training of a policy in a target task by transferring knowledge from a policy learned in a related source task. This is commonly achieved by copying pretrained weights from the source policy to the target policy prior to training, under the constraint that they use the same model architecture. However, not only does this require a robust representation learned over a wide distribution of states -- often failing to transfer between specialist models trained over single tasks -- but it is largely uninterpretable and provides little indication of what knowledge is transferred. In this work, we propose an alternative approach to transfer learning between tasks based on action advising, in which a teacher trained in a source task actively guides a student's exploration in a target task. Through introspection, the teacher is capable of identifying when advice is beneficial to the student and should be given, and when it is not. Our approach allows knowledge transfer between policies agnostic of the underlying representations, and we empirically show that this leads to improved convergence rates in Gridworld and Atari environments while providing insight into what knowledge is transferred.

Minimax · Performer · MoDELS · 損失 · 下游任務 ·

2023 年 6 月 21 日

Task-Robust Pre-Training for Worst-Case Downstream Adaptation

Jianghui Wang,Cheng Yang,Xingyu Xie,Cong Fang,Zhouchen Lin

Pre-training has achieved remarkable success when transferred to downstream tasks. In machine learning, we care about not only the good performance of a model but also its behavior under reasonable shifts of condition. The same philosophy holds when pre-training a foundation model. However, the foundation model may not uniformly behave well for a series of related downstream tasks. This happens, for example, when conducting mask recovery regression where the recovery ability or the training instances diverge like pattern features are extracted dominantly on pre-training, but semantic features are also required on a downstream task. This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks. We call this goal as $\textit{downstream-task robustness}$. Our method first separates the upstream task into several representative ones and applies a simple minimax loss for pre-training. We then design an efficient algorithm to solve the minimax loss and prove its convergence in the convex setting. In the experiments, we show both on large-scale natural language processing and computer vision datasets our method increases the metrics on worse-case downstream tasks. Additionally, some theoretical explanations for why our loss is beneficial are provided. Specifically, we show fewer samples are inherently required for the most challenging downstream task in some cases.

Learning · 知識 (knowledge) · 類別 · 特征提取器 · MoDELS ·

2023 年 6 月 21 日

Complementary Learning Subnetworks for Parameter-Efficient Class-Incremental Learning

Depeng Li,Zhigang Zeng

from arxiv, 13 pages, 4 figures. Under review

In the scenario of class-incremental learning (CIL), deep neural networks have to adapt their model parameters to non-stationary data distributions, e.g., the emergence of new classes over time. However, CIL models are challenged by the well-known catastrophic forgetting phenomenon. Typical methods such as rehearsal-based ones rely on storing exemplars of old classes to mitigate catastrophic forgetting, which limits real-world applications considering memory resources and privacy issues. In this paper, we propose a novel rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks. Our approach involves jointly optimizing a plastic CNN feature extractor and an analytical feed-forward classifier. The inaccessibility of historical data is tackled by holistically controlling the parameters of a well-trained model, ensuring that the decision boundary learned fits new classes while retaining recognition of previously learned classes. Specifically, the trainable CNN feature extractor provides task-dependent knowledge separately without interference; and the final classifier integrates task-specific knowledge incrementally for decision-making without forgetting. In each CIL session, it accommodates new tasks by attaching a tiny set of declarative parameters to its backbone, in which only one matrix per task or one vector per class is kept for knowledge retention. Extensive experiments on a variety of task sequences show that our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order robustness. Furthermore, to make the non-growing backbone (i.e., a model with limited network capacity) suffice to train on more incoming tasks, a graceful forgetting implementation on previously learned trivial tasks is empirically investigated.

知識 (knowledge) · 圖 · 數學 · 表示 · 知識圖譜 ·

2022 年 11 月 7 日

Knowledge Graph Embedding: A Survey from the Perspective of Representation Spaces

Jiahang Cao,Jinyuan Fang,Zaiqiao Meng,Shangsong Liang

from arxiv, 32 pages, 6 figures

Knowledge graph embedding (KGE) is a increasingly popular technique that aims to represent entities and relations of knowledge graphs into low-dimensional semantic spaces for a wide spectrum of applications such as link prediction, knowledge reasoning and knowledge completion. In this paper, we provide a systematic review of existing KGE techniques based on representation spaces. Particularly, we build a fine-grained classification to categorise the models based on three mathematical perspectives of the representation spaces: (1) Algebraic perspective, (2) Geometric perspective, and (3) Analytical perspective. We introduce the rigorous definitions of fundamental mathematical spaces before diving into KGE models and their mathematical properties. We further discuss different KGE methods over the three categories, as well as summarise how spatial advantages work over different embedding needs. By collating the experimental results from downstream tasks, we also explore the advantages of mathematical space in different scenarios and the reasons behind them. We further state some promising research directions from a representation space perspective, with which we hope to inspire researchers to design their KGE models as well as their related applications with more consideration of their mathematical space properties.

主動學習 · 自由能 · Extensibility · 學成 · TAP ·

2021 年 12 月 2 日

Active Learning for Domain Adaptation: An Energy-based Approach

Binhui Xie,Longhui Yuan,Shuang Li,Chi Harold Liu,Xinjing Cheng,Guoren Wang

from arxiv, Accepted by AAAI 2022. Code is available at //github.com/BIT-DA/EADA

Unsupervised domain adaptation has recently emerged as an effective paradigm for generalizing deep neural networks to new target domains. However, there is still enormous potential to be tapped to reach the fully supervised performance. In this paper, we present a novel active learning strategy to assist knowledge transfer in the target domain, dubbed active domain adaptation. We start from an observation that energy-based models exhibit free energy biases when training (source) and test (target) data come from different distributions. Inspired by this inherent mechanism, we empirically reveal that a simple yet efficient energy-based sampling strategy sheds light on selecting the most valuable target samples than existing approaches requiring particular architectures or computation of the distances. Our algorithm, Energy-based Active Domain Adaptation (EADA), queries groups of targe data that incorporate both domain characteristic and instance uncertainty into every selection round. Meanwhile, by aligning the free energy of target data compact around the source domain via a regularization term, domain gap can be implicitly diminished. Through extensive experiments, we show that EADA surpasses state-of-the-art methods on well-known challenging benchmarks with substantial improvements, making it a useful option in the open world. Code is available at //github.com/BIT-DA/EADA.

泛化理論 · Pyramid · 視頻分類 · domain shift · Networking ·

2021 年 9 月 17 日

VideoDG: Generalizing Temporal Relations in Videos to Novel Domains

Zhiyu Yao,Yunbo Wang,Jianmin Wang,Philip S. Yu,Mingsheng Long

from arxiv, Accepted by IEEE TPAMI, 2021. Code: //github.com/thuml/VideoDG

This paper introduces video domain generalization where most video classification networks degenerate due to the lack of exposure to the target domains of divergent distributions. We observe that the global temporal features are less generalizable, due to the temporal domain shift that videos from other unseen domains may have an unexpected absence or misalignment of the temporal relations. This finding has motivated us to solve video domain generalization by effectively learning the local-relation features of different timescales that are more generalizable, and exploiting them along with the global-relation features to maintain the discriminability. This paper presents the VideoDG framework with two technical contributions. The first is a new deep architecture named the Adversarial Pyramid Network, which improves the generalizability of video features by capturing the local-relation, global-relation, and cross-relation features progressively. On the basis of pyramid features, the second contribution is a new and robust approach of adversarial data augmentation that can bridge different video domains by improving the diversity and quality of augmented data. We construct three video domain generalization benchmarks in which domains are divided according to different datasets, different consequences of actions, or different camera views, respectively. VideoDG consistently outperforms the combinations of previous video classification models and existing domain generalization methods on all benchmarks.

圖形處理器 · 圖 · 可辨認的 · Neural Networks · Networking ·

2021 年 5 月 31 日

On Explainability of Graph Neural Networks via Subgraph Explorations

Hao Yuan,Haiyang Yu,Jie Wang,Kang Li,Shuiwang Ji

from arxiv, Accepted by ICML 2021

We consider the problem of explaining the predictions of graph neural networks (GNNs), which otherwise are considered as black boxes. Existing methods invariably focus on explaining the importance of graph nodes or edges but ignore the substructures of graphs, which are more intuitive and human-intelligible. In this work, we propose a novel method, known as SubgraphX, to explain GNNs by identifying important subgraphs. Given a trained GNN model and an input graph, our SubgraphX explains its predictions by efficiently exploring different subgraphs with Monte Carlo tree search. To make the tree search more effective, we propose to use Shapley values as a measure of subgraph importance, which can also capture the interactions among different subgraphs. To expedite computations, we propose efficient approximation schemes to compute Shapley values for graph data. Our work represents the first attempt to explain GNNs via identifying subgraphs explicitly and directly. Experimental results show that our SubgraphX achieves significantly improved explanations, while keeping computations at a reasonable level.