国产特级黄色片A级无毛视频_亚洲国产原创精品国语一区_538在线播放视频_18禁止网站在线看_国产萌白酱在线一区二区_日韩欧美一二区在线观看_一女被两男吃奶玩乳尖

Accurate estimation of multiple quality variables is critical for building industrial soft sensor models, which have long been confronted with data efficiency and negative transfer issues. Methods sharing backbone parameters among tasks address the data efficiency issue; however, they still fail to mitigate the negative transfer problem. To address this issue, a balanced Mixture-of-Experts (BMoE) is proposed in this work, which consists of a multi-gate mixture of experts (MMoE) module and a task gradient balancing (TGB) module. The MoE module aims to portray task relationships, while the TGB module balances the gradients among tasks dynamically. Both of them cooperate to mitigate the negative transfer problem. Experiments on the typical sulfur recovery unit demonstrate that BMoE models task relationship and balances the training process effectively, and achieves better performance than baseline models significantly.

相關內容

SOFT

關注 0

估計/估計量 · 穩健性 · Networking · Weight · MoDELS ·

2023 年 7 月 16 日

Enhancing Energy Efficiency and Reliability in Autonomous Systems Estimation using Neuromorphic Approach

Reza Ahmadvand,Sarah Safura Sharif,Yaser Mike Banad

from arxiv, 10 pages, 14 figures

Energy efficiency and reliability have long been crucial factors for ensuring cost-effective and safe missions in autonomous systems computers. With the rapid evolution of industries such as space robotics and advanced air mobility, the demand for these low size, weight, and power (SWaP) computers has grown significantly. This study focuses on introducing an estimation framework based on spike coding theories and spiking neural networks (SNN), leveraging the efficiency and scalability of neuromorphic computers. Therefore, we propose an SNN-based Kalman filter (KF), a fundamental and widely adopted optimal strategy for well-defined linear systems. Furthermore, based on the modified sliding innovation filter (MSIF) we present a robust strategy called SNN-MSIF. Notably, the weight matrices of the networks are designed according to the system model, eliminating the need for learning. To evaluate the effectiveness of the proposed strategies, we compare them to their algorithmic counterparts, namely the KF and the MSIF, using Monte Carlo simulations. Additionally, we assess the robustness of SNN-MSIF by comparing it to SNN-KF in the presence of modeling uncertainties and neuron loss. Our results demonstrate the applicability of the proposed methods and highlight the superior performance of SNN-MSIF in terms of accuracy and robustness. Furthermore, the spiking pattern observed from the networks serves as evidence of the energy efficiency achieved by the proposed methods, as they exhibited an impressive reduction of approximately 97 percent in emitted spikes compared to possible spikes.

語言模型化 · tuning · 知識 (knowledge) · Performer · MoDELS ·

2023 年 7 月 15 日

CPET: Effective Parameter-Efficient Tuning for Compressed Large Language Models

Weilin Zhao,Yuxiang Huang,Xu Han,Zhiyuan Liu,Zhengyan Zhang,Maosong Sun

Parameter-efficient tuning (PET) has been widely explored in recent years because it tunes much fewer parameters (PET modules) than full-parameter fine-tuning (FT) while still stimulating sufficient knowledge from large language models (LLMs) for downstream tasks. Moreover, when PET is employed to serve multiple tasks, different task-specific PET modules can be built on a frozen LLM, avoiding redundant LLM deployments. Although PET significantly reduces the cost of tuning and deploying LLMs, its inference still suffers from the computational bottleneck of LLMs. To address the above issue, we propose an effective PET framework based on compressed LLMs, named "CPET". In CPET, we evaluate the impact of mainstream LLM compression techniques on PET performance and then introduce knowledge inheritance and recovery strategies to restore the knowledge loss caused by these compression techniques. Our experimental results demonstrate that, owing to the restoring strategies of CPET, collaborating task-specific PET modules with a compressed LLM can achieve comparable performance to collaborating PET modules with the original version of the compressed LLM and outperform directly applying vanilla PET methods to the compressed LLM.

anchor · Attention · 特征提取 · Integration · Learning ·

2023 年 7 月 14 日

ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning

Azmine Toushik Wasi,Karlo ?erbetar,Raima Islam,Taki Hasan Rafi,Dong-Kyu Chae

from arxiv, 12 pages, 7 figures. Code: //github.com/takihasan/ARBEx

In this paper, we introduce a framework ARBEx, a novel attentive feature extraction framework driven by Vision Transformer with reliability balancing to cope against poor class distributions, bias, and uncertainty in the facial expression learning (FEL) task. We reinforce several data pre-processing and refinement methods along with a window-based cross-attention ViT to squeeze the best of the data. We also employ learnable anchor points in the embedding space with label distributions and multi-head self-attention mechanism to optimize performance against weak predictions with reliability balancing, which is a strategy that leverages anchor points, attention scores, and confidence values to enhance the resilience of label predictions. To ensure correct label classification and improve the models' discriminative power, we introduce anchor loss, which encourages large margins between anchor points. Additionally, the multi-head self-attention mechanism, which is also trainable, plays an integral role in identifying accurate labels. This approach provides critical elements for improving the reliability of predictions and has a substantial positive effect on final prediction capabilities. Our adaptive model can be integrated with any deep neural network to forestall challenges in various recognition tasks. Our strategy outperforms current state-of-the-art methodologies, according to extensive experiments conducted in a variety of contexts.

混合專家模型 · 語音識別 · Networking · CC · Performer ·

2023 年 7 月 14 日

Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition

Wenxuan Wang,Guodong Ma,Yuke Li,Binbin Du

from arxiv, To appear in Proc. INTERSPEECH 2023, August 20-24, 2023, Dublin, Ireland

Multilingual speech recognition for both monolingual and code-switching speech is a challenging task. Recently, based on the Mixture of Experts (MoE), many works have made good progress in multilingual and code-switching ASR, but present huge computational complexity with the increase of supported languages. In this work, we propose a computation-efficient network named Language-Routing Mixture of Experts (LR-MoE) for multilingual and code-switching ASR. LR-MoE extracts language-specific representations through the Mixture of Language Experts (MLE), which is guided to learn by a frame-wise language routing mechanism. The weight-shared frame-level language identification (LID) network is jointly trained as the shared pre-router of each MoE layer. Experiments show that the proposed method significantly improves multilingual and code-switching speech recognition performances over baseline with comparable computational efficiency.

知識 (knowledge) · 圖 · Engineering · ChatGPT · 知識圖譜 ·

2023 年 7 月 13 日

LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT

Lars-Peter Meyer,Claus Stadler,Johannes Frey,Norman Radtke,Kurt Junghanns,Roy Meissner,Gordian Dziwis,Kirill Bulert,Michael Martin

from arxiv, to appear in conference proceedings of AI-Tomorrow-23, 29.+30.6.2023 in Leipzig, Germany

Knowledge Graphs (KG) provide us with a structured, flexible, transparent, cross-system, and collaborative way of organizing our knowledge and data across various domains in society and industrial as well as scientific disciplines. KGs surpass any other form of representation in terms of effectiveness. However, Knowledge Graph Engineering (KGE) requires in-depth experiences of graph structures, web technologies, existing models and vocabularies, rule sets, logic, as well as best practices. It also demands a significant amount of work. Considering the advancements in large language models (LLMs) and their interfaces and applications in recent years, we have conducted comprehensive experiments with ChatGPT to explore its potential in supporting KGE. In this paper, we present a selection of these experiments and their results to demonstrate how ChatGPT can assist us in the development and management of KGs.

Prompt · 學成 · Extensibility · 替代損失 · 講稿 ·

2022 年 5 月 6 日

Prompt Distribution Learning

Yuning Lu,Jianzhuang Liu,Yonggang Zhang,Yajing Liu,Xinmei Tian

from arxiv, Accepted by CVPR 2022

We present prompt distribution learning for effectively adapting a pre-trained vision-language model to address downstream recognition tasks. Our method not only learns low-bias prompts from a few samples but also captures the distribution of diverse prompts to handle the varying visual representations. In this way, we provide high-quality task-related content for facilitating recognition. This prompt distribution learning is realized by an efficient approach that learns the output embeddings of prompts instead of the input embeddings. Thus, we can employ a Gaussian distribution to model them effectively and derive a surrogate loss for efficient training. Extensive experiments on 12 datasets demonstrate that our method consistently and significantly outperforms existing methods. For example, with 1 sample per category, it relatively improves the average result by 9.1% compared to human-crafted prompts.

Networking · SimPLe · Automator · INFORMS · Prompt ·

2021 年 6 月 11 日

Neural Architecture Search without Training

Joseph Mellor,Jack Turner,Amos Storkey,Elliot J. Crowley

from arxiv, Accepted at ICML 2021 for a long presentation

The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be alleviated if we could partially predict a network's trained accuracy from its initial state. In this work, we examine the overlap of activations between datapoints in untrained networks and motivate how this can give a measure which is usefully indicative of a network's trained performance. We incorporate this measure into a simple algorithm that allows us to search for powerful networks without any training in a matter of seconds on a single GPU, and verify its effectiveness on NAS-Bench-101, NAS-Bench-201, NATS-Bench, and Network Design Spaces. Our approach can be readily combined with more expensive search methods; we examine a simple adaptation of regularised evolutionary search. Code for reproducing our experiments is available at //github.com/BayesWatch/nas-without-training.

MoDELS · 圖卷積神經網絡/圖卷積網絡 · 圖 · 圖卷積 · Networking ·

2020 年 12 月 14 日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Dong Wang,Di Hu,Xingjian Li,Dejing Dou

from arxiv, Accepted by the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

Temporal relational modeling in video is essential for human action understanding, such as action recognition and action segmentation. Although Graph Convolution Networks (GCNs) have shown promising advantages in relation reasoning on many tasks, it is still a challenge to apply graph convolution networks on long video sequences effectively. The main reason is that large number of nodes (i.e., video frames) makes GCNs hard to capture and model temporal relations in videos. To tackle this problem, in this paper, we introduce an effective GCN module, Dilated Temporal Graph Reasoning Module (DTGRM), designed to model temporal relations and dependencies between video frames at various time spans. In particular, we capture and model temporal relations via constructing multi-level dilated temporal graphs where the nodes represent frames from different moments in video. Moreover, to enhance temporal reasoning ability of the proposed model, an auxiliary self-supervised task is proposed to encourage the dilated temporal graph reasoning module to find and correct wrong temporal relations in videos. Our DTGRM model outperforms state-of-the-art action segmentation models on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. The code is available at //github.com/redwang/DTGRM.

圖形處理器 · 圖 · Neural Networks · Networking · 層 ·

2020 年 5 月 24 日

Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks

Zonghan Wu,Shirui Pan,Guodong Long,Jing Jiang,Xiaojun Chang,Chengqi Zhang

from arxiv, Accepted by KDD 2020

Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivariate time series forecasting is that its variables depend on one another but, upon looking closely, it is fair to say that existing methods fail to fully exploit latent spatial dependencies between pairs of variables. In recent years, meanwhile, graph neural networks (GNNs) have shown high capability in handling relational dependencies. GNNs require well-defined graph structures for information propagation which means they cannot be applied directly for multivariate time series where the dependencies are not known in advance. In this paper, we propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module, into which external knowledge like variable attributes can be easily integrated. A novel mix-hop propagation layer and a dilated inception layer are further proposed to capture the spatial and temporal dependencies within the time series. The graph learning, graph convolution, and temporal convolution modules are jointly learned in an end-to-end framework. Experimental results show that our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets and achieves on-par performance with other approaches on two traffic datasets which provide extra structural information.

SOFT · 硬性注意力 · 注意力機制 · Performer · MoDELS ·

2018 年 1 月 31 日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Tao Shen,Tianyi Zhou,Guodong Long,Jing Jiang,Sen Wang,Chengqi Zhang

from arxiv, 12 pages, 3 figures

Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, "reinforced self-attention (ReSA)", for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft self-attention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called "reinforced sequence sampling (RSS)", selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNN-free sentence-encoding model, "reinforced self-attention network (ReSAN)", solely based on ReSA. It achieves state-of-the-art performance on both Stanford Natural Language Inference (SNLI) and Sentences Involving Compositional Knowledge (SICK) datasets.