清纯唯美另类亚洲欧美综合_亚洲国产一区二区三区欧美_久久久久亚洲AV色欲AV首页_一级A爱做片免费观看国产_精品国产片自在线拍免费看_狠狠色婷婷久久一区二区三区免费_欧美在线免费黄片

Jeonghun Park,Byungju Lee,Jinseok Choi,Hoon Lee,Namyoon Lee,Seok-Hwan Park,Kyoung-Jae Lee,Junil Choi,Sung Ho Chae,Sang-Woon Jeon,Kyung Sup Kwak,Bruno Clerckx,Wonjae Shin

from arxiv, 17 pages, 6 figures, submitted to IEEE Network Magazine

In the upcoming 6G era, multiple access (MA) will play an essential role in achieving high throughput performances required in a wide range of wireless applications. Since MA and interference management are closely related issues, the conventional MA techniques are limited in that they cannot provide near-optimal performance in universal interference regimes. Recently, rate-splitting multiple access (RSMA) has been gaining much attention. RSMA splits an individual message into two parts: a common part, decodable by every user, and a private part, decodable only by the intended user. Each user first decodes the common message and then decodes its private message by applying successive interference cancellation (SIC). By doing so, RSMA not only embraces the existing MA techniques as special cases but also provides significant performance gains by efficiently mitigating inter-user interference in a broad range of interference regimes. In this article, we first present the theoretical foundation of RSMA. Subsequently, we put forth four key benefits of RSMA: spectral efficiency, robustness, scalability, and flexibility. Upon this, we describe how RSMA can enable ten promising scenarios and applications along with future research directions to pave the way for 6G.

相關內容

Performer

關注 10

估計/估計量 · INTERACT · Learning · Networking · Extensibility ·

2023 年 8 月 15 日

HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning

Xiaozheng Zheng,Chao Wen,Zhou Xue,Pengfei Ren,Jingyu Wang

from arxiv, Accepted to ICCV 2023. Won first place in the HANDS22 Challenge Task 2. Project page: //zxz267.github.io/HaMuCo

Recent advancements in 3D hand pose estimation have shown promising results, but its effectiveness has primarily relied on the availability of large-scale annotated datasets, the creation of which is a laborious and costly process. To alleviate the label-hungry limitation, we propose a self-supervised learning framework, HaMuCo, that learns a single-view hand pose estimator from multi-view pseudo 2D labels. However, one of the main challenges of self-supervised learning is the presence of noisy labels and the ``groupthink'' effect from multiple views. To overcome these issues, we introduce a cross-view interaction network that distills the single-view estimator by utilizing the cross-view correlated features and enforcing multi-view consistency to achieve collaborative learning. Both the single-view estimator and the cross-view interaction network are trained jointly in an end-to-end manner. Extensive experiments show that our method can achieve state-of-the-art performance on multi-view self-supervised hand pose estimation. Furthermore, the proposed cross-view interaction network can also be applied to hand pose estimation from multi-view input and outperforms previous methods under the same settings.

Continuity · 動量 · contrastive · 知識 (knowledge) · Learning ·

2023 年 8 月 14 日

CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation

Hongguang Zhu,Yunchao Wei,Xiaodan Liang,Chunjie Zhang,Yao Zhao

from arxiv, Accepted by ICCV 2023. Code: //github.com/KevinLight831/CTP

Vision-Language Pretraining (VLP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets. Regarding the growing nature of real-world data, such an offline training paradigm on ever-expanding data is unsustainable, because models lack the continual learning ability to accumulate knowledge constantly. However, most continual learning studies are limited to uni-modal classification and existing multi-modal datasets cannot simulate continual non-stationary data stream scenarios. To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D which contains over one million product image-text pairs from 9 industries. The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data. We comprehensively study the characteristics and challenges of VLCP, and propose a new algorithm: Compatible momentum contrast with Topology Preservation, dubbed CTP. The compatible momentum model absorbs the knowledge of the current and previous-task models to flexibly update the modal feature. Moreover, Topology Preservation transfers the knowledge of embedding across tasks while preserving the flexibility of feature adjustment. The experimental results demonstrate our method not only achieves superior performance compared with other baselines but also does not bring an expensive training burden. Dataset and codes are available at //github.com/KevinLight831/CTP.

emotion model · MoDELS · INFORMS · Learning · Extensibility ·

2023 年 8 月 14 日

Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition

Jiaxin Ye,Xin-cheng Wen,Yujie Wei,Yong Xu,Kunhong Liu,Hongming Shan

from arxiv, ICASSP 2023

Speech emotion recognition (SER) plays a vital role in improving the interactions between humans and machines by inferring human emotion and affective states from speech signals. Whereas recent works primarily focus on mining spatiotemporal information from hand-crafted features, we explore how to model the temporal patterns of speech emotions from dynamic temporal scales. Towards that goal, we introduce a novel temporal emotional modeling approach for SER, termed Temporal-aware bI-direction Multi-scale Network (TIM-Net), which learns multi-scale contextual affective representations from various time scales. Specifically, TIM-Net first employs temporal-aware blocks to learn temporal affective representation, then integrates complementary information from the past and the future to enrich contextual representations, and finally, fuses multiple time scale features for better adaptation to the emotional variation. Extensive experimental results on six benchmark SER datasets demonstrate the superior performance of TIM-Net, gaining 2.34% and 2.61% improvements of the average UAR and WAR over the second-best on each corpus. The source code is available at //github.com/Jiaxin-Ye/TIM-Net_SER.

NeRF · Microsoft Surface · 相似度 · 數據點 · 損失 ·

2023 年 8 月 14 日

S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields

Zeke Xie,Xindi Yang,Yujie Yang,Qi Sun,Yixiang Jiang,Haoran Wang,Yunfeng Cai,Mingming Sun

from arxiv, ICCV 2023 main conference. Code: //github.com/Madaoer/S3IM. 14 pages, 5 figures, 17 tables

Recently, Neural Radiance Field (NeRF) has shown great success in rendering novel-view images of a given scene by learning an implicit representation with only posed RGB images. NeRF and relevant neural field methods (e.g., neural surface representation) typically optimize a point-wise loss and make point-wise predictions, where one data point corresponds to one pixel. Unfortunately, this line of research failed to use the collective supervision of distant pixels, although it is known that pixels in an image or scene can provide rich structural information. To the best of our knowledge, we are the first to design a nonlocal multiplex training paradigm for NeRF and relevant neural field methods via a novel Stochastic Structural SIMilarity (S3IM) loss that processes multiple data points as a whole set instead of process multiple inputs independently. Our extensive experiments demonstrate the unreasonable effectiveness of S3IM in improving NeRF and neural surface representation for nearly free. The improvements of quality metrics can be particularly significant for those relatively difficult tasks: e.g., the test MSE loss unexpectedly drops by more than 90% for TensoRF and DVGO over eight novel view synthesis tasks; a 198% F-score gain and a 64% Chamfer $L_{1}$ distance reduction for NeuS over eight surface reconstruction tasks. Moreover, S3IM is consistently robust even with sparse inputs, corrupted images, and dynamic scenes.

Microsoft Surface · 數據集 · HTTPS · BASIC · 模型評估 ·

2023 年 8 月 14 日

WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset and Benchmark for Autonomous Driving on Water Surfaces

Shanliang Yao,Runwei Guan,Zhaodong Wu,Yi Ni,Zile Huang,Zixian Zhang,Yong Yue,Weiping Ding,Eng Gee Lim,Hyungjoon Seo,Ka Lok Man,Xiaohui Zhu,Yutao Yue

Autonomous driving on water surfaces plays an essential role in executing hazardous and time-consuming missions, such as maritime surveillance, survivors rescue, environmental monitoring, hydrography mapping and waste cleaning. This work presents WaterScenes, the first multi-task 4D radar-camera fusion dataset for autonomous driving on water surfaces. Equipped with a 4D radar and a monocular camera, our Unmanned Surface Vehicle (USV) proffers all-weather solutions for discerning object-related information, including color, shape, texture, range, velocity, azimuth, and elevation. Focusing on typical static and dynamic objects on water surfaces, we label the camera images and radar point clouds at pixel-level and point-level, respectively. In addition to basic perception tasks, such as object detection, instance segmentation and semantic segmentation, we also provide annotations for free-space segmentation and waterline segmentation. Leveraging the multi-task and multi-modal data, we conduct benchmark experiments on the uni-modality of radar and camera, as well as the fused modalities. Experimental results demonstrate that 4D radar-camera fusion can considerably improve the accuracy and robustness of perception on water surfaces, especially in adverse lighting and weather conditions. WaterScenes dataset is public on //waterscenes.github.io.

MoDELS · 語言模型化 · Automator · Performer · tuning ·

2023 年 8 月 12 日

Bio-SIEVE: Exploring Instruction Tuning Large Language Models for Systematic Review Automation

Ambrose Robinson,William Thorne,Ben P. Wu,Abdullah Pandor,Munira Essat,Mark Stevenson,Xingyi Song

Medical systematic reviews can be very costly and resource intensive. We explore how Large Language Models (LLMs) can support and be trained to perform literature screening when provided with a detailed set of selection criteria. Specifically, we instruction tune LLaMA and Guanaco models to perform abstract screening for medical systematic reviews. Our best model, Bio-SIEVE, outperforms both ChatGPT and trained traditional approaches, and generalises better across medical domains. However, there remains the challenge of adapting the model to safety-first scenarios. We also explore the impact of multi-task training with Bio-SIEVE-Multi, including tasks such as PICO extraction and exclusion reasoning, but find that it is unable to match single-task Bio-SIEVE's performance. We see Bio-SIEVE as an important step towards specialising LLMs for the biomedical systematic review process and explore its future developmental opportunities. We release our models, code and a list of DOIs to reconstruct our dataset for reproducibility.

MoDELS · Processing（編程語言） · Uptake · 可辨認的 · 機器學習建模 ·

2023 年 8 月 11 日

Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models

Benjamin Laufer,Jon Kleinberg,Hoda Heidari

from arxiv, 36 pages, 8 figures

Major advances in Machine Learning (ML) and Artificial Intelligence (AI) increasingly take the form of developing and releasing general-purpose models. These models are designed to be adapted by other businesses and agencies to perform a particular, domain-specific function. This process has become known as adaptation or fine-tuning. This paper offers a model of the fine-tuning process where a Generalist brings the technological product (here an ML model) to a certain level of performance, and one or more Domain-specialist(s) adapts it for use in a particular domain. Both entities are profit-seeking and incur costs when they invest in the technology, and they must reach a bargaining agreement on how to share the revenue for the technology to reach the market. For a relatively general class of cost and revenue functions, we characterize the conditions under which the fine-tuning game yields a profit-sharing solution. We observe that any potential domain-specialization will either contribute, free-ride, or abstain in their uptake of the technology, and we provide conditions yielding these different strategies. We show how methods based on bargaining solutions and sub-game perfect equilibria provide insights into the strategic behavior of firms in these types of interactions, and we find that profit-sharing can still arise even when one firm has significantly higher costs than another. We also provide methods for identifying Pareto-optimal bargaining arrangements for a general set of utility functions.

剪枝 · Better · CAP · contrastive · MoDELS ·

2021 年 12 月 14 日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Runxin Xu,Fuli Luo,Chengyu Wang,Baobao Chang,Jun Huang,Songfang Huang,Fei Huang

from arxiv, Accepted to AAAI 2022

Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge. Besides, to better retain the performance of the pruned model, the snapshots (i.e., the intermediate models at each pruning iteration) also serve as effective supervisions for pruning. Our extensive experiments show that adopting CAP consistently yields significant improvements, especially in extremely high sparsity scenarios. With only 3% model parameters reserved (i.e., 97% sparsity), CAP successfully achieves 99.2% and 96.3% of the original BERT performance in QQP and MNLI tasks. In addition, our probing experiments demonstrate that the model pruned by CAP tends to achieve better generalization ability.

Extensibility · GM · MoDELS · 類別 · 多代理人模型 ·

2021 年 2 月 9 日

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Lewis Hammond,James Fox,Tom Everitt,Alessandro Abate,Michael Wooldridge

from arxiv, Accepted to the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibrium refinements. We then prove several equivalence results between MAIDs and EFGs. Finally, we describe an open source implementation for reasoning about MAIDs and computing their equilibria.

可理解性 · 多峰值 · MoDELS · Extensibility · Performer ·

2020 年 2 月 15 日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Huaishao Luo,Lei Ji,Botian Shi,Haoyang Huang,Nan Duan,Tianrui Li,Xilin Chen,Ming Zhou

We propose UniViLM: a Unified Video and Language pre-training Model for multimodal understanding and generation. Motivated by the recent success of BERT based pre-training technique for NLP and image-language tasks, VideoBERT and CBT are proposed to exploit BERT model for video and language pre-training using narrated instructional videos. Different from their works which only pre-train understanding task, we propose a unified video-language pre-training model for both understanding and generation tasks. Our model comprises of 4 components including two single-modal encoders, a cross encoder and a decoder with the Transformer backbone. We first pre-train our model to learn the universal representation for both video and language on a large instructional video dataset. Then we fine-tune the model on two multimodal tasks including understanding task (text-based video retrieval) and generation task (multimodal video captioning). Our extensive experiments show that our method can improve the performance of both understanding and generation tasks and achieves the state-of-the art results.