苹果电影在线观看免费高清,日韩欧美国产AⅤ另类,久热精品视频一区二区三区,中文字幕人妻综合色在线,丰满少妇人妻久久久久久久久

Despite previous efforts in melody-to-lyric generation research, there is still a significant compatibility gap between generated lyrics and melodies, negatively impacting the singability of the outputs. This paper bridges the singability gap with a novel approach to generating singable lyrics by jointly Learning wOrding And Formatting during Melody-to-Lyric training (LOAF-M2L). After general-domain pretraining, our proposed model acquires length awareness first from a large text-only lyric corpus. Then, we introduce a new objective informed by musicological research on the relationship between melody and lyrics during melody-to-lyric training, which enables the model to learn the fine-grained format requirements of the melody. Our model achieves 3.75% and 21.44% absolute accuracy gains in the outputs' number-of-line and syllable-per-line requirements compared to naive fine-tuning, without sacrificing text fluency. Furthermore, our model demonstrates a 63.92% and 74.18% relative improvement of music-lyric compatibility and overall quality in the subjective evaluation, compared to the state-of-the-art melody-to-lyric generation model, highlighting the significance of formatting learning.

相關內容

Learning

關注 12

SimPLe · Learning · 優化器 · 樣本 · 線性的 ·

2023 年 8 月 28 日

SimFBO: Towards Simple, Flexible and Communication-efficient Federated Bilevel Learning

Yifan Yang,Peiyao Xiao,Kaiyi Ji

Federated bilevel optimization (FBO) has shown great potential recently in machine learning and edge computing due to the emerging nested optimization structure in meta-learning, fine-tuning, hyperparameter tuning, etc. However, existing FBO algorithms often involve complicated computations and require multiple sub-loops per iteration, each of which contains a number of communication rounds. In this paper, we propose a simple and flexible FBO framework named SimFBO, which is easy to implement without sub-loops, and includes a generalized server-side aggregation and update for improving communication efficiency. We further propose System-level heterogeneity robust FBO (ShroFBO) as a variant of SimFBO with stronger resilience to heterogeneous local computation. We show that SimFBO and ShroFBO provably achieve a linear convergence speedup with partial client participation and client sampling without replacement, as well as improved sample and communication complexities. Experiments demonstrate the effectiveness of the proposed methods over existing FBO algorithms.

MoDELS · 優化器 · 求逆 · 全 · 代價 ·

2023 年 8 月 26 日

Beyond Inverted Pendulums: Task-optimal Simple Models of Legged Locomotion

Yu-Ming Chen,Jianshu Hu,Michael Posa

Reduced-order models (ROM) are popular in online motion planning due to their simplicity. A good ROM for control captures critical task-relevant aspects of the full dynamics while remaining low dimensional. However, planning within the reduced-order space unavoidably constrains the full model, and hence we sacrifice the full potential of the robot. In the community of legged locomotion, this has lead to a search for better model extensions, but many of these extensions require human intuition, and there has not existed a principled way of evaluating the model performance and discovering new models. In this work, we propose a model optimization algorithm that automatically synthesizes reduced-order models, optimal with respect to a user-specified distribution of tasks and corresponding cost functions. To demonstrate our work, we optimized models for a bipedal robot Cassie. We show in simulation that the optimal ROM reduces the cost of Cassie's joint torques by up to 23% and increases its walking speed by up to 54%. We also show hardware result that the real robot walks on flat ground with 10% lower torque cost. All videos and code can be found at //sites.google.com/view/ymchen/research/optimal-rom.

Performer · Less · 多峰值 · 模型性能 · MoDELS ·

2023 年 8 月 26 日

A Comprehensive Survey for Evaluation Methodologies of AI-Generated Music

Zeyu Xiong,Weitao Wang,Jing Yu,Yue Lin,Ziyan Wang

In recent years, AI-generated music has made significant progress, with several models performing well in multimodal and complex musical genres and scenes. While objective metrics can be used to evaluate generative music, they often lack interpretability for musical evaluation. Therefore, researchers often resort to subjective user studies to assess the quality of the generated works, which can be resource-intensive and less reproducible than objective metrics. This study aims to comprehensively evaluate the subjective, objective, and combined methodologies for assessing AI-generated music, highlighting the advantages and disadvantages of each approach. Ultimately, this study provides a valuable reference for unifying generative AI in the field of music evaluation.

E2E · MoDELS · 語音識別 · 端到端 · Learning ·

2023 年 8 月 25 日

Decoupled Structure for Improved Adaptability of End-to-End Models

Keqi Deng,Philip C. Woodland

Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great success by jointly learning acoustic and linguistic information, it still suffers from the effect of domain shifts, thus limiting potential applications. The E2E ASR model implicitly learns an internal language model (LM) which characterises the training distribution of the source domain, and the E2E trainable nature makes the internal LM difficult to adapt to the target domain with text-only data To solve this problem, this paper proposes decoupled structures for attention-based encoder-decoder (Decoupled-AED) and neural transducer (Decoupled-Transducer) models, which can achieve flexible domain adaptation in both offline and online scenarios while maintaining robust intra-domain performance. To this end, the acoustic and linguistic parts of the E2E model decoder (or prediction network) are decoupled, making the linguistic component (i.e. internal LM) replaceable. When encountering a domain shift, the internal LM can be directly replaced during inference by a target-domain LM, without re-training or using domain-specific paired speech-text data. Experiments for E2E ASR models trained on the LibriSpeech-100h corpus showed that the proposed decoupled structure gave 15.1% and 17.2% relative word error rate reductions on the TED-LIUM 2 and AESRC2020 corpora while still maintaining performance on intra-domain data.

Agent · 機器人 · Extensibility · Performer · 情景 ·

2023 年 8 月 24 日

Exploring Emerging Technologies for Requirements Elicitation Interview Training: Empirical Assessment of Robotic and Virtual Tutors

Binnur G?rer,Fatma Ba?ak Aydemir

from arxiv, Author submitted manuscript

Requirements elicitation interviews are a widely adopted technique, where the interview success heavily depends on the interviewer's preparedness and communication skills. Students can enhance these skills through practice interviews. However, organizing practice interviews for many students presents scalability challenges, given the time and effort required to involve stakeholders in each session. To address this, we propose REIT, an extensible architecture for Requirements Elicitation Interview Training system based on emerging educational technologies. REIT consists of two phases: the interview phase, wherein students act as interviewers while the system assumes the role of an interviewee, and the feedback phase, during which the system assesses students' performance and offers contextual and behavioral feedback to enhance their interviewing skills. We demonstrate the applicability of REIT through two implementations: RoREIT with a physical robotic agent and VoREIT with a virtual voice-only agent. We empirically evaluated both instances with a group of graduate students. The participants appreciated both systems. They demonstrated higher learning gain when trained with RoREIT, but they found VoREIT more engaging and easier to use. These findings indicate that each system has its distinct benefits and drawbacks, suggesting that \gensys{} can be configured for various educational settings based on preferences and available resources.

TOOLS · AI · MoDELS · 評論員 · 正則化項 ·

2023 年 8 月 24 日

A Survey of AI Music Generation Tools and Models

Yueyue Zhu,Jared Baca,Banafsheh Rekabdar,Reza Rawassizadeh

In this work, we provide a comprehensive survey of AI music generation tools, including both research projects and commercialized applications. To conduct our analysis, we classified music generation approaches into three categories: parameter-based, text-based, and visual-based classes. Our survey highlights the diverse possibilities and functional features of these tools, which cater to a wide range of users, from regular listeners to professional musicians. We observed that each tool has its own set of advantages and limitations. As a result, we have compiled a comprehensive list of these factors that should be considered during the tool selection process. Moreover, our survey offers critical insights into the underlying mechanisms and challenges of AI music generation.

Prompt · MoDELS · 學成 · Extensibility · 向量化 ·

2022 年 3 月 10 日

Conditional Prompt Learning for Vision-Language Models

Kaiyang Zhou,Jingkang Yang,Chen Change Loy,Ziwei Liu

from arxiv, CVPR 2022. TL;DR: We propose a conditional prompt learning approach to solve the generalizability issue of static prompts

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning -- a recent trend in NLP -- to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp overfits base classes observed during training. To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). Compared to CoOp's static prompts, our dynamic prompts adapt to each instance and are thus less sensitive to class shift. Extensive experiments show that CoCoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset; and yields stronger domain generalization performance as well. Code is available at //github.com/KaiyangZhou/CoOp.

學成 · Machine Learning · INTERACT · 圖 · INFORMS ·

2021 年 5 月 27 日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

David Ahmedt-Aristizabal,Mohammad Ali Armin,Simon Denman,Clinton Fookes,Lars Petersson

With the advances of data-driven machine learning research, a wide variety of prediction problems have been tackled. It has become critical to explore how machine learning and specifically deep learning methods can be exploited to analyse healthcare data. A major limitation of existing methods has been the focus on grid-like data; however, the structure of physiological recordings are often irregular and unordered which makes it difficult to conceptualise them as a matrix. As such, graph neural networks have attracted significant attention by exploiting implicit information that resides in a biological system, with interactive nodes connected by edges whose weights can be either temporal associations or anatomical junctions. In this survey, we thoroughly review the different types of graph architectures and their applications in healthcare. We provide an overview of these methods in a systematic manner, organized by their domain of application including functional connectivity, anatomical structure and electrical-based analysis. We also outline the limitations of existing techniques and discuss potential directions for future research.

損失函數（機器學習） · 學習的學習 · 學成 · entity · 泛函 ·

2019 年 9 月 9 日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Jiawei Wu,Wenhan Xiong,William Yang Wang

from arxiv, 11pages, 5 figures, accepted to EMNLP 2019

Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.

注意力機制 · MoDELS · 學成 · Performer · 卷積神經網絡 ·

2018 年 5 月 20 日

Attention U-Net: Learning Where to Look for the Pancreas

Ozan Oktay,Jo Schlemper,Loic Le Folgoc,Matthew Lee,Mattias Heinrich,Kazunari Misawa,Kensaku Mori,Steven McDonagh,Nils Y Hammerla,Bernhard Kainz,Ben Glocker,Daniel Rueckert

from arxiv, Accepted to published in MIDL'18 (Revised Version) / OpenReview link: //openreview.net/forum?id=Skft7cijM

We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.