久草精品视频在线观看_69WW无码免费视频播放_久久一区二区免费播放_亚洲欧美一区二区三区不卡_精品视频国产狼友视频第二页_久久国内精品久久久久久久久_日本无乱码高清在线电影

Autoregressive (AR) Transformer-based sequence models are known to have difficulty generalizing to sequences longer than those seen during training. When applied to text-to-speech (TTS), these models tend to drop or repeat words or produce erratic output, especially for longer utterances. In this paper, we introduce enhancements aimed at AR Transformer-based encoder-decoder TTS systems that address these robustness and length generalization issues. Our approach uses an alignment mechanism to provide cross-attention operations with relative location information. The associated alignment position is learned as a latent property of the model via backprop and requires no external alignment information during training. While the approach is tailored to the monotonic nature of TTS input-output alignment, it is still able to benefit from the flexible modeling power of interleaved multi-head self- and cross-attention operations. A system incorporating these improvements, which we call Very Attentive Tacotron, matches the naturalness and expressiveness of a baseline T5-based TTS system, while eliminating problems with repeated or dropped words and enabling generalization to any practical utterance length.

相關內容

語音合成

關注 491

語(yu)(yu)音(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)（Speech Synthesis），也稱為文(wen)語(yu)(yu)轉換(huan)（Text-to-Speech, TTS,它是將(jiang)任意的輸(shu)入(ru)(ru)文(wen)本(ben)轉換(huan)成(cheng)(cheng)(cheng)自然流暢的語(yu)(yu)音(yin)(yin)輸(shu)出。語(yu)(yu)音(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)涉及到(dao)人(ren)工智能(neng)(neng)、心理學(xue)、聲學(xue)、語(yu)(yu)言(yan)學(xue)、數字信(xin)(xin)號處(chu)理、計算機(ji)科學(xue)等(deng)(deng)(deng)多個學(xue)科技(ji)術，是信(xin)(xin)息處(chu)理領域中的一項前(qian)沿技(ji)術。隨著(zhu)(zhu)計算機(ji)技(ji)術的不(bu)斷提(ti)高，語(yu)(yu)音(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)技(ji)術從早期的共振峰合(he)(he)成(cheng)(cheng)(cheng),逐(zhu)步(bu)發(fa)展(zhan)為波形拼接合(he)(he)成(cheng)(cheng)(cheng)和統(tong)(tong)計參數語(yu)(yu)音(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)，再發(fa)展(zhan)到(dao)混(hun)合(he)(he)語(yu)(yu)音(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)；合(he)(he)成(cheng)(cheng)(cheng)語(yu)(yu)音(yin)(yin)的質量(liang)、自然度(du)已經得到(dao)明顯提(ti)高，基(ji)本(ben)能(neng)(neng)滿足一些特定場(chang)合(he)(he)的應(ying)(ying)用需(xu)求。目前(qian)，語(yu)(yu)音(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)技(ji)術在銀行、醫院等(deng)(deng)(deng)的信(xin)(xin)息播報系統(tong)(tong)、汽車導航系統(tong)(tong)、自動應(ying)(ying)答呼叫中心等(deng)(deng)(deng)都有(you)廣泛應(ying)(ying)用，取得了巨大(da)的經濟效益。另外，隨著(zhu)(zhu)智能(neng)(neng)手機(ji)、MP3、PDA 等(deng)(deng)(deng)與我們(men)(men)生活(huo)密切相關的媒(mei)介的大(da)量(liang)涌現，語(yu)(yu)音(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)的應(ying)(ying)用也在逐(zhu)漸向(xiang)娛樂(le)、語(yu)(yu)音(yin)(yin)教(jiao)學(xue)、康復(fu)治療等(deng)(deng)(deng)領域深入(ru)(ru)。可以說(shuo)語(yu)(yu)音(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)正在影響著(zhu)(zhu)人(ren)們(men)(men)生活(huo)的方方面(mian)面(mian)。

解碼 · MoDELS · 超參數 · 黑盒 · Continuity ·

2024 年 12 月 10 日

FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks

Bocheng Chen,Hanqing Guo,Qiben Yan

Defense in large language models (LLMs) is crucial to counter the numerous attackers exploiting these systems to generate harmful content through manipulated prompts, known as jailbreak attacks. Although many defense strategies have been proposed, they often require access to the model's internal structure or need additional training, which is impractical for service providers using LLM APIs, such as OpenAI APIs or Claude APIs. In this paper, we propose a moving target defense approach that alters decoding hyperparameters to enhance model robustness against various jailbreak attacks. Our approach does not require access to the model's internal structure and incurs no additional training costs. The proposed defense includes two key components: (1) optimizing the decoding strategy by identifying and adjusting decoding hyperparameters that influence token generation probabilities, and (2) transforming the decoding hyperparameters and model system prompts into dynamic targets, which are continuously altered during each runtime. By continuously modifying decoding strategies and prompts, the defense effectively mitigates the existing attacks. Our results demonstrate that our defense is the most effective against jailbreak attacks in three of the models tested when using LLMs as black-box APIs. Moreover, our defense offers lower inference costs and maintains comparable response quality, making it a potential layer of protection when used alongside other defense methods.

正則化項 · 優化器 · 估計/估計量 · Minimax · 情景 ·

2024 年 12 月 9 日

Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization

Ferdinand Genans,Antoine Godichon-Baggioni,Fran?ois-Xavier Vialard,Olivier Wintenberger

from arxiv, The result where based on Lemma A.1 in arXiv:1812.09150 which is wrong

Optimal Transport (OT) based distances are powerful tools for machine learning to compare probability measures and manipulate them using OT maps. In this field, a setting of interest is semi-discrete OT, where the source measure $\mu$ is continuous, while the target $\nu$ is discrete. Recent works have shown that the minimax rate for the OT map is $\mathcal{O}(t^{-1/2})$ when using $t$ i.i.d. subsamples from each measure (two-sample setting). An open question is whether a better convergence rate can be achieved when the full information of the discrete measure $\nu$ is known (one-sample setting). In this work, we answer positively to this question by (i) proving an $\mathcal{O}(t^{-1})$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation, and (ii) proposing a Stochastic Gradient Descent (SGD) algorithm with adaptive entropic regularization and averaging acceleration. To nearly achieve the desired fast rate, characteristic of non-regular parametric problems, we design an entropic regularization scheme decreasing with the number of samples. Another key step in our algorithm consists of using a projection step that permits to leverage the local strong convexity of the regularized OT problem. Our convergence analysis integrates online convex optimization and stochastic gradient techniques, complemented by the specificities of the OT semi-dual. Moreover, while being as computationally and memory efficient as vanilla SGD, our algorithm achieves the unusual fast rates of our theory in numerical experiments.

Learning · Markov · Processing（編程語言） · 強化學習 · 控制器 ·

2024 年 12 月 9 日

Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes

Juan Sebastian Rojas,Chi-Guhn Lee

Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty. However, average-reward MDPs have remained largely unexplored in reinforcement learning (RL) settings, with the majority of RL-based efforts having been allocated to episodic and discounted MDPs. In this work, we study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning: a novel RL framework that can be used to effectively and efficiently solve various subtasks simultaneously in the average-reward setting. We introduce a family of RED learning algorithms for prediction and control, including proven-convergent algorithms for the tabular case. We then showcase the power of these algorithms by demonstrating how they can be used to learn a policy that optimizes, for the first time, the well-known conditional value-at-risk (CVaR) risk measure in a fully-online manner, without the use of an explicit bi-level optimization scheme or an augmented state-space.

向量化 · 動力系統 · 平滑 · 離散化 · 周期的 ·

2024 年 12 月 9 日

Floer Homology: From Generalized Morse-Smale Dynamical Systems to Forman's Combinatorial Vector Fields

Marzieh Eidi,Jürgen Jost

from arxiv, 23 pages, 18 figures

We construct a Floer type boundary operator for generalised Morse-Smale dynamical systems on compact smooth manifolds by counting the number of suitable flow lines between closed (both homoclinic and periodic) orbits and isolated critical points. The same principle works for the discrete situation of general combinatorial vector fields, defined by Forman, on CW complexes. We can thus recover the $\mathbb{Z}_2$ homology of both smooth and discrete structures directly from the flow lines (V-paths) of our vector field.

Performer · 語言模型化 · 有向 · 表示 · MoDELS ·

2024 年 12 月 9 日

Householder Pseudo-Rotation: A Novel Approach to Activation Editing in LLMs with Direction-Magnitude Perspective

Van-Cuong Pham,Thien Huu Nguyen

from arxiv, EMNLP 2024

Activation Editing, which involves directly editting the internal representations of large language models (LLMs) to alter their behaviors and achieve desired properties, has emerged as a promising area of research. Existing works primarily treat LLMs' activations as points in space and modify them by adding steering vectors. However, this approach is limited in its ability to achieve greater performance improvement while maintaining the necessary consistency of activation magnitudes. To overcome these issues, we propose a novel editing method that views activations in terms of their directions and magnitudes. Our method, named Householder Pseudo-Rotation (HPR), mimics the rotation transformation, thus preserving activation norms and resulting in an improved performance on various safety benchmarks.

Learning · 圖 · INFORMS · 學習器 · Projection ·

2024 年 12 月 7 日

CoRemix: Supporting Informal Learning in Scratch Community With Visual Graph and Generative AI

Yunnong Chen,Yishu Shen,Ruiyi Liu,Xinyu Yu,Lingyun Sun,Liuqing Chen

from arxiv, 22 pages,7 figures,5 tables

Online programming communities provide a space for novices to engage with computing concepts, allowing them to learn and develop computing skills using user-generated projects. However, the lack of structured guidance in the informal learning environment often makes it difficult for novices to experience progressively challenging learning opportunities. Learners frequently struggle with understanding key project events and relations, grasping computing concepts, and remixing practices. This study introduces CoRemix, a generative AI-powered learning system that provides a visual graph to present key events and relations for project understanding. We propose a visual-textual scaffolding to help learners construct the visual graph and support remixing practice. Our user study demonstrates that CoRemix, compared to the baseline, effectively helps learners break down complex projects, enhances computing concept learning, and improves their experience with community resources for learning and remixing.

噪聲 · Guidance · MoDELS · Prompt · 有向 ·

2024 年 12 月 6 日

The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation

Ruoyu Wang,Huayang Huang,Ye Zhu,Olga Russakovsky,Yu Wu

from arxiv, 18 pages, 18 figures, 6 tables

Text-to-image synthesis (T2I) has advanced remarkably with the emergence of large-scale diffusion models. In the conventional setup, the text prompt provides explicit, user-defined guidance, directing the generation process by denoising a randomly sampled Gaussian noise. In this work, we reveal that the often-overlooked noise itself encodes inherent generative tendencies, acting as a "silent prompt" that implicitly guides the output. This implicit guidance, embedded in the noise scheduler design of diffusion model formulations and their training stages, generalizes across a wide range of T2I models and backbones. Building on this insight, we introduce NoiseQuery, a novel strategy that selects optimal initial noise from a pre-built noise library to meet diverse user needs. Our approach not only enhances high-level semantic alignment with text prompts, but also allows for nuanced adjustments of low-level visual attributes, such as texture, sharpness, shape, and color, which are typically challenging to control through text alone. Extensive experiments across various models and target attributes demonstrate the strong performance and zero-shot transferability of our approach, requiring no additional optimization.

估計/估計量 · 數據集 · CASES · 前向 · Mobileye ·

2024 年 12 月 6 日

EvTTC: An Event Camera Dataset for Time-to-Collision Estimation

Kaizhen Sun,Jinghang Li,Kuan Dai,Bangyan Liao,Wei Xiong,Yi Zhou

from arxiv, 8 pages, 7 figures, 5 tables

Time-to-Collision (TTC) estimation lies in the core of the forward collision warning (FCW) functionality, which is key to all Automatic Emergency Braking (AEB) systems. Although the success of solutions using frame-based cameras (e.g., Mobileye's solutions) has been witnessed in normal situations, some extreme cases, such as the sudden variation in the relative speed of leading vehicles and the sudden appearance of pedestrians, still pose significant risks that cannot be handled. This is due to the inherent imaging principles of frame-based cameras, where the time interval between adjacent exposures introduces considerable system latency to AEB. Event cameras, as a novel bio-inspired sensor, offer ultra-high temporal resolution and can asynchronously report brightness changes at the microsecond level. To explore the potential of event cameras in the above-mentioned challenging cases, we propose EvTTC, which is, to the best of our knowledge, the first multi-sensor dataset focusing on TTC tasks under high-relative-speed scenarios. EvTTC consists of data collected using standard cameras and event cameras, covering various potential collision scenarios in daily driving and involving multiple collision objects. Additionally, LiDAR and GNSS/INS measurements are provided for the calculation of ground-truth TTC. Considering the high cost of testing TTC algorithms on full-scale mobile platforms, we also provide a small-scale TTC testbed for experimental validation and data augmentation. All the data and the design of the testbed are open sourced, and they can serve as a benchmark that will facilitate the development of vision-based TTC techniques.

Learning · 圖 · Extensibility · motivation · 講稿 ·

2022 年 6 月 27 日

FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning

Zhen Wang,Weirui Kuang,Yuexiang Xie,Liuyi Yao,Yaliang Li,Bolin Ding,Jingren Zhou

from arxiv, Accpeted by KDD'2022; We have released FederatedScope for users on //github.com/alibaba/FederatedScope

The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and natural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package FederatedScope-GNN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities. We validate the effectiveness of FS-G by conducting extensive experiments, which simultaneously gains many valuable insights about FGL for the community. Moreover, we employ FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits. We publicly release FS-G, as submodules of FederatedScope, at //github.com/alibaba/FederatedScope to promote FGL's research and enable broad applications that would otherwise be infeasible due to the lack of a dedicated package.

知識 (knowledge) · Machine Learning · MoDELS · 學成 · Conformer ·

2022 年 5 月 10 日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Julian W?rmann,Daniel Bogdoll,Etienne Bührle,Han Chen,Evaristus Fuh Chuo,Kostadin Cvejoski,Ludger van Elst,Tobias Glei?ner,Philip Gottschall,Stefan Griesche,Christian Hellert,Christian Hesels,Sebastian Houben,Tim Joseph,Niklas Keil,Johann Kelsch,Hendrik K?nigshof,Erwin Kraft,Leonie Kreuser,Kevin Krone,Tobias Latka,Denny Mattern,Stefan Matthes,Mohsin Munir,Moritz Nekolla,Adrian Paschke,Maximilian Alexander Pintz,Tianming Qiu,Faraz Qureishi,Syed Tahseen Raza Rizvi,J?rg Reichardt,Laura von Rueden,Stefan Rudolph,Alexander Sagel,Gerhard Schunk,Hao Shen,Hendrik Stapelbroek,Vera Stehr,Gurucharan Srinivas,Anh Tuan Tran,Abhishek Vivekanandan,Ya Wang,Florian Wasserrab,Tino Werner,Christian Wirth,Stefan Zwicklbauer

from arxiv, 93 pages

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.