成人午夜性影院视频,国产成人三级经典中文

from arxiv, Corrected typos. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Video-to-speech synthesis is the task of reconstructing the speech signal from a silent video of a speaker. Most established approaches to date involve a two-step process, whereby an intermediate representation from the video, such as a spectrogram, is extracted first and then passed to a vocoder to produce the raw audio. Some recent work has focused on end-to-end synthesis, whereby the generation of raw audio and any intermediate representations is performed jointly. All such approaches involve training on data from almost exclusively audio-visual datasets, i.e. every audio sample has a corresponding video sample. This precludes the use of abundant audio-only datasets which may not have a corresponding visual modality (e.g. audiobooks, radio podcasts, speech recognition datasets etc.), as well as audio-only architectures that have been developed by the audio machine learning community over the years. In this paper we propose to train encoder-decoder models on more than 3,500 hours of audio data at 24kHz, and then use the pre-trained decoders to initialize the audio decoders for the video-to-speech synthesis task. The pre-training step uses audio samples only and does not require labels or corresponding samples from other modalities (visual, text). We demonstrate that this pre-training step improves the reconstructed speech and that it is an unexplored way to improve the quality of the generator in a cross-modal task while only requiring samples from one of the modalities. We conduct experiments using both raw audio and mel spectrograms as target outputs and benchmark our models with existing work.

相關內容

樣本

關注 2

統計量 · MoDELS · SimPLe · DATE · Networks ·

2023 年 9 月 21 日

Overcoming near-degeneracy in the autologistic actor attribute model

Alex Stivala

from arxiv, Added a paper to literature survey

The autologistic actor attribute model, or ALAAM, is the social influence counterpart of the better-known exponential-family random graph model (ERGM) for social selection. Extensive experience with ERGMs has shown that the problem of near-degeneracy which often occurs with simple models can be overcome by using "geometrically weighted" or "alternating" statistics. In the much more limited empirical applications of ALAAMs to date, the problem of near-degeneracy, although theoretically expected, appears to have been less of an issue. In this work I present a comprehensive survey of ALAAM applications, showing that this model has to date only been used with relatively small networks, in which near-degeneracy does not appear to be a problem. I show near-degeneracy does occur in simple ALAAM models of larger empirical networks, define some geometrically weighted ALAAM statistics analogous to those for ERGM, and demonstrate that models with these statistics do not suffer from near-degeneracy and hence can be estimated where they could not be with the simple statistics.

Processing（編程語言） · Stream Processing · 流 · Integration · CC ·

2023 年 9 月 21 日

An implicit unified gas-kinetic wave-particle method for radiative transport process

Chang Liu,Weiming Li,Peng Song,Kun Xu

The unified gas-kinetic wave-particle method (UGKWP) has been developed for the multiscale gas, plasma, and multiphase flow transport processes for the past years. In this work, we propose an implicit unified gas-kinetic wave-particle (IUGKWP) method to remove the CFL time step constraint. Based on the local integral solution of the radiative transfer equation (RTE), the particle transport processes are categorized into the long-$\lambda$ streaming process and the short-$\lambda$ streaming process comparing to a local physical characteristic time $t_p$. In the construction of the IUGKWP method, the long-$\lambda$ streaming process is tracked by the implicit Monte Carlo (IMC) method; the short-$\lambda$ streaming process is evolved by solving the implicit moments equations; and the photon distribution is closed by a local integral solution of RTE. In the IUGKWP method, the multiscale flux of radiation energy and the multiscale closure of photon distribution are constructed based on the local integral solution. The IUGKWP method preserves the second-order asymptotic expansion of RTE in the optically thick regime and adapts its computational complexity to the flow regime. The numerical dissipation is well controlled, and the teleportation error is significantly reduced in the optically thick regime. The computational complexity of the IUGKWP method decreases exponentially as the Knudsen number approaches zero, and the computational efficiency is remarkably improved in the optically thick regime. The IUGKWP is formulated on a generalized unstructured mesh, and multidimensional 2D and 3D algorithms are developed. Numerical tests are presented to validate the capability of IUGKWP in capturing the multiscale photon transport process. The algorithm and code will apply in the engineering applications of inertial confinement fusion (ICF).

可辨認的 · Learning · Machine Learning · MoDELS · Performer ·

2023 年 9 月 21 日

Identification of pneumonia on chest x-ray images through machine learning

Eduardo Augusto Roeder

from arxiv, In Brazilian Portuguese, 30 pages, 16 figures. This thesis was elaborated by the guidance of Prof. Dr. Akihito Inca Atahualpa Urdiales

Pneumonia is the leading infectious cause of infant death in the world. When identified early, it is possible to alter the prognosis of the patient, one could use imaging exams to help in the diagnostic confirmation. Performing and interpreting the exams as soon as possible is vital for a good treatment, with the most common exam for this pathology being chest X-ray. The objective of this study was to develop a software that identify the presence or absence of pneumonia in chest radiographs. The software was developed as a computational model based on machine learning using transfer learning technique. For the training process, images were collected from a database available online with children's chest X-rays images taken at a hospital in China. After training, the model was then exposed to new images, achieving relevant results on identifying such pathology, reaching 98% sensitivity and 97.3% specificity for the sample used for testing. It can be concluded that it is possible to develop a software that identifies pneumonia in chest X-ray images.

離散化 · remeshing · bulk · 相互獨立的 · 確切的 ·

2023 年 9 月 21 日

A structure-preserving finite element method for the multi-phase Mullins-Sekerka problem with triple junctions

Tokuhiro Eto,Harald Garcke,Robert Nürnberg

from arxiv, 27 pages, 11 figures

We consider a sharp interface formulation for the multi-phase Mullins-Sekerka flow. The flow is characterized by a network of curves evolving such that the total surface energy of the curves is reduced, while the areas of the enclosed phases are conserved. Making use of a variational formulation, we introduce a fully discrete finite element method. Our discretization features a parametric approximation of the moving interfaces that is independent of the discretization used for the equations in the bulk. The scheme can be shown to be unconditionally stable and to satisfy an exact volume conservation property. Moreover, an inherent tangential velocity for the vertices on the discrete curves leads to asymptotically equidistributed vertices, meaning no remeshing is necessary in practice. Several numerical examples, including a convergence experiment for the three-phase Mullins-Sekerka flow, demonstrate the capabilities of the introduced method.

簇 · Performer · 剪枝 · Analysis · 數據集 ·

2023 年 9 月 21 日

Cluster-based pruning techniques for audio data

Boris Bergsma,Marta Brzezinska,Oleg V. Yazyev,Milos Cernak

Deep learning models have become widely adopted in various domains, but their performance heavily relies on a vast amount of data. Datasets often contain a large number of irrelevant or redundant samples, which can lead to computational inefficiencies during the training. In this work, we introduce, for the first time in the context of the audio domain, the k-means clustering as a method for efficient data pruning. K-means clustering provides a way to group similar samples together, allowing the reduction of the size of the dataset while preserving its representative characteristics. As an example, we perform clustering analysis on the keyword spotting (KWS) dataset. We discuss how k-means clustering can significantly reduce the size of audio datasets while maintaining the classification performance across neural networks (NNs) with different architectures. We further comment on the role of scaling analysis in identifying the optimal pruning strategies for a large number of samples. Our studies serve as a proof-of-principle, demonstrating the potential of data selection with distance-based clustering algorithms for the audio domain and highlighting promising research avenues.

語音識別 · Integration · Processing（編程語言） · MoDELS · 自動語音識別 ·

2023 年 9 月 21 日

CPPF: A contextual and post-processing-free model for automatic speech recognition

Lei Zhang,Zhengkun Tian,Xiang Chen,Jiaming Sun,Hongyu Xiang,Ke Ding,Guanglu Wan

from arxiv, Submitted to ICASSP2024

ASR systems have become increasingly widespread in recent years. However, their textual outputs often require post-processing tasks before they can be practically utilized. To address this issue, we draw inspiration from the multifaceted capabilities of LLMs and Whisper, and focus on integrating multiple ASR text processing tasks related to speech recognition into the ASR model. This integration not only shortens the multi-stage pipeline, but also prevents the propagation of cascading errors, resulting in direct generation of post-processed text. In this study, we focus on ASR-related processing tasks, including Contextual ASR and multiple ASR post processing tasks. To achieve this objective, we introduce the CPPF model, which offers a versatile and highly effective alternative to ASR processing. CPPF seamlessly integrates these tasks without any significant loss in recognition performance.

表示 · 不變 · 描述符 · 穩健性 · 不變性 ·

2023 年 9 月 20 日

Enhancing motion trajectory segmentation of rigid bodies using a novel screw-based trajectory-shape representation

Arno Verduyn,Maxim Vochten,Joris De Schutter

from arxiv, This work has been submitted to the IEEE International Conference on Robotics and Automation (ICRA) for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Trajectory segmentation refers to dividing a trajectory into meaningful consecutive sub-trajectories. This paper focuses on trajectory segmentation for 3D rigid-body motions. Most segmentation approaches in the literature represent the body's trajectory as a point trajectory, considering only its translation and neglecting its rotation. We propose a novel trajectory representation for rigid-body motions that incorporates both translation and rotation, and additionally exhibits several invariant properties. This representation consists of a geometric progress rate and a third-order trajectory-shape descriptor. Concepts from screw theory were used to make this representation time-invariant and also invariant to the choice of body reference point. This new representation is validated for a self-supervised segmentation approach, both in simulation and using real recordings of human-demonstrated pouring motions. The results show a more robust detection of consecutive submotions with distinct features and a more consistent segmentation compared to conventional representations. We believe that other existing segmentation methods may benefit from using this trajectory representation to improve their invariance.

分解的 ·

2023 年 9 月 20 日

Tropical cryptography III: digital signatures

Jiale Chen,Dima Grigoriev,Vladimir Shpilrain

from arxiv, 7 pages

We use tropical algebras as platforms for a very efficient digital signature protocol. Security relies on computational hardness of factoring one-variable tropical polynomials; this problem is known to be NP-hard.

樣本 · MoDELS · 混合模型 · 奇異的 · Performer ·

2023 年 9 月 19 日

Implementing a new fully stepwise decomposition-based sampling technique for the hybrid water level forecasting model in real-world application

Ziqian Zhang,Nana Bao,Xingting Yan,Aokai Zhu,Chenyang Li,Mingyu Liu

Various time variant non-stationary signals need to be pre-processed properly in hydrological time series forecasting in real world, for example, predictions of water level. Decomposition method is a good candidate and widely used in such a pre-processing problem. However, decomposition methods with an inappropriate sampling technique may introduce future data which is not available in practical applications, and result in incorrect decomposition-based forecasting models. In this work, a novel Fully Stepwise Decomposition-Based (FSDB) sampling technique is well designed for the decomposition-based forecasting model, strictly avoiding introducing future information. This sampling technique with decomposition methods, such as Variational Mode Decomposition (VMD) and Singular spectrum analysis (SSA), is applied to predict water level time series in three different stations of Guoyang and Chaohu basins in China. Results of VMD-based hybrid model using FSDB sampling technique show that Nash-Sutcliffe Efficiency (NSE) coefficient is increased by 6.4%, 28.8% and 7.0% in three stations respectively, compared with those obtained from the currently most advanced sampling technique. In the meantime, for series of SSA-based experiments, NSE is increased by 3.2%, 3.1% and 1.1% respectively. We conclude that the newly developed FSDB sampling technique can be used to enhance the performance of decomposition-based hybrid model in water level time series forecasting in real world.

INFORMS · Performer · 講稿 · 監督 · 語音學 ·

2023 年 9 月 19 日

A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures

Tobias Cord-Landwehr,Christoph Boeddeker,C?t?lin Zoril?,Rama Doddipatla,Reinhold Haeb-Umbach

from arxiv, Proceedings of INTERSPEECH

We introduce a monaural neural speaker embeddings extractor that computes an embedding for each speaker present in a speech mixture. To allow for supervised training, a teacher-student approach is employed: the teacher computes the target embeddings from each speaker's utterance before the utterances are added to form the mixture, and the student embedding extractor is then tasked to reproduce those embeddings from the speech mixture at its input. The system much more reliably verifies the presence or absence of a given speaker in a mixture than a conventional speaker embedding extractor, and even exhibits comparable performance to a multi-channel approach that exploits spatial information for embedding extraction. Further, it is shown that a speaker embedding computed from a mixture can be used to check for the presence of that speaker in another mixture.