In indoor scenes, reverberation is a crucial factor in degrading the perceived quality and intelligibility of speech. In this work, we propose a generative dereverberation method. Our approach is based on a probabilistic model utilizing a recurrent variational auto-encoder (RVAE) network and the convolutive transfer function (CTF) approximation. Different from most previous approaches, the output of our RVAE serves as the prior of the clean speech. And our target is the maximum a posteriori (MAP) estimation of clean speech, which is achieved iteratively through the expectation maximization (EM) algorithm. The proposed method integrates the capabilities of network-based speech prior modelling and CTF-based observation modelling. Experiments on single-channel speech dereverberation show that the proposed generative method noticeably outperforms the advanced discriminative networks.
We consider the problem of using SciML to predict solutions of high Mach fluid flows over irregular geometries. In this setting, data is limited, and so it is desirable for models to perform well in the low-data setting. We show that Neural Basis Functions (NBF), which learns a basis of behavior modes from the data and then uses this basis to make predictions, is more effective than a basis-unaware baseline model. In addition, we identify continuing challenges in the space of predicting solutions for this type of problem.
We establish that a large and flexible class of long, high redundancy error correcting codes can be efficiently and accurately decoded with guessing random additive noise decoding (GRAND). Performance evaluation demonstrates that it is possible to construct simple concatenated codes that outperform low-density parity-check (LDPC) codes found in the 5G New Radio standard. The concatenated structure enables many desirable features, including: low-complexity hardware-friendly encoding and decoding; high levels of flexibility in length and rate through modularity; and high levels of parallelism in encoding and decoding that enable low latency. Central to this is the development of a method through which any soft-input (SI) GRAND algorithm can provide soft-output (SO) in the form of an accurate a-posteriori estimate of the likelihood that a decoding is correct or, in the case of list decoding, the likelihood that each element of the list is correct. The key distinguishing feature of SOGRAND in comparison to other methods is the provision of an estimate that the correct decoding has not been found, even when providing a single decoding. That per-block SO can be converted into accurate per-bit SO by a weighted sum that includes a term for the SI. Crucially, implementing SOGRAND adds negligible computation and memory to the existing decoding process, and using it results in a practical alternative to LDPC codes.
In many applications, it is desired to obtain extreme eigenvalues and eigenvectors of large Hermitian matrices by efficient and compact algorithms. In particular, orthogonalization-free methods are preferred for large-scale problems for finding eigenspaces of extreme eigenvalues without explicitly computing orthogonal vectors in each iteration. For the top $p$ eigenvalues, the simplest orthogonalization-free method is to find the best rank-$p$ approximation to a positive semi-definite Hermitian matrix by algorithms solving the unconstrained Burer-Monteiro formulation. We show that the nonlinear conjugate gradient method for the unconstrained Burer-Monteiro formulation is equivalent to a Riemannian conjugate gradient method on a quotient manifold with the Bures-Wasserstein metric, thus its global convergence to a stationary point can be proven. Numerical tests suggest that it is efficient for computing the largest $k$ eigenvalues for large-scale matrices if the largest $k$ eigenvalues are nearly distributed uniformly.
When two stiff inclusions are closely located, the gradient of the solution may become arbitrarily large as the distance between two inclusions tends to zero. Since blow-up of the gradient occurs in the narrow region, fine meshes should be required to compute the gradient. Thus, it is a challenging problem to numerically compute the gradient. Recent studies have shown that the major singularity can be extracted in an explicit way, so it suffices to compute the residual term for which only regular meshes are required. In this paper, we show through numerical simulations that the characterization of the singular term method can be efficiently used for the computation of the gradient when two strongly convex stiff domains of general shapes are closely located.
With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is framed as a discrete-structured non-convex optimization, prompting the development of algorithms rooted in Coordinate Descent (CD) techniques. These CD-based methods provide high-quality solutions to the complex non-convex layer-wise quantization problems. Notably, our CD-based approach features straightforward updates, relying solely on matrix and vector operations, circumventing the need for matrix inversion or decomposition. We also explore an outlier-aware variant of our approach, allowing for retaining significant weights (outliers) with complete precision. Our proposal attains state-of-the-art performance in terms of perplexity and zero-shot accuracy in empirical evaluations across various LLMs and datasets, with relative improvements up to 15% over methods such as GPTQ. Leveraging careful linear algebra optimizations, QuantEase can quantize models like Falcon-180B on a single NVIDIA A100 GPU in $\sim$3 hours. Particularly noteworthy is our outlier-aware algorithm's capability to achieve near or sub-3-bit quantization of LLMs with an acceptable drop in accuracy, obviating the need for non-uniform quantization or grouping techniques, improving upon methods such as SpQR by up to two times in terms of perplexity.
We present ParrotTTS, a modularized text-to-speech synthesis model leveraging disentangled self-supervised speech representations. It can train a multi-speaker variant effectively using transcripts from a single speaker. ParrotTTS adapts to a new language in low resource setup and generalizes to languages not seen while training the self-supervised backbone. Moreover, without training on bilingual or parallel examples, ParrotTTS can transfer voices across languages while preserving the speaker specific characteristics, e.g., synthesizing fluent Hindi speech using a French speaker's voice and accent. We present extensive results in monolingual and multi-lingual scenarios. ParrotTTS outperforms state-of-the-art multi-lingual TTS models using only a fraction of paired data as latter.
Many real-life signals are defined on spherical domains, in particular in geophysics and physics applications. In this work, we tackle the problem of extending the iterative filtering algorithm, developed for the decomposition of non-stationary signals defined in Euclidean spaces, to spherical domains. We review the properties of the classical Iterative Filtering method, present its extension, and study its convergence in the discrete setting. In particular, by leveraging the Generalized Locally Toeplitz sequence theory, we are able to characterize spectrally the operators associated with the spherical extension of Iterative Filtering, and we show a counterexample of its convergence. Finally, we propose a convergent version, called Spherical Iterative Filtering, and present numerical results of its application to spherical data.
The estimation of causal effects is a fundamental goal in the field of causal inference. However, it is challenging for various reasons. One reason is that the exposure (or treatment) is naturally continuous in many real-world scenarios. When dealing with continuous exposure, dichotomizing the exposure variable based on a pre-defined threshold may result in a biased understanding of causal relationships. In this paper, we propose a novel causal inference framework that can measure the causal effect of continuous exposure. We define the expectation of a derivative of potential outcomes at a specific exposure level as the average causal derivative effect. Additionally, we propose a matching method for this estimator and propose a permutation approach to test the hypothesis of no local causal effect. We also investigate the asymptotic properties of the proposed estimator and examine its performance through simulation studies. Finally, we apply this causal framework in a real data example of Chronic Obstructive Pulmonary Disease (COPD) patients.
The trace plot is seldom used in meta-analysis, yet it is a very informative plot. In this article we define and illustrate what the trace plot is, and discuss why it is important. The Bayesian version of the plot combines the posterior density of tau, the between-study standard deviation, and the shrunken estimates of the study effects as a function of tau. With a small or moderate number of studies, tau is not estimated with much precision, and parameter estimates and shrunken study effect estimates can vary widely depending on the correct value of tau. The trace plot allows visualization of the sensitivity to tau along with a plot that shows which values of tau are plausible and which are implausible. A comparable frequentist or empirical Bayes version provides similar results. The concepts are illustrated using examples in meta-analysis and meta-regression; implementaton in R is facilitated in a Bayesian or frequentist framework using the bayesmeta and metafor packages, respectively.
Knowledge graphs are important resources for many artificial intelligence tasks but often suffer from incompleteness. In this work, we propose to use pre-trained language models for knowledge graph completion. We treat triples in knowledge graphs as textual sequences and propose a novel framework named Knowledge Graph Bidirectional Encoder Representations from Transformer (KG-BERT) to model these triples. Our method takes entity and relation descriptions of a triple as input and computes scoring function of the triple with the KG-BERT language model. Experimental results on multiple benchmark knowledge graphs show that our method can achieve state-of-the-art performance in triple classification, link prediction and relation prediction tasks.