Recently, conditional score-based diffusion models have gained significant attention in the field of supervised speech enhancement, yielding state-of-the-art performance. However, these methods may face challenges when generalising to unseen conditions. To address this issue, we introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models. Specifically, in a training phase, a clean speech prior distribution is learnt in the short-time Fourier transform (STFT) domain using score-based diffusion models, allowing it to unconditionally generate clean speech from Gaussian noise. Then, we develop a posterior sampling methodology for speech enhancement by combining the learnt clean speech prior with a noise model for speech signal inference. The noise parameters are simultaneously learnt along with clean speech estimation through an iterative expectationmaximisation (EM) approach. To the best of our knowledge, this is the first work exploring diffusion-based generative models for unsupervised speech enhancement, demonstrating promising results compared to a recent variational auto-encoder (VAE)-based unsupervised approach and a state-of-the-art diffusion-based supervised method. It thus opens a new direction for future research in unsupervised speech enhancement.
Uncertainty quantification using Bayesian methods is a growing area of research. Bayesian model mixing (BMM) is a recent development which combines the predictions from multiple models such that each model's best qualities are preserved in the final result. Practical tools and analysis suites that facilitate such methods are therefore needed. Taweret introduces BMM to existing Bayesian uncertainty quantification efforts. Currently Taweret contains three individual Bayesian model mixing techniques, each pertaining to a different type of problem structure; we encourage the future inclusion of user-developed mixing methods. Taweret's first use case is in nuclear physics, but the package has been structured such that it should be adaptable to any research engaged in model comparison or model mixing.
We consider (nonparametric) sparse additive models (SpAM) for classification. The design of a SpAM classifier is based on minimizing the logistic loss with a sparse group Lasso and more general sparse group Slope-type penalties on the coefficients of univariate components' expansions in orthonormal series (e.g., Fourier or wavelets). The resulting classifiers are inherently adaptive to the unknown sparsity and smoothness. We show that under certain sparse group restricted eigenvalue condition the sparse group Lasso classifier is nearly-minimax (up to log-factors) within the entire range of analytic, Sobolev and Besov classes while the sparse group Slope classifier achieves the exact minimax order (without the extra log-factors) for sparse and moderately dense setups. The performance of the proposed classifier is illustrated on the real-data example.
We present a component-based model order reduction procedure to efficiently and accurately solve parameterized incompressible flows governed by the Navier-Stokes equations. Our approach leverages a non-overlapping optimization-based domain decomposition technique to determine the control variable that minimizes jumps across the interfaces between sub-domains. To solve the resulting constrained optimization problem, we propose both Gauss-Newton and sequential quadratic programming methods, which effectively transform the constrained problem into an unconstrained formulation. Furthermore, we integrate model order reduction techniques into the optimization framework, to speed up computations. In particular, we incorporate localized training and adaptive enrichment to reduce the burden associated with the training of the local reduced-order models. Numerical results are presented to demonstrate the validity and effectiveness of the overall methodology.
Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the reward distributions: the classic $K$-armed bandit and the linearly parameterized bandit settings. We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits. For the regret minimization setting in $K$-armed as well as linear bandit problems, we propose algorithms that are inspired by Upper Confidence Bound (UCB) algorithms, incorporate reward distortions, and exhibit sublinear regret. For the $K$-armed bandit setting, we derive an upper bound on the expected regret for our proposed algorithm, and then we prove a matching lower bound to establish the order-optimality of our algorithm. For the linearly parameterized setting, our algorithm achieves a regret upper bound that is of the same order as that of regular linear bandit algorithm called Optimism in the Face of Uncertainty Linear (OFUL) bandit algorithm, and unlike OFUL, our algorithm handles distortions and an arm-dependent noise model. For the best arm identification problem in the $K$-armed bandit setting, we propose algorithms, derive guarantees on their performance, and also show that these algorithms are order optimal by proving matching fundamental limits on performance. For best arm identification in linear bandits, we propose an algorithm and establish sample complexity guarantees. Finally, we present simulation experiments which demonstrate the advantages resulting from using distortion-aware learning algorithms in a vehicular traffic routing application.
In the emerging field of mechanical metamaterials, using periodic lattice structures as a primary ingredient is relatively frequent. However, the choice of aperiodic lattices in these structures presents unique advantages regarding failure, e.g., buckling or fracture, because avoiding repeated patterns prevents global failures, with local failures occurring in turn that can beneficially delay structural collapse. Therefore, it is expedient to develop models for computing efficiently the effective mechanical properties in lattices from different general features while addressing the challenge of presenting topologies (or graphs) of different sizes. In this paper, we develop a deep learning model to predict energetically-equivalent mechanical properties of linear elastic lattices effectively. Considering the lattice as a graph and defining material and geometrical features on such, we show that Graph Neural Networks provide more accurate predictions than a dense, fully connected strategy, thanks to the geometrically induced bias through graph representation, closer to the underlying equilibrium laws from mechanics solved in the direct problem. Leveraging the efficient forward-evaluation of a vast number of lattices using this surrogate enables the inverse problem, i.e., to obtain a structure having prescribed specific behavior, which is ultimately suitable for multiscale structural optimization problems.
We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to evaluate them. Using best practices in psychophysics, we measure human perception of image realism for generated samples by conducting the largest experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations. Comparing to 17 modern metrics for evaluating the overall performance, fidelity, diversity, rarity, and memorization of generative models, we find that the state-of-the-art perceptual realism of diffusion models as judged by humans is not reflected in commonly reported metrics such as FID. This discrepancy is not explained by diversity in generated samples, though one cause is over-reliance on Inception-V3. We address these flaws through a study of alternative self-supervised feature extractors, find that the semantic information encoded by individual networks strongly depends on their training procedure, and show that DINOv2-ViT-L/14 allows for much richer evaluation of generative models. Next, we investigate data memorization, and find that generative models do memorize training examples on simple, smaller datasets like CIFAR10, but not necessarily on more complex datasets like ImageNet. However, our experiments show that current metrics do not properly detect memorization: none in the literature is able to separate memorization from other phenomena such as underfitting or mode shrinkage. To facilitate further development of generative models and their evaluation we release all generated image datasets, human evaluation data, and a modular library to compute 17 common metrics for 9 different encoders at //github.com/layer6ai-labs/dgm-eval.
The spatial autoregressive (SAR) model is extended by introducing a Markov switching dynamics for the weight matrix and spatial autoregressive parameter. The framework enables the identification of regime-specific connectivity patterns and strengths and the study of the spatiotemporal propagation of shocks in a system with a time-varying spatial multiplier matrix. The proposed model is applied to disaggregated CPI data from 15 EU countries to examine cross-price dependencies. The analysis identifies distinct connectivity structures and spatial weights across the states, which capture shifts in consumer behaviour, with marked cross-country differences in the spillover from one price category to another.
The Volterra integral-functional series is the classic approach for nonlinear black box dynamical systems modeling. It is widely employed in many domains including radiophysics, aerodynamics, electronic and electrical engineering and many other. Identifying the time-varying functional parameters, also known as Volterra kernels, poses a difficulty due to the curse of dimensionality. This refers to the exponential growth in the number of model parameters as the complexity of the input-output response increases. The least squares method (LSM) is widely acknowledged as the standard approach for tackling the issue of identifying parameters. Unfortunately, the LSM suffers with many drawbacks such as the sensitivity to outliers causing biased estimation, multicollinearity, overfitting and inefficiency with large datasets. This paper presents alternative approach based on direct estimation of the Volterra kernels using the collocation method. Two model examples are studied. It is found that the collocation method presents a promising alternative for optimization, surpassing the traditional least squares method when it comes to the Volterra kernels identification including the case when input and output signals suffer from considerable measurement errors.
Latent variable models (LVMs) are commonly used to capture the underlying dependencies, patterns, and hidden structure in observed data. Source duplication is a by-product of the data hankelisation pre-processing step common to single channel LVM applications, which hinders practical LVM utilisation. In this article, a Python package titled spectrally-regularised-LVMs is presented. The proposed package addresses the source duplication issue via the addition of a novel spectral regularisation term. This package provides a framework for spectral regularisation in single channel LVM applications, thereby making it easier to investigate and utilise LVMs with spectral regularisation. This is achieved via the use of symbolic or explicit representations of potential LVM objective functions which are incorporated into a framework that uses spectral regularisation during the LVM parameter estimation process. The objective of this package is to provide a consistent linear LVM optimisation framework which incorporates spectral regularisation and caters to single channel time-series applications.
This paper describes two intelligibility prediction systems derived from a pretrained noise-robust automatic speech recognition (ASR) model for the second Clarity Prediction Challenge (CPC2). One system is intrusive and leverages the hidden representations of the ASR model. The other system is non-intrusive and makes predictions with derived ASR uncertainty. The ASR model is only pretrained with a simulated noisy speech corpus and does not take advantage of the CPC2 data. For that reason, the intelligibility prediction systems are robust to unseen scenarios given the accurate prediction performance on the CPC2 evaluation.