This paper proposes a deep sound-field denoiser, a deep neural network (DNN) based denoising of optically measured sound-field images. Sound-field imaging using optical methods has gained considerable attention due to its ability to achieve high-spatial-resolution imaging of acoustic phenomena that conventional acoustic sensors cannot accomplish. However, the optically measured sound-field images are often heavily contaminated by noise because of the low sensitivity of optical interferometric measurements to airborne sound. Here, we propose a DNN-based sound-field denoising method. Time-varying sound-field image sequences are decomposed into harmonic complex-amplitude images by using a time-directional Fourier transform. The complex images are converted into two-channel images consisting of real and imaginary parts and denoised by a nonlinear-activation-free network. The network is trained on a sound-field dataset obtained from numerical acoustic simulations with randomized parameters. We compared the method with conventional ones, such as image filters, a spatiotemporal filter, and other DNN architectures, on numerical and experimental data. The experimental data were measured by parallel phase-shifting interferometry and holographic speckle interferometry. The proposed deep sound-field denoiser significantly outperformed the conventional methods on both the numerical and experimental data. Code is available on GitHub: //github.com/nttcslab/deep-sound-field-denoiser.
We give a broad survey of inequalities for the number of linear extensions of finite posets. We review many examples, discuss open problems, and present recent results on the subject. We emphasize the bounds, the equality conditions of the inequalities, and the computational complexity aspects of the results.
We survey recent contributions to finite element exterior calculus over manifolds and surfaces within a comprehensive formalism for the error analysis of vector-valued partial differential equations over manifolds. Our primary focus is on uniformly bounded commuting projections over manifolds: these projections map from Sobolev de Rham complexes onto finite element de Rham complexes, commute with the differential operators, and satisfy uniform bounds in Lebesgue norms. They enable the Galerkin theory of Hilbert complexes for a large range of intrinsic finite element methods over manifolds. However, these intrinsic finite element methods are generally not computable and thus primarily of theoretical interest. This leads to our second point: estimating the geometric variational crime incurred by transitioning to computable approximate problems. Lastly, our third point addresses how to estimate the approximation error of the intrinsic finite element method in terms of the mesh size. If the solution is not continuous, then such an estimate is achieved via modified Cl\'ement or Scott-Zhang interpolants that facilitate a broken Bramble--Hilbert lemma.
Spoken language understanding systems using audio-only data are gaining popularity, yet their ability to handle unseen intents remains limited. In this study, we propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent. To achieve this, we first train a supervised audio-to-intent classifier by making use of a self-supervised pre-trained model. We then leverage a neural audio synthesizer to create audio embeddings for sample text utterances and perform generalized zero-shot classification on unseen intents using cosine similarity. We also propose a multimodal training strategy that incorporates lexical information into the audio representation to improve zero-shot performance. Our multimodal training approach improves the accuracy of zero-shot intent classification on unseen intents of SLURP by 2.75% and 18.2% for the SLURP and internal goal-oriented dialog datasets, respectively, compared to audio-only training.
We investigate the link between regularised self-transport problems and maximum likelihood estimation in Gaussian mixture models (GMM). This link suggests that self-transport followed by a clustering technique leads to principled estimators at a reasonable computational cost. Also, robustness, sparsity and stability properties of the optimal transport plan arguably make the regularised self-transport a statistical tool of choice for the GMM.
We consider the community recovery problem on a multilayer variant of the hypergraph stochastic block model (HSBM). Each layer is associated with an independent realization of a d-uniform HSBM on N vertices. Given the similarity matrix containing the aggregated number of hyperedges incident to each pair of vertices, the goal is to obtain a partition of the N vertices into disjoint communities. In this work, we investigate a semidefinite programming (SDP) approach and obtain information-theoretic conditions on the model parameters that guarantee exact recovery both in the assortative and the disassortative cases.
Previous audio-visual speech separation methods use the synchronization of the speaker's facial movement and speech in the video to supervise the speech separation in a self-supervised way. In this paper, we propose a model to solve the speech separation problem assisted by both face and sign language, which we call the extended speech separation problem. We design a general deep learning network for learning the combination of three modalities, audio, face, and sign language information, for better solving the speech separation problem. To train the model, we introduce a large-scale dataset named the Chinese Sign Language News Speech (CSLNSpeech) dataset, in which three modalities of audio, face, and sign language coexist. Experiment results show that the proposed model has better performance and robustness than the usual audio-visual system. Besides, sign language modality can also be used alone to supervise speech separation tasks, and the introduction of sign language is helpful for hearing-impaired people to learn and communicate. Last, our model is a general speech separation framework and can achieve very competitive separation performance on two open-source audio-visual datasets. The code is available at //github.com/iveveive/SLNSpeech
At least two, different approaches to define and solve statistical models for the analysis of economic systems exist: the typical, econometric one, interpreting the Gravity Model specification as the expected link weight of an arbitrary probability distribution, and the one rooted into statistical physics, constructing maximum-entropy distributions constrained to satisfy certain network properties. In a couple of recent, companion papers they have been successfully integrated within the framework induced by the constrained minimisation of the Kullback-Leibler divergence: specifically, two, broad classes of models have been devised, i.e. the integrated and the conditional ones, defined by different, probabilistic rules to place links, load them with weights and turn them into proper, econometric prescriptions. Still, the recipes adopted by the two approaches to estimate the parameters entering into the definition of each model differ. In econometrics, a likelihood that decouples the binary and weighted parts of a model, treating a network as deterministic, is typically maximised; to restore its random character, two alternatives exist: either solving the likelihood maximisation on each configuration of the ensemble and taking the average of the parameters afterwards or taking the average of the likelihood function and maximising the latter one. The difference between these approaches lies in the order in which the operations of averaging and maximisation are taken - a difference that is reminiscent of the quenched and annealed ways of averaging out the disorder in spin glasses. The results of the present contribution, devoted to comparing these recipes in the case of continuous, conditional network models, indicate that the annealed estimation recipe represents the best alternative to the deterministic one.
We propose a hybrid Finite Volume (FV) - Spectral Element Method (SEM) for modelling aeroacoustic phenomena based on the Lighthill's acoustic analogy. First the fluid solution is computed employing a FV method. Then, the sound source term is projected onto the acoustic grid and the inhomogeneous Lighthill's wave equation is solved employing the SEM. The novel projection method computes offline the intersections between the acoustic and the fluid grids in order to preserve the accuracy. The proposed intersection algorithm is shown to be robust, scalable and able to efficiently compute the geometric intersection of arbitrary polyhedral elements. We then analyse the properties of the projection error, showing that if the fluid grid is fine enough we are able to exploit the accuracy of the acoustic solver and we numerically assess the obtained theoretical estimates. Finally, we address two relevant aeroacoustic benchmarks, namely the corotating vortex pair and the noise induced by a laminar flow around a squared cylinder, to demonstrate in practice the effectiveness of the projection method when dealing with high order solvers. The flow computations are performed with OpenFOAM [46], an open-source finite volume library, while the inhomogeneous Lighthill's wave equation is solved with SPEED [31], an opensource spectral element library.
This paper presents the workspace optimization of one-translational two-rotational (1T2R) parallel manipulators using a dimensionally homogeneous constraint-embedded Jacobian. The mixed degrees of freedom of 1T2R parallel manipulators, which cause dimensional inconsistency, make it difficult to optimize their architectural parameters. To solve this problem, a point-based approach with a shifting property, selection matrix, and constraint-embedded inverse Jacobian is proposed. A simplified formulation is provided, eliminating the complex partial differentiation required in previous approaches. The dimensional homogeneity of the proposed method was analytically proven, and its validity was confirmed by comparing it with the conventional point-based method using a 3-PRS manipulator. Furthermore, the approach was applied to an asymmetric 2-RRS/RRRU manipulator with no parasitic motion. This mechanism has a T-shape combination of limbs with different kinematic parameters, making it challenging to derive a dimensionally homogeneous Jacobian using the conventional method. Finally, optimization was performed, and the results show that the proposed method is more efficient than the conventional approach. The efficiency and simplicity of the proposed method were verified using two distinct parallel manipulators.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.