Tracking the fundamental frequency (f0) of a monophonic instrumental performance is effectively a solved problem with several solutions achieving 99% accuracy. However, the related task of automatic music transcription requires a further processing step to segment an f0 contour into discrete notes. This sub-task of note segmentation is necessary to enable a range of applications including musicological analysis and symbolic music generation. Building on CREPE, a state-of-the-art monophonic pitch tracking solution based on a simple neural network, we propose a simple and effective method for post-processing CREPE's output to achieve monophonic note segmentation. The proposed method demonstrates state-of-the-art results on two challenging datasets of monophonic instrumental music. Our approach also gives a 97% reduction in the total number of parameters used when compared with other deep learning based methods.
In this paper, the surface of revolution discrete element method (SR-DEM) is introduced to simulate systems of particles with closed surfaces of revolution. Due to the cylindrical symmetry of a surface of revolution, the geometry of any cross-section about the axis of rotation remains the same. Taking advantage of this geometric feature, a node-to-cross-section contact algorithm is proposed for efficient contact detection between particles with a surface of revolution. In our SR-DEM framework, the contact algorithm is realized in a master-slave fashion: the master particle is approximated by its surface nodes, while the slave particle is represented by a signed distance field (SDF) of the cross-section about the axis of rotation. This hybrid formulation in both 2D and 3D space allows a very efficient contact calculation yet relatively simple code implementation. We then apply SR-DEM to simulate particle-particle, particle-wall impact, granular packing in a cylindrical container, and tablets in a rotating drum, to demonstrate SR-DEM's ability to predict the post-impact velocities, packing porosity, and dynamic angle of repose, respectively. Finally, we suggest a simple approach to find an optimal surface resolution, by increasing the number of surface nodes until some of the bulk properties that could characterize the system converge.
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline.
The present work is devoted to strong approximations of a generalized Ait-Sahalia model arising from mathematical finance. The numerical study of the considered model faces essential difficulties caused by a drift that blows up at the origin, highly nonlinear drift and diffusion coefficients and positivity-preserving requirement. In this paper, a novel explicit Euler-type scheme is proposed, which is easily implementable and able to preserve positivity of the original model unconditionally, i.e., for any time step-size h>0. A mean-square convergence rate of order 0.5 is also obtained for the proposed scheme in both non-critical and general critical cases. Our work is motivated by the need to justify the multi-level Monte Carlo (MLMC) simulations for the underlying model, where the rate of mean-square convergence is required and the preservation of positivity is desirable particularly for large discretization time steps. To the best of our knowledge, this is the first paper to propose an unconditionally positivity preserving explicit scheme with order 1/2 of mean-square convergence for the model. Numerical experiments are finally provided to confirm the theoretical findings.
The Levin method is a well-known technique for evaluating oscillatory integrals, which operates by solving a certain ordinary differential equation in order to construct an antiderivative of the integrand. It was long believed that this approach suffers from "low-frequency breakdown," meaning that the accuracy of the calculated value of the integral deteriorates when the integrand is only slowly oscillating. Recently presented experimental evidence, however, suggests that if a Chebyshev spectral method is used to discretize the differential equation and the resulting linear system is solved via a truncated singular value decomposition, then no low-frequency breakdown occurs. Here, we provide a proof that this is the case, and our proof applies not only when the integrand is slowly oscillating, but even in the case of stationary points. Our result puts adaptive schemes based on the Levin method on a firm theoretical foundation and accounts for their behavior in the presence of stationary points. We go on to point out that by combining an adaptive Levin scheme with phase function methods for ordinary differential equations, a large class of oscillatory integrals involving special functions, including products of such functions and the compositions of such functions with slowly-varying functions, can be easily evaluated without the need for symbolic computations. Finally, we present the results of numerical experiments which illustrate the consequences of our analysis and demonstrate the properties of the adaptive Levin method.
Gradient-enhanced Kriging (GE-Kriging) is a well-established surrogate modelling technique for approximating expensive computational models. However, it tends to get impractical for high-dimensional problems due to the size of the inherent correlation matrix and the associated high-dimensional hyper-parameter tuning problem. To address these issues, a new method, called sliced GE-Kriging (SGE-Kriging), is developed in this paper for reducing both the size of the correlation matrix and the number of hyper-parameters. We first split the training sample set into multiple slices, and invoke Bayes' theorem to approximate the full likelihood function via a sliced likelihood function, in which multiple small correlation matrices are utilized to describe the correlation of the sample set rather than one large one. Then, we replace the original high-dimensional hyper-parameter tuning problem with a low-dimensional counterpart by learning the relationship between the hyper-parameters and the derivative-based global sensitivity indices. The performance of SGE-Kriging is finally validated by means of numerical experiments with several benchmarks and a high-dimensional aerodynamic modeling problem. The results show that the SGE-Kriging model features an accuracy and robustness that is comparable to the standard one but comes at much less training costs. The benefits are most evident for high-dimensional problems with tens of variables.
We propose a finite difference scheme for the numerical solution of a two-dimensional singularly perturbed convection-diffusion partial differential equation whose solution features interacting boundary and interior layers, the latter due to discontinuities in source term. The problem is posed on the unit square. The second derivative is multiplied by a singular perturbation parameter, $\epsilon$, while the nature of the first derivative term is such that flow is aligned with a boundary. These two facts mean that solutions tend to exhibit layers of both exponential and characteristic type. We solve the problem using a finite difference method, specially adapted to the discontinuities, and applied on a piecewise-uniform (Shishkin). We prove that that the computed solution converges to the true one at a rate that is independent of the perturbation parameter, and is nearly first-order. We present numerical results that verify that these results are sharp.
Multichannel convolutive blind speech source separation refers to the problem of separating different speech sources from the observed multichannel mixtures without much a priori information about the mixing system. Multichannel nonnegative matrix factorization (MNMF) has been proven to be one of the most powerful separation frameworks and the representative algorithms such as MNMF and the independent low-rank matrix analysis (ILRMA) have demonstrated great performance. However, the sparseness properties of speech source signals are not fully taken into account in such a framework. It is well known that speech signals are sparse in nature, which is considered in this work to improve the separation performance. Specifically, we utilize the Bingham and Laplace distributions to formulate a disjoint constraint regularizer, which is subsequently incorporated into both MNMF and ILRMA. We then derive majorization-minimization rules for updating parameters related to the source model, resulting in the development of two enhanced algorithms: s-MNMF and s-ILRMA. Comprehensive simulations are conducted, and the results unequivocally demonstrate the efficacy of our proposed methodologies.
We introduce ensembles of stochastic neural networks to approximate the Bayesian posterior, combining stochastic methods such as dropout with deep ensembles. The stochastic ensembles are formulated as families of distributions and trained to approximate the Bayesian posterior with variational inference. We implement stochastic ensembles based on Monte Carlo dropout, DropConnect and a novel non-parametric version of dropout and evaluate them on a toy problem and CIFAR image classification. For both tasks, we test the quality of the posteriors directly against Hamiltonian Monte Carlo simulations. Our results show that stochastic ensembles provide more accurate posterior estimates than other popular baselines for Bayesian inference.
Despite increasing interest in the automatic detection of media frames in NLP, the problem is typically simplified as single-label classification and adopts a topic-like view on frames, evading modelling the broader document-level narrative. In this work, we revisit a widely used conceptualization of framing from the communication sciences which explicitly captures elements of narratives, including conflict and its resolution, and integrate it with the narrative framing of key entities in the story as heroes, victims or villains. We adapt an effective annotation paradigm that breaks a complex annotation task into a series of simpler binary questions, and present an annotated data set of English news articles, and a case study on the framing of climate change in articles from news outlets across the political spectrum. Finally, we explore automatic multi-label prediction of our frames with supervised and semi-supervised approaches, and present a novel retrieval-based method which is both effective and transparent in its predictions. We conclude with a discussion of opportunities and challenges for future work on document-level models of narrative framing.
We consider wave scattering from a system of highly contrasting resonators with time-modulated material parameters. In this setting, the wave equation reduces to a system of coupled Helmholtz equations that models the scattering problem. We consider the one-dimensional setting. In order to understand the energy of the system, we prove a novel higher-order discrete, capacitance matrix approximation of the subwavelength resonant quasifrequencies. Further, we perform numerical experiments to support and illustrate our analytical results and show how periodically time-dependent material parameters affect the scattered wave field.