In recent years, text-to-audio models have revolutionized the field of automatic audio generation. This paper investigates their application in generating synthetic datasets for training data-driven models. Specifically, this study analyzes the performance of two environmental sound classification systems trained with data generated from text-to-audio models. We considered three scenarios: a) augmenting the training dataset with data generated by text-to-audio models; b) using a mixed training dataset combining real and synthetic text-driven generated data; and c) using a training dataset composed entirely of synthetic audio. In all cases, the performance of the classification models was tested on real data. Results indicate that text-to-audio models are effective for dataset augmentation, with consistent performance when replacing a subset of the recorded dataset. However, the performance of the audio recognition models drops when relying entirely on generated audio.
Recently, 2D convolution has been found unqualified in sound event detection (SED). It enforces translation equivariance on sound events along frequency axis, which is not a shift-invariant dimension. To address this issue, dynamic convolution is used to model the frequency dependency of sound events. In this paper, we proposed the first full-dynamic method named full-frequency dynamic convolution (FFDConv). FFDConv generates frequency kernels for every frequency band, which is designed directly in the structure for frequency-dependent modeling. It physically furnished 2D convolution with the capability of frequency-dependent modeling. FFDConv outperforms not only the baseline by 6.6% in DESED real validation dataset in terms of PSDS1, but outperforms the other full-dynamic methods. In addition, by visualizing features of sound events, we observed that FFDConv could effectively extract coherent features in specific frequency bands, consistent with the vocal continuity of sound events. This proves that FFDConv has great frequency-dependent perception ability.
With the emergence of new technologies and a growing number of wireless networks, we face the problem of radio spectrum shortages. As a result, identifying the wireless channel spectrum to exploit the channel's idle state while also boosting network security is a pivotal issue. Detecting and classifying protocols in the MAC sublayer enables Cognitive Radio users to improve spectrum utilization and minimize potential interference. In this paper, we classify the Wi-Fi and Bluetooth protocols, which are the most widely used MAC sublayer protocols in the ISM radio band. With the advent of various wireless technologies, especially in the 2.4 GHz frequency band, the ISM frequency spectrum has become crowded and high-traffic, which faces a lack of spectrum resources and user interference. Therefore, identifying and classifying protocols is an effective and useful method. Leveraging machine learning and deep learning techniques, known for their advanced classification capabilities, we apply Support Vector Machine and K-Nearest Neighbors algorithms, which are machine learning algorithms, to classify protocols into three classes: Wi-Fi, Wi-Fi Beacon, and Bluetooth. To capture the signals, we use the USRP N210 Software Defined Radio device and sample the real data in the indoor environment in different conditions of the presence and absence of transmitters and receivers for these two protocols. By assembling this dataset and studying the time and frequency features of the protocols, we extract the frame width and the silence gap between the two frames as time features and the PAPR of each frame as a power feature. By comparing the output of the protocols classification in different conditions and also adding Gaussian noise, it was found that the samples in the nonlinear SVM method with RBF and KNN functions have the best performance, with 97.83% and 98.12% classification accuracy, respectively.
The method of multivariable Mendelian randomization uses genetic variants to instrument multiple exposures, to estimate the effect that a given exposure has on an outcome conditional on all other exposures included in a linear model. Unfortunately, the inclusion of every additional exposure makes a weak instruments problem more likely, because we require conditionally strong genetic predictors of each exposure. This issue is well appreciated in practice, with different versions of F-statistics routinely reported as measures of instument strength. Less transparently, however, these F-statistics are sometimes used to guide instrument selection, and even to decide whether to report empirical results. Rather than discarding findings with low F-statistics, weak instrument-robust methods can provide valid inference under weak instruments. For multivariable Mendelian randomization with two-sample summary data, we encourage use of the inference strategy of Andrews (2018) that reports both robust and non-robust confidence sets, along with a statistic that measures how reliable the non-robust confidence set is in terms of coverage. We also propose a novel adjusted-Kleibergen statistic that corrects for overdispersion heterogeneity in genetic associations with the outcome.
This note extends the results of classical parametric statistics like Fisher and Wilks theorem to modern setups with a high or infinite parameter dimension, limited sample size, and possible model misspecification. We consider a special class of stochastically linear smooth (SLS) models satisfying three major conditions: the stochastic component of the log-likelihood is linear in the model parameter and the expected log-likelihood is a smooth and concave function. For the penalized maximum likelihood estimators (pMLE), we establish three types of results: (1) concentration in a small vicinity of the ``truth''; (2) Fisher and Wilks expansions; (3) risk bounds. In all results, the remainder is given explicitly and can be evaluated in terms of the effective sample size and effective parameter dimension which allows us to identify the so-called \emph{critical parameter dimension}. The results are also dimension and coordinate-free. The obtained finite sample expansions are of special interest because they can be used not only for obtaining the risk bounds but also for inference, studying the asymptotic distribution, analysis of resampling procedures, etc. The main tool for all these expansions is the so-called ``basic lemma'' about linearly perturbed optimization. Despite their generality, all the presented bounds are nearly sharp and the classical asymptotic results can be obtained as simple corollaries. Our results indicate that the use of advanced fourth-order expansions allows to relax the critical dimension condition $ \mathbb{p}^{3} \ll n $ from Spokoiny (2023a) to $ \mathbb{p}^{3/2} \ll n $. Examples for classical models like logistic regression, log-density and precision matrix estimation illustrate the applicability of general results.
Clinical machine learning research and AI driven clinical decision support models rely on clinically accurate labels. Manually extracting these labels with the help of clinical specialists is often time-consuming and expensive. This study tests the feasibility of automatic span- and document-level diagnosis extraction from unstructured Dutch echocardiogram reports. We included 115,692 unstructured echocardiogram reports from the UMCU a large university hospital in the Netherlands. A randomly selected subset was manually annotated for the occurrence and severity of eleven commonly described cardiac characteristics. We developed and tested several automatic labelling techniques at both span and document levels, using weighted and macro F1-score, precision, and recall for performance evaluation. We compared the performance of span labelling against document labelling methods, which included both direct document classifiers and indirect document classifiers that rely on span classification results. The SpanCategorizer and MedRoBERTa.nl models outperformed all other span and document classifiers, respectively. The weighted F1-score varied between characteristics, ranging from 0.60 to 0.93 in SpanCategorizer and 0.96 to 0.98 in MedRoBERTa.nl. Direct document classification was superior to indirect document classification using span classifiers. SetFit achieved competitive document classification performance using only 10\% of the training data. Utilizing a reduced label set yielded near-perfect document classification results. We recommend using our published SpanCategorizer and MedRoBERTa.nl models for span- and document-level diagnosis extraction from Dutch echocardiography reports. For settings with limited training data, SetFit may be a promising alternative for document classification.
This paper establishes a universal framework for the nonlocal modeling of anisotropic damage at finite strains. By the combination of two recent works, the new framework allows for the flexible incorporation of different established hyperelastic finite strain material formulations into anisotropic damage whilst ensuring mesh-independent results by employing a generic set of micromorphic gradient-extensions. First, the anisotropic damage model, generally satisfying the damage growth criterion, is investigated for the specific choice of a Neo-Hookean material on a single element. Next, the model is applied with different gradient-extensions in structural simulations of an asymmetrically notched specimen to identify an efficient choice in the form of a volumetric-deviatoric regularization. Thereafter, the universal framework, which is without loss of generality here specified for a Neo-Hookean material with a volumetric-deviatoric gradient-extension, successfully serves for the complex simulation of a pressure loaded rotor blade. After acceptance of the manuscript, we make the codes of the material subroutines accessible to the public at //doi.org/10.5281/zenodo.11171630.
Two-sample tests for multivariate data and especially for non-Euclidean data are not well explored. This paper presents a novel test statistic based on a similarity graph constructed on the pooled observations from the two samples. It can be applied to multivariate data and non-Euclidean data as long as a dissimilarity measure on the sample space can be defined, which can usually be provided by domain experts. Existing tests based on a similarity graph lack power either for location or for scale alternatives. The new test utilizes a common pattern that was overlooked previously, and works for both types of alternatives. The test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large data sets. The new test is illustrated on two applications: The assessment of covariate balance in a matched observational study, and the comparison of network data under different conditions.
This paper presents a scalable physics-based block preconditioner for mixed-dimensional models in beam-solid interaction and their application in engineering. In particular, it studies the linear systems arising from a regularized mortar-type approach for embedding geometrically exact beams into solid continua. Due to the lack of block diagonal dominance of the arising 2 x 2 block system, an approximate block factorization preconditioner is used. It exploits the sparsity structure of the beam sub-block to construct a sparse approximate inverse, which is then not only used to explicitly form an approximation of the Schur complement, but also acts as a smoother within the prediction step of the arising SIMPLE-type preconditioner. The correction step utilizes an algebraic multigrid method. Although, for now, the beam sub-block is tackled by a one-level method only, the multi-level nature of the computationally demanding correction step delivers a scalable preconditioner in practice. In numerical test cases, the influence of different algorithmic parameters on the quality of the sparse approximate inverse is studied and the weak scaling behavior of the proposed preconditioner on up to 1000 MPI ranks is demonstrated, before the proposed preconditioner is finally applied for the analysis of steel-reinforced concrete structures in civil engineering.
This paper presents a scalable physics-based block preconditioner for mixed-dimensional models in beam-solid interaction and their application in engineering. In particular, it studies the linear systems arising from a regularized mortar-type approach for embedding geometrically exact beams into solid continua. Due to the lack of block diagonal dominance of the arising 2 x 2 block system, an approximate block factorization preconditioner is used. It exploits the sparsity structure of the beam sub-block to construct a sparse approximate inverse, which is then not only used to explicitly form an approximation of the Schur complement, but also acts as a smoother within the prediction step of the arising SIMPLE-type preconditioner. The correction step utilizes an algebraic multigrid method. Although, for now, the beam sub-block is tackled by a one-level method only, the multi-level nature of the computationally demanding correction step delivers a scalable preconditioner in practice. In numerical test cases, the influence of different algorithmic parameters on the quality of the sparse approximate inverse is studied and the weak scaling behavior of the proposed preconditioner on up to 1000 MPI ranks is demonstrated, before the proposed preconditioner is finally applied for the analysis of steel-reinforced concrete structures in civil engineering.
To date, most investigations on surprisal and entropy effects in reading have been conducted on the group level, disregarding individual differences. In this work, we revisit the predictive power of surprisal and entropy measures estimated from a range of language models (LMs) on data of human reading times as a measure of processing effort by incorporating information of language users' cognitive capacities. To do so, we assess the predictive power of surprisal and entropy estimated from generative LMs on reading data obtained from individuals who also completed a wide range of psychometric tests. Specifically, we investigate if modulating surprisal and entropy relative to cognitive scores increases prediction accuracy of reading times, and we examine whether LMs exhibit systematic biases in the prediction of reading times for cognitively high- or low-performing groups, revealing what type of psycholinguistic subject a given LM emulates. Our study finds that in most cases, incorporating cognitive capacities increases predictive power of surprisal and entropy on reading times, and that generally, high performance in the psychometric tests is associated with lower sensitivity to predictability effects. Finally, our results suggest that the analyzed LMs emulate readers with lower verbal intelligence, suggesting that for a given target group (i.e., individuals with high verbal intelligence), these LMs provide less accurate predictability estimates.