Semantic similarity between natural language texts is typically measured either by looking at the overlap between subsequences (e.g., BLEU) or by using embeddings (e.g., BERTScore, S-BERT). Within this paper, we argue that when we are only interested in measuring the semantic similarity, it is better to directly predict the similarity using a fine-tuned model for such a task. Using a fine-tuned model for the STS-B from the GLUE benchmark, we define the STSScore approach and show that the resulting similarity is better aligned with our expectations on a robust semantic similarity measure than other approaches.
We introduce a semi-explicit time-stepping scheme of second order for linear poroelasticity satisfying a weak coupling condition. Here, semi-explicit means that the system, which needs to be solved in each step, decouples and hence improves the computational efficiency. The construction and the convergence proof are based on the connection to a differential equation with two time delays, namely one and two times the step size. Numerical experiments confirm the theoretical results and indicate the applicability to higher-order schemes.
We combine the recent relaxation approach with multiderivative Runge-Kutta methods to preserve conservation or dissipation of entropy functionals for ordinary and partial differential equations. Relaxation methods are minor modifications of explicit and implicit schemes, requiring only the solution of a single scalar equation per time step in addition to the baseline scheme. We demonstrate the robustness of the resulting methods for a range of test problems including the 3D compressible Euler equations. In particular, we point out improved error growth rates for certain entropy-conservative problems including nonlinear dispersive wave equations.
We import the algebro-geometric notion of a complete collineation into the study of maximum likelihood estimation in directed Gaussian graphical models. A complete collineation produces a perturbation of sample data, which we call a stabilisation of the sample. While a maximum likelihood estimate (MLE) may not exist or be unique given sample data, it is always unique given a stabilisation. We relate the MLE given a stabilisation to the MLE given original sample data, when one exists, providing necessary and sufficient conditions for the MLE given a stabilisation to be one given the original sample. For linear regression models, we show that the MLE given any stabilisation is the minimal norm choice among the MLEs given an original sample. We show that the MLE has a well-defined limit as the stabilisation of a sample tends to the original sample, and that the limit is an MLE given the original sample, when one exists. Finally, we study which MLEs given a sample can arise as such limits. We reduce this to a question regarding the non-emptiness of certain algebraic varieties.
Scale-free dynamics, formalized by selfsimilarity, provides a versatile paradigm massively and ubiquitously used to model temporal dynamics in real-world data. However, its practical use has mostly remained univariate so far. By contrast, modern applications often demand multivariate data analysis. Accordingly, models for multivariate selfsimilarity were recently proposed. Nevertheless, they have remained rarely used in practice because of a lack of available robust estimation procedures for the vector of selfsimilarity parameters. Building upon recent mathematical developments, the present work puts forth an efficient estimation procedure based on the theoretical study of the multiscale eigenstructure of the wavelet spectrum of multivariate selfsimilar processes. The estimation performance is studied theoretically in the asymptotic limits of large scale and sample sizes, and computationally for finite-size samples. As a practical outcome, a fully operational and documented multivariate signal processing estimation toolbox is made freely available and is ready for practical use on real-world data. Its potential benefits are illustrated in epileptic seizure prediction from multi-channel EEG data.
Thanks to the singularity of the solution of linear subdiffusion problems, most time-stepping methods on uniform meshes can result in $O(\tau)$ accuracy where $\tau$ denotes the time step. The present work aims to discover the reason why some type of Crank-Nicolson schemes (the averaging Crank-Nicolson scheme) for the subdiffusion can only yield $O(\tau^\alpha)$$(\alpha<1)$ accuracy, which is much lower than the desired. The existing well developed error analysis for the subdiffusion, which has been successfully applied to many time-stepping methods such as the fractional BDF-$p (1\leq p\leq 6)$, all requires singular points be out of the path of contour integrals involved. The averaging Crank-Nicolson scheme in this work is quite natural but fails to meet this requirement. By resorting to the residue theorem, some novel sharp error analysis is developed in this study, upon which correction methods are further designed to obtain the optimal $O(\tau^2)$ accuracy. All results are verified by numerical tests.
Quadratic NURBS-based discretizations of the Galerkin method suffer from volumetric locking when applied to nearly-incompressible linear elasticity. Volumetric locking causes not only smaller displacements than expected, but also large-amplitude spurious oscillations of normal stresses. Continuous-assumed-strain (CAS) elements have been recently introduced to remove membrane locking in quadratic NURBS-based discretizations of linear plane curved Kirchhoff rods (Casquero et al., CMAME, 2022). In this work, we propose two generalizations of CAS elements (named CAS1 and CAS2 elements) to overcome volumetric locking in quadratic NURBS-based discretizations of nearly-incompressible linear elasticity. CAS1 elements linearly interpolate the strains at the knots in each direction for the term in the variational form involving the first Lam\'e parameter while CAS2 elements linearly interpolate the dilatational strains at the knots in each direction. For both element types, a displacement vector with C1 continuity across element boundaries results in assumed strains with C0 continuity across element boundaries. In addition, the implementation of the two locking treatments proposed in this work does not require any additional global or element matrix operations such as matrix inversions or matrix multiplications. The locking treatments are applied at the element level and the nonzero pattern of the global stiffness matrix is preserved. The numerical examples solved in this work show that CAS1 and CAS2 elements, using either two or three Gauss-Legrendre quadrature points per direction, are effective locking treatments since they not only result in more accurate displacements for coarse meshes, but also remove the spurious oscillations of normal stresses.
Quantum computing has recently emerged as a transformative technology. Yet, its promised advantages rely on efficiently translating quantum operations into viable physical realizations. In this work, we use generative machine learning models, specifically denoising diffusion models (DMs), to facilitate this transformation. Leveraging text-conditioning, we steer the model to produce desired quantum operations within gate-based quantum circuits. Notably, DMs allow to sidestep during training the exponential overhead inherent in the classical simulation of quantum dynamics -- a consistent bottleneck in preceding ML techniques. We demonstrate the model's capabilities across two tasks: entanglement generation and unitary compilation. The model excels at generating new circuits and supports typical DM extensions such as masking and editing to, for instance, align the circuit generation to the constraints of the targeted quantum device. Given their flexibility and generalization abilities, we envision DMs as pivotal in quantum circuit synthesis, enhancing both practical applications but also insights into theoretical quantum computation.
This work explores the degree to which grammar acquisition is driven by language `simplicity' and the source modality (speech vs. text) of data. Using BabyBERTa as a probe, we find that grammar acquisition is largely driven by exposure to speech data, and in particular through exposure to two of the BabyLM training corpora: AO-Childes and Open Subtitles. We arrive at this finding by examining various ways of presenting input data to our model. First, we assess the impact of various sequence-level complexity based curricula. We then examine the impact of learning over `blocks' -- covering spans of text that are balanced for the number of tokens in each of the source corpora (rather than number of lines). Finally, we explore curricula that vary the degree to which the model is exposed to different corpora. In all cases, we find that over-exposure to AO-Childes and Open Subtitles significantly drives performance. We verify these findings through a comparable control dataset in which exposure to these corpora, and speech more generally, is limited by design. Our findings indicate that it is not the proportion of tokens occupied by high-utility data that aids acquisition, but rather the proportion of training steps assigned to such data. We hope this encourages future research into the use of more developmentally plausible linguistic data (which tends to be more scarce) to augment general purpose pre-training regimes.
We propose a new framework for the simultaneous inference of monotone and smoothly time-varying functions under complex temporal dynamics utilizing the monotone rearrangement and the nonparametric estimation. We capitalize the Gaussian approximation for the nonparametric monotone estimator and construct the asymptotically correct simultaneous confidence bands (SCBs) by carefully designed bootstrap methods. We investigate two general and practical scenarios. The first is the simultaneous inference of monotone smooth trends from moderately high-dimensional time series, and the proposed algorithm has been employed for the joint inference of temperature curves from multiple areas. Specifically, most existing methods are designed for a single monotone smooth trend. In such cases, our proposed SCB empirically exhibits the narrowest width among existing approaches while maintaining confidence levels, and has been used for testing several hypotheses tailored to global warming. The second scenario involves simultaneous inference of monotone and smoothly time-varying regression coefficients in time-varying coefficient linear models. The proposed algorithm has been utilized for testing the impact of sunshine duration on temperature which is believed to be increasing by the increasingly severe greenhouse effect. The validity of the proposed methods has been justified in theory as well as by extensive simulations.
In prediction settings where data are collected over time, it is often of interest to understand both the importance of variables for predicting the response at each time point and the importance summarized over the time series. Building on recent advances in estimation and inference for variable importance measures, we define summaries of variable importance trajectories. These measures can be estimated and the same approaches for inference can be applied regardless of the choice of the algorithm(s) used to estimate the prediction function. We propose a nonparametric efficient estimation and inference procedure as well as a null hypothesis testing procedure that are valid even when complex machine learning tools are used for prediction. Through simulations, we demonstrate that our proposed procedures have good operating characteristics, and we illustrate their use by investigating the longitudinal importance of risk factors for suicide attempt.