Domain-specific terminology extraction is an important task in text analysis. A term in a corpus is said to be "bursty" when its occurrences are concentrated in few out of many documents. Being content rich, bursty terms are highly suited for subject matter characterization, and serve as natural candidates for identifying with technical terminology. Multiple measures of term burstiness have been proposed in the literature. However, the statistical significance testing paradigm has remained underexplored in text analysis, including in relation to term burstiness. To test these waters, we propose as our main contribution a multinomial language model-based exact test of statistical significance for term burstiness. Due to its prohibitive computational cost, we advance a heuristic formula designed to serve as a proxy for test P-values. As a complementary theoretical contribution, we derive a previously unreported relationship connecting the inverse document frequency and inverse collection frequency (two foundational quantities in text analysis) under the multinomial language model. The relation is used in the evaluation of our heuristic. Using the GENIA Term corpus benchmark, we compare our approach against established methods, demonstrating our heuristic's potential in identifying domain-specific technical terms. We hope this demonstration of statistical significance testing in text analysis serves as a springboard for future research.
We consider the problem of chance constrained optimization where it is sought to optimize a function and satisfy constraints, both of which are affected by uncertainties. The real world declinations of this problem are particularly challenging because of their inherent computational cost. To tackle such problems, we propose a new Bayesian optimization method. It applies to the situation where the uncertainty comes from some of the inputs, so that it becomes possible to define an acquisition criterion in the joint controlled-uncontrolled input space. The main contribution of this work is an acquisition criterion that accounts for both the average improvement in objective function and the constraint reliability. The criterion is derived following the Stepwise Uncertainty Reduction logic and its maximization provides both optimal controlled and uncontrolled parameters. Analytical expressions are given to efficiently calculate the criterion. Numerical studies on test functions are presented. It is found through experimental comparisons with alternative sampling criteria that the adequation between the sampling criterion and the problem contributes to the efficiency of the overall optimization. As a side result, an expression for the variance of the improvement is given.
Singularly perturbed boundary value problems pose a significant challenge for their numerical approximations because of the presence of sharp boundary layers. These sharp boundary layers are responsible for the stiffness of solutions, which leads to large computational errors, if not properly handled. It is well-known that the classical numerical methods as well as the Physics-Informed Neural Networks (PINNs) require some special treatments near the boundary, e.g., using extensive mesh refinements or finer collocation points, in order to obtain an accurate approximate solution especially inside of the stiff boundary layer. In this article, we modify the PINNs and construct our new semi-analytic SL-PINNs suitable for singularly perturbed boundary value problems. Performing the boundary layer analysis, we first find the corrector functions describing the singular behavior of the stiff solutions inside boundary layers. Then we obtain the SL-PINN approximations of the singularly perturbed problems by embedding the explicit correctors in the structure of PINNs or by training the correctors together with the PINN approximations. Our numerical experiments confirm that our new SL-PINN methods produce stable and accurate approximations for stiff solutions.
The joint modeling of multiple longitudinal biomarkers together with a time-to-event outcome is a challenging modeling task of continued scientific interest. In particular, the computational complexity of high dimensional (generalized) mixed effects models often restricts the flexibility of shared parameter joint models, even when the subject-specific marker trajectories follow highly nonlinear courses. We propose a parsimonious multivariate functional principal components representation of the shared random effects. This allows better scalability, as the dimension of the random effects does not directly increase with the number of markers, only with the chosen number of principal component basis functions used in the approximation of the random effects. The functional principal component representation additionally allows to estimate highly flexible subject-specific random trajectories without parametric assumptions. The modeled trajectories can thus be distinctly different for each biomarker. We build on the framework of flexible Bayesian additive joint models implemented in the R-package 'bamlss', which also supports estimation of nonlinear covariate effects via Bayesian P-splines. The flexible yet parsimonious functional principal components basis used in the estimation of the joint model is first estimated in a preliminary step. We validate our approach in a simulation study and illustrate its advantages by analyzing a study on primary biliary cholangitis.
We adopt the integral definition of the fractional Laplace operator and study an optimal control problem on Lipschitz domains that involves a fractional elliptic partial differential equation (PDE) as state equation and a control variable that enters the state equation as a coefficient; pointwise constraints on the control variable are considered as well. We establish the existence of optimal solutions and analyze first and, necessary and sufficient, second order optimality conditions. Regularity estimates for optimal variables are also analyzed. We develop two finite element discretization strategies: a semidiscrete scheme in which the control variable is not discretized, and a fully discrete scheme in which the control variable is discretized with piecewise constant functions. For both schemes, we analyze the convergence properties of discretizations and derive error estimates.
Speech recognition has become an important task in the development of machine learning and artificial intelligence. In this study, we explore the important task of keyword spotting using speech recognition machine learning and deep learning techniques. We implement feature engineering by converting raw waveforms to Mel Frequency Cepstral Coefficients (MFCCs), which we use as inputs to our models. We experiment with several different algorithms such as Hidden Markov Model with Gaussian Mixture, Convolutional Neural Networks and variants of Recurrent Neural Networks including Long Short-Term Memory and the Attention mechanism. In our experiments, RNN with BiLSTM and Attention achieves the best performance with an accuracy of 93.9 %
The ability to predict upcoming events has been hypothesized to comprise a key aspect of natural and machine cognition. This is supported by trends in deep reinforcement learning (RL), where self-supervised auxiliary objectives such as prediction are widely used to support representation learning and improve task performance. Here, we study the effects predictive auxiliary objectives have on representation learning across different modules of an RL system and how these mimic representational changes observed in the brain. We find that predictive objectives improve and stabilize learning particularly in resource-limited architectures, and we identify settings where longer predictive horizons better support representational transfer. Furthermore, we find that representational changes in this RL system bear a striking resemblance to changes in neural activity observed in the brain across various experiments. Specifically, we draw a connection between the auxiliary predictive model of the RL system and hippocampus, an area thought to learn a predictive model to support memory-guided behavior. We also connect the encoder network and the value learning network of the RL system to visual cortex and striatum in the brain, respectively. This work demonstrates how representation learning in deep RL systems can provide an interpretable framework for modeling multi-region interactions in the brain. The deep RL perspective taken here also suggests an additional role of the hippocampus in the brain -- that of an auxiliary learning system that benefits representation learning in other regions.
Anderson acceleration (AA) is a technique for accelerating the convergence of an underlying fixed-point iteration. AA is widely used within computational science, with applications ranging from electronic structure calculation to the training of neural networks. Despite AA's widespread use, relatively little is understood about it theoretically. An important and unanswered question in this context is: To what extent can AA actually accelerate convergence of the underlying fixed-point iteration? While simple enough to state, this question appears rather difficult to answer. For example, it is unanswered even in the simplest (non-trivial) case where the underlying fixed-point iteration consists of applying a two-dimensional affine function. In this note we consider a restarted variant of AA applied to solve symmetric linear systems with restart window of size one. Several results are derived from the analytical solution of a nonlinear eigenvalue problem characterizing residual propagation of the AA iteration. This includes a complete characterization of the method to solve $2 \times 2$ linear systems, rigorously quantifying how the asymptotic convergence factor depends on the initial iterate, and quantifying by how much AA accelerates the underlying fixed-point iteration. We also prove that even if the underlying fixed-point iteration diverges, the associated AA iteration may still converge.
The field of 'explainable' artificial intelligence (XAI) has produced highly cited methods that seek to make the decisions of complex machine learning (ML) methods 'understandable' to humans, for example by attributing 'importance' scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.
The locations of different mRNA molecules can be revealed by multiplexed in situ RNA detection. By assigning detected mRNA molecules to individual cells, it is possible to identify many different cell types in parallel. This in turn enables investigation of the spatial cellular architecture in tissue, which is crucial for furthering our understanding of biological processes and diseases. However, cell typing typically depends on the segmentation of cell nuclei, which is often done based on images of a DNA stain, such as DAPI. Limiting cell definition to a nuclear stain makes it fundamentally difficult to determine accurate cell borders, and thereby also difficult to assign mRNA molecules to the correct cell. As such, we have developed a computational tool that segments cells solely based on the local composition of mRNA molecules. First, a small neural network is trained to compute attractive and repulsive edges between pairs of mRNA molecules. The signed graph is then partitioned by a mutex watershed into components corresponding to different cells. We evaluated our method on two publicly available datasets and compared it against the current state-of-the-art and older baselines. We conclude that combining neural networks with combinatorial optimization is a promising approach for cell segmentation of in situ transcriptomics data.
Test-negative designs are widely used for post-market evaluation of vaccine effectiveness. Different from classical test-negative designs where only healthcare-seekers with symptoms are included, recent test-negative designs have involved individuals with various reasons for testing, especially in an outbreak setting. While including these data can increase sample size and hence improve precision, concerns have been raised about whether they will introduce bias into the current framework of test-negative designs, thereby demanding a formal statistical examination of this modified design. In this article, using statistical derivations, causal graphs, and numerical simulations, we show that the standard odds ratio estimator may be biased if various reasons for testing are not accounted for. To eliminate this bias, we identify three categories of reasons for testing, including symptoms, disease-unrelated reasons, and case contact tracing, and characterize associated statistical properties and estimands. Based on our characterization, we propose stratified estimators that can incorporate multiple reasons for testing to achieve consistent estimation and improve precision by maximizing the use of data. The performance of our proposed method is demonstrated through simulation studies.