Due to the large amount of daily scientific publications, it is impossible to manually review each one. Therefore, an automatic extraction of key information is desirable. In this paper, we examine STEREO, a tool for extracting statistics from scientific papers using regular expressions. By adapting an existing regular expression inclusion algorithm for our use case, we decrease the number of regular expressions used in STEREO by about $33.8\%$. We reveal common patterns from the condensed rule set that can be used for the creation of new rules. We also apply STEREO, which was previously trained in the life-sciences and medical domain, to a new scientific domain, namely Human-Computer-Interaction (HCI), and re-evaluate it. According to our research, statistics in the HCI domain are similar to those in the medical domain, although a higher percentage of APA-conform statistics were found in the HCI domain. Additionally, we compare extraction on PDF and LaTeX source files, finding LaTeX to be more reliable for extraction.
In the realm of urban transportation, metro systems serve as crucial and sustainable means of public transit. However, their substantial energy consumption poses a challenge to the goal of sustainability. Disturbances such as delays and passenger flow changes can further exacerbate this issue by negatively affecting energy efficiency in metro systems. To tackle this problem, we propose a policy-based reinforcement learning approach that reschedules the metro timetable and optimizes energy efficiency in metro systems under disturbances by adjusting the dwell time and cruise speed of trains. Our experiments conducted in a simulation environment demonstrate the superiority of our method over baseline methods, achieving a traction energy consumption reduction of up to 10.9% and an increase in regenerative braking energy utilization of up to 47.9%. This study provides an effective solution to the energy-saving problem of urban rail transit.
The Fisher information matrix is a quantity of fundamental importance for information geometry and asymptotic statistics. In practice, it is widely used to quickly estimate the expected information available in a data set and guide experimental design choices. In many modern applications, it is intractable to analytically compute the Fisher information and Monte Carlo methods are used instead. The standard Monte Carlo method produces estimates of the Fisher information that can be biased when the Monte-Carlo noise is non-negligible. Most problematic is noise in the derivatives as this leads to an overestimation of the available constraining power, given by the inverse Fisher information. In this work we find another simple estimate that is oppositely biased and produces an underestimate of the constraining power. This estimator can either be used to give approximate bounds on the parameter constraints or can be combined with the standard estimator to give improved, approximately unbiased estimates. Both the alternative and the combined estimators are asymptotically unbiased so can be also used as a convergence check of the standard approach. We discuss potential limitations of these estimators and provide methods to assess their reliability. These methods accelerate the convergence of Fisher forecasts, as unbiased estimates can be achieved with fewer Monte Carlo samples, and so can be used to reduce the simulated data set size by several orders of magnitude.
The use of AI systems in healthcare for the early screening of diseases is of great clinical importance. Deep learning has shown great promise in medical imaging, but the reliability and trustworthiness of AI systems limit their deployment in real clinical scenes, where patient safety is at stake. Uncertainty estimation plays a pivotal role in producing a confidence evaluation along with the prediction of the deep model. This is particularly important in medical imaging, where the uncertainty in the model's predictions can be used to identify areas of concern or to provide additional information to the clinician. In this paper, we review the various types of uncertainty in deep learning, including aleatoric uncertainty and epistemic uncertainty. We further discuss how they can be estimated in medical imaging. More importantly, we review recent advances in deep learning models that incorporate uncertainty estimation in medical imaging. Finally, we discuss the challenges and future directions in uncertainty estimation in deep learning for medical imaging. We hope this review will ignite further interest in the community and provide researchers with an up-to-date reference regarding applications of uncertainty estimation models in medical imaging.
Relation extraction (RE) is a fundamental task in information extraction, whose extension to multilingual settings has been hindered by the lack of supervised resources comparable in size to large English datasets such as TACRED (Zhang et al., 2017). To address this gap, we introduce the MultiTACRED dataset, covering 12 typologically diverse languages from 9 language families, which is created by machine-translating TACRED instances and automatically projecting their entity annotations. We analyze translation and annotation projection quality, identify error categories, and experimentally evaluate fine-tuned pretrained mono- and multilingual language models in common transfer learning scenarios. Our analyses show that machine translation is a viable strategy to transfer RE instances, with native speakers judging more than 83% of the translated instances to be linguistically and semantically acceptable. We find monolingual RE model performance to be comparable to the English original for many of the target languages, and that multilingual models trained on a combination of English and target language data can outperform their monolingual counterparts. However, we also observe a variety of translation and annotation projection errors, both due to the MT systems and linguistic features of the target languages, such as pronoun-dropping, compounding and inflection, that degrade dataset quality and RE model performance.
Conventional harvesting problems for natural resources often assume physiological homogeneity of the body length/weight among individuals. However, such assumptions generally are not valid in real-world problems, where heterogeneity plays an essential role in the planning of biological resource harvesting. Furthermore, it is difficult to observe heterogeneity directly from the available data. This paper presents a novel optimal control framework for the cost-efficient harvesting of biological resources for application in fisheries management. The heterogeneity is incorporated into the resource dynamics, which is the population dynamics in this case, through a probability density that can be distorted from the reality. Subsequently, the distortion, which is the model uncertainty, is penalized through a divergence, leading to a non-standard dynamic differential game wherein the Hamilton-Jacobi-Bellman-Isaacs (HJBI) equation has a unique nonlinear partial differential term. Here, the existence and uniqueness results of the HJBI equation are presented along with an explicit monotone finite difference method. Finally, the proposed optimal control is applied to a harvesting problem with recreationally, economically, and ecologically important fish species using collected field data.
Signal region detection is one of the challenging problems in modern statistics and has broad applications especially in genetic studies. We propose a novel approach effectively coupling with high-dimensional test, which is distinct from existing methods based on scan or knockoff statistics. The idea is to conduct binary segmentation with re-search and arrangement based on a sequence of dynamic tests to increase detection accuracy and reduce computation. Theoretical and empirical studies demonstrate that our approach enjoys favorable theoretical guarantees with fewer restrictions and exhibits superior numerical performance with faster computation. Compared to scan-based methods, our procedure is capable of detecting shorter or longer regions with unbalanced signal strengths while allowing for more dependence structures. Relative to the knockoff framework that only controls false discovery rate, our approach attains higher detection accuracy while controlling the family-wise error rate. Utilizing the UK Biobank data to identify the genetic regions related to breast cancer, we confirm previous findings and meanwhile, identify a number of new regions which suggest strong association with risk of breast cancer and deserve further investigation.
Breast cancer early detection is crucial for improving patient outcomes. The Institut Catal\`a de la Salut (ICS) has launched the DigiPatICS project to develop and implement artificial intelligence algorithms to assist with the diagnosis of cancer. In this paper, we propose a new approach for facing the color normalization problem in HER2-stained histopathological images of breast cancer tissue, posed as an style transfer problem. We combine the Color Deconvolution technique with the Pix2Pix GAN network to present a novel approach to correct the color variations between different HER2 stain brands. Our approach focuses on maintaining the HER2 score of the cells in the transformed images, which is crucial for the HER2 analysis. Results demonstrate that our final model outperforms the state-of-the-art image style transfer methods in maintaining the cell classes in the transformed images and is as effective as them in generating realistic images.
In this paper, we have shown a method of improving the quality of neural machine translation by translating/transliterating name entities as a preprocessing step. Through experiments we have shown the performance gain of our system. For evaluation we considered three types of name entities viz person names, location names and organization names. The system was able to correctly translate mostly all the name entities. For person names the accuracy was 99.86%, for location names the accuracy was 99.63% and for organization names the accuracy was 99.05%. Overall, the accuracy of the system was 99.52%
The hull of a linear code (i.e., a finite field vector space)~\({\mathcal C}\) is defined to be the vector space formed by the intersection of~\({\mathcal C}\) with its dual~\({\mathcal C}^{\perp}.\) Constructing vector spaces with a specified hull dimension has important applications and it is therefore of interest to study minimum distance properties of such spaces. In this paper, we use the probabilistic method to obtain spaces with a given hull dimension and minimum distance and also derive Gilbert-Varshamov type sufficient conditions for their existence.
Under-approximations of reachable sets and tubes have been receiving growing research attention due to their important roles in control synthesis and verification. Available under-approximation methods applicable to continuous-time linear systems typically assume the ability to compute transition matrices and their integrals exactly, which is not feasible in general, and/or suffer from high computational costs. In this note, we attempt to overcome these drawbacks for a class of linear time-invariant (LTI) systems, where we propose a novel method to under-approximate finite-time forward reachable sets and tubes, utilizing approximations of the matrix exponential and its integral. In particular, we consider the class of continuous-time LTI systems with an identity input matrix and initial and input values belonging to full dimensional sets that are affine transformations of closed unit balls. The proposed method yields computationally efficient under-approximations of reachable sets and tubes, when implemented using zonotopes, with first-order convergence guarantees in the sense of the Hausdorff distance. To illustrate its performance, we implement our approach in three numerical examples, where linear systems of dimensions ranging between 2 and 200 are considered.