This paper presents and evaluates a set of methods to classify individual Scopus publications using their references back to the second generation, where each publication can be assigned fractionally into up to five ASJC (All Science Journal Classifications) categories, excluding the Multidisciplinary area and the miscellaneous categories. Based on proposals by Glanzel et al. (1999a, 1999b, 2021), some additional parameters are established that allow different results to be obtained depending on how category membership is weighted or how the acceptance thresholds for multiple assignments are established. Various classifications are obtained, and then compared with each other, with the original ASJC Scopus journal classification, and with the AAC (Authors Assignation Collection) classification of a previous study (Alvarez-Llorente et al., 2023) in which the papers corresponding authors assign them the most appropriate categories. Classifications in which a high threshold is set for allowing assignments to multiple categories, combined with the use of first- and second-generation references and averaging over the number of references, provide the most promising results, improving over other reference-based reclassification proposals in terms of granularity, and over the Scopus classification itself in such aspects as the homogeneity of the publications assigned to a category. They also show greater coincidence with the AAC classification.
We present a novel deep-learning-based method to cluster words in documents which we apply to detect and recognize tables given the OCR output. We interpret table structure bottom-up as a graph of relations between pairs of words (belonging to the same row, column, header, as well as to the same table) and use a transformer encoder model to predict its adjacency matrix. We demonstrate the performance of our method on the PubTables-1M dataset as well as PubTabNet and FinTabNet datasets. Compared to the current state-of-the-art detection methods such as DETR and Faster R-CNN, our method achieves similar or better accuracy, while requiring a significantly smaller model.
This work proposes a novel variational approximation of partial differential equations on moving geometries determined by explicit boundary representations. The benefits of the proposed formulation are the ability to handle large displacements of explicitly represented domain boundaries without generating body-fitted meshes and remeshing techniques. For the space discretization, we use a background mesh and an unfitted method that relies on integration on cut cells only. We perform this intersection by using clipping algorithms. To deal with the mesh movement, we pullback the equations to a reference configuration (the spatial mesh at the initial time slab times the time interval) that is constant in time. This way, the geometrical intersection algorithm is only required in 3D, another key property of the proposed scheme. At the end of the time slab, we compute the deformed mesh, intersect the deformed boundary with the background mesh, and consider an exact transfer operator between meshes to compute jump terms in the time discontinuous Galerkin integration. The transfer is also computed using geometrical intersection algorithms. We demonstrate the applicability of the method to fluid problems around rotating (2D and 3D) geometries described by oriented boundary meshes. We also provide a set of numerical experiments that show the optimal convergence of the method.
In this paper, we propose R\'enyi information generating function (RIGF) and discuss its various properties. The relation between the RIGF and Shannon entropy of order $q>0$ is established. Several bounds are obtained. The RIGF of escort distribution is also derived. Furthermore, we introduce R\'enyi divergence information generating function (RDIGF) and discuss its effect under monotone transformations. Next, we propose Jensen-R\'enyi information generating function (JRIGF) and establish its properties. In addition, we present non-parametric and parametric estimators of the RIGF. For illustrative purpose, a simulation study is carried out and a real data relating to the failure times of electronic components is analyzed. Finally, a comparison study between the non-parametric and parametric estimators is made in terms of absolute bias and mean square error (MSE).
In this paper, we propose a weak Galerkin (WG) finite element method for the Maxwell eigenvalue problem. By restricting subspaces, we transform the mixed form of Maxwell eigenvalue problem into simple elliptic equation. Then we give the WG numerical scheme for the Maxwell eigenvalue problem. Furthermore, we obtain the optimal error estimates of arbitrarily high convergence order and prove the lower bound property of numerical solutions for eigenvalues. Numerical experiments show the accuracy of theoretical analysis and the property of lower bound.
Reducing wealth inequality and increasing utility are critical issues. This study reveals the effects of redistribution and consumption morals on wealth inequality and utility. To this end, we present a novel approach that couples the dynamic model of capital, consumption, and utility in macroeconomics with the interaction model of joint business and redistribution in econophysics. With this approach, we calculate the capital (wealth), the utility based on consumption, and the Gini index of these inequality using redistribution and consumption thresholds as moral parameters. The results show that: under-redistribution and waste exacerbate inequality; conversely, over-redistribution and stinginess reduce utility; and a balanced moderate moral leads to achieve both reduced inequality and increased utility. These findings provide renewed economic and numerical support for the moral importance known from philosophy, anthropology, and religion. The revival of redistribution and consumption morals should promote the transformation to a human mutual-aid economy, as indicated by philosopher and anthropologist, instead of the capitalist economy that has produced the current inequality. The practical challenge is to implement bottom-up social business, on a foothold of worker coops and platform cooperatives as a community against the state and the market, with moral consensus and its operation.
We marshall the arguments for preferring Bayesian hypothesis testing and confidence sets to frequentist ones. We define admissible solutions to inference problems, noting that Bayesian solutions are admissible. We give seven weaker common-sense criteria for solutions to inference problems, all failed by these frequentist methods but satisfied by any admissible method. We note that pseudo-Bayesian methods made by handicapping Bayesian methods to satisfy criteria on type I error rate makes them frequentist not Bayesian in nature. We give five examples showing the differences between Bayesian and frequentist methods; the first requiring little calculus, the second showing in abstract what is wrong with these frequentist methods, the third to illustrate information conservation, the fourth to show that the same problems arise in everyday statistical problems, and the fifth to illustrate how on some real-life inference problems Bayesian methods require less data than fixed sample-size (resp. pseudo-Bayesian) frequentist hypothesis testing by factors exceeding 3000 (resp 300) without recourse to informative priors. To address the issue of different parties with opposing interests reaching agreement on a prior, we illustrate the beneficial effects of a Bayesian "Let the data decide" policy both on results under a wide variety of conditions and on motivation to reach a common prior by consent. We show that in general the frequentist confidence level contains less relevant Shannon information than the Bayesian posterior, and give an example where no deterministic frequentist critical regions give any relevant information even though the Bayesian posterior contains up to the maximum possible amount. In contrast use of the Bayesian prior allows construction of non-deterministic critical regions for which the Bayesian posterior can be recovered from the frequentist confidence.
This paper studies linear reconstruction of partially observed functional data which are recorded on a discrete grid. We propose a novel estimation approach based on approximate factor models with increasing rank taking into account potential covariate information. Whereas alternative reconstruction procedures commonly involve some preliminary smoothing, our method separates the signal from noise and reconstructs missing fragments at once. We establish uniform convergence rates of our estimator and introduce a new method for constructing simultaneous prediction bands for the missing trajectories. A simulation study examines the performance of the proposed methods in finite samples. Finally, a real data application of temperature curves demonstrates that our theory provides a simple and effective method to recover missing fragments.
Due to its ubiquitous and contact-free nature, the use of WiFi infrastructure for performing sensing tasks has tremendous potential. However, the channel state information (CSI) measured by a WiFi receiver suffers from errors in both its gain and phase, which can significantly hinder sensing tasks. By analyzing these errors from different WiFi receivers, a mathematical model for these gain and phase errors is developed in this work. Based on these models, several theoretically justified preprocessing algorithms for correcting such errors at a receiver and, thus, obtaining clean CSI are presented. Simulation results show that at typical system parameters, the developed algorithms for cleaning CSI can reduce noise by $40$% and $200$%, respectively, compared to baseline methods for gain correction and phase correction, without significantly impacting computational cost. The superiority of the proposed methods is also validated in a real-world test bed for respiration rate monitoring (an example sensing task), where they improve the estimation signal-to-noise ratio by $20$% compared to baseline methods.
Spinodal metamaterials, with architectures inspired by natural phase-separation processes, have presented a significant alternative to periodic and symmetric morphologies when designing mechanical metamaterials with extreme performance. While their elastic mechanical properties have been systematically determined, their large-deformation, nonlinear responses have been challenging to predict and design, in part due to limited data sets and the need for complex nonlinear simulations. This work presents a novel physics-enhanced machine learning (ML) and optimization framework tailored to address the challenges of designing intricate spinodal metamaterials with customized mechanical properties in large-deformation scenarios where computational modeling is restrictive and experimental data is sparse. By utilizing large-deformation experimental data directly, this approach facilitates the inverse design of spinodal structures with precise finite-strain mechanical responses. The framework sheds light on instability-induced pattern formation in spinodal metamaterials -- observed experimentally and in selected nonlinear simulations -- leveraging physics-based inductive biases in the form of nonconvex energetic potentials. Altogether, this combined ML, experimental, and computational effort provides a route for efficient and accurate design of complex spinodal metamaterials for large-deformation scenarios where energy absorption and prediction of nonlinear failure mechanisms is essential.
We present a theoretical framework for the extraction and transformation of text documents. We propose to use a two-phase process where the first phase extracts span-tuples from a document, and the second phase maps the content of the span-tuples into new documents. We base the extraction phase on the framework of document spanners and the transformation phase on the theory of polyregular functions, the class of regular string-to-string functions with polynomial growth. For supporting practical extract-transform scenarios, we propose an extension of document spanners described by regex formulas from span-tuples to so-called multispan-tuples, where variables are mapped to sets of spans. We prove that this extension, called regex multispanners, has the same desirable properties as standard spanners described by regex formulas. In our framework, an Extract-Transform (ET) program is given by a regex multispanner followed by a polyregular function. In this paper, we study the expressibility and evaluation problem of ET programs when the transformation function is linear, called linear ET programs. We show that linear ET programs are equally expressive as non-deterministic streaming string transducers under bag semantics. Moreover, we show that linear ET programs are closed under composition. Finally, we present an enumeration algorithm for evaluating every linear ET program over a document with linear time preprocessing and constant delay.