This paper evaluates six strategies for mitigating imbalanced data: oversampling, undersampling, ensemble methods, specialized algorithms, class weight adjustments, and a no-mitigation approach referred to as the baseline. These strategies were tested on 58 real-life binary imbalanced datasets with imbalance rates ranging from 3 to 120. We conducted a comparative analysis of 10 under-sampling algorithms, 5 over-sampling algorithms, 2 ensemble methods, and 3 specialized algorithms across eight different performance metrics: accuracy, area under the ROC curve (AUC), balanced accuracy, F1-measure, G-mean, Matthew's correlation coefficient, precision, and recall. Additionally, we assessed the six strategies on altered datasets, derived from real-life data, with both low (3) and high (100 or 300) imbalance ratios (IR). The principal finding indicates that the effectiveness of each strategy significantly varies depending on the metric used. The paper also examines a selection of newer algorithms within the categories of specialized algorithms, oversampling, and ensemble methods. The findings suggest that the current hierarchy of best-performing strategies for each metric is unlikely to change with the introduction of newer algorithms.
The increasing availability of temporal data poses a challenge to time-series and signal-processing domains due to its high numerosity and complexity. Symbolic representation outperforms raw data in a variety of engineering applications due to its storage efficiency, reduced numerosity, and noise reduction. The most recent symbolic aggregate approximation technique called ABBA demonstrates outstanding performance in preserving essential shape information of time series and enhancing the downstream applications. However, ABBA cannot handle multiple time series with consistent symbols, i.e., the same symbols from distinct time series are not identical. Also, working with appropriate ABBA digitization involves the tedious task of tuning the hyperparameters, such as the number of symbols or tolerance. Therefore, we present a joint symbolic aggregate approximation that has symbolic consistency, and show how the hyperparameter of digitization can itself be optimized alongside the compression tolerance ahead of time. Besides, we propose a novel computing paradigm that enables parallel computing of symbolic approximation. The extensive experiments demonstrate its superb performance and outstanding speed regarding symbolic approximation and reconstruction.
Semantic segmentation techniques for extracting building footprints from high-resolution remote sensing images have been widely used in many fields such as urban planning. However, large-scale building extraction demands higher diversity in training samples. In this paper, we construct a Global Building Semantic Segmentation (GBSS) dataset (The dataset will be released), which comprises 116.9k pairs of samples (about 742k buildings) from six continents. There are significant variations of building samples in terms of size and style, so the dataset can be a more challenging benchmark for evaluating the generalization and robustness of building semantic segmentation models. We validated through quantitative and qualitative comparisons between different datasets, and further confirmed the potential application in the field of transfer learning by conducting experiments on subsets.
In this paper, we propose the application of shrinkage strategies to estimate coefficients in the Bell regression models when prior information about the coefficients is available. The Bell regression models are well-suited for modeling count data with multiple covariates. Furthermore, we provide a detailed explanation of the asymptotic properties of the proposed estimators, including asymptotic biases and mean squared errors. To assess the performance of the estimators, we conduct numerical studies using Monte Carlo simulations and evaluate their simulated relative efficiency. The results demonstrate that the suggested estimators outperform the unrestricted estimator when prior information is taken into account. Additionally, we present an empirical application to demonstrate the practical utility of the suggested estimators.
Amount of information in SAT is estimated and compared with the amount of information in the fixed code algorithms. A remark on SAT Kolmogorov complexity is made. It is argued that SAT can be polynomial-time solvable, or not, depending on the solving algorithm information content.
In the author's previous paper (Zhang et al. 2022), exponential convergence was proved for the perfectly matched layers (PML) approximation of scattering problems with periodic surfaces in 2D. However, due to the overlapping of singularities, an exceptional case, i.e., when the wave number is a half integer, has to be excluded in the proof. However, numerical results for these cases still have fast convergence rate and this motivates us to go deeper into these cases. In this paper, we focus on these cases and prove that the fast convergence result for the discretized form. Numerical examples are also presented to support our theoretical results.
The paper considers the convergence of the complex block Jacobi diagonalization methods under the large set of the generalized serial pivot strategies. The global convergence of the block methods for Hermitian, normal and $J$-Hermitian matrices is proven. In order to obtain the convergence results for the block methods that solve other eigenvalue problems, such as the generalized eigenvalue problem, we consider the convergence of a general block iterative process which uses the complex block Jacobi annihilators and operators.
This paper presents a critical analysis of generative Artificial Intelligence (AI) detection tools in higher education assessments. The rapid advancement and widespread adoption of generative AI, particularly in education, necessitates a reevaluation of traditional academic integrity mechanisms. We explore the effectiveness, vulnerabilities, and ethical implications of AI detection tools in the context of preserving academic integrity. Our study synthesises insights from various case studies, newspaper articles, and student testimonies to scrutinise the practical and philosophical challenges associated with AI detection. We argue that the reliance on detection mechanisms is misaligned with the educational landscape, where AI plays an increasingly widespread role. This paper advocates for a strategic shift towards robust assessment methods and educational policies that embrace generative AI usage while ensuring academic integrity and authenticity in assessments.
This paper presents a regularized recursive identification algorithm with simultaneous on-line estimation of both the model parameters and the algorithms hyperparameters. A new kernel is proposed to facilitate the algorithm development. The performance of this novel scheme is compared with that of the recursive least-squares algorithm in simulation.
Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchronous iterations, only very few of them are based on global residual computation and guarantee effective convergence. But the most effective and efficient existing solutions feature two reduction operations, which constitutes an important factor of termination delay. In this paper, we present new, non-intrusive, protocols to compute a residual error under asynchronous iterations, requiring only one reduction operation. Various communication models show that some heuristics can even be introduced and formally evaluated. Extensive experiments with up to 5600 processor cores confirm the practical effectiveness and efficiency of our approach.
The paper introduces a new meshfree pseudospectral method based on Gaussian radial basis functions (RBFs) collocation to solve fractional Poisson equations. Hypergeometric functions are used to represent the fractional Laplacian of Gaussian RBFs, enabling an efficient computation of stiffness matrix entries. Unlike existing RBF-based methods, our approach ensures a Toeplitz structure in the stiffness matrix with equally spaced RBF centers, enabling efficient matrix-vector multiplications using fast Fourier transforms. We conduct a comprehensive study on the shape parameter selection, addressing challenges related to ill-conditioning and numerical stability. The main contribution of our work includes rigorous stability analysis and error estimates of the Gaussian RBF collocation method, representing a first attempt at the rigorous analysis of RBF-based methods for fractional PDEs to the best of our knowledge. We conduct numerical experiments to validate our analysis and provide practical insights for implementation.