Judging whether an integer can be divided by prime numbers such as 2 or 3 may appear trivial to human beings, but can be less straightforward for computers. Here, we tested multiple deep learning architectures and feature engineering approaches on classifying integers based on their residues when divided by small prime numbers. We found that the ability of classification critically depends on the feature space. We also evaluated Automated Machine Learning (AutoML) platforms from Amazon, Google and Microsoft, and found that they failed on this task without appropriately engineered features. Furthermore, we introduced a method that utilizes linear regression on Fourier series basis vectors, and demonstrated its effectiveness. Finally, we evaluated Large Language Models (LLMs) such as GPT-4, GPT-J, LLaMA and Falcon, and demonstrated their failures. In conclusion, feature engineering remains an important task to improve performance and increase interpretability of machine-learning models, even in the era of AutoML and LLMs.
This proposed model introduces novel deep learning methodologies. The objective here is to create a reliable intrusion detection mechanism to help identify malicious attacks. Deep learning based solution framework is developed consisting of three approaches. The first approach is Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) with seven optimizer functions such as adamax, SGD, adagrad, adam, RMSprop, nadam and adadelta. The model is evaluated on NSL-KDD dataset and classified multi attack classification. The model has outperformed with adamax optimizer in terms of accuracy, detection rate and low false alarm rate. The results of LSTM-RNN with adamax optimizer is compared with existing shallow machine and deep learning models in terms of accuracy, detection rate and low false alarm rate. The multi model methodology consisting of Recurrent Neural Network (RNN), Long-Short Term Memory Recurrent Neural Network (LSTM-RNN), and Deep Neural Network (DNN). The multi models are evaluated on bench mark datasets such as KDD99, NSL-KDD, and UNSWNB15 datasets. The models self-learnt the features and classifies the attack classes as multi-attack classification. The models RNN, and LSTM-RNN provide considerable performance compared to other existing methods on KDD99 and NSL-KDD dataset
Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to eliminate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle properties of the optimal subsample size and provide an in-depth comparison between different bagging variants.
Supervised deep learning was recently introduced in high-contrast imaging (HCI) through the SODINN algorithm, a convolutional neural network designed for exoplanet detection in angular differential imaging (ADI) datasets. The benchmarking of HCI algorithms within the Exoplanet Imaging Data Challenge (EIDC) showed that (i) SODINN can produce a high number of false positives in the final detection maps, and (ii) algorithms processing images in a more local manner perform better. This work aims to improve the SODINN detection performance by introducing new local processing approaches and adapting its learning process accordingly. We propose NA-SODINN, a new deep learning binary classifier based on a convolutional neural network (CNN) that better captures image noise correlations in ADI-processed frames by identifying noise regimes. Our new approach was tested against its predecessor, as well as two SODINN-based hybrid models and a more standard annular-PCA approach, through local receiving operating characteristics (ROC) analysis of ADI sequences from the VLT/SPHERE and Keck/NIRC-2 instruments. Results show that NA-SODINN enhances SODINN in both sensitivity and specificity, especially in the speckle-dominated noise regime. NA-SODINN is also benchmarked against the complete set of submitted detection algorithms in EIDC, in which we show that its final detection score matches or outperforms the most powerful detection algorithms.Throughout the supervised machine learning case, this study illustrates and reinforces the importance of adapting the task of detection to the local content of processed images.
Quantum computing promises transformational gains for solving some problems, but little to none for others. For anyone hoping to use quantum computers now or in the future, it is important to know which problems will benefit. In this paper, we introduce a framework for answering this question both intuitively and quantitatively. The underlying structure of the framework is a race between quantum and classical computers, where their relative strengths determine when each wins. While classical computers operate faster, quantum computers can sometimes run more efficient algorithms. Whether the speed advantage or the algorithmic advantage dominates determines whether a problem will benefit from quantum computing or not. Our analysis reveals that many problems, particularly those of small to moderate size that can be important for typical businesses, will not benefit from quantum computing. Conversely, larger problems or those with particularly big algorithmic gains will benefit from near-term quantum computing. Since very large algorithmic gains are rare in practice and theorized to be rare even in principle, our analysis suggests that the benefits from quantum computing will flow either to users of these rare cases, or practitioners processing very large data.
Benchmark suites are crucial for assessing the performance of evolutionary algorithms, but the constituent problems are often too complex to provide clear intuition about an algorithm's strengths and weaknesses. To address this gap, we introduce DOSSIER ("Diagnostic Overview of Selection Schemes In Evolutionary Runs"), a diagnostic suite initially composed of eight handcrafted metrics. These metrics are designed to empirically measure specific capacities for exploitation, exploration, and their interactions. We consider exploitation both with and without constraints, and we divide exploration into two aspects: diversity exploration (the ability to simultaneously explore multiple pathways) and valley-crossing exploration (the ability to cross wider and wider fitness valleys). We apply DOSSIER to six popular selection schemes: truncation, tournament, fitness sharing, lexicase, nondominated sorting, and novelty search. Our results confirm that simple schemes (e.g., tournament and truncation) emphasized exploitation. For more sophisticated schemes, however, our diagnostics revealed interesting dynamics. Lexicase selection performed moderately well across all diagnostics that did not incorporate valley crossing, but faltered dramatically whenever valleys were present, performing worse than even random search. Fitness sharing was the only scheme to effectively contend with valley crossing but it struggled with the other diagnostics. Our study highlights the utility of using diagnostics to gain nuanced insights into selection scheme characteristics, which can inform the design of new selection methods.
Obtaining the solutions of partial differential equations based on machine learning methods has drawn more and more attention in the fields of scientific computation and engineering applications. In this work, we first propose a coupled Extreme Learning Machine(called CELM) method incorporated with the physical laws to solve a class of fourth-order biharmonic equations by reformulating it into two well-posed Poisson problems. In addition, some activation functions including tangent, gaussian, sine, and trigonometric functions are introduced to assess our CELM method. Furthermore, we introduce several activation functions, such as tangent, Gaussian, sine, and trigonometric functions, to evaluate the performance of our CELM method. Notably, the sine and trigonometric functions demonstrate a remarkable ability to effectively minimize the approximation error of the CELM model. In the end, several numerical experiments are performed to study the initializing ways for both the weights and biases of the hidden units in our CELM model and explore the required number of hidden units. Numerical results show the proposed CELM algorithm is high-precision and efficient to address the biharmonic equations on both regular and irregular domains.
Symmetry is a cornerstone of much of mathematics, and many probability distributions possess symmetries characterized by their invariance to a collection of group actions. Thus, many mathematical and statistical methods rely on such symmetry holding and ostensibly fail if symmetry is broken. This work considers under what conditions a sequence of probability measures asymptotically gains such symmetry or invariance to a collection of group actions. Considering the many symmetries of the Gaussian distribution, this work effectively proposes a non-parametric type of central limit theorem. That is, a Lipschitz function of a high dimensional random vector will be asymptotically invariant to the actions of certain compact topological groups. Applications of this include a partial law of the iterated logarithm for uniformly random points in an $\ell_p^n$-ball and an asymptotic equivalence between classical parametric statistical tests and their randomization counterparts even when invariance assumptions are violated.
Models of complex technological systems inherently contain interactions and dependencies among their input variables that affect their joint influence on the output. Such models are often computationally expensive and few sensitivity analysis methods can effectively process such complexities. Moreover, the sensitivity analysis field as a whole pays limited attention to the nature of interaction effects, whose understanding can prove to be critical for the design of safe and reliable systems. In this paper, we introduce and extensively test a simple binning approach for computing sensitivity indices and demonstrate how complementing it with the smart visualization method, simulation decomposition (SimDec), can permit important insights into the behavior of complex engineering models. The simple binning approach computes first-, second-order effects, and a combined sensitivity index, and is considerably more computationally efficient than Sobol' indices. The totality of the sensitivity analysis framework provides an efficient and intuitive way to analyze the behavior of complex systems containing interactions and dependencies.
Differential geometric approaches are ubiquitous in several fields of mathematics, physics and engineering, and their discretizations enable the development of network-based mathematical and computational frameworks, which are essential for large-scale data science. The Forman-Ricci curvature (FRC) - a statistical measure based on Riemannian geometry and designed for networks - is known for its high capacity for extracting geometric information from complex networks. However, extracting information from dense networks is still challenging due to the combinatorial explosion of high-order network structures. Motivated by this challenge we sought a set-theoretic representation theory for high-order network cells and FRC, as well as their associated concepts and properties, which together provide an alternative and efficient formulation for computing high-order FRC in complex networks. We provide a pseudo-code, a software implementation coined FastForman, as well as a benchmark comparison with alternative implementations. Crucially, our representation theory reveals previous computational bottlenecks and also accelerates the computation of FRC. As a consequence, our findings open new research possibilities in complex systems where higher-order geometric computations are required.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.