This research proposes a machine learning-based attack detection model for power systems, specifically targeting smart grids. By utilizing data and logs collected from Phasor Measuring Devices (PMUs), the model aims to learn system behaviors and effectively identify potential security boundaries. The proposed approach involves crucial stages including dataset pre-processing, feature selection, model creation, and evaluation. To validate our approach, we used a dataset used, consist of 15 separate datasets obtained from different PMUs, relay snort alarms and logs. Three machine learning models: Random Forest, Logistic Regression, and K-Nearest Neighbour were built and evaluated using various performance metrics. The findings indicate that the Random Forest model achieves the highest performance with an accuracy of 90.56% in detecting power system disturbances and has the potential in assisting operators in decision-making processes.
In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the p-value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).
We study the problem of maximizing information divergence from a new perspective using logarithmic Voronoi polytopes. We show that for linear models, the maximum is always achieved at the boundary of the probability simplex. For toric models, we present an algorithm that combines the combinatorics of the chamber complex with numerical algebraic geometry. We pay special attention to reducible models and models of maximum likelihood degree one.
This paper proposes a new framework using physics-informed neural networks (PINNs) to simulate complex structural systems that consist of single and double beams based on Euler-Bernoulli and Timoshenko theory, where the double beams are connected with a Winkler foundation. In particular, forward and inverse problems for the Euler-Bernoulli and Timoshenko partial differential equations (PDEs) are solved using nondimensional equations with the physics-informed loss function. Higher-order complex beam PDEs are efficiently solved for forward problems to compute the transverse displacements and cross-sectional rotations with less than 1e-3 percent error. Furthermore, inverse problems are robustly solved to determine the unknown dimensionless model parameters and applied force in the entire space-time domain, even in the case of noisy data. The results suggest that PINNs are a promising strategy for solving problems in engineering structures and machines involving beam systems.
We introduce a new data fusion method that utilizes multiple data sources to estimate a smooth, finite-dimensional parameter. Most existing methods only make use of fully aligned data sources that share common conditional distributions of one or more variables of interest. However, in many settings, the scarcity of fully aligned sources can make existing methods require unduly large sample sizes to be useful. Our approach enables the incorporation of weakly aligned data sources that are not perfectly aligned, provided their degree of misalignment can be characterized by a prespecified density ratio model. We describe gains in efficiency and provide a general means to construct estimators achieving these gains. We illustrate our results by fusing data from two harmonized HIV monoclonal antibody prevention efficacy trials to study how a neutralizing antibody biomarker associates with HIV genotype.
Data Spaces are an emerging concept for the trusted implementation of data-based applications and business models, offering a high degree of flexibility and sovereignty to all stakeholders. As Data Spaces are currently emerging in different domains such as mobility, health or food, semantic interfaces need to be identified and implemented to ensure the technical interoperability of these Data Spaces. This paper consolidates data models from 13 different domains and analyzes the ontological dissonance of these domains. Using a network graph, central data models and ontology attributes are identified, while the semantic heterogeneity of these domains is described qualitatively. The research outlook describes how these results help to connect different Data Spaces across domains.
Multiple-input multiple-output (MIMO) systems will play a crucial role in future wireless communication, but improving their signal detection performance to increase transmission efficiency remains a challenge. To address this issue, we propose extending the discrete signal detection problem in MIMO systems to a continuous one and applying the Hamiltonian Monte Carlo method, an efficient Markov chain Monte Carlo algorithm. In our previous studies, we have used a mixture of normal distributions for the prior distribution. In this study, we propose using a mixture of t-distributions, which further improves detection performance. Based on our theoretical analysis and computer simulations, the proposed method can achieve near-optimal signal detection with polynomial computational complexity. This high-performance and practical MIMO signal detection could contribute to the development of the 6th-generation mobile network.
This paper investigates a swarm of autonomous mobile robots in the Euclidean plane, under the semi-synchronous ($\cal SSYNC$) scheduler. Each robot has a target function to determine a destination point from the robots' positions. All robots in the swarm take the same target function conventionally. We allow the robots to take different target functions, and investigate the effects of the number of distinct target functions on the problem-solving ability, regarding target function as a resource to solve a problem like time. Specifically, we are interested in how many distinct target functions are necessary and sufficient to solve a problem $\Pi$. The number of distinct target functions necessary and sufficient to solve $\Pi$ is called the minimum algorithm size (MAS) for $\Pi$. The MAS is defined to be $\infty$, if $\Pi$ is unsolvable even for the robots with unique target functions. We show that the problems form an infinite hierarchy with respect to their MASs; for each integer $c > 0$ and $\infty$, the set of problems whose MAS is $c$ is not empty, which implies that target function is a resource irreplaceable, e.g., with time. We propose MAS as a natural measure to measure the complexity of a problem. We establish the MASs for solving the gathering and related problems from any initial configuration, i.e., in a self-stabilizing manner. For example, the MAS for the gathering problem is 2. It is 3, for the problem of gathering {\bf all non-faulty} robots at a single point, regardless of the number $(< n)$ of crash failures. It is however $\infty$, for the problem of gathering all robots at a single point, in the presence of at most one crash failure.
We present a new transport-based approach to efficiently perform sequential Bayesian inference of static model parameters. The strategy is based on the extraction of conditional distribution from the joint distribution of parameters and data, via the estimation of structured (e.g., block triangular) transport maps. This gives explicit surrogate models for the likelihood functions and their gradients. This allow gradient-based characterizations of posterior density via transport maps in a model-free, online phase. This framework is well suited for parameter estimation in case of complex noise models including nuisance parameters and when the forward model is only known as a black box. The numerical application of this method is performed in the context of characterization of ice thickness with conductivity measurements.
Modeling seeks to tame complexity during software development, by supporting design, analysis, and stakeholder communication. Paradoxically, experiences made by educators indicate that students often perceive modeling as adding complexity, instead of reducing it. In this position paper, I analyse modeling education from the lens of complexity science, a theoretical framework for the study of complex systems. I revisit pedagogical literature where complexity science has been used as a framework for general education and subject-specific education in disciplines such as medicine, project management, and sustainability. I revisit complexity-related challenges from modeling education literature, discuss them in the light of complexity and present recommendations for taming complexity when teaching modeling.
This research aims to develop kernel GNG, a kernelized version of the growing neural gas (GNG) algorithm, and to investigate the features of the networks generated by the kernel GNG. The GNG is an unsupervised artificial neural network that can transform a dataset into an undirected graph, thereby extracting the features of the dataset as a graph. The GNG is widely used in vector quantization, clustering, and 3D graphics. Kernel methods are often used to map a dataset to feature space, with support vector machines being the most prominent application. This paper introduces the kernel GNG approach and explores the characteristics of the networks generated by kernel GNG. Five kernels, including Gaussian, Laplacian, Cauchy, inverse multiquadric, and log kernels, are used in this study. The results of this study show that the average degree and the average clustering coefficient decrease as the kernel parameter increases for Gaussian, Laplacian, Cauchy, and IMQ kernels. If we avoid more edges and a higher clustering coefficient (or more triangles), the kernel GNG with a larger value of the parameter will be more appropriate.