We propose a novel methodology for validating software product line (PL) models by integrating Statistical Model Checking (SMC) with Process Mining (PM). Our approach focuses on the feature-oriented language QFLan in the PL engineering domain, allowing modeling of PLs with rich cross-tree and quantitative constraints, as well as aspects of dynamic PLs like staged configurations. This richness leads to models with infinite state-space, requiring simulation-based analysis techniques like SMC. For instance, we illustrate with a running example involving infinite state space. SMC involves generating samples of system dynamics to estimate properties such as event probabilities or expected values. On the other hand, PM uses data-driven techniques on execution logs to identify and reason about the underlying execution process. In this paper, we propose, for the first time, applying PM techniques to SMC simulations' byproducts to enhance the utility of SMC analyses. Typically, when SMC results are unexpected, modelers must determine whether they stem from actual system characteristics or model bugs in a black-box manner. We improve on this by using PM to provide a white-box perspective on the observed system dynamics. Samples from SMC are fed into PM tools, producing a compact graphical representation of observed dynamics. The mined PM model is then transformed into a QFLan model, accessible to PL engineers. Using two well-known PL models, we demonstrate the effectiveness and scalability of our methodology in pinpointing issues and suggesting fixes. Additionally, we show its generality by applying it to the security domain.
We present a new method for causal discovery in linear structural vector autoregressive models. We adapt an idea designed for independent observations to the case of time series while retaining its favorable properties, i.e., explicit error control for false causal discovery, at least asymptotically. We apply our method to several real-world bivariate time series datasets and discuss its findings which mostly agree with common understanding. The arrow of time in a model can be interpreted as background knowledge on possible causal mechanisms. Hence, our ideas could be extended to incorporating different background knowledge, even for independent observations.
This work explores the dimension reduction problem for Bayesian nonparametric regression and density estimation. More precisely, we are interested in estimating a functional parameter $f$ over the unit ball in $\mathbb{R}^d$, which depends only on a $d_0$-dimensional subspace of $\mathbb{R}^d$, with $d_0 < d$.It is well-known that rescaled Gaussian process priors over the function space achieve smoothness adaptation and posterior contraction with near minimax-optimal rates. Moreover, hierarchical extensions of this approach, equipped with subspace projection, can also adapt to the intrinsic dimension $d_0$ (\cite{Tokdar2011DimensionAdapt}).When the ambient dimension $d$ does not vary with $n$, the minimax rate remains of the order $n^{-\beta/(2\beta +d_0)}$.%When $d$ does not vary with $n$, the order of the minimax rate remains the same regardless of the ambient dimension $d$. However, this is up to multiplicative constants that can become prohibitively large when $d$ grows. The dependences between the contraction rate and the ambient dimension have not been fully explored yet and this work provides a first insight: we let the dimension $d$ grow with $n$ and, by combining the arguments of \cite{Tokdar2011DimensionAdapt} and \cite{Jiang2021VariableSelection}, we derive a growth rate for $d$ that still leads to posterior consistency with minimax rate.The optimality of this growth rate is then discussed.Additionally, we provide a set of assumptions under which consistent estimation of $f$ leads to a correct estimation of the subspace projection, assuming that $d_0$ is known.
This paper focuses on discussing Newton's method and its hybrid with machine learning for the steady state Navier-Stokes Darcy model discretized by mixed element methods. First, a Newton iterative method is introduced for solving the relative discretized problem. It is proved technically that this method converges quadratically with the convergence rate independent of the finite element mesh size, under certain standard conditions. Later on, a deep learning algorithm is proposed for solving this nonlinear coupled problem. Following the ideas of an earlier work by Huang, Wang and Yang (2020), an Int-Deep algorithm is constructed by combining the previous two methods so as to further improve the computational efficiency and robustness. A series of numerical examples are reported to show the numerical performance of the proposed methods.
This study addresses a class of linear mixed-integer programming (MILP) problems that involve uncertainty in the objective function parameters. The parameters are assumed to form a random vector, whose probability distribution can only be observed through a finite training data set. Unlike most of the related studies in the literature, we also consider uncertainty in the underlying data set. The data uncertainty is described by a set of linear constraints for each random sample, and the uncertainty in the distribution (for a fixed realization of data) is defined using a type-1 Wasserstein ball centered at the empirical distribution of the data. The overall problem is formulated as a three-level distributionally robust optimization (DRO) problem. First, we prove that the three-level problem admits a single-level MILP reformulation, if the class of loss functions is restricted to biaffine functions. Secondly, it turns out that for several particular forms of data uncertainty, the outlined problem can be solved reasonably fast by leveraging the nominal MILP problem. Finally, we conduct a computational study, where the out-of-sample performance of our model and computational complexity of the proposed MILP reformulation are explored numerically for several application domains.
We introduce PennyLane's Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures and showcase the scale of problems that can be simulated using our tooling. We benchmark the performance of Lightning with backends supporting CPUs, as well as NVidia and AMD GPUs, and compare the results to other commonly used high-performance simulator packages, demonstrating where Lightning's implementations give performance leads. We show improved CPU performance by employing explicit SIMD intrinsics and multi-threading, batched task-based execution across multiple GPUs, and distributed forward and gradient-based quantum circuit executions across multiple nodes. Our data shows we can comfortably simulate a variety of circuits, giving examples with up to 30 qubits on a single device or node, and up to 41 qubits using multiple nodes.
We present novel improvements in the context of symbol-based multigrid procedures for solving large block structured linear systems. We study the application of an aggregation-based grid transfer operator that transforms the symbol of a block Toeplitz matrix from matrix-valued to scalar-valued at the coarser level. Our convergence analysis of the Two-Grid Method (TGM) reveals the connection between the features of the scalar-valued symbol at the coarser level and the properties of the original matrix-valued one. This allows us to prove the convergence of a V-cycle multigrid with standard grid transfer operators for scalar Toeplitz systems at the coarser levels. Consequently, we extend the class of suitable smoothers for block Toeplitz matrices, focusing on the efficiency of block strategies, particularly the relaxed block Jacobi method. General conditions on smoothing parameters are derived, with emphasis on practical applications where these parameters can be calculated with negligible computational cost. We test the proposed strategies on linear systems stemming from the discretization of differential problems with $\mathbb{Q}_{d} $ Lagrangian FEM or B-spline with non-maximal regularity. The numerical results show in both cases computational advantages compared to existing methods for block structured linear systems.
The multiple testing problem appears when fitting multivariate generalized linear models for high dimensional data. We show that the sign-flip test can be combined with permutation-based procedures for assessing the multiple testing problem
Classical tests are available for the two-sample test of correspondence of distribution functions. From these, the Kolmogorov-Smirnov test provides also the graphical interpretation of the test results, in different forms. Here, we propose modifications of the Kolmogorov-Smirnov test with higher power. The proposed tests are based on the so-called global envelope test which allows for graphical interpretation, similarly as the Kolmogorov-Smirnov test. The tests are based on rank statistics and are suitable also for the comparison of $n$ samples, with $n \geq 2$. We compare the alternatives for the two-sample case through an extensive simulation study and discuss their interpretation. Finally, we apply the tests to real data. Specifically, we compare the height distributions between boys and girls at different ages, as well as sepal length distributions of different flower species using the proposed methodologies.
We propose Stein-type estimators for zero-inflated Bell regression models by incorporating information on model parameters. These estimators combine the advantages of unrestricted and restricted estimators. We derive the asymptotic distributional properties, including bias and mean squared error, for the proposed shrinkage estimators. Monte Carlo simulations demonstrate the superior performance of our shrinkage estimators across various scenarios. Furthermore, we apply the proposed estimators to analyze a real dataset, showcasing their practical utility.
We establish a theoretical framework of the particle relaxation method for uniform particle generation of Smoothed Particle Hydrodynamics. We achieve this by reformulating the particle relaxation as an optimization problem. The objective function is an integral difference between discrete particle-based and smoothed-analytical volume fractions. The analysis demonstrates that the particle relaxation method in the domain interior is essentially equivalent to employing a gradient descent approach to solve this optimization problem, and we can extend such an equivalence to the bounded domain by introducing a proper boundary term. Additionally, each periodic particle distribution has a spatially uniform particle volume, denoted as characteristic volume. The relaxed particle distribution has the largest characteristic volume, and the kernel cut-off radius determines this volume. This insight enables us to control the relaxed particle distribution by selecting the target kernel cut-off radius for a given kernel function.