Stateful Coverage-Based Greybox Fuzzing (SCGF) is considered the state-of-the-art method for network protocol greybox fuzzing. During the protocol fuzzing process, SCGF constructs the state machine of the target protocol by identifying protocol states. Optimal states are selected for fuzzing using heuristic methods, along with corresponding seeds and mutation regions, to effectively conduct fuzz testing. Nevertheless, existing SCGF methodologies prioritise the selection of protocol states without considering the correspondence between program basic block coverage information and protocol states. To address this gap, this paper proposes a statemap-based reverse state selection method for SCGF. This approach prioritises the coverage information of fuzzy test seeds, and delves deeper into the correspondence between the basic block coverage information of the programme and the protocol state, with the objective of improving the bitmap coverage. The state map is employed to simplify the state machine representation method. Furthermore, the design of different types of states has enabled the optimisation of the method of constructing message sequences, the reduction in the length of message sequences further improve the efficiency of test case execution. By optimising the SCGF, we developed SMGFuzz and conducted experiments utilising Profuzzbench in order to assess the testing efficiency of SMGFuzz.The results indicate that compared to AFLNet, SMGFuzz achieved an average increase of 12.48% in edges coverage, a 50.1% increase in unique crashes and a 40.2% increase in test case execution speed over a period of 24 hours.
The various limitations of Generative AI, such as hallucinations and model failures, have made it crucial to understand the role of different modalities in Visual Language Model (VLM) predictions. Our work investigates how the integration of information from image and text modalities influences the performance and behavior of VLMs in visual question answering (VQA) and reasoning tasks. We measure this effect through answer accuracy, reasoning quality, model uncertainty, and modality relevance. We study the interplay between text and image modalities in different configurations where visual content is essential for solving the VQA task. Our contributions include (1) the Semantic Interventions (SI)-VQA dataset, (2) a benchmark study of various VLM architectures under different modality configurations, and (3) the Interactive Semantic Interventions (ISI) tool. The SI-VQA dataset serves as the foundation for the benchmark, while the ISI tool provides an interface to test and apply semantic interventions in image and text inputs, enabling more fine-grained analysis. Our results show that complementary information between modalities improves answer and reasoning quality, while contradictory information harms model performance and confidence. Image text annotations have minimal impact on accuracy and uncertainty, slightly increasing image relevance. Attention analysis confirms the dominant role of image inputs over text in VQA tasks. In this study, we evaluate state-of-the-art VLMs that allow us to extract attention coefficients for each modality. A key finding is PaliGemma's harmful overconfidence, which poses a higher risk of silent failures compared to the LLaVA models. This work sets the foundation for rigorous analysis of modality integration, supported by datasets specifically designed for this purpose.
The Gibbs sampler (a.k.a. Glauber dynamics and heat-bath algorithm) is a popular Markov Chain Monte Carlo algorithm which iteratively samples from the conditional distributions of a probability measure $\pi$ of interest. Under the assumption that $\pi$ is strongly log-concave, we show that the random scan Gibbs sampler contracts in relative entropy and provide a sharp characterization of the associated contraction rate. Assuming that evaluating conditionals is cheap compared to evaluating the joint density, our results imply that the number of full evaluations of $\pi$ needed for the Gibbs sampler to mix grows linearly with the condition number and is independent of the dimension. If $\pi$ is non-strongly log-concave, the convergence rate in entropy degrades from exponential to polynomial. Our techniques are versatile and extend to Metropolis-within-Gibbs schemes and the Hit-and-Run algorithm. A comparison with gradient-based schemes and the connection with the optimization literature are also discussed.
We consider a two-dimensional sharp-interface model for solid-state dewetting of thin films with anisotropic surface energies on curved substrates, where the film/vapor interface and substrate surface are represented by an evolving and a static curve, respectively. The model is governed by the anisotropic surface diffusion for the evolving curve, with appropriate boundary conditions at the contact points where the two curves meet. The continuum model obeys an energy decay law and preserves the enclosed area between the two curves. We introduce an arclength parameterization for the substrate curve, which plays a crucial role in a structure-preserving approximation as it straightens the curved substrate and tracks length changes between contact points. Based on this insight, we introduce a symmetrized weak formulation which leads to an unconditional energy stable parametric approximation in terms of the discrete energy. We also provide an error estimate of the enclosed area, which depends on the substrate profile and can be zero in the case of a flat substrate. Furthermore, we introduce a correction to the discrete normals to enable an exact area preservation for general curved substrates. The resulting nonlinear system is efficiently solved using a hybrid iterative algorithm which combines both Picard and Newton's methods. Numerical results are presented to show the robustness and good properties of the introduced method for simulating solid-state dewetting on various curved substrates.
We propose a method utilizing physics-informed neural networks (PINNs) to solve Poisson equations that serve as control variates in the computation of transport coefficients via fluctuation formulas, such as the Green--Kubo and generalized Einstein-like formulas. By leveraging approximate solutions to the Poisson equation constructed through neural networks, our approach significantly reduces the variance of the estimator at hand. We provide an extensive numerical analysis of the estimators and detail a methodology for training neural networks to solve these Poisson equations. The approximate solutions are then incorporated into Monte Carlo simulations as effective control variates, demonstrating the suitability of the method for moderately high-dimensional problems where fully deterministic solutions are computationally infeasible.
Finding vertex-to-vertex correspondences in real-world graphs is a challenging task with applications in a wide variety of domains. Structural matching based on graphs connectivities has attracted considerable attention, while the integration of all the other information stemming from vertices and edges attributes has been mostly left aside. Here we present the Graph Attributes and Structure Matching (GASM) algorithm, which provides high-quality solutions by integrating all the available information in a unified framework. Parameters quantifying the reliability of the attributes can tune how much the solutions should rely on the structure or on the attributes. We further show that even without attributes GASM consistently finds as-good-as or better solutions than state-of-the-art algorithms, with similar processing times.
We present a streamlined and simplified exponential lower bound on the length of proofs in intuitionistic implicational logic, adapted to Gordeev and Haeusler's dag-like natural deduction.
Functional Ordinary Kriging is the most widely used method to predict a curve at a given spatial point. However, uncertainty remains an open issue. In this article a distribution-free prediction method based on two different modulation functions and two conformity scores is proposed. Through simulations and benchmark data analyses, we demonstrate the advantages of our approach when compared to standard methods.
Mediation analysis aims to identify and estimate the effect of an exposure on an outcome that is mediated through one or more intermediate variables. In the presence of multiple intermediate variables, two pertinent methodological questions arise: estimating mediated effects when mediators are correlated, and performing high-dimensional mediation analysis when the number of mediators exceeds the sample size. This paper presents a two-step procedure for high-dimensional mediation analysis. The first step selects a reduced number of candidate mediators using an ad-hoc lasso penalty. The second step applies a procedure we previously developed to estimate the mediated and direct effects, accounting for the correlation structure among the retained candidate mediators. We compare the performance of the proposed two-step procedure with state-of-the-art methods using simulated data. Additionally, we demonstrate its practical application by estimating the causal role of DNA methylation in the pathway between smoking and rheumatoid arthritis using real data.
To avoid ineffective collisions between the equilibrium states, the hybrid method with deviational particles (HDP) has been proposed to integrate the Fokker-Planck-Landau system, while leaving a new issue in sampling deviational particles from the high-dimensional source term. In this paper, we present an adaptive sampling (AS) strategy that first adaptively reconstructs a piecewise constant approximation of the source term based on sequential clustering via discrepancy estimation, and then samples deviational particles directly from the resulting adaptive piecewise constant function without rejection. The mixture discrepancy, which can be easily calculated thanks to its explicit analytical expression, is employed as a measure of uniformity instead of the star discrepancy the calculation of which is NP-hard. The resulting method, dubbed the HDP-AS method, runs approximately ten times faster than the HDP method while keeping the same accuracy in the Landau damping, two stream instability, bump on tail and Rosenbluth's test problem.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.