Tensor-based morphometry (TBM) aims at showing local differences in brain volumes with respect to a common template. TBM images are smooth but they exhibit (especially in diseased groups) higher values in some brain regions called lateral ventricles. More specifically, our voxelwise analysis shows both a mean-variance relationship in these areas and evidence of spatially dependent skewness. We propose a model for 3-dimensional functional data where mean, variance, and skewness functions vary smoothly across brain locations. We model the voxelwise distributions as skew-normal. The smooth effects of age and sex are estimated on a reference population of cognitively normal subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and mapped across the whole brain. The three parameter functions allow to transform each TBM image (in the reference population as well as in a test set) into a Gaussian process. These subject-specific normative maps are used to derive indices of deviation from a healthy condition to assess the individual risk of pathological degeneration.
Forward regression is a crucial methodology for automatically identifying important predictors from a large pool of potential covariates. In contexts with moderate predictor correlation, forward selection techniques can achieve screening consistency. However, this property gradually becomes invalid in the presence of substantially correlated variables, especially in high-dimensional datasets where strong correlations exist among predictors. This dilemma is encountered by other model selection methods in literature as well. To address these challenges, we introduce a novel decorrelated forward (DF) selection framework for generalized mean regression models, including prevalent models, such as linear, logistic, Poisson, and quasi likelihood. The DF selection framework stands out because of its ability to convert generalized mean regression models into linear ones, thus providing a clear interpretation of the forward selection process. It also offers a closed-form expression for forward iteration, to improve practical applicability and efficiency. Theoretically, we establish the screening consistency of DF selection and determine the upper bound of the selected submodel's size. To reduce computational burden, we develop a thresholding DF algorithm that provides a stopping rule for the forward-searching process. Simulations and two real data applications show the outstanding performance of our method compared with some existing model selection methods.
Data depth has emerged as an invaluable nonparametric measure for the ranking of multivariate samples. The main contribution of depth-based two-sample comparisons is the introduction of the Q statistic (Liu and Singh, 1993), a quality index. Unlike traditional methods, data depth does not require the assumption of normal distributions and adheres to four fundamental properties. Many existing two-sample homogeneity tests, which assess mean and/or scale changes in distributions often suffer from low statistical power or indeterminate asymptotic distributions. To overcome these challenges, we introduced a DEEPEAST (depth-explored same-attraction sample-to-sample central-outward ranking) technique for improving statistical power in two-sample tests via the same-attraction function. We proposed two novel and powerful depth-based test statistics: the sum test statistic and the product test statistic, which are rooted in Q statistics, share a "common attractor" and are applicable across all depth functions. We further proved the asymptotic distribution of these statistics for various depth functions. To assess the performance of power gain, we apply three depth functions: Mahalanobis depth (Liu and Singh, 1993), Spatial depth (Brown, 1958; Gower, 1974), and Projection depth (Liu, 1992). Through two-sample simulations, we have demonstrated that our sum and product statistics exhibit superior power performance, utilizing a strategic block permutation algorithm and compare favourably with popular methods in literature. Our tests are further validated through analysis on Raman spectral data, acquired from cellular and tissue samples, highlighting the effectiveness of the proposed tests highlighting the effective discrimination between health and cancerous samples.
We show that Pfaffians or contiguity relations of hypergeometric functions of several variables give a direct sampling algorithm from toric models in statistics, which is a Markov chain on a lattice generated by a matrix $A$. A correspondence among graphical models and $A$-hypergeometric system is discussed and we give a sum formula of special values of $A$-hypergeometric polynomials. Some hypergeometric series which are interesting in view of statistics are presented.
A partially identified model, where the parameters can not be uniquely identified, often arises during statistical analysis. While researchers frequently use Bayesian inference to analyze the models, when Bayesian inference with an off-the-shelf MCMC sampling algorithm is applied to a partially identified model, the computational performance can be poor. It is found that using importance sampling with transparent reparameterization (TP) is one remedy. This method is preferable since the model is known to be rendered as identified with respect to the new parameterization, and at the same time, it may allow faster, i.i.d. Monte Carlo sampling by using conjugate convenience priors. In this paper, we explain the importance sampling method with the TP and a pseudo-TP. We introduce the pseudo-TP, an alternative to TP, since finding a TP is sometimes difficult. Then, we test the methods' performance in some scenarios and compare it to the performance of the off-the-shelf MCMC method - Gibbs sampling - applied in the original parameterization. While the importance sampling with TP (ISTP) shows generally better results than off-the-shelf MCMC methods, as seen in the compute time and trace plots, it is also seen that finding a TP which is necessary for the method may not be easy. On the other hand, the pseudo-TP method shows a mixed result and room for improvement since it relies on an approximation, which may not be adequate for a given model and dataset.
Goodness--of--fit tests for the distribution of the composed error term in a Stochastic Frontier Model (SFM) are suggested. The focus is on the case of a normal/gamma SFM and the heavy--tailed stable/gamma SFM. In the first case the moment generating function is used as tool while in the latter case the characteristic function of the error term is employed. In both cases our test statistics are formulated as weighted integrals of properly standardized data. The new normal/gamma test is consistent, and is shown to have an intrinsic relation to moment--based tests. The finite--sample behavior of resampling versions of both tests is investigated by Monte Carlo simulation, while several real--data applications are also included.
This paper addresses the convergence analysis of a variant of the LevenbergMarquardt method (LMM) designed for nonlinear least-squares problems with non-zero residue. This variant, called LMM with Singular Scaling (LMMSS), allows the LMM scaling matrix to be singular, encompassing a broader class of regularizers, which has proven useful in certain applications. In order to establish local convergence results, a careful choice of the LMM parameter is made based on the gradient linearization error, dictated by the nonlinearity and size of the residual. Under completeness and local error bound assumptions we prove that the distance from an iterate to the set of stationary points goes to zero superlinearly and that the iterative sequence converges. Furthermore, we also study a globalized version of the method obtained using linesearch and prove that any limit point of the generated sequence is stationary. Some examples are provided to illustrate our theoretical results.
Stateful Coverage-Based Greybox Fuzzing (SCGF) is considered the state-of-the-art method for network protocol greybox fuzzing. During the protocol fuzzing process, SCGF constructs the state machine of the target protocol by identifying protocol states. Optimal states are selected for fuzzing using heuristic methods, along with corresponding seeds and mutation regions, to effectively conduct fuzz testing. Nevertheless, existing SCGF methodologies prioritise the selection of protocol states without considering the correspondence between program basic block coverage information and protocol states. To address this gap, this paper proposes a statemap-based reverse state selection method for SCGF. This approach prioritises the coverage information of fuzzy test seeds, and delves deeper into the correspondence between the basic block coverage information of the programme and the protocol state, with the objective of improving the bitmap coverage. The state map is employed to simplify the state machine representation method. Furthermore, the design of different types of states has enabled the optimisation of the method of constructing message sequences, the reduction in the length of message sequences further improve the efficiency of test case execution. By optimising the SCGF, we developed SMGFuzz and conducted experiments utilising Profuzzbench in order to assess the testing efficiency of SMGFuzz.The results indicate that compared to AFLNet, SMGFuzz achieved an average increase of 12.48% in edges coverage, a 50.1% increase in unique crashes and a 40.2% increase in test case execution speed over a period of 24 hours.
The broad class of multivariate unified skew-normal (SUN) distributions has been recently shown to possess important conjugacy properties. When used as priors for the coefficients vector in probit, tobit, and multinomial probit models, these distributions yield posteriors that still belong to the SUN family. Although this result has led to important advancements in Bayesian inference and computation, its applicability beyond likelihoods associated with fully-observed, discretized, or censored realizations from multivariate Gaussian models remains yet unexplored. This article covers such a gap by proving that the wider family of multivariate unified skew-elliptical (SUE) distributions, which extends SUNs to more general perturbations of elliptical densities, guarantees conjugacy for broader classes of models, beyond those relying on fully-observed, discretized or censored Gaussians. Such a result leverages the closure under linear combinations, conditioning and marginalization of SUE to prove that this family is conjugate to the likelihood induced by multivariate regression models for fully-observed, censored or dichotomized realizations from skew-elliptical distributions. This advancement enlarges the set of models that enable conjugate Bayesian inference to general formulations arising from elliptical and skew-elliptical families, including the multivariate Student's t and skew-t, among others.
Influenced mixed moving average fields are a versatile modeling class for spatio-temporal data. However, their predictive distribution is not generally known. Under this modeling assumption, we define a novel spatio-temporal embedding and a theory-guided machine learning approach that employs a generalized Bayesian algorithm to make ensemble forecasts. We use Lipschitz predictors and determine fixed-time and any-time PAC Bayesian bounds in the batch learning setting. Performing causal forecast is a highlight of our methodology as its potential application to data with spatial and temporal short and long-range dependence. We then test the performance of our learning methodology by using linear predictors and data sets simulated from a spatio-temporal Ornstein-Uhlenbeck process.
We propose a new second-order accurate lattice Boltzmann formulation for linear elastodynamics that is stable for arbitrary combinations of material parameters under a CFL-like condition. The construction of the numerical scheme uses an equivalent first-order hyperbolic system of equations as an intermediate step, for which a vectorial lattice Boltzmann formulation is introduced. The only difference to conventional lattice Boltzmann formulations is the usage of vector-valued populations, so that all computational benefits of the algorithm are preserved. Using the asymptotic expansion technique and the notion of pre-stability structures we further establish second-order consistency as well as analytical stability estimates. Lastly, we introduce a second-order consistent initialization of the populations as well as a boundary formulation for Dirichlet boundary conditions on 2D rectangular domains. All theoretical derivations are numerically verified by convergence studies using manufactured solutions and long-term stability tests.