We investigate the multiplicity model with m values of some test statistic independently drawn from a mixture of no effect (null) and positive effect (alternative), where we seek to identify, the alternative test results with a controlled error rate. We are interested in the case where the alternatives are rare. A number of multiple testing procedures filter the set of ordered p-values in order to eliminate the nulls. Such an approach can only work if the p-values originating from the alternatives form one or several identifiable clusters. The Benjamini and Hochberg (BH) method, for example, assumes that this cluster occurs in a small interval $(0,\Delta)$ and filters out all or most of the ordered p-values $p_{(r)}$ above a linear threshold $s \times r$. In repeated applications this filter controls the false discovery rate via the slope s. We propose a new adaptive filter that deletes the p-values from regions of uniform distribution. In cases where a single cluster remains, the p-values in an interval are declared alternatives, with the mid-point and the length of the interval chosen by controlling the data-dependent FDR at a desired level.
We propose a new Riemannian gradient descent method for computing spherical area-preserving mappings of topological spheres using a Riemannian retraction-based framework with theoretically guaranteed convergence. The objective function is based on the stretch energy functional, and the minimization is constrained on a power manifold of unit spheres embedded in 3-dimensional Euclidean space. Numerical experiments on several mesh models demonstrate the accuracy and stability of the proposed framework. Comparisons with two existing state-of-the-art methods for computing area-preserving mappings demonstrate that our algorithm is both competitive and more efficient. Finally, we present a concrete application to the problem of landmark-aligned surface registration of two brain models.
We establish an invariance principle for polynomial functions of $n$ independent high-dimensional random vectors, and also show that the obtained rates are nearly optimal. Both the dimension of the vectors and the degree of the polynomial are permitted to grow with $n$. Specifically, we obtain a finite sample upper bound for the error of approximation by a polynomial of Gaussians, measured in Kolmogorov distance, and extend it to functions that are approximately polynomial in a mean squared error sense. We give a corresponding lower bound that shows the invariance principle holds up to polynomial degree $o(\log n)$. The proof is constructive and adapts an asymmetrisation argument due to V. V. Senatov. As applications, we obtain a higher-order delta method with possibly non-Gaussian limits, and generalise a number of known results on high-dimensional and infinite-order U-statistics, and on fluctuations of subgraph counts.
The Fisher-Rao distance between two probability distributions of a statistical model is defined as the Riemannian geodesic distance induced by the Fisher information metric. In order to calculate the Fisher-Rao distance in closed-form, we need (1) to elicit a formula for the Fisher-Rao geodesics, and (2) to integrate the Fisher length element along those geodesics. We consider several numerically robust approximation and bounding techniques for the Fisher-Rao distances: First, we report generic upper bounds on Fisher-Rao distances based on closed-form 1D Fisher-Rao distances of submodels. Second, we describe several generic approximation schemes depending on whether the Fisher-Rao geodesics or pregeodesics are available in closed-form or not. In particular, we obtain a generic method to guarantee an arbitrarily small additive error on the approximation provided that Fisher-Rao pregeodesics and tight lower and upper bounds are available. Third, we consider the case of Fisher metrics being Hessian metrics, and report generic tight upper bounds on the Fisher-Rao distances using techniques of information geometry. Uniparametric and biparametric statistical models always have Fisher Hessian metrics, and in general a simple test allows to check whether the Fisher information matrix yields a Hessian metric or not. Fourth, we consider elliptical distribution families and show how to apply the above techniques to these models. We also propose two new distances based either on the Fisher-Rao lengths of curves serving as proxies of Fisher-Rao geodesics, or based on the Birkhoff/Hilbert projective cone distance. Last, we consider an alternative group-theoretic approach for statistical transformation models based on the notion of maximal invariant which yields insights on the structures of the Fisher-Rao distance formula which may be used fruitfully in applications.
In many circumstances given an ordered sequence of one or more types of elements/ symbols, the objective is to determine any existence of randomness in occurrence of one of the elements,say type 1 element. Such a method can be useful in determining existence of any non-random pattern in the wins or loses of a player in a series of games played. Existing methods of tests based on total number of runs or tests based on length of longest run (Mosteller (1941)) can be used for testing the null hypothesis of randomness in the entire sequence, and not a specific type of element. Additionally, the Runs Test tends to show results contradictory to the intuition visualised by the graphs of say, win proportions over time due to method used in computation of runs. This paper develops a test approach to address this problem by computing the gaps between two consecutive type 1 elements and thereafter following the idea of "pattern" in occurrence and "directional" trend (increasing, decreasing or constant), employs the use of exact Binomial test, Kenall's Tau and Siegel-Tukey test for scale problem. Further modifications suggested by Jan Vegelius(1982) have been applied in the Siegel Tukey test to adjust for tied ranks and achieve more accurate results. This approach is distribution-free and suitable for small sizes. Also comparisons with the conventional runs test shows the superiority of the proposed approach under the null hypothesis of randomness in the occurrence of type 1 elements.
Regression models that incorporate smooth functions of predictor variables to explain the relationships with a response variable have gained widespread usage and proved successful in various applications. By incorporating smooth functions of predictor variables, these models can capture complex relationships between the response and predictors while still allowing for interpretation of the results. In situations where the relationships between a response variable and predictors are explored, it is not uncommon to assume that these relationships adhere to certain shape constraints. Examples of such constraints include monotonicity and convexity. The scam package for R has become a popular package to carry out the full fitting of exponential family generalized additive modelling with shape restrictions on smooths. The paper aims to extend the existing framework of shape-constrained generalized additive models (SCAM) to accommodate smooth interactions of covariates, linear functionals of shape-constrained smooths and incorporation of residual autocorrelation. The methods described in this paper are implemented in the recent version of the package scam, available on the Comprehensive R Archive Network (CRAN).
The increasing reliance on numerical methods for controlling dynamical systems and training machine learning models underscores the need to devise algorithms that dependably and efficiently navigate complex optimization landscapes. Classical gradient descent methods offer strong theoretical guarantees for convex problems; however, they demand meticulous hyperparameter tuning for non-convex ones. The emerging paradigm of learning to optimize (L2O) automates the discovery of algorithms with optimized performance leveraging learning models and data - yet, it lacks a theoretical framework to analyze convergence and robustness of the learned algorithms. In this paper, we fill this gap by harnessing nonlinear system theory. Specifically, we propose an unconstrained parametrization of all convergent algorithms for smooth non-convex objective functions. Notably, our framework is directly compatible with automatic differentiation tools, ensuring convergence by design while learning to optimize.
Discovering causal relationships from observational data is a fundamental yet challenging task. Invariant causal prediction (ICP, Peters et al., 2016) is a method for causal feature selection which requires data from heterogeneous settings and exploits that causal models are invariant. ICP has been extended to general additive noise models and to nonparametric settings using conditional independence tests. However, the latter often suffer from low power (or poor type I error control) and additive noise models are not suitable for applications in which the response is not measured on a continuous scale, but reflects categories or counts. Here, we develop transformation-model (TRAM) based ICP, allowing for continuous, categorical, count-type, and uninformatively censored responses (these model classes, generally, do not allow for identifiability when there is no exogenous heterogeneity). As an invariance test, we propose TRAM-GCM based on the expected conditional covariance between environments and score residuals with uniform asymptotic level guarantees. For the special case of linear shift TRAMs, we also consider TRAM-Wald, which tests invariance based on the Wald statistic. We provide an open-source R package 'tramicp' and evaluate our approach on simulated data and in a case study investigating causal features of survival in critically ill patients.
In logistic regression modeling, Firth's modified estimator is widely used to address the issue of data separation, which results in the nonexistence of the maximum likelihood estimate. Firth's modified estimator can be formulated as a penalized maximum likelihood estimator in which Jeffreys' prior is adopted as the penalty term. Despite its widespread use in practice, the formal verification of the corresponding estimate's existence has not been established. In this study, we establish the existence theorem of Firth's modified estimate in binomial logistic regression models, assuming only the full column rankness of the design matrix. We also discuss other binomial regression models obtained through alternating link functions and prove the existence of similar penalized maximum likelihood estimates for such models.
This study introduces a reduced-order model (ROM) for analyzing the transient diffusion-deformation of hydrogels. The full-order model (FOM) describing hydrogel transient behavior consists of a coupled system of partial differential equations in which chemical potential and displacements are coupled. This system is formulated in a monolithic fashion and solved using the Finite Element Method (FEM). The ROM employs proper orthogonal decomposition as a model order reduction approach. We test the ROM performance through benchmark tests on hydrogel swelling behavior and a case study simulating co-axial printing. Finally, we embed the ROM into an optimization problem to identify the model material parameters of the coupled problem using full-field data. We verify that the ROM can predict hydrogels' diffusion-deformation evolution and material properties, significantly reducing computation time compared to the FOM. The results demonstrate the ROM's accuracy and computational efficiency. This work paths the way towards advanced practical applications of ROMs, e.g., in the context of feedback error control in hydrogel 3D printing.
In logistic regression modeling, Firth's modified estimator is widely used to address the issue of data separation, which results in the nonexistence of the maximum likelihood estimate. Firth's modified estimator can be formulated as a penalized maximum likelihood estimator in which Jeffreys' prior is adopted as the penalty term. Despite its widespread use in practice, the formal verification of the corresponding estimate's existence has not been established. In this study, we establish the existence theorem of Firth's modified estimate in binomial logistic regression models, assuming only the full column rankness of the design matrix. We also discuss multinomial logistic regression models. Unlike the binomial regression case, we show through an example that the Jeffreys-prior penalty term does not necessarily diverge to negative infinity as the parameter diverges.