This paper proposes a new generalized linear model with fractional binomial distribution. Zero-inflated Poisson/negative binomial distributions are used for count data that has many zeros. To analyze the association of such a count variable with covariates, zero-inflated Poisson/negative binomial regression models are widely used. In this work, we develop a regression model with the fractional binomial distribution that can serve as an additional tool for modeling the count response variable with covariates. Data analysis results show that on some occasions, our model outperforms the existing zero-inflated regression models.
Generative diffusion models apply the concept of Langevin dynamics in physics to machine leaning, attracting a lot of interests from engineering, statistics and physics, but a complete picture about inherent mechanisms is still lacking. In this paper, we provide a transparent physics analysis of diffusion models, formulating the fluctuation theorem, entropy production, equilibrium measure, and Franz-Parisi potential to understand the dynamic process and intrinsic phase transitions. Our analysis is rooted in a path integral representation of both forward and backward dynamics, and in treating the reverse diffusion generative process as a statistical inference, where the time-dependent state variables serve as quenched disorder akin to that in spin glass theory. Our study thus links stochastic thermodynamics, statistical inference and geometry based analysis together to yield a coherent picture about how the generative diffusion models work.
Many partial differential equations (PDEs) such as Navier--Stokes equations in fluid mechanics, inelastic deformation in solids, and transient parabolic and hyperbolic equations do not have an exact, primal variational structure. Recently, a variational principle based on the dual (Lagrange multiplier) field was proposed. The essential idea in this approach is to treat the given PDE as constraints, and to invoke an arbitrarily chosen auxiliary potential with strong convexity properties to be optimized. This leads to requiring a convex dual functional to be minimized subject to Dirichlet boundary conditions on dual variables, with the guarantee that even PDEs that do not possess a variational structure in primal form can be solved via a variational principle. The vanishing of the first variation of the dual functional is, up to Dirichlet boundary conditions on dual fields, the weak form of the primal PDE problem with the dual-to-primal change of variables incorporated. We derive the dual weak form for the linear, one-dimensional, transient convection-diffusion equation. A Galerkin discretization is used to obtain the discrete equations, with the trial and test functions chosen as linear combination of either RePU activation functions (shallow neural network) or B-spline basis functions; the corresponding stiffness matrix is symmetric. For transient problems, a space-time Galerkin implementation is used with tensor-product B-splines as approximating functions. Numerical results are presented for the steady-state and transient convection-diffusion equation, and transient heat conduction. The proposed method delivers sound accuracy for ODEs and PDEs and rates of convergence are established in the $L^2$ norm and $H^1$ seminorm for the steady-state convection-diffusion problem.
We consider the problem of quantifying how an input perturbation impacts the outputs of large language models (LLMs), a fundamental task for model reliability and post-hoc interpretability. A key obstacle in this domain is disentangling the meaningful changes in model responses from the intrinsic stochasticity of LLM outputs. To overcome this, we introduce Distribution-Based Perturbation Analysis (DBPA), a framework that reformulates LLM perturbation analysis as a frequentist hypothesis testing problem. DBPA constructs empirical null and alternative output distributions within a low-dimensional semantic similarity space via Monte Carlo sampling. Comparisons of Monte Carlo estimates in the reduced dimensionality space enables tractable frequentist inference without relying on restrictive distributional assumptions. The framework is model-agnostic, supports the evaluation of arbitrary input perturbations on any black-box LLM, yields interpretable p-values, supports multiple perturbation testing via controlled error rates, and provides scalar effect sizes for any chosen similarity or distance metric. We demonstrate the effectiveness of DBPA in evaluating perturbation impacts, showing its versatility for perturbation analysis.
A time-varying bivariate copula joint model, which models the repeatedly measured longitudinal outcome at each time point and the survival data jointly by both the random effects and time-varying bivariate copulas, is proposed in this paper. A regular joint model normally supposes there exist subject-specific latent random effects or classes shared by the longitudinal and time-to-event processes and the two processes are conditionally independent given these latent variables. Under this assumption, the joint likelihood of the two processes is straightforward to derive and their association, as well as heterogeneity among the population, are naturally introduced by the unobservable latent variables. However, because of the unobservable nature of these latent variables, the conditional independence assumption is difficult to verify. Therefore, besides the random effects, a time-varying bivariate copula is introduced to account for the extra time-dependent association between the two processes. The proposed model includes a regular joint model as a special case under some copulas. Simulation studies indicates the parameter estimators in the proposed model are robust against copula misspecification and it has superior performance in predicting survival probabilities compared to the regular joint model. A real data application on the Primary biliary cirrhosis (PBC) data is performed.
While generalized linear mixed models are a fundamental tool in applied statistics, many specifications, such as those involving categorical factors with many levels or interaction terms, can be computationally challenging to estimate due to the need to compute or approximate high-dimensional integrals. Variational inference is a popular way to perform such computations, especially in the Bayesian context. However, naive use of such methods can provide unreliable uncertainty quantification. We show that this is indeed the case for mixed models, proving that standard mean-field variational inference dramatically underestimates posterior uncertainty in high-dimensions. We then show how appropriately relaxing the mean-field assumption leads to methods whose uncertainty quantification does not deteriorate in high-dimensions, and whose total computational cost scales linearly with the number of parameters and observations. Our theoretical and numerical results focus on mixed models with Gaussian or binomial likelihoods, and rely on connections to random graph theory to obtain sharp high-dimensional asymptotic analysis. We also provide generic results, which are of independent interest, relating the accuracy of variational inference to the convergence rate of the corresponding coordinate ascent algorithm that is used to find it. Our proposed methodology is implemented in the R package, see //github.com/mgoplerud/vglmer . Numerical results with simulated and real data examples illustrate the favourable computation cost versus accuracy trade-off of our approach compared to various alternatives.
The domain decomposition (DD) nonlinear-manifold reduced-order model (NM-ROM) represents a computationally efficient method for integrating underlying physics principles into a neural network-based, data-driven approach. Compared to linear subspace methods, NM-ROMs offer superior expressivity and enhanced reconstruction capabilities, while DD enables cost-effective, parallel training of autoencoders by partitioning the domain into algebraic subdomains. In this work, we investigate the scalability of this approach by implementing a "bottom-up" strategy: training NM-ROMs on smaller domains and subsequently deploying them on larger, composable ones. The application of this method to the two-dimensional time-dependent Burgers' equation shows that extrapolating from smaller to larger domains is both stable and effective. This approach achieves an accuracy of 1% in relative error and provides a remarkable speedup of nearly 700 times.
Under a multinormal distribution with an arbitrary unknown covariance matrix, the main purpose of this paper is to propose a framework to achieve the goal of reconciliation of Bayesian, frequentist, and Fisher's reporting $p$-values, Neyman-Pearson's optimal theory and Wald's decision theory for the problems of testing mean against restricted alternatives (closed convex cones). To proceed, the tests constructed via the likelihood ratio (LR) and the union-intersection (UI) principles are studied. For the problems of testing against restricted alternatives, first, we show that the LRT and the UIT are not the proper Bayes tests, however, they are shown to be the integrated LRT and the integrated UIT, respectively. For the problem of testing against the positive orthant space alternative, both the null distributions of the LRT and the UIT depend on the unknown nuisance covariance matrix. Hence we have difficulty adopting Fisher's approach to reporting $p$-values. On the other hand, according to the definition of the level of significance, both the LRT and the UIT are shown to be power-dominated by the corresponding LRT and UIT for testing against the half-space alternative, respectively. Hence, both the LRT and the UIT are $\alpha$-inadmissible, these results are against the common statistical sense. Neither Fisher's approach of reporting $p$-values alone nor Neyman-Pearson's optimal theory for power function alone is a satisfactory criterion for evaluating the performance of tests. Wald's decision theory via $d$-admissibility may shed light on resolving these challenging issues of imposing the balance between type 1 error and power.
Several hypothesis testing methods have been proposed to validate the assumption of isotropy in spatial point patterns. A majority of these methods are characterised by an unknown distribution of the test statistic under the null hypothesis of isotropy. Parametric approaches to approximating the distribution involve simulation of patterns from a user-specified isotropic model. Alternatively, nonparametric replicates of the test statistic under isotropy can be used to waive the need for specifying a model. In this paper, we first develop a general framework which allows for the integration of a selected nonparametric replication method into isotropy testing. We then conduct a large simulation study comprising application-like scenarios to assess the performance of tests with different parametric and nonparametric replication methods. In particular, we explore distortions in test size and power caused by model misspecification, and demonstrate the advantages of nonparametric replication in such scenarios.
This paper investigates logical consequence defined in terms of probability distributions, for a classical propositional language using a standard notion of probability. We examine three distinct probabilistic consequence notions, which we call material consequence, preservation consequence, and symmetric consequence. While material consequence is fully classical for any threshold, preservation consequence and symmetric consequence are subclassical, with only symmetric consequence gradually approaching classical logic at the limit threshold equal to 1. Our results extend earlier results obtained by J. Paris in a SET-FMLA setting to the SET-SET setting, and consider open thresholds beside closed ones. In the SET-SET setting, in particular, they reveal that probability 1 preservation does not yield classical logic, but supervaluationism, and conversely positive probability preservation yields subvaluationism.
Let $N$ components be partitioned into two communities, denoted ${\cal P}_+$ and ${\cal P}_-$, possibly of different sizes. Assume that they are connected via a directed and weighted Erd\"os-R\'enyi random graph (DWER) with unknown parameter $ p \in (0, 1).$ The weights assigned to the existing connections are of mean-field type, scaling as $N^{-1}$. At each time unit, we observe the state of each component: either it sends some signal to its successors (in the directed graph) or remains silent otherwise. In this paper, we show that it is possible to find the communities ${\cal P}_+$ and ${\cal P}_-$ based only on the activity of the $N$ components observed over $T$ time units. More specifically, we propose a simple algorithm for which the probability of {\it exact recovery} converges to $1$ as long as $(N/T^{1/2})\log(NT) \to 0$, as $T$ and $N$ diverge. Interestingly, this simple algorithm does not require any prior knowledge on the other model parameters (e.g. the edge probability $p$). The key step in our analysis is to derive an asymptotic approximation of the one unit time-lagged covariance matrix associated to the states of the $N$ components, as $N$ diverges. This asymptotic approximation relies on the study of the behavior of the solutions of a matrix equation of Stein type satisfied by the simultaneous (0-lagged) covariance matrix associated to the states of the components. This study is challenging, specially because the simultaneous covariance matrix is random since it depends on the underlying DWER random graph.