Rejective sampling improves design and estimation efficiency of single-phase sampling when auxiliary information in a finite population is available. When such auxiliary information is unavailable, we propose to use two-phase rejective sampling (TPRS), which involves measuring auxiliary variables for the sample of units in the first phase, followed by the implementation of rejective sampling for the outcome in the second phase. We explore the asymptotic design properties of double expansion and regression estimators under TPRS. We show that TPRS enhances the efficiency of the double expansion estimator, rendering it comparable to a regression estimator. We further refine the design to accommodate varying importance of covariates and extend it to multi-phase sampling.
Gaussian elimination (GE) is the most used dense linear solver. Error analysis of GE with selected pivoting strategies on well-conditioned systems can focus on studying the behavior of growth factors. Although exponential growth is possible with GE with partial pivoting (GEPP), growth tends to stay much smaller in practice. Support for this behavior was provided recently by Huang and Tikhomirov's average-case analysis of GEPP, which showed GEPP growth factors for Gaussian matrices stay at most polynomial with very high probability. GE with complete pivoting (GECP) has also seen a lot of recent interest, with improvements to both lower and upper bounds on worst-case GECP growth provided by Bisain, Edelman and Urschel in 2023. We are interested in studying how GEPP and GECP behave on the same linear systems as well as studying large growth on particular subclasses of matrices, including orthogonal matrices. Moreover, as a means to better address the question of why large growth is rarely encountered, we further study matrices with a large difference in growth between using GEPP and GECP, and we explore how the smaller growth strategy dominates behavior in a small neighborhood of the initial matrix.
Numerical simulations of real-world phenomenon are implemented with at least two parts: the computational scheme and the computational domain. In the context of hemodynamics, the computational domain of a simulation represents the blood vessel network through which blood flows. Such blood vessel networks can contain millions of individual vessels that are joined together to form a in series and parallel to form the network. It is computationally unfeasible to explicitly simulate blood flow in all blood vessels. Here, from imaged data of a single porcine left coronary arterial tree, we develop a data-pipeline to obtain computational domains for hemodynmaic simulations from a graph representing the coronary vascular tree. Further, we develop a method to ascertain which subregions of the left ventricle are most likely to be perfused via a given artery using a comparison with the American Heart Association division of the left ventricle as a sense check.
The problem of binary hypothesis testing between two probability measures is considered. New sharp bounds are derived for the best achievable error probability of such tests based on independent and identically distributed observations. Specifically, the asymmetric version of the problem is examined, where different requirements are placed on the two error probabilities. Accurate nonasymptotic expansions with explicit constants are obtained for the error probability, using tools from large deviations and Gaussian approximation. Examples are shown indicating that, in the asymmetric regime, the approximations suggested by the new bounds are significantly more accurate than the approximations provided by either of the two main earlier approaches -- normal approximation and error exponents.
Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image to represent the imaging scene and facilitate downstream visual tasks comprehensively. In recent years, significant progress has been made in MMIF tasks due to advances in deep neural networks. However, existing methods cannot effectively and efficiently extract modality-specific and modality-fused features constrained by the inherent local reductive bias (CNN) or quadratic computational complexity (Transformers). To overcome this issue, we propose a Mamba-based Dual-phase Fusion (MambaDFuse) model. Firstly, a dual-level feature extractor is designed to capture long-range features from single-modality images by extracting low and high-level features from CNN and Mamba blocks. Then, a dual-phase feature fusion module is proposed to obtain fusion features that combine complementary information from different modalities. It uses the channel exchange method for shallow fusion and the enhanced Multi-modal Mamba (M3) blocks for deep fusion. Finally, the fused image reconstruction module utilizes the inverse transformation of the feature extraction to generate the fused result. Through extensive experiments, our approach achieves promising fusion results in infrared-visible image fusion and medical image fusion. Additionally, in a unified benchmark, MambaDFuse has also demonstrated improved performance in downstream tasks such as object detection. Code with checkpoints will be available after the peer-review process.
Analyses of voting algorithms often overlook informational externalities shaping individual votes. For example, pre-polling information often skews voters towards candidates who may not be their top choice, but who they believe would be a worthwhile recipient of their vote. In this work, we aim to understand the role of external information in voting outcomes. We study this by analyzing (1) the probability that voting outcomes align with external information, and (2) the effect of external information on the total utility across voters, or social welfare. In practice, voting mechanisms elicit coarse information about voter utilities, such as ordinal preferences, which initially prevents us from directly analyzing the effect of informational externalities with standard voting mechanisms. To overcome this, we present an intermediary mechanism for learning how preferences change with external information which does not require eliciting full cardinal preferences. With this tool in hand, we find that voting mechanisms are generally more likely to select the alternative most favored by the external information, and when external information reflects the population's true preferences, social welfare increases in expectation.
In this paper, we analyze the preservation of asymptotic properties of partially dissipative hyperbolic systems when switching to a discrete setting. We prove that one of the simplest consistent and unconditionally stable numerical methods - the central finite-differences scheme - preserves both the asymptotic behaviour and the parabolic relaxation limit of one-dimensional partially dissipative hyperbolic systems which satisfy the Kalman rank condition. The large time asymptotic-preserving property is achieved by conceiving time-weighted perturbed energy functionals in the spirit of the hypocoercivity theory. For the relaxation-preserving property, drawing inspiration from the observation that solutions in the continuous case exhibit distinct behaviours in low and high frequencies, we introduce a novel discrete Littlewood-Paley theory tailored to the central finite-difference scheme. This allows us to prove Bernstein-type estimates for discrete differential operators and leads to a new relaxation result: the strong convergence of the discrete linearized compressible Euler equations with damping towards the discrete heat equation, uniformly with respect to the mesh parameter.
The high volatility of renewable energies calls for more energy efficiency. Thus, different physical systems need to be coupled efficiently although they run on various time scales. Here, the port-Hamiltonian (pH) modeling framework comes into play as it has several advantages, e.g., physical properties are encoded in the system structure and systems running on different time scales can be coupled easily. Additionally, pH systems coupled by energy-preserving conditions are still pH. Furthermore, in the energy transition hydrogen becomes an important player and unlike in natural gas, its temperature-dependence is of importance. Thus, we introduce an infinite dimensional pH formulation of the compressible non-isothermal Euler equations to model flow with temperature-dependence. We set up the underlying Stokes-Dirac structure and deduce the boundary port variables. We introduce coupling conditions into our pH formulation, such that the whole network system is pH itself. This is achieved by using energy-preserving coupling conditions, i.e., mass conservation and equality of total enthalpy, at the coupling nodes. Furthermore, to close the system a third coupling condition is needed. Here, equality of the outgoing entropy at coupling nodes is used and included into our systems in a structure-preserving way. Following that, we adapt the structure-preserving aproximation methods from the isothermal to the non-isothermal case. Academic numerical examples will support our analytical findings.
The macroscopic behaviors of materials are determined by interactions that occur at multiple lengths and time scales. Depending on the application, describing, predicting, and understanding these behaviors require models that rely on insights from electronic and atomic scales. In such cases, classical simplified approximations at those scales are insufficient, and quantum-based modeling is required. In this paper, we study how quantum effects can modify the mechanical properties of systems relevant to materials engineering. We base our study on a high-fidelity modeling framework that combines two computationally efficient models rooted in quantum first principles: Density Functional Tight Binding (DFTB) and many-body dispersion (MBD). The MBD model is applied to accurately describe non-covalent van der Waals interactions. Through various benchmark applications, we demonstrate the capabilities of this framework and the limitations of simplified modeling. We provide an open-source repository containing all codes, datasets, and examples presented in this work. This repository serves as a practical toolkit that we hope will support the development of future research in effective large-scale and multiscale modeling with quantum-mechanical fidelity.
When analyzing large datasets, it is common to select a model prior to making inferences. For reliable inferences, it is important to make adjustments that account for the model selection process, resulting in selective inferences. Our paper introduces an asymptotic pivot to infer about the effects of selected variables on conditional quantile functions. Utilizing estimators from smoothed quantile regression, our proposed pivot is easy to compute and ensures asymptotically-exact selective inferences without making strict distributional assumptions about the response variable. At the core of the pivot is the use of external randomization, which enables us to utilize the full sample for both selection and inference without the need to partition the data into independent data subsets or discard data at either step. On simulated data, we find that: (i) the asymptotic confidence intervals based on our pivot achieve the desired coverage rates, even in cases where sample splitting fails due to insufficient sample size for inference; (ii) our intervals are consistently shorter than those produced by sample splitting across various models and signal settings. We report similar findings when we apply our approach to study risk factors for low birth weights in a publicly accessible dataset of US birth records from 2022.
The problem of constructing optimal factoring automata arises in the context of unification factoring for the efficient execution of logic programs. Given an ordered set of $n$ strings of length $m$, the problem is to construct a trie-like tree structure of minimum size in which the leaves in left-to-right order represent the input strings in the given order. Contrary to standard tries, the order in which the characters of a string are encountered can be different on different root-to-leaf paths. Dawson et al. [ACM Trans. Program. Lang. Syst. 18(5):528--563, 1996] gave an algorithm that solves the problem in time $O(n^2 m (n+m))$. In this paper, we present an improved algorithm with running-time $O(n^2m)$.