In a variety of application areas, there is interest in assessing evidence of differences in the intensity of event realizations between groups. For example, in cancer genomic studies collecting data on rare variants, the focus is on assessing whether and how the variant profile changes with the disease subtype. Motivated by this application, we develop multiresolution nonparametric Bayes tests for differential mutation rates across groups. The multiresolution approach yields fast and accurate detection of spatial clusters of rare variants, and our nonparametric Bayes framework provides great flexibility for modeling the intensities of rare variants. Some theoretical properties are also assessed, including weak consistency of our Dirichlet Process-Poisson-Gamma mixture over multiple resolutions. Simulation studies illustrate excellent small sample properties relative to competitors, and we apply the method to detect rare variants related to common variable immunodeficiency from whole exome sequencing data on 215 patients and over 60,027 control subjects.
Model averaging (MA), a technique for combining estimators from a set of candidate models, has attracted increasing attention in machine learning and statistics. In the existing literature, there is an implicit understanding that MA can be viewed as a form of shrinkage estimation that draws the response vector towards the subspaces spanned by the candidate models. This paper explores this perspective by establishing connections between MA and shrinkage in a linear regression setting with multiple nested models. We first demonstrate that the optimal MA estimator is the best linear estimator with monotone non-increasing weights in a Gaussian sequence model. The Mallows MA, which estimates weights by minimizing the Mallows' $C_p$, is a variation of the positive-part Stein estimator. Motivated by these connections, we develop a novel MA procedure based on a blockwise Stein estimation. Our resulting Stein-type MA estimator is asymptotically optimal across a broad parameter space when the variance is known. Numerical results support our theoretical findings. The connections established in this paper may open up new avenues for investigating MA from different perspectives. A discussion on some topics for future research concludes the paper.
Recent advances in deep learning methods such as LLMs and Diffusion models have created a need for improved quantization methods that can meet the computational demands of these modern architectures while maintaining accuracy. Towards this goal, we study the advantages of FP8 data formats for post-training quantization across 75 unique network architectures covering a wide range of tasks, including machine translation, language modeling, text generation, image classification, generation, and segmentation. We examine three different FP8 representations (E5M2, E4M3, and E3M4) to study the effects of varying degrees of trade-off between dynamic range and precision on model accuracy. Based on our extensive study, we developed a quantization workflow that generalizes across different network architectures. Our empirical results show that FP8 formats outperform INT8 in multiple aspects, including workload coverage (92.64% vs. 65.87%), model accuracy and suitability for a broader range of operations. Furthermore, our findings suggest that E4M3 is better suited for NLP models, whereas E3M4 performs marginally better than E4M3 on computer vision tasks. The code is publicly available on Intel Neural Compressor: //github.com/intel/neural-compressor.
We introduce a general abstract framework for database repairing that differentiates between integrity constraints and the so-called query constraints. The former are used to model consistency and desirable properties of the data (such as functional dependencies and independencies), while the latter relates two database instances according to their answers for the query constraints. The framework also admits a distinction between hard and soft queries, allowing to preserve the answers of a core set of queries as well as defining a distance between instances based on query answers. Finally, we present an instantiation of this framework by defining logic-based metrics in K-teams (a notion recently defined for logical modelling of relational data with semiring annotations). We exemplify how various notions of repairs from the literature can be modelled in our unifying framework.
Multiscale coupling methods are significant methodologies for the modeling and simulation of materials with defects, intending to achieve the (quasi-)optimal balance of accuracy and efficiency. The a posteriori analysis and corresponding adaptive algorithms play a crucial role in the efficient implementation of multiscale coupling methods. This paper proposes a unified framework for residual-based a posteriori error estimates that can be applied to general consistent multiscale coupling methods. In particular, we prove that the error estimator based on the residual force can provide the upper bound of the true approximation error. As prototypical examples, we present a variety of adaptive computations based on this reliable error estimator for the blended atomistic-to-continuum (a/c) coupling methods, including the energy-based blended quasi-continuum (BQCE), the force-based blended quasi-continuum (BQCF) and the recently developed blended ghost force correction (BGFC) methods. We develop a coarse-grained technique for the efficient evaluation of the error estimator. A robust adaptive algorithm is therefore proposed and validated with different types of crystalline defects, some of which are not considered in previous related literature on the adaptive a/c coupling methods. The results demonstrate that the adaptive algorithm leads to the same optimal convergence rate of the error as the a priori error estimate, but with considerable computational efficiency. This study provides valuable insights into the design and implementation of adaptive multiscale methods, and represents a significant contribution to the literature on a/c coupling methods.
Tensor parameters that are amortized or regularized over large tensor powers, often called "asymptotic" tensor parameters, play a central role in several areas including algebraic complexity theory (constructing fast matrix multiplication algorithms), quantum information (entanglement cost and distillable entanglement), and additive combinatorics (bounds on cap sets, sunflower-free sets, etc.). Examples are the asymptotic tensor rank, asymptotic slice rank and asymptotic subrank. Recent works (Costa-Dalai, Blatter-Draisma-Rupniewski, Christandl-Gesmundo-Zuiddam) have investigated notions of discreteness (no accumulation points) or "gaps" in the values of such tensor parameters. We prove a general discreteness theorem for asymptotic tensor parameters of order-three tensors and use this to prove that (1) over any finite field (and in fact any finite set of coefficients in any field), the asymptotic subrank and the asymptotic slice rank have no accumulation points, and (2) over the complex numbers, the asymptotic slice rank has no accumulation points. Central to our approach are two new general lower bounds on the asymptotic subrank of tensors, which measures how much a tensor can be diagonalized. The first lower bound says that the asymptotic subrank of any concise three-tensor is at least the cube-root of the smallest dimension. The second lower bound says that any concise three-tensor that is "narrow enough" (has one dimension much smaller than the other two) has maximal asymptotic subrank. Our proofs rely on new lower bounds on the maximum rank in matrix subspaces that are obtained by slicing a three-tensor in the three different directions. We prove that for any concise tensor, the product of any two such maximum ranks must be large, and as a consequence there are always two distinct directions with large max-rank.
Probability measures on the sphere form an important class of statistical models and are used, for example, in modeling directional data or shapes. Due to their widespread use, but also as an algorithmic building block, efficient sampling of distributions on the sphere is highly desirable. We propose a shrinkage based and an idealized geodesic slice sampling Markov chain, designed to generate approximate samples from distributions on the sphere. In particular, the shrinkage based algorithm works in any dimension, is straight-forward to implement and has no tuning parameters. We verify reversibility and show that under weak regularity conditions geodesic slice sampling is uniformly ergodic. Numerical experiments show that the proposed slice samplers achieve excellent mixing on challenging targets including the Bingham distribution and mixtures of von Mises-Fisher distributions. In these settings our approach outperforms standard samplers such as random-walk Metropolis Hastings and Hamiltonian Monte Carlo.
For an infinite class of finite graphs of unbounded size, we define a limit object, to be called wide limit, relative to some computationally restricted class of functions. The properties of the wide limit then reflect how a computationally restricted viewer "sees" a generic instance from the class. The construction uses arithmetic forcing with random variables [10]. We prove sufficient conditions for universal and existential sentences to be valid in the limit, provide several examples, and prove that such a limit object can then be expanded to a model of weak arithmetic. We then take the wide limit of all finite pointed paths to obtain a model of arithmetic where the problem OntoWeakPigeon is total but Leaf (the complete problem for $\textbf{PPA}$) is not. This logical separation of the oracle classes of total NP search problems in our setting implies that Leaf is not reducible to OntoWeakPigeon even if some errors are allowed in the reductions.
Monocular depth estimation is a crucial task to measure distance relative to a camera, which is important for applications, such as robot navigation and self-driving. Traditional frame-based methods suffer from performance drops due to the limited dynamic range and motion blur. Therefore, recent works leverage novel event cameras to complement or guide the frame modality via frame-event feature fusion. However, event streams exhibit spatial sparsity, leaving some areas unperceived, especially in regions with marginal light changes. Therefore, direct fusion methods, e.g., RAMNet, often ignore the contribution of the most confident regions of each modality. This leads to structural ambiguity in the modality fusion process, thus degrading the depth estimation performance. In this paper, we propose a novel Spatial Reliability-oriented Fusion Network (SRFNet), that can estimate depth with fine-grained structure at both daytime and nighttime. Our method consists of two key technical components. Firstly, we propose an attention-based interactive fusion (AIF) module that applies spatial priors of events and frames as the initial masks and learns the consensus regions to guide the inter-modal feature fusion. The fused feature are then fed back to enhance the frame and event feature learning. Meanwhile, it utilizes an output head to generate a fused mask, which is iteratively updated for learning consensual spatial priors. Secondly, we propose the Reliability-oriented Depth Refinement (RDR) module to estimate dense depth with the fine-grained structure based on the fused features and masks. We evaluate the effectiveness of our method on the synthetic and real-world datasets, which shows that, even without pretraining, our method outperforms the prior methods, e.g., RAMNet, especially in night scenes. Our project homepage: //vlislab22.github.io/SRFNet.
Discovering causal relationships from observational data is a fundamental yet challenging task. In some applications, it may suffice to learn the causal features of a given response variable, instead of learning the entire underlying causal structure. Invariant causal prediction (ICP, Peters et al., 2016) is a method for causal feature selection which requires data from heterogeneous settings. ICP assumes that the mechanism for generating the response from its direct causes is the same in all settings and exploits this invariance to output a subset of the causal features. The framework of ICP has been extended to general additive noise models and to nonparametric settings using conditional independence testing. However, nonparametric conditional independence testing often suffers from low power (or poor type I error control) and the aforementioned parametric models are not suitable for applications in which the response is not measured on a continuous scale, but rather reflects categories or counts. To bridge this gap, we develop ICP in the context of transformation models (TRAMs), allowing for continuous, categorical, count-type, and uninformatively censored responses (we show that, in general, these model classes do not allow for identifiability when there is no exogenous heterogeneity). We propose TRAM-GCM, a test for invariance of a subset of covariates, based on the expected conditional covariance between environments and score residuals which satisfies uniform asymptotic level guarantees. For the special case of linear shift TRAMs, we propose an additional invariance test, TRAM-Wald, based on the Wald statistic. We implement both proposed methods in the open-source R package "tramicp" and show in simulations that under the correct model specification, our approach empirically yields higher power than nonparametric ICP based on conditional independence testing.
Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and 0.3 dB respectively and matches the performance of recent SOTA models with up to 8 times fewer parameters.