Evaluating the expected information gain (EIG) is a critical task in many areas of computational science and statistics, necessitating the approximation of nested integrals. Available techniques for this problem based on quasi-Monte Carlo (QMC) methods have focused on enhancing the efficiency of either the inner or outer integral approximation. In this work, we introduce a novel approach that extends the scope of these efforts to address inner and outer expectations simultaneously. Leveraging the principles of Owen's scrambling of digital nets, we develop a randomized QMC (rQMC) method that improves the convergence behavior of the approximation of nested integrals. We also indicate how to combine this methodology with importance sampling to address a measure concentration arising in the inner integral. Our method capitalizes on the unique structure of nested expectations to offer a more efficient approximation mechanism. By incorporating Owen's scrambling techniques, we handle integrands exhibiting infinite variation in the Hardy--Krause sense, paving the way for theoretically sound error estimates. As the main contribution of this work, we derive asymptotic error bounds for the bias and variance of our estimator, along with regularity conditions under which these error bounds can be attained. In addition, we provide nearly optimal sample sizes for the rQMC approximations, which are helpful for the actual numerical implementations. Moreover, we verify the quality of our estimator through numerical experiments in the context of EIG estimation. Specifically, we compare the computational efficiency of our rQMC method against standard nested MC integration across two case studies: one in thermo-mechanics and the other in pharmacokinetics. These examples highlight our approach's computational savings and enhanced applicability.
Reachability and other path-based measures on temporal graphs can be used to understand spread of infection, information, and people in modelled systems. Due to delays and errors in reporting, temporal graphs derived from data are unlikely to perfectly reflect reality, especially with respect to the precise times at which edges appear. To reflect this uncertainty, we consider a model in which some number $\zeta$ of edge appearances may have their timestamps perturbed by $\pm\delta$ for some $\delta$. Within this model, we investigate temporal reachability and consider the problem of determining the maximum number of vertices any vertex can reach under these perturbations. We show that this problem is intractable in general but is efficiently solvable when $\zeta$ is sufficiently large. We also give algorithms which solve this problem in several restricted settings. We complement this with some contrasting results concerning the complexity of related temporal eccentricity problems under perturbation.
Bayesian sampling is an important task in statistics and machine learning. Over the past decade, many ensemble-type sampling methods have been proposed. In contrast to the classical Markov chain Monte Carlo methods, these new methods deploy a large number of interactive samples, and the communication between these samples is crucial in speeding up the convergence. To justify the validity of these sampling strategies, the concept of interacting particles naturally calls for the mean-field theory. The theory establishes a correspondence between particle interactions encoded in a set of coupled ODEs/SDEs and a PDE that characterizes the evolution of the underlying distribution. This bridges numerical algorithms with the PDE theory used to show convergence in time. Many mathematical machineries are developed to provide the mean-field analysis, and we showcase two such examples: The coupling method and the compactness argument built upon the martingale strategy. The former has been deployed to show the convergence of ensemble Kalman sampler and ensemble Kalman inversion, and the latter will be shown to be immensely powerful in proving the validity of the Vlasov-Boltzmann simulator.
We study the data-driven selection of causal graphical models using constraint-based algorithms, which determine the existence or non-existence of edges (causal connections) in a graph based on testing a series of conditional independence hypotheses. In settings where the ultimate scientific goal is to use the selected graph to inform estimation of some causal effect of interest (e.g., by selecting a valid and sufficient set of adjustment variables), we argue that a "cautious" approach to graph selection should control the probability of falsely removing edges and prefer dense, rather than sparse, graphs. We propose a simple inversion of the usual conditional independence testing procedure: to remove an edge, test the null hypothesis of conditional association greater than some user-specified threshold, rather than the null of independence. This equivalence testing formulation to testing independence constraints leads to a procedure with desriable statistical properties and behaviors that better match the inferential goals of certain scientific studies, for example observational epidemiological studies that aim to estimate causal effects in the face of causal model uncertainty. We illustrate our approach on a data example from environmental epidemiology.
Model averaging (MA), a technique for combining estimators from a set of candidate models, has attracted increasing attention in machine learning and statistics. In the existing literature, there is an implicit understanding that MA can be viewed as a form of shrinkage estimation that draws the response vector towards the subspaces spanned by the candidate models. This paper explores this perspective by establishing connections between MA and shrinkage in a linear regression setting with multiple nested models. We first demonstrate that the optimal MA estimator is the best linear estimator with monotonically non-increasing weights in a Gaussian sequence model. The Mallows MA (MMA), which estimates weights by minimizing the Mallows' $C_p$ over the unit simplex, can be viewed as a variation of the sum of a set of positive-part Stein estimators. Indeed, the latter estimator differs from the MMA only in that its optimization of Mallows' $C_p$ is within a suitably relaxed weight set. Motivated by these connections, we develop a novel MA procedure based on a blockwise Stein estimation. The resulting Stein-type MA estimator is asymptotically optimal across a broad parameter space when the variance is known. Numerical results support our theoretical findings. The connections established in this paper may open up new avenues for investigating MA from different perspectives. A discussion on some topics for future research concludes the paper.
High-dimensional, higher-order tensor data are gaining prominence in a variety of fields, including but not limited to computer vision and network analysis. Tensor factor models, induced from noisy versions of tensor decompositions or factorizations, are natural potent instruments to study a collection of tensor-variate objects that may be dependent or independent. However, it is still in the early stage of developing statistical inferential theories for the estimation of various low-rank structures, which are customary to play the role of signals of tensor factor models. In this paper, we attempt to ``decode" the estimation of a higher-order tensor factor model by leveraging tensor matricization. Specifically, we recast it into mode-wise traditional high-dimensional vector/fiber factor models, enabling the deployment of conventional principal components analysis (PCA) for estimation. Demonstrated by the Tucker tensor factor model (TuTFaM), which is induced from the noisy version of the widely-used Tucker decomposition, we summarize that estimations on signal components are essentially mode-wise PCA techniques, and the involvement of projection and iteration will enhance the signal-to-noise ratio to various extent. We establish the inferential theory of the proposed estimators, conduct rich simulation experiments, and illustrate how the proposed estimations can work in tensor reconstruction, and clustering for independent video and dependent economic datasets, respectively.
With the rapid growth of machine learning (ML) workloads in datacenters, existing congestion control (CC) algorithms fail to deliver the required performance at scale. ML traffic is bursty and bulk-synchronous and thus requires quick reaction and strong fairness. We show that existing CC algorithms that use delay as a main signal react too slowly and are not always fair. We design SMaRTT, a simple sender-based CC algorithm that combines delay, ECN, and optional packet trimming for fast and precise window adjustments. At the core of SMaRTT lies the novel QuickAdapt algorithm that accurately estimates the bandwidth at the receiver. We show how to combine SMaRTT with a new per-packet traffic load-balancing algorithm called REPS to effectively reroute packets around congested hotspots as well as flaky or failing links. Our evaluation shows that SMaRTT alone outperforms EQDS, Swift, BBR, and MPRDMA by up to 50% on modern datacenter networks.
We introduce a nonconforming hybrid finite element method for the two-dimensional vector Laplacian, based on a primal variational principle for which conforming methods are known to be inconsistent. Consistency is ensured using penalty terms similar to those used to stabilize hybridizable discontinuous Galerkin (HDG) methods, with a carefully chosen penalty parameter due to Brenner, Li, and Sung [Math. Comp., 76 (2007), pp. 573-595]. Our method accommodates elements of arbitrarily high order and, like HDG methods, it may be implemented efficiently using static condensation. The lowest-order case recovers the $P_1$-nonconforming method of Brenner, Cui, Li, and Sung [Numer. Math., 109 (2008), pp. 509-533], and we show that higher-order convergence is achieved under appropriate regularity assumptions. The analysis makes novel use of a family of weighted Sobolev spaces, due to Kondrat'ev, for domains admitting corner singularities.
The e-BH procedure is an e-value-based multiple testing procedure that provably controls the false discovery rate (FDR) under any dependence structure between the e-values. Despite this appealing theoretical FDR control guarantee, the e-BH procedure often suffers from low power in practice. In this paper, we propose a general framework that boosts the power of e-BH without sacrificing its FDR control under arbitrary dependence. This is achieved by the technique of conditional calibration, where we take as input the e-values and calibrate them to be a set of "boosted e-values" that are guaranteed to be no less -- and are often more -- powerful than the original ones. Our general framework is explicitly instantiated in three classes of multiple testing problems: (1) testing under parametric models, (2) conditional independence testing under the model-X setting, and (3) model-free conformalized selection. Extensive numerical experiments show that our proposed method significantly improves the power of e-BH while continuing to control the FDR. We also demonstrate the effectiveness of our method through an application to an observational study dataset for identifying individuals whose counterfactuals satisfy certain properties.
With the advent of massive data sets much of the computational science and engineering community has moved toward data-intensive approaches in regression and classification. However, these present significant challenges due to increasing size, complexity and dimensionality of the problems. In particular, covariance matrices in many cases are numerically unstable and linear algebra shows that often such matrices cannot be inverted accurately on a finite precision computer. A common ad hoc approach to stabilizing a matrix is application of a so-called nugget. However, this can change the model and introduce error to the original solution. It is well known from numerical analysis that ill-conditioned matrices cannot be accurately inverted. In this paper we develop a multilevel computational method that scales well with the number of observations and dimensions. A multilevel basis is constructed adapted to a kD-tree partitioning of the observations. Numerically unstable covariance matrices with large condition numbers can be transformed into well conditioned multilevel ones without compromising accuracy. Moreover, it is shown that the multilevel prediction exactly solves the Best Linear Unbiased Predictor (BLUP) and Generalized Least Squares (GLS) model, but is numerically stable. The multilevel method is tested on numerically unstable problems of up to 25 dimensions. Numerical results show speedups of up to 42,050 times for solving the BLUP problem, but with the same accuracy as the traditional iterative approach. For very ill-conditioned cases the speedup is infinite. In addition, decay estimates of the multilevel covariance matrices are derived based on high dimensional interpolation techniques from the field of numerical analysis. This work lies at the intersection of statistics, uncertainty quantification, high performance computing and computational applied mathematics.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.