In the geosciences, a recurring problem is one of estimating spatial means of a physical field using weighted averages of point observations. An important variant is when individual observations are counted with some probability less than one. This can occur in different contexts: from missing data to estimating the statistics across subsamples. In such situations, the spatial mean is a ratio of random variables, whose statistics involve approximate estimators derived through series expansion. The present paper considers truncated estimators of variance of the spatial mean and their general structure in the presence of missing data. To all orders, the variance estimator depends only on the first and second moments of the underlying field, and convergence requires these moments to be finite. Furthermore, convergence occurs if either the probability of counting individual observations is larger than 1/2 or the number of point observations is large. In case the point observations are weighted uniformly, the estimators are easily found using combinatorics and involve Stirling numbers of the second kind.
Distributed stochastic optimization has drawn great attention recently due to its effectiveness in solving large-scale machine learning problems. Though numerous algorithms have been proposed and successfully applied to general practical problems, their theoretical guarantees mainly rely on certain boundedness conditions on the stochastic gradients, varying from uniform boundedness to the relaxed growth condition. In addition, how to characterize the data heterogeneity among the agents and its impacts on the algorithmic performance remains challenging. In light of such motivations, we revisit the classical Federated Averaging (FedAvg) algorithm for solving the distributed stochastic optimization problem and establish the convergence results under only a mild variance condition on the stochastic gradients for smooth nonconvex objective functions. Almost sure convergence to a stationary point is also established under the condition. Moreover, we discuss a more informative measurement for data heterogeneity as well as its implications.
This paper introduces a matrix quantile factor model for matrix-valued data with a low-rank structure. We estimate the row and column factor spaces via minimizing the empirical check loss function over all panels. We show the estimates converge at rate $1/\min\{\sqrt{p_1p_2}, \sqrt{p_2T},$ $\sqrt{p_1T}\}$ in average Frobenius norm, where $p_1$, $p_2$ and $T$ are the row dimensionality, column dimensionality and length of the matrix sequence. This rate is faster than that of the quantile estimates via ``flattening" the matrix model into a large vector model. Smoothed estimates are given and their central limit theorems are derived under some mild condition. We provide three consistent criteria to determine the pair of row and column factor numbers. Extensive simulation studies and an empirical study justify our theory.
Modelling in biology must adapt to increasingly complex and massive data. The efficiency of the inference algorithms used to estimate model parameters is therefore questioned. Many of these are based on stochastic optimization processes which waste a significant part of the computation time due to their rejection sampling approaches. We introduce the Fixed Landscape Inference MethOd (\textit{flimo}), a new likelihood-free inference method for continuous state-space stochastic models. It applies deterministic gradient-based optimization algorithms to obtain a point estimate of the parameters, minimizing the difference between the data and some simulations according to some prescribed summary statistics. In this sense, it is analogous to Approximate Bayesian Computation (ABC). Like ABC, it can also provide an approximation of the distribution of the parameters. Three applications are proposed: a usual theoretical example, namely the inference of the parameters of g-and-k distributions; a population genetics problem, not so simple as it seems, namely the inference of a selective value from time series in a Wright-Fisher model; and simulations from a Ricker model, representing chaotic population dynamics. In the two first applications, the results show a drastic reduction of the computational time needed for the inference phase compared to the other methods, despite an equivalent accuracy. Even when likelihood-based methods are applicable, the simplicity and efficiency of \textit{flimo} make it a compelling alternative. Implementations in Julia and in R are available on \url{//metabarcoding.org/flimo}. To run \textit{flimo}, the user must simply be able to simulate data according to the chosen model.
The Horvitz-Thompson (HT), the Rao-Hartley-Cochran (RHC) and the generalized regression (GREG) estimators of the finite population mean are considered, when the observations are from an infinite dimensional space. We compare these estimators based on their asymptotic distributions under some commonly used sampling designs and some superpopulations satisfying linear regression models. We show that the GREG estimator is asymptotically at least as efficient as any of the other two estimators under different sampling designs considered in this paper. Further, we show that the use of some well known sampling designs utilizing auxiliary information may have an adverse effect on the performance of the GREG estimator, when the degree of heteroscedasticity present in linear regression models is not very large. On the other hand, the use of those sampling designs improves the performance of this estimator, when the degree of heteroscedasticity present in linear regression models is large. We develop methods for determining the degree of heteroscedasticity, which in turn determines the choice of appropriate sampling design to be used with the GREG estimator. We also investigate the consistency of the covariance operators of the above estimators. We carry out some numerical studies using real and synthetic data, and our theoretical results are supported by the results obtained from those numerical studies.
We show that most structured prediction problems can be solved in linear time and space by considering them as partial orderings of the tokens in the input string. Our method computes real numbers for each token in an input string and sorts the tokens accordingly, resulting in as few as 2 total orders of the tokens in the string. Each total order possesses a set of edges oriented from smaller to greater tokens. The intersection of total orders results in a partial order over the set of input tokens, which is then decoded into a directed graph representing the desired structure. Experiments show that our method achieves 95.4 LAS and 96.9 UAS by using an intersection of 2 total orders, 95.7 LAS and 97.1 UAS with 4 on the English Penn Treebank dependency parsing benchmark. Our method is also the first linear-complexity coreference resolution model and achieves 79.2 F1 on the English OntoNotes benchmark, which is comparable with state of the art.
The moment-sum-of-squares (moment-SOS) hierarchy is one of the most celebrated and widely applied methods for approximating the minimum of an n-variate polynomial over a feasible region defined by polynomial (in)equalities. A key feature of the hierarchy is that, at a fixed level, it can be formulated as a semidefinite program of size polynomial in the number of variables n. Although this suggests that it may therefore be computed in polynomial time, this is not necessarily the case. Indeed, as O'Donnell (2017) and later Raghavendra & Weitz (2017) show, there exist examples where the sos-representations used in the hierarchy have exponential bit-complexity. We study the computational complexity of the moment-SOS hierarchy, complementing and expanding upon earlier work of Raghavendra & Weitz (2017). In particular, we establish algebraic and geometric conditions under which polynomial-time computation is guaranteed to be possible.
We develop a class of data-driven generative models that approximate the solution operator for parameter-dependent partial differential equations (PDE). We propose a novel probabilistic formulation of the operator learning problem based on recently developed generative denoising diffusion probabilistic models (DDPM) in order to learn the input-to-output mapping between problem parameters and solutions of the PDE. To achieve this goal we modify DDPM to supervised learning in which the solution operator for the PDE is represented by a class of conditional distributions. The probabilistic formulation combined with DDPM allows for an automatic quantification of confidence intervals for the learned solutions. Furthermore, the framework is directly applicable for learning from a noisy data set. We compare computational performance of the developed method with the Fourier Network Operators (FNO). Our results show that our method achieves comparable accuracy and recovers the noise magnitude when applied to data sets with outputs corrupted by additive noise.
Causal discovery procedures aim to deduce causal relationships among variables in a multivariate dataset. While various methods have been proposed for estimating a single causal model or a single equivalence class of models, less attention has been given to quantifying uncertainty in causal discovery in terms of confidence statements. The primary challenge in causal discovery is determining a causal ordering among the variables. Our research offers a framework for constructing confidence sets of causal orderings that the data do not rule out. Our methodology applies to structural equation models and is based on a residual bootstrap procedure to test the goodness-of-fit of causal orderings. We demonstrate the asymptotic validity of the confidence set constructed using this goodness-of-fit test and explain how the confidence set may be used to form sub/supersets of ancestral relationships as well as confidence intervals for causal effects that incorporate model uncertainty.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.