In this paper, we consider inference in the context of a factor model for tensor-valued time series. We study the consistency of the estimated common factors and loadings space when using estimators based on minimising quadratic loss functions. Building on the observation that such loss functions are adequate only if sufficiently many moments exist, we extend our results to the case of heavy-tailed distributions by considering estimators based on minimising the Huber loss function, which uses an $L_{1}$-norm weight on outliers. We show that such class of estimators is robust to the presence of heavy tails, even when only the second moment of the data exists.
This study develops an asymptotic theory for estimating the time-varying characteristics of locally stationary functional time series (LSFTS). We investigate a kernel-based method to estimate the time-varying covariance operator and the time-varying mean function of an LSFTS. In particular, we derive the convergence rate of the kernel estimator of the covariance operator and associated eigenvalue and eigenfunctions and establish a central limit theorem for the kernel-based locally weighted sample mean. As applications of our results, we discuss methods for testing the equality of time-varying mean functions in two functional samples.
Analysis of networks that evolve dynamically requires the joint modelling of individual snapshots and time dynamics. This paper proposes a new flexible two-way heterogeneity model towards this goal. The new model equips each node of the network with two heterogeneity parameters, one to characterize the propensity to form ties with other nodes statically and the other to differentiate the tendency to retain existing ties over time. With $n$ observed networks each having $p$ nodes, we develop a new asymptotic theory for the maximum likelihood estimation of $2p$ parameters when $np\rightarrow \infty$. We overcome the global non-convexity of the negative log-likelihood function by the virtue of its local convexity, and propose a novel method of moment estimator as the initial value for a simple algorithm that leads to the consistent local maximum likelihood estimator (MLE). To establish the upper bounds for the estimation error of the MLE, we derive a new uniform deviation bound, which is of independent interest. The theory of the model and its usefulness are further supported by extensive simulation and a data analysis examining social interactions of ants.
Empirical best prediction (EBP) is a well-known method for producing reliable proportion estimates when the primary data source provides only small or no sample from finite populations. There are at least two potential challenges encountered in implementing the existing EBP methodology. First, one must accurately link the sample to the finite population frame. This may be a difficult or even impossible task because of absence of identifiers that can be used to link sample and the frame. Secondly, the finite population frame typically contains limited auxiliary variables, which may not be adequate for building a reasonable working predictive model. We propose a data linkage approach in which we replace the finite population frame by a big sample that does not have the outcome binary variable of interest, but has a large set of auxiliary variables. Our proposed method calls for fitting the assumed model using data from the smaller sample, imputing the outcome variable for all the units of the big sample, and then finally using these imputed values to obtain standard weighted proportion using the big sample. We develop a new adjusted maximum likelihood method to avoid estimates of model variance on the boundary encountered in the commonly used in maximum likelihood estimation method. We propose an estimator of mean squared prediction error (MSPE) using a parametric bootstrap method and address computational issues by developing efficient EM algorithm. We illustrate the proposed methodology in the context of election projection for small areas.
In experimental and observational studies, there is often interest in understanding the mechanism through which an intervention program improves the final outcome. Causal mediation analyses have been developed for this purpose but are primarily considered for the case of perfect treatment compliance, with a few exceptions that require the exclusion restriction assumption. In this article, we consider a semiparametric framework for assessing causal mediation in the presence of treatment noncompliance without the exclusion restriction. We propose a set of assumptions to identify the natural mediation effects for the entire study population and further, for the principal natural mediation effects within subpopulations characterized by the potential compliance behavior. We derive the efficient influence functions for the principal natural mediation effect estimands and motivate a set of multiply robust estimators for inference. The multiply robust estimators remain consistent to their respective estimands under four types of misspecification of the working models and are efficient when all nuisance models are correctly specified. We further introduce a nonparametric extension of the proposed estimators by incorporating machine learners to estimate the nuisance functions. Sensitivity analysis methods are also discussed for addressing key identification assumptions. We demonstrate the proposed methods via simulations and an application to a real data example.
Continuous normalizing flows are widely used in generative tasks, where a flow network transports from a data distribution $P$ to a normal distribution. A flow model that can transport from $P$ to an arbitrary $Q$, where both $P$ and $Q$ are accessible via finite samples, would be of various application interests, particularly in the recently developed telescoping density ratio estimation (DRE) which calls for the construction of intermediate densities to bridge between $P$ and $Q$. In this work, we propose such a ``Q-malizing flow'' by a neural-ODE model which is trained to transport invertibly from $P$ to $Q$ (and vice versa) from empirical samples and is regularized by minimizing the transport cost. The trained flow model allows us to perform infinitesimal DRE along the time-parametrized $\log$-density by training an additional continuous-time flow network using classification loss, which estimates the time-partial derivative of the $\log$-density. Integrating the time-score network along time provides a telescopic DRE between $P$ and $Q$ that is more stable than a one-step DRE. The effectiveness of the proposed model is empirically demonstrated on mutual information estimation from high-dimensional data and energy-based generative models of image data.
Recently, there has been a growing interest in efficient numerical algorithms based on tensor networks and low-rank techniques to approximate high-dimensional functions and solutions to high-dimensional PDEs. In this paper, we propose a new tensor rank reduction method based on coordinate transformations that can greatly increase the efficiency of high-dimensional tensor approximation algorithms. The idea is simple: given a multivariate function, determine a coordinate transformation so that the function in the new coordinate system has smaller tensor rank. We restrict our analysis to linear coordinate transformations, which gives rise to a new class of functions that we refer to as tensor ridge functions. Leveraging Riemannian gradient descent on matrix manifolds we develop an algorithm that determines a quasi-optimal linear coordinate transformation for tensor rank reduction.The results we present for rank reduction via linear coordinate transformations open the possibility for generalizations to larger classes of nonlinear transformations. Numerical applications are presented and discussed for linear and nonlinear PDEs.
We study the computational complexity of counterfactual reasoning in relation to the complexity of associational and interventional reasoning on structural causal models (SCMs). We show that counterfactual reasoning is no harder than associational or interventional reasoning on fully specified SCMs in the context of two computational frameworks. The first framework is based on the notion of treewidth and includes the classical variable elimination and jointree algorithms. The second framework is based on the more recent and refined notion of causal treewidth which is directed towards models with functional dependencies such as SCMs. Our results are constructive and based on bounding the (causal) treewidth of twin networks -- used in standard counterfactual reasoning that contemplates two worlds, real and imaginary -- to the (causal) treewidth of the underlying SCM structure. In particular, we show that the latter (causal) treewidth is no more than twice the former plus one. Hence, if associational or interventional reasoning is tractable on a fully specified SCM then counterfactual reasoning is tractable too. We extend our results to general counterfactual reasoning that requires contemplating more than two worlds and discuss applications of our results to counterfactual reasoning with a partially specified SCM that is coupled with data. We finally present empirical results that measure the gap between the complexities of counterfactual reasoning and associational/interventional reasoning on random SCMs.
We study the parametric online changepoint detection problem, where the underlying distribution of the streaming data changes from a known distribution to an alternative that is of a known parametric form but with unknown parameters. We propose a joint detection/estimation scheme, which we call Window-Limited CUSUM, that combines the cumulative sum (CUSUM) test with a sliding window-based consistent estimate of the post-change parameters. We characterize the optimal choice of the window size and show that the Window-Limited CUSUM enjoys first-order asymptotic optimality as average run length approaches infinity under the optimal choice of window length. Compared to existing schemes with similar asymptotic optimality properties, our test can be much faster computed because it can recursively update the CUSUM statistic by employing the estimate of the post-change parameters. A parallel variant is also proposed that facilitates the practical implementation of the test. Numerical simulations corroborate our theoretical findings.
We study the problem of change point (CP) detection with high dimensional time series, within the framework of frequency domain. The overarching goal is to locate all change points and for each change point, delineate which series are activated by the change, over which set of frequencies. The working assumption is that only a few series are activated per change and frequency. We solve the problem by computing a CUSUM tensor based on spectra estimated from blocks of the observed time series. A frequency-specific projection approach is applied to the CUSUM tensor for dimension reduction. The projection direction is estimated by a proposed sparse tensor decomposition algorithm. Finally, the projected CUSUM vectors across frequencies are aggregated by a sparsified wild binary segmentation for change point detection. We provide theoretical guarantees on the number of estimated change points and the convergence rate of their locations. We derive error bounds for the estimated projection direction for identifying the frequency-specific series that are activated in a change. We provide data-driven rules for the choice of parameters. We illustrate the efficacy of the proposed method by simulation and a stock returns application.
BCART (Bayesian Classification and Regression Trees) and BART (Bayesian Additive Regression Trees) are popular Bayesian regression models widely applicable in modern regression problems. Their popularity is intimately tied to the ability to flexibly model complex responses depending on high-dimensional inputs while simultaneously being able to quantify uncertainties. This ability to quantify uncertainties is key, as it allows researchers to perform appropriate inferential analyses in settings that have generally been too difficult to handle using the Bayesian approach. However, surprisingly little work has been done to evaluate the sensitivity of these modern regression models to violations of modeling assumptions. In particular, we will consider influential observations, which one reasonably would imagine to be common -- or at least a concern -- in the big-data setting. In this paper, we consider both the problem of detecting influential observations and adjusting predictions to not be unduly affected by such potentially problematic data. We consider three detection diagnostics for Bayesian tree models, one an analogue of Cook's distance and the others taking the form of a divergence measure and a conditional predictive density metric, and then propose an importance sampling algorithm to re-weight previously sampled posterior draws so as to remove the effects of influential data in a computationally efficient manner. Finally, our methods are demonstrated on real-world data where blind application of the models can lead to poor predictions and inference.