The Markov property is widely imposed in analysis of time series data. Correspondingly, testing the Markov property, and relatedly, inferring the order of a Markov model, are of paramount importance. In this article, we propose a nonparametric test for the Markov property in high-dimensional time series via deep conditional generative learning. We also apply the test sequentially to determine the order of the Markov model. We show that the test controls the type-I error asymptotically, and has the power approaching one. Our proposal makes novel contributions in several ways. We utilize and extend state-of-the-art deep generative learning to estimate the conditional density functions, and establish a sharp upper bound on the approximation error of the estimators. We derive a doubly robust test statistic, which employs a nonparametric estimation but achieves a parametric convergence rate. We further adopt sample splitting and cross-fitting to minimize the conditions required to ensure the consistency of the test. We demonstrate the efficacy of the test through both simulations and the three data applications.
This paper focuses on testing for the presence of alpha in time-varying factor pricing models, specifically when the number of securities N is larger than the time dimension of the return series T. We introduce a maximum-type test that performs well in scenarios where the alternative hypothesis is sparse. We establish the limit null distribution of the proposed maximum-type test statistic and demonstrate its asymptotic independence from the sum-type test statistics proposed by Ma et al.(2020).Additionally, we propose an adaptive test by combining the maximum-type test and sum-type test, and we show its advantages under various alternative hypotheses through simulation studies and two real data applications.
We investigate trade-offs in static and dynamic evaluation of hierarchical queries with arbitrary free variables. In the static setting, the trade-off is between the time to partially compute the query result and the delay needed to enumerate its tuples. In the dynamic setting, we additionally consider the time needed to update the query result under single-tuple inserts or deletes to the database. Our approach observes the degree of values in the database and uses different computation and maintenance strategies for high-degree (heavy) and low-degree (light) values. For the latter it partially computes the result, while for the former it computes enough information to allow for on-the-fly enumeration. We define the preprocessing time, the update time, and the enumeration delay as functions of the light/heavy threshold. By appropriately choosing this threshold, our approach recovers a number of prior results when restricted to hierarchical queries. We show that for a restricted class of hierarchical queries, our approach achieves worst-case optimal update time and enumeration delay conditioned on the Online Matrix-Vector Multiplication Conjecture.
Randomized trials balance all covariates on average and provide the gold standard for estimating treatment effects. Chance imbalances nevertheless exist more or less in realized treatment allocations and intrigue an important question: what should we do in case the treatment groups differ with respect to some important baseline characteristics? A common strategy is to conduct a {\it preliminary test} of the balance of baseline covariates after randomization, and invoke covariate adjustment for subsequent inference if and only if the realized allocation fails some prespecified criterion. Although such practice is intuitive and popular among practitioners, the existing literature has so far only evaluated its properties under strong parametric model assumptions in theory and simulation, yielding results of limited generality. To fill this gap, we examine two strategies for conducting preliminary test-based covariate adjustment by regression, and evaluate the validity and efficiency of the resulting inferences from the randomization-based perspective. As it turns out, the preliminary-test estimator based on the analysis of covariance can be even less efficient than the unadjusted difference in means, and risks anticonservative confidence intervals based on normal approximation even with the robust standard error. The preliminary-test estimator based on the fully interacted specification is on the other hand less efficient than its counterpart under the {\it always-adjust} strategy, and yields overconservative confidence intervals based on normal approximation. Based on theory and simulation, we echo the existing literature and do not recommend the preliminary-test procedure for covariate adjustment in randomized trials.
In this article we perform an asymptotic analysis of parallel Bayesian logspline density estimators. Such estimators are useful for the analysis of datasets that are partitioned into subsets and stored in separate databases without the capability of accessing the full dataset from a single computer. The parallel estimator we introduce is in the spirit of a kernel density estimator introduced in recent studies. We provide a numerical procedure that produces the normalized density estimator itself in place of the sampling algorithm. We then derive an error bound for the mean integrated squared error of the full dataset posterior estimator. The error bound depends upon the parameters that arise in logspline density estimation and the numerical approximation procedure. In our analysis, we identify the choices for the parameters that result in the error bound scaling optimally in relation to the number of samples. This provides our method with increased estimation accuracy, while also minimizing the computational cost.
A bootstrap procedure for constructing prediction bands for a stationary functional time series is proposed. The procedure exploits a general vector autoregressive representation of the time-reversed series of Fourier coefficients appearing in the Karhunen-Loeve representation of the functional process. It generates backward-in-time, functional replicates that adequately mimic the dependence structure of the underlying process in a model-free way and have the same conditionally fixed curves at the end of each functional pseudo-time series. The bootstrap prediction error distribution is then calculated as the difference between the model-free, bootstrap-generated future functional observations and the functional forecasts obtained from the model used for prediction. This allows the estimated prediction error distribution to account for the innovation and estimation errors associated with prediction and the possible errors due to model misspecification. We establish the asymptotic validity of the bootstrap procedure in estimating the conditional prediction error distribution of interest, and we also show that the procedure enables the construction of prediction bands that achieve (asymptotically) the desired coverage. Prediction bands based on a consistent estimation of the conditional distribution of the studentized prediction error process also are introduced. Such bands allow for taking more appropriately into account the local uncertainty of prediction. Through a simulation study and the analysis of two data sets, we demonstrate the capabilities and the good finite-sample performance of the proposed method.
We propose a novel learning-based surrogate data assimilation (DA) model for efficient state estimation in a limited area. Our model employs a feedforward neural network for online computation, eliminating the need for integrating high-dimensional limited-area models. This approach offers significant computational advantages over traditional DA algorithms. Furthermore, our method avoids the requirement of lateral boundary conditions for the limited-area model in both online and offline computations. The design of our surrogate DA model is built upon a robust theoretical framework that leverages two fundamental concepts: observability and effective region. The concept of observability enables us to quantitatively determine the optimal amount of observation data necessary for accurate DA. Meanwhile, the concept of effective region substantially reduces the computational burden associated with computing observability and generating training data.
Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration method that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. Assuming agents compute actions following a sequential order at \textit{each environment timestep}, we provide a perspective to view MARL as tree search iterations by considering agents as nodes at different depths of the search tree. Inspired by the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees), we develop a method called Conditionally Optimistic Exploration (COE). COE augments each agent's state-action value estimate with an action-conditioned optimistic bonus derived from the visitation count of the global state and joint actions of preceding agents. COE is performed during training and disabled at deployment, making it compatible with any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.
In this paper, we investigate the impact of numerical instability on the reliability of sampling, density evaluation, and evidence lower bound (ELBO) estimation in variational flows. We first empirically demonstrate that common flows can exhibit a catastrophic accumulation of error: the numerical flow map deviates significantly from the exact map -- which affects sampling -- and the numerical inverse flow map does not accurately recover the initial input -- which affects density and ELBO computations. Surprisingly though, we find that results produced by flows are often accurate enough for applications despite the presence of serious numerical instability. In this work, we treat variational flows as dynamical systems, and leverage shadowing theory to elucidate this behavior via theoretical guarantees on the error of sampling, density evaluation, and ELBO estimation. Finally, we develop and empirically test a diagnostic procedure that can be used to validate results produced by numerically unstable flows in practice.
Graphs are important data representations for describing objects and their relationships, which appear in a wide diversity of real-world scenarios. As one of a critical problem in this area, graph generation considers learning the distributions of given graphs and generating more novel graphs. Owing to their wide range of applications, generative models for graphs, which have a rich history, however, are traditionally hand-crafted and only capable of modeling a few statistical properties of graphs. Recent advances in deep generative models for graph generation is an important step towards improving the fidelity of generated graphs and paves the way for new kinds of applications. This article provides an extensive overview of the literature in the field of deep generative models for graph generation. Firstly, the formal definition of deep generative models for the graph generation and the preliminary knowledge are provided. Secondly, taxonomies of deep generative models for both unconditional and conditional graph generation are proposed respectively; the existing works of each are compared and analyzed. After that, an overview of the evaluation metrics in this specific domain is provided. Finally, the applications that deep graph generation enables are summarized and five promising future research directions are highlighted.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.