An important issue for many economic experiments is how the experimenter can ensure sufficient power for rejecting one or more hypotheses. Here, we apply methods developed mainly within the area of clinical trials for testing multiple hypotheses simultaneously in adaptive, two-stage designs. Our main goal is to illustrate how this approach can be used to improve the power of economic experiments. Having briefly introduced the relevant theory, we perform a simulation study supported by the open source R package asd in order to evaluate the power of some different designs. The simulations show that the power to reject at least one hypothesis can be improved while still ensuring strong control of the overall Type I error probability, and without increasing the total sample size and thus the costs of the study. The derived designs are further illustrated by applying them to two different real-world data sets from experimental economics.
This paper proposes an adaptive randomization procedure for two-stage randomized controlled trials. The method uses data from a first-wave experiment in order to determine how to stratify in a second wave of the experiment, where the objective is to minimize the variance of an estimator for the average treatment effect (ATE). We consider selection from a class of stratified randomization procedures which we call stratification trees: these are procedures whose strata can be represented as decision trees, with differing treatment assignment probabilities across strata. By using the first wave to estimate a stratification tree, we simultaneously select which covariates to use for stratification, how to stratify over these covariates, as well as the assignment probabilities within these strata. Our main result shows that using this randomization procedure with an appropriate estimator results in an asymptotic variance which is minimal in the class of stratification trees. Moreover, the results we present are able to accommodate a large class of assignment mechanisms within strata, including stratified block randomization. In a simulation study, we find that our method, paired with an appropriate cross-validation procedure ,can improve on ad-hoc choices of stratification. We conclude by applying our method to the study in Karlan and Wood (2017), where we estimate stratification trees using the first wave of their experiment.
This paper concerns the construction of confidence intervals in standard seroprevalence surveys. In particular, we discuss methods for constructing confidence intervals for the proportion of individuals in a population infected with a disease using a sample of antibody test results and measurements of the test's false positive and false negative rates. We begin by documenting erratic behavior in the coverage probabilities of standard Wald and percentile bootstrap intervals when applied to this problem. We then consider two alternative sets of intervals constructed with test inversion. The first set of intervals are approximate, using either asymptotic or bootstrap approximation to the finite-sample distribution of a chosen test statistic. We consider several choices of test statistic, including maximum likelihood estimators and generalized likelihood ratio statistics. We show with simulation that, at empirically relevant parameter values and sample sizes, the coverage probabilities for these intervals are close to their nominal level and are approximately equi-tailed. The second set of intervals are shown to contain the true parameter value with probability at least equal to the nominal level, but can be conservative in finite samples.
A matching is a set of edges in a graph with no common endpoint. A matching $M$ is called acyclic if the induced subgraph on the endpoints of the edges in $M$ is acyclic. Given a graph $G$ and an integer $k$, Acyclic Matching Problem seeks for an acyclic matching of size $k$ in $G$. The problem is known to be NP-complete. In this paper, we investigate the complexity of the problem in different aspects. First, we prove that the problem remains NP-complete for the class of planar bipartite graphs of maximum degree three and arbitrarily large girth. Also, the problem remains NP-complete for the class of planar line graphs with maximum degree four. Moreover, we study the parameterized complexity of the problem. In particular, we prove that the problem is W[1]-hard on bipartite graphs with respect to the parameter $k$. On the other hand, the problem is fixed parameter tractable with respect to $k$, for line graphs, $C_4$-free graphs and every proper minor-closed class of graphs (including bounded tree-width and planar graphs).
Volatility forecasting is crucial to risk management and portfolio construction. One particular challenge of assessing volatility forecasts is how to construct a robust proxy for the unknown true volatility. In this work, we show that the empirical loss comparison between two volatility predictors hinges on the deviation of the volatility proxy from the true volatility. We then establish non-asymptotic deviation bounds for three robust volatility proxies, two of which are based on clipped data, and the third of which is based on exponentially weighted Huber loss minimization. In particular, in order for the Huber approach to adapt to non-stationary financial returns, we propose to solve a tuning-free weighted Huber loss minimization problem to jointly estimate the volatility and the optimal robustification parameter at each time point. We then inflate this robustification parameter and use it to update the volatility proxy to achieve optimal balance between the bias and variance of the global empirical loss. We also extend this Huber method to construct volatility predictors. Finally, we exploit the proposed robust volatility proxy to compare different volatility predictors on the Bitcoin market data. It turns out that when the sample size is limited, applying the robust volatility proxy gives more consistent and stable evaluation of volatility forecasts.
One of the possible objectives when designing experiments is to build or formulate a model for predicting future observations. When the primary objective is prediction, some typical approaches in the planning phase are to use well-established small-sample experimental designs in the design phase (e.g., Definitive Screening Designs) and to construct predictive models using widely used model selection algorithms such as LASSO. These design and analytic strategies, however, do not guarantee high prediction performance, partly due to the small sample sizes that prevent partitioning the data into training and validation sets, a strategy that is commonly used in machine learning models to improve out-of-sample prediction. In this work, we propose a novel framework for building high-performance predictive models from experimental data that capitalizes on the advantage of having both training and validation sets. However, instead of partitioning the data into two mutually exclusive subsets, we propose a weighting scheme based on the fractional random weight bootstrap that emulates data partitioning by assigning anti-correlated training and validation weights to each observation. The proposed methodology, called Self-Validated Ensemble Modeling (SVEM), proceeds in the spirit of bagging so that it iterates through bootstraps of anti-correlated weights and fitted models, with the final SVEM model being the average of the bootstrapped models. We investigate the performance of the SVEM algorithm with several model-building approaches such as stepwise regression, Lasso, and the Dantzig selector. Finally, through simulation and case studies, we show that SVEM generally generates models with better prediction performance in comparison to one-shot model selection approaches.
We introduce a notion of "simulation" for labelled graphs, in which edges of the simulated graph are realized by regular expressions in the simulating graph, and prove that the tiling problem (aka "domino problem") for the simulating graph is at least as difficult as that for the simulated graph. We apply this to the Cayley graph of the "lamplighter group" $L=\mathbb Z/2\wr\mathbb Z$, and more generally to "Diestel-Leader graphs". We prove that these graphs simulate the plane, and thus deduce that the seeded tiling problem is unsolvable on the group $L$. We note that $L$ does not contain any plane in its Cayley graph, so our undecidability criterion by simulation covers cases not covered by Jeandel's criterion based on translation-like action of a product of finitely generated infinite groups. Our approach to tiling problems is strongly based on categorical constructions in graph theory.
Evolving diverse sets of high quality solutions has gained increasing interest in the evolutionary computation literature in recent years. With this paper, we contribute to this area of research by examining evolutionary diversity optimisation approaches for the classical Traveling Salesperson Problem (TSP). We study the impact of using different diversity measures for a given set of tours and the ability of evolutionary algorithms to obtain a diverse set of high quality solutions when adopting these measures. Our studies show that a large variety of diverse high quality tours can be achieved by using our approaches. Furthermore, we compare our approaches in terms of theoretical properties and the final set of tours obtained by the evolutionary diversity optimisation algorithm.
Accurate forecasting is one of the fundamental focus in the literature of econometric time-series. Often practitioners and policy makers want to predict outcomes of an entire time horizon in the future instead of just a single $k$-step ahead prediction. These series, apart from their own possible non-linear dependence, are often also influenced by many external predictors. In this paper, we construct prediction intervals of time-aggregated forecasts in a high-dimensional regression setting. Our approach is based on quantiles of residuals obtained by the popular LASSO routine. We allow for general heavy-tailed, long-memory, and nonlinear stationary error process and stochastic predictors. Through a series of systematically arranged consistency results we provide theoretical guarantees of our proposed quantile-based method in all of these scenarios. After validating our approach using simulations we also propose a novel bootstrap based method that can boost the coverage of the theoretical intervals. Finally analyzing the EPEX Spot data, we construct prediction intervals for hourly electricity prices over horizons spanning 17 weeks and contrast them to selected Bayesian and bootstrap interval forecasts.
Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks while also exhibiting robustness in the presence of observations unrelated to the task. Our method, which we call FORM (for "Future Observation Reward Model") is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations, without needing ground truth actions. We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.
Although the Transformer translation model (Vaswani et al., 2017) has achieved state-of-the-art performance in a variety of translation tasks, how to use document-level context to deal with discourse phenomena problematic for Transformer still remains a challenge. In this work, we extend the Transformer model with a new context encoder to represent document-level context, which is then incorporated into the original encoder and decoder. As large-scale document-level parallel corpora are usually not available, we introduce a two-step training method to take full advantage of abundant sentence-level parallel corpora and limited document-level parallel corpora. Experiments on the NIST Chinese-English datasets and the IWSLT French-English datasets show that our approach improves over Transformer significantly.