Technological advancements have made it possible to deliver mobile health interventions to individuals. A novel framework that has emerged from such advancements is the just-in-time adaptive intervention (JITAI), which aims to suggest the right support to the individuals when their needs arise. The micro-randomized trial (MRT) design has been proposed recently to test the proximal effects of these JITAIs. However, the extant MRT framework only considers components with a fixed number of categories added at the beginning of the study. We propose a flexible MRT (FlexiMRT) design which allows addition of more categories to the components during the study. The proposed design is motivated by collaboration on the DIAMANTE study, which learns to deliver text messages to encourage physical activity among the patients with diabetes and depression. We developed a new test statistic and the corresponding sample size calculator for the FlexiMRT using an approach similar to the generalized estimating equation for longitudinal data. Simulation studies were conducted to evaluate the sample size calculators and an R shiny application for the calculators was developed.
Strategic test allocation plays a major role in the control of both emerging and existing pandemics (e.g., COVID-19, HIV). Widespread testing supports effective epidemic control by (1) reducing transmission via identifying cases, and (2) tracking outbreak dynamics to inform targeted interventions. However, infectious disease surveillance presents unique statistical challenges. For instance, the true outcome of interest - one's positive infectious status, is often a latent variable. In addition, presence of both network and temporal dependence reduces the data to a single observation. As testing entire populations regularly is neither efficient nor feasible, standard approaches to testing recommend simple rule-based testing strategies (e.g., symptom based, contact tracing), without taking into account individual risk. In this work, we study an adaptive sequential design involving n individuals over a period of {\tau} time-steps, which allows for unspecified dependence among individuals and across time. Our causal target parameter is the mean latent outcome we would have obtained after one time-step, if, starting at time t given the observed past, we had carried out a stochastic intervention that maximizes the outcome under a resource constraint. We propose an Online Super Learner for adaptive sequential surveillance that learns the optimal choice of tests strategies over time while adapting to the current state of the outbreak. Relying on a series of working models, the proposed method learns across samples, through time, or both: based on the underlying (unknown) structure in the data. We present an identification result for the latent outcome in terms of the observed data, and demonstrate the superior performance of the proposed strategy in a simulation modeling a residential university environment during the COVID-19 pandemic.
A new unimodal distribution family indexed by the mode and three other parameters is derived from a mixture of a Gumbel distribution for the maximum and a Gumbel distribution for the minimum. Properties of the proposed distribution are explored, including model identifiability and flexibility in capturing heavy-tailed data that exhibit different directions of skewness over a wide range. Both frequentist and Bayesian methods are developed to infer parameters in the new distribution. Simulation studies are conducted to demonstrate satisfactory performance of both methods. By fitting the proposed model to simulated data and data from an application in hydrology, it is shown that the proposed flexible distribution is especially suitable for data containing extreme values in either direction, with the mode being a location parameter of interest. A regression model concerning the mode of a response given covariates based on the proposed unimodal distribution can be easily formulated, which we apply to data from an application in criminology to reveal interesting data features that are obscured by outliers.
Emerging connected vehicle (CV) data sets have recently become commercially available that enable analysts to develop a variety of powerful performance measures without a need to deploy field infrastructure. This paper presents a several tools using CV data to evaluate the quality of signal progression. These include both performance measures for high-level analysis as well as visualizations to examine details of coordinated operation. With the use of CV data, it is possible to assess not only the movement of traffic on the corridor but also to consider its origin-destination (O-D) path through the corridor, and the tools can be applied to select O-D paths or to all O-D paths in the corridor. Results for real-world operation of an eight-intersection signalized arterial are presented. A series of high-level performance measures are used to evaluate overall performance by time of day and direction, with differing results by metric. Next, the details of the operation are examined with the use of two visualization tools: a cyclic time space diagram, and an empirical platoon progression diagram. Comparing visualizations of only end-to-end journeys on the corridor with all journeys on the corridor reveals several features that are only visible with the latter. The study demonstrates the utility of CV trajectory data for obtaining high-level details as well as drilling down into the details.
Non-Fungible Token (NFT) marketplaces on the Ethereum blockchain saw an astonishing growth in 2021. The trend does not seem to stop, with a monthly trading volume of \$6 billion in January 2022. However, questions have arisen about such a high trading volume. The primary concern is wash trading, a market manipulation in which a single entity trades an NFT multiple times to increase the volume artificially. This paper describes several methodologies for identifying wash trading in Ethereum, from its inception to January 2022, and explores the tangible impact on NFTs. We found that the collections affected by wash trading are 5.66% of all the collections, with a total artificial volume of \$3,406,110,774. We study two different ways of profiting from wash trading: Increasing the price of NFTs by showing artificial interest on the asset, and exploiting the reward token system of some marketplaces. We show that the latter is safer for wash traders since it guarantees a higher expected profit. Our findings indicate that wash trading is a frequent event in the blockchain eco-system, that reward token systems can stimulate market manipulations, and that marketplaces can introduce countermeasures by using the methodologies described in this paper.
Background: The COVID-19 pandemic has had a profound impact on health, everyday life and economics around the world. An important complication that can arise in connection with a COVID-19 infection is acute kidney injury. A recent observational cohort study of COVID-19 patients treated at multiple sites of a tertiary care center in Berlin, Germany identified risk factors for the development of (severe) acute kidney injury. Since inferring results from a single study can be tricky, we validate these findings and potentially adjust results by including external information from other studies on acute kidney injury and COVID-19. Methods: We synthesize the results of the main study with other trials via a Bayesian meta-analysis. The external information is used to construct a predictive distribution and to derive posterior estimates for the study of interest. We focus on various important potential risk factors for acute kidney injury development such as mechanical ventilation, use of vasopressors, hypertension, obesity, diabetes, gender and smoking. Results: Our results show that depending on the degree of heterogeneity in the data the estimated effect sizes may be refined considerably with inclusion of external data. Our findings confirm that mechanical ventilation and use of vasopressors are important risk factors for the development of acute kidney injury in COVID-19 patients. Hypertension also appears to be a risk factor that should not be ignored. Shrinkage weights depended to a large extent on the estimated heterogeneity in the model. Conclusions: Our work shows how external information can be used to adjust the results from a primary study, using a Bayesian meta-analytic approach. How much information is borrowed from external studies will depend on the degree of heterogeneity present in the model.
Mainstream methods for clinical trial design do not yet use prior probabilities of clinical hypotheses, mainly due to a concern that poor priors may lead to weak designs. To address this concern, we illustrate a conservative approach to trial design ensuring that the frequentist operational characteristics of the primary trial outcome are stronger than the design prior. Compared to current approaches to Bayesian design, we focus on defining a sample size cost commensurate to the prior to ensure against the possibility of prior-data conflict. Our approach is ethical, in that it calls for quantification of the level of clinical equipoise at design stage and requires the design to be appropriate to disturb this initial equipoise by a pre-specified amount. Four examples are discussed, illustrating the design of phase II-III trials with binary or time to event endpoints. Sample sizes are shown to be conductive to strong levels of overall evidence, whether positive or negative, increasing the conclusiveness of the design and associated trial outcome. Levels of negative evidence provided by standard group sequential designs are found negligible, underscoring the importance of complementing traditional efficacy boundaries with futility rules.
Most published work on differential privacy (DP) focuses exclusively on meeting privacy constraints, by adding to the query noise with a pre-specified parametric distribution model, typically with one or two degrees of freedom. The accuracy of the response and its utility to the intended use are frequently overlooked. Considering that several database queries are categorical in nature (e.g., a label, a ranking, etc.), or can be quantized, the parameters that define the randomized mechanism's distribution are finite. Thus, it is reasonable to search through numerical optimization for the probability masses that meet the privacy constraints while minimizing the query distortion. Considering the modulo summation of random noise as the DP mechanism, the goal of this paper is to introduce a tractable framework to design the optimum noise probability mass function (PMF) for database queries with a discrete and finite set, optimizing with an expected distortion metric for a given $(\epsilon,\delta)$. We first show that the optimum PMF can be obtained by solving a mixed integer linear program (MILP). Then, we derive closed-form solutions for the optimum PMF that minimize the probability of error for two special cases. We show numerically that the proposed optimal mechanisms significantly outperform the state-of-the-art.
Vision Transformers (ViTs) have achieved overwhelming success, yet they suffer from vulnerable resolution scalability, i.e., the performance drops drastically when presented with input resolutions that are unseen during training. We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions. In particular, ResFormer operates on replicated images of different resolutions and enforces a scale consistency loss to engage interactive information across different scales. More importantly, to alternate among varying resolutions, we propose a global-local positional embedding strategy that changes smoothly conditioned on input sizes. This allows ResFormer to cope with novel resolutions effectively. We conduct extensive experiments for image classification on ImageNet. The results provide strong quantitative evidence that ResFormer has promising scaling abilities towards a wide range resolutions. For instance, ResFormer-B-MR achieves a Top-1 accuracy of 75.86% and 81.72% when evaluated on relatively low and high resolutions respectively (i.e., 96 and 640), which are 48% and 7.49% better than DeiT-B. We also demonstrate, among other things, ResFormer is flexible and can be easily extended to semantic segmentation and video action recognition.
Multifidelity methods are widely used for estimating quantities of interest (QoI) in computational science by employing numerical simulations of differing costs and accuracies. Many methods approximate numerical-valued statistics that represent only limited information, e.g., scalar statistics, about the QoI. Further quantification of uncertainty, e.g., for risk assessment, failure probabilities, or confidence intervals, requires estimation of the full distributions. In this paper, we generalize the ideas in [Xu et al., SIAM J. Sci. Comput. 44.1 (2022), A150-A175] to develop a multifidelity method that approximates the full distribution of scalar-valued QoI. The main advantage of our approach compared to alternative methods is that we require no particular relationships among the high and lower-fidelity models (e.g. model hierarchy), and we do not assume any knowledge of model statistics including correlations and other cross-model statistics before the procedure starts. Under suitable assumptions in the framework above, we achieve provable 1-Wasserstein metric convergence of an algorithmically constructed distributional emulator via an exploration-exploitation strategy. We also prove that crucial policy actions taken by our algorithm are budget-asymptotically optimal. Numerical experiments are provided to support our theoretical analysis.
Some classical uncertainty quantification problems require the estimation of multiple expectations. Estimating all of them accurately is crucial and can have a major impact on the analysis to perform, and standard existing Monte Carlo methods can be costly to do so. We propose here a new procedure based on importance sampling and control variates for estimating more efficiently multiple expectations with the same sample. We first show that there exists a family of optimal estimators combining both importance sampling and control variates, which however cannot be used in practice because they require the knowledge of the values of the expectations to estimate. Motivated by the form of these optimal estimators and some interesting properties, we therefore propose an adaptive algorithm. The general idea is to adaptively update the parameters of the estimators for approaching the optimal ones. We suggest then a quantitative stopping criterion that exploits the trade-off between approaching these optimal parameters and having a sufficient budget left. This left budget is then used to draw a new independent sample from the final sampling distribution, allowing to get unbiased estimators of the expectations. We show how to apply our procedure to sensitivity analysis, by estimating Sobol' indices and quantifying the impact of the input distributions. Finally, realistic test cases show the practical interest of the proposed algorithm, and its significant improvement over estimating the expectations separately.