Many classical inferential approaches fail to hold when interference exists among the population units. This amounts to the treatment status of one unit affecting the potential outcome of other units in the population. Testing for such spillover effects in this setting makes the null hypothesis non-sharp. An interesting approach to tackling the non-sharp nature of the null hypothesis in this setup is constructing conditional randomization tests such that the null is sharp on the restricted population. In randomized experiments, conditional randomized tests hold finite sample validity. Such approaches can pose computational challenges as finding these appropriate sub-populations based on experimental design can involve solving an NP-hard problem. In this paper, we view the network amongst the population as a random variable instead of being fixed. We propose a new approach that builds a conditional quasi-randomization test. Our main idea is to build the (non-sharp) null distribution of no spillover effects using random graph null models. We show that our method is exactly valid in finite-samples under mild assumptions. Our method displays enhanced power over other methods, with substantial improvement in complex experimental designs. We highlight that the method reduces to a simple permutation test, making it easy to implement in practice. We conduct a simulation study to verify the finite-sample validity of our approach and illustrate our methodology to test for interference in a weather insurance adoption experiment run in rural China.
In this paper, we consider an experimental setting where units enter the experiment sequentially. Our goal is to form stopping rules which lead to estimators of treatment effects with a given precision. We propose a fixed-width confidence interval design (FWCID) where the experiment terminates once a pre-specified confidence interval width is achieved. We show that under this design, the difference-in-means estimator is a consistent estimator of the average treatment effect and standard confidence intervals have asymptotic guarantees of coverage and efficiency for several versions of the design. In addition, we propose a version of the design that we call fixed power design (FPD) where a given power is asymptotically guaranteed for a given treatment effect, without the need to specify the variances of the outcomes under treatment or control. In addition, this design also gives a consistent difference-in-means estimator with correct coverage of the corresponding standard confidence interval. We complement our theoretical findings with Monte Carlo simulations where we compare our proposed designs with standard designs in the sequential experiments literature, showing that our designs outperform these designs in several important aspects. We believe our results to be relevant for many experimental settings where units enter sequentially, such as in clinical trials, as well as in online A/B tests used by the tech and e-commerce industry.
First order shape optimization methods, in general, require a large number of iterations until they reach a locally optimal design. While higher order methods can significantly reduce the number of iterations, they exhibit only local convergence properties, necessitating a sufficiently close initial guess. In this work, we present an unregularized shape-Newton method and combine shape optimization with homotopy (or continuation) methods in order to allow for the use of higher order methods even if the initial design is far from a solution. The idea of homotopy methods is to continuously connect the problem of interest with a simpler problem and to follow the corresponding solution path by a predictor-corrector scheme. We use a shape-Newton method as a corrector and arbitrary order shape derivatives for the predictor. Moreover, we apply homotopy methods also to the case of multi-objective shape optimization to efficiently obtain well-distributed points on a Pareto front. Finally, our results are substantiated with a set of numerical experiments.
We study the properties of a family of distances between functions of a single variable. These distances are examples of integral probability metrics, and have been used previously for comparing probability measures on the line; special cases include the Earth Mover's Distance and the Kolmogorov Metric. We examine their properties for general signals, proving that they are robust to a broad class of deformations. We also establish corresponding robustness results for the induced sliced distances between multivariate functions. Finally, we establish error bounds for approximating the univariate metrics from finite samples, and prove that these approximations are robust to additive Gaussian noise. The results are illustrated in numerical experiments, which include comparisons with Wasserstein distances.
The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing innovative evaluation methodologies, our research seeks to advance our understanding of these models' capabilities. We propose a novel VQA benchmark based on well-known visual classification datasets which allows a granular evaluation of text-generative vision-language models and their comparison with discriminative vision-language models. To improve the assessment of coarse answers on fine-grained classification tasks, we suggest using the semantic hierarchy of the label space to ask automatically generated follow-up questions about the ground-truth category. Finally, we compare traditional NLP and LLM-based metrics for the problem of evaluating model predictions given ground-truth answers. We perform a human evaluation study upon which we base our decision on the final metric. We apply our benchmark to a suite of vision-language models and show a detailed comparison of their abilities on object, action, and attribute classification. Our contributions aim to lay the foundation for more precise and meaningful assessments, facilitating targeted progress in the exciting field of vision-language modeling.
I review some of the main methods for selecting tuning parameters in nonparametric and $\ell_1$-penalized estimation. For the nonparametric estimation, I consider the methods of Mallows, Stein, Lepski, cross-validation, penalization, and aggregation in the context of series estimation. For the $\ell_1$-penalized estimation, I consider the methods based on the theory of self-normalized moderate deviations, bootstrap, Stein's unbiased risk estimation, and cross-validation in the context of Lasso estimation. I explain the intuition behind each of the methods and discuss their comparative advantages. I also give some extensions.
In the mixture of experts model, a common assumption is the linearity between a response variable and covariates. While this assumption has theoretical and computational benefits, it may lead to suboptimal estimates by overlooking potential nonlinear relationships among the variables. To address this limitation, we propose a partially linear structure that incorporates unspecified functions to capture nonlinear relationships. We establish the identifiability of the proposed model under mild conditions and introduce a practical estimation algorithm. We present the performance of our approach through numerical studies, including simulations and real data analysis.
When multitudes of features can plausibly be associated with a response, both privacy considerations and model parsimony suggest grouping them to increase the predictive power of a regression model. Specifically, the identification of groups of predictors significantly associated with the response variable eases further downstream analysis and decision-making. This paper proposes a new data analysis methodology that utilizes the high-dimensional predictor space to construct an implicit network with weighted edges %and weights on the edges to identify significant associations between the response and the predictors. Using a population model for groups of predictors defined via network-wide metrics, a new supervised grouping algorithm is proposed to determine the correct group, with probability tending to one as the sample size diverges to infinity. For this reason, we establish several theoretical properties of the estimates of network-wide metrics. A novel model-assisted bootstrap procedure that substantially decreases computational complexity is developed, facilitating the assessment of uncertainty in the estimates of network-wide metrics. The proposed methods account for several challenges that arise in the high-dimensional data setting, including (i) a large number of predictors, (ii) uncertainty regarding the true statistical model, and (iii) model selection variability. The performance of the proposed methods is demonstrated through numerical experiments, data from sports analytics, and breast cancer data.
We consider the computation of statistical moments to operator equations with stochastic data. We remark that application of PINNs -- referred to as TPINNs -- allows to solve the induced tensor operator equations under minimal changes of existing PINNs code, and enabling handling of non-linear and time-dependent operators. We propose two types of architectures, referred to as vanilla and multi-output TPINNs, and investigate their benefits and limitations. Exhaustive numerical experiments are performed; demonstrating applicability and performance; raising a variety of new promising research avenues.
Central Bank Digital Currency (CBDC) is a novel form of money that could be issued and regulated by central banks, offering benefits such as programmability, security, and privacy. However, the design of a CBDC system presents numerous technical and social challenges. This paper presents the design and prototype of a non-custodial wallet, a device that enables users to store and spend CBDC in various contexts. To address the challenges of designing a CBDC system, we conducted a series of workshops with internal and external stakeholders, using methods such as storytelling, metaphors, and provotypes to communicate CBDC concepts, elicit user feedback and critique, and incorporate normative values into the technical design. We derived basic guidelines for designing CBDC systems that balance technical and social aspects, and reflect user needs and values. Our paper contributes to the CBDC discourse by demonstrating a practical example of how CBDC could be used in everyday life and by highlighting the importance of a user-centred approach.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.