We provide new insights into the finding that Medicaid increased emergency department (ED) use from the Oregon experiment. Using nonparametric causal machine learning methods, we find economically meaningful treatment effect heterogeneity in the impact of Medicaid coverage on ED use. The effect distribution is widely dispersed, with significant positive effects concentrated among high-use individuals. A small group - about 14% of participants - in the right tail with significant increases in ED use drives the overall effect. The remainder of the individualized treatment effects is either indistinguishable from zero or negative. The average treatment effect is not representative of the individualized treatment effect for most people. We identify four priority groups with large and statistically significant increases in ED use - men, prior SNAP participants, adults less than 50 years old, and those with pre-lottery ED use classified as primary care treatable. Our results point to an essential role of intensive margin effects - Medicaid increases utilization among those already accustomed to ED use and who use the emergency department for all types of care. We leverage the heterogeneous effects to estimate optimal assignment rules to prioritize insurance applications in similar expansions.
Many studies have demonstrated that mobile applications are common means to collect massive amounts of personal data. This goes unnoticed by most users, who are also unaware that many different organizations are receiving this data, even from multiple apps in parallel. This paper assesses different techniques to identify the organizations that are receiving personal data flows in the Android ecosystem, namely the WHOIS service, SSL certificates inspection, and privacy policy textual analysis. Based on our findings, we propose a fully automated method that combines the most successful techniques, achieving a 94.73% precision score in identifying the recipient organization. We further demonstrate our method by evaluating 1,000 Android apps and exposing the corporations that collect the users' personal data.
In a sports competition, a team might lose a powerful incentive to exert full effort if its final rank does not depend on the outcome of the matches still to be played. Therefore, the organiser should reduce the probability of such a situation to the extent possible. Our paper provides a classification scheme to identify these weakly (where one team is indifferent) or strongly (where both teams are indifferent) stakeless games. A statistical model is estimated to simulate the UEFA Champions League groups and compare the candidate schedules used in the 2021/22 season according to the competitiveness of the matches played in the last round(s). The option followed in four of the eight groups is found to be optimal under a wide set of parameters. Minimising the number of strongly stakeless matches is verified to be a likely goal in the computer draw of the fixture that remains hidden from the public.
The Coronavirus disease 2019 (COVID-19) outbreak quickly spread around the world, resulting in over 240 million infections and 4 million deaths by Oct 2021. While the virus is spreading from person to person silently, fear has also been spreading around the globe. The COVID-19 information from the Australian Government is convincing but not timely or detailed, and there is much information on social networks with both facts and rumors. As software engineers, we have spontaneously and rapidly constructed a COVID-19 information dashboard aggregating reliable information semi-automatically checked from different sources for providing one-stop information sharing site about the latest status in Australia. Inspired by the John Hopkins University COVID-19 Map, our dashboard contains the case statistics, case distribution, government policy, latest news, with interactive visualization. In this paper, we present a participant's in-person observations in which the authors acted as founders of //covid-19-au.com/ serving more than 830K users with 14M page views since March 2020. According to our first-hand experience, we summarize 9 lessons for developers, researchers and instructors. These lessons may inspire the development, research and teaching in software engineer aspects for coping with similar public crises in the future.
As machine learning algorithms become increasingly integrated in crucial decision-making scenarios, such as healthcare, recruitment, and risk assessment, there have been increasing concerns about the privacy and fairness of such systems. Federated learning has been viewed as a promising solution for collaboratively training of machine learning models among multiple parties while maintaining the privacy of their local data. However, federated learning also poses new challenges in mitigating the potential bias against certain populations (e.g., demographic groups), as this typically requires centralized access to the sensitive information (e.g., race, gender) of each data point. Motivated by the importance and challenges of group fairness in federated learning, in this work, we propose FairFed, a novel algorithm to enhance group fairness via a fairness-aware aggregation method, which aims to provide fair model performance across different sensitive groups (e.g., racial, gender groups) while maintaining high utility. This formulation can further provide more flexibility in the customized local debiasing strategies for each client. We build our FairFed algorithm around the secure aggregation protocol of federated learning. When running federated training on widely investigated fairness datasets, we demonstrate that our proposed method outperforms the state-of-the-art fair federated learning frameworks under a high heterogeneous sensitive attribute distribution. We also investigate the performance of FairFed on naturally distributed real-life data collected from different geographical locations or departments within an organization.
Collaborative tasks are ubiquitous activities where a form of communication is required in order to reach a joint goal. Collaborative building is one of such tasks. We wish to develop an intelligent builder agent in a simulated building environment (Minecraft) that can build whatever users wish to build by just talking to the agent. In order to achieve this goal, such agents need to be able to take the initiative by asking clarification questions when further information is needed. Existing works on Minecraft Corpus Dataset only learn to execute instructions neglecting the importance of asking for clarifications. In this paper, we extend the Minecraft Corpus Dataset by annotating all builder utterances into eight types, including clarification questions, and propose a new builder agent model capable of determining when to ask or execute instructions. Experimental results show that our model achieves state-of-the-art performance on the collaborative building task with a substantial improvement. We also define two new tasks, the learning to ask task and the joint learning task. The latter consists of solving both collaborating building and learning to ask tasks jointly.
This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the level of the cluster; by non-ignorable cluster sizes we mean that "large" clusters and "small" clusters may be heterogeneous, and, in particular, the effects of the treatment may vary across clusters of differing sizes. In order to permit this sort of flexibility, we consider a sampling framework in which cluster sizes themselves are random. In this way, our analysis departs from earlier analyses of cluster randomized experiments in which cluster sizes are treated as non-random. We distinguish between two different parameters of interest: the equally-weighted cluster-level average treatment effect, and the size-weighted cluster-level average treatment effect. For each parameter, we provide methods for inference in an asymptotic framework where the number of clusters tends to infinity and treatment is assigned using simple random sampling. We additionally permit the experimenter to sample only a subset of the units within each cluster rather than the entire cluster and demonstrate the implications of such sampling for some commonly used estimators. A small simulation study shows the practical relevance of our theoretical results.
Exponential growth in digital information outlets and the race to publish has made scientific misinformation more prevalent than ever. However, the task to fact-verify a given scientific claim is not straightforward even for researchers. Scientific claim verification requires in-depth knowledge and great labor from domain experts to substantiate supporting and refuting evidence from credible scientific sources. The SciFact dataset and corresponding task provide a benchmarking leaderboard to the community to develop automatic scientific claim verification systems via extracting and assimilating relevant evidence rationales from source abstracts. In this work, we propose a modular approach that sequentially carries out binary classification for every prediction subtask as in the SciFact leaderboard. Our simple classifier-based approach uses reduced abstract representations to retrieve relevant abstracts. These are further used to train the relevant rationale-selection model. Finally, we carry out two-step stance predictions that first differentiate non-relevant rationales and then identify supporting or refuting rationales for a given claim. Experimentally, our system RerrFact with no fine-tuning, simple design, and a fraction of model parameters fairs competitively on the leaderboard against large-scale, modular, and joint modeling approaches. We make our codebase available at //github.com/ashishrana160796/RerrFact.
In this paper we study the finite sample and asymptotic properties of various weighting estimators of the local average treatment effect (LATE), several of which are based on Abadie (2003)'s kappa theorem. Our framework presumes a binary endogenous explanatory variable ("treatment") and a binary instrumental variable, which may only be valid after conditioning on additional covariates. We argue that one of the Abadie estimators, which we show is weight normalized, is likely to dominate the others in many contexts. A notable exception is in settings with one-sided noncompliance, where certain unnormalized estimators have the advantage of being based on a denominator that is bounded away from zero. We use a simulation study and three empirical applications to illustrate our findings. In applications to causal effects of college education using the college proximity instrument (Card, 1995) and causal effects of childbearing using the sibling sex composition instrument (Angrist and Evans, 1998), the unnormalized estimates are clearly unreasonable, with "incorrect" signs, magnitudes, or both. Overall, our results suggest that (i) the relative performance of different kappa weighting estimators varies with features of the data-generating process; and that (ii) the normalized version of Tan (2006)'s estimator may be an attractive alternative in many contexts. Applied researchers with access to a binary instrumental variable should also consider covariate balancing or doubly robust estimators of the LATE.
Context: Forgetting is defined as a gradual process of losing information. Even though there are many studies demonstrating the effect of forgetting in software development, to the best of our knowledge, no study explores the impact of forgetting in software development using a controlled experiment approach. Objective: We would like to provide insights on the impact of forgetting in software development projects. We want to examine whether the recency & frequency of interaction impact forgetting in software development. Methods: We will conduct an experiment that examines the impact of forgetting in software development. Participants will first do an initial task. According to their initial task performance, they will be assigned to either the experiment or the control group. The experiment group will then do two additional tasks to enhance their exposure to the code. Both groups will then do a final task to see if additional exposure to the code benefits the experiment group's performance in the final task. Finally, we will conduct a survey and a recall task with the same participants to collect data about their perceptions of forgetting and quantify their memory performance, respectively.
Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing fairness in machine learning models have considered the centralized setting, in which the algorithm has access to the users' data. This paper introduces an algorithm to enforce group fairness in private federated learning, where users' data does not leave their devices. First, the paper extends the modified method of differential multipliers to empirical risk minimization with fairness constraints, thus providing an algorithm to enforce fairness in the central setting. Then, this algorithm is extended to the private federated learning setting. The proposed algorithm, \texttt{FPFL}, is tested on a federated version of the Adult dataset and an "unfair" version of the FEMNIST dataset. The experiments on these datasets show how private federated learning accentuates unfairness in the trained models, and how FPFL is able to mitigate such unfairness.