Online coding environments can help support computing students gain programming practice at their own pace. Especially informative feedback can be beneficial during such self-guided, independent study phases. This research aims at the identification of feedback types applied by CodingBat, Scratch and Blockly. Tutoring feedback as coined by Susanne Narciss along with the specification of subtypes by Keuning, Jeuring and Heeren constitute the theoretical basis. Accordingly, the five categories of elaborated feedback (knowledge about task requirements, knowledge about concepts, knowledge about mistakes, knowledge about how to proceed, and knowledge about meta-cognition) and their subtypes were utilized for the analysis of available feedback options. The study revealed difficulties in identifying clear-cut boundaries between feedback types, as the offered feedback usually integrates more than one type or subtype. Moreover, currently defined feedback types do not rigorously distinguish individualized and generic feedback. The lack of granularity is also evident in the absence of subtypes relating to the knowledge type of the task. The analysis thus has implications for the future design and investigation of applied tutoring feedback. It encourages future research on feedback types and their implementation in the context of programming exercises to define feedback types that match the demands of novice programmers.
When training a machine learning classifier on data where one of the classes is intrinsically rare, the classifier will often assign too few sources to the rare class. To address this, it is common to up-weight the examples of the rare class to ensure it isn't ignored. It is also a frequent practice to train on restricted data where the balance of source types is closer to equal for the same reason. Here we show that these practices can bias the model toward over-assigning sources to the rare class. We also explore how to detect when training data bias has had a statistically significant impact on the trained model's predictions, and how to reduce the bias's impact. While the magnitude of the impact of the techniques developed here will vary with the details of the application, for most cases it should be modest. They are, however, universally applicable to every time a machine learning classification model is used, making them analogous to Bessel's correction to the sample variance.
Maximal Extractable Value (MEV) represents excess value captured by miners (or validators) from users in a cryptocurrency network. This excess value often comes from reordering users transactions to maximize fees or inserting new transactions that allow a miner to front-run users' transactions. The most common type of MEV involves what is known as a sandwich attack against a user trading on a popular class of automated market makers known as CFMMs. In this first paper of a series on MEV, we analyze game theoretic properties of MEV in CFMMs that we call \textit{reordering} and \textit{routing} MEV. In the case of reordering, we show conditions when the maximum price impact caused by the reordering of sandwich attacks in a sequence of trades relative to the average price impact is $O(\log n)$ in the number of user trades. In the case of routing, we present examples where the existence of MEV both degrades and counterintuitively \emph{improves} the quality of routing. We construct an analogue of the price of anarchy for this setting and demonstrate that if the impact of a sandwich attack is localized in a suitable sense, then the price of anarchy is constant. Combined, our results provide improvements that both MEV searchers and CFMM designers can utilize for estimating costs and profits of MEV.
We propose application-layer coding schemes to recover lost data in delay-sensitive uplink (sensor-to-gateway) communications in the Internet of Things. Built on an approach that combines retransmissions and forward erasure correction, the proposed schemes' salient features include low computational complexity and the ability to exploit sporadic receiver feedback for efficient data recovery. Reduced complexity is achieved by keeping the number of coded transmissions as low as possible and by devising a mechanism to compute the optimal degree of a coded packet in O(1). Our major contributions are: (a) An enhancement to an existing scheme called windowed coding, whose complexity is greatly reduced and data recovery performance is improved by our proposed approach. (b) A technique that combines elements of windowed coding with a new feedback structure to further reduce the coding complexity and improve data recovery. (c) A coded forwarding scheme in which a relay node provides further resilience against packet loss by overhearing source-to-destination communications and making forwarding decisions based on overheard information.
We consider studies where multiple measures on an outcome variable are collected over time, but some subjects drop out before the end of follow up. Analyses of such data often proceed under either a 'last observation carried forward' or 'missing at random' assumption. We consider two alternative strategies for identification; the first is closely related to the difference-in-differences methodology in the causal inference literature. The second enables correction for violations of the parallel trend assumption, so long as one has access to a valid 'bespoke instrumental variable'. These are compared with existing approaches, first conceptually and then in an analysis of data from the Framingham Heart Study.
Learning controllers from data for stabilizing dynamical systems typically follows a two step process of first identifying a model and then constructing a controller based on the identified model. However, learning models means identifying generic descriptions of the dynamics of systems, which can require large amounts of data and extracting information that are unnecessary for the specific task of stabilization. The contribution of this work is to show that if a linear dynamical system has dimension (McMillan degree) $n$, then there always exist $n$ states from which a stabilizing feedback controller can be constructed, independent of the dimension of the representation of the observed states and the number of inputs. By building on previous work, this finding implies that any linear dynamical system can be stabilized from fewer observed states than the minimal number of states required for learning a model of the dynamics. The theoretical findings are demonstrated with numerical experiments that show the stabilization of the flow behind a cylinder from less data than necessary for learning a model.
Artificial intelligence (AI) has the potential to greatly improve society, but as with any powerful technology, it comes with heightened risks and responsibilities. Current AI research lacks a systematic discussion of how to manage long-tail risks from AI systems, including speculative long-term risks. Keeping in mind the potential benefits of AI, there is some concern that building ever more intelligent and powerful AI systems could eventually result in systems that are more powerful than us; some say this is like playing with fire and speculate that this could create existential risks (x-risks). To add precision and ground these discussions, we provide a guide for how to analyze AI x-risk, which consists of three parts: First, we review how systems can be made safer today, drawing on time-tested concepts from hazard analysis and systems safety that have been designed to steer large processes in safer directions. Next, we discuss strategies for having long-term impacts on the safety of future systems. Finally, we discuss a crucial concept in making AI systems safer by improving the balance between safety and general capabilities. We hope this document and the presented concepts and tools serve as a useful guide for understanding how to analyze AI x-risk.
We identify a new class of vulnerabilities in implementations of differential privacy. Specifically, they arise when computing basic statistics such as sums, thanks to discrepancies between the implemented arithmetic using finite data types (namely, ints or floats) and idealized arithmetic over the reals or integers. These discrepancies cause the sensitivity of the implemented statistics (i.e., how much one individual's data can affect the result) to be much higher than the sensitivity we expect. Consequently, essentially all differential privacy libraries fail to introduce enough noise to hide individual-level information as required by differential privacy, and we show that this may be exploited in realistic attacks on differentially private query systems. In addition to presenting these vulnerabilities, we also provide a number of solutions, which modify or constrain the way in which the sum is implemented in order to recover the idealized or near-idealized bounds on sensitivity.
While vaccinations continue to be rolled out to curb the ongoing COVID-19 pandemic, their verification is becoming a requirement for the re-incorporation of individuals into many social activities or travel. Blockchain technology has been widely proposed to manage vaccination records and their verification in many politically-bound regions. However, the high contagiousness of COVID-19 calls for a global vaccination campaign. Therefore, a blockchain for vaccination management must scale up to support such a campaign and be adaptable to the requirements of different countries. While there have been many proposals of blockchain frameworks that balance the access and immutability of vaccination records, their scalability, a critical feature, has not yet been addressed. In this paper, we propose a scalable and cooperative Global Immunization Information Blockchain-based System (GEOS) that leverages the global interoperability of immunization information systems. We model GEOS and describe its requirements, features, and operation. We analyze the communications and the delays incurred by the national and international consensus processes and blockchain interoperability in GEOS. Such communications are pivotal in enabling global-scale interoperability and access to electronic vaccination records for verification. We show that GEOS ably keeps up with the global vaccination rates of COVID-19 as an example of its scalability.
Recommender systems are the algorithms which select, filter, and personalize content across many of the worlds largest platforms and apps. As such, their positive and negative effects on individuals and on societies have been extensively theorized and studied. Our overarching question is how to ensure that recommender systems enact the values of the individuals and societies that they serve. Addressing this question in a principled fashion requires technical knowledge of recommender design and operation, and also critically depends on insights from diverse fields including social science, ethics, economics, psychology, policy and law. This paper is a multidisciplinary effort to synthesize theory and practice from different perspectives, with the goal of providing a shared language, articulating current design approaches, and identifying open problems. It is not a comprehensive survey of this large space, but a set of highlights identified by our diverse author cohort. We collect a set of values that seem most relevant to recommender systems operating across different domains, then examine them from the perspectives of current industry practice, measurement, product design, and policy approaches. Important open problems include multi-stakeholder processes for defining values and resolving trade-offs, better values-driven measurements, recommender controls that people use, non-behavioral algorithmic feedback, optimization for long-term outcomes, causal inference of recommender effects, academic-industry research collaborations, and interdisciplinary policy-making.
This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.