Learners are often introduced to programming via dedicated languages such as Scratch, where block-based commands are assembled visually in order to control the interactions of graphical sprites. Automated testing of such programs is an important prerequisite for supporting debugging, providing hints, or assessing learning outcomes. However, writing tests for Scratch programs can be challenging: The game-like and randomised nature of typical Scratch programs makes it difficult to identify specific timed input sequences used to control the programs. Furthermore, precise test assertions to check the resulting program states are incompatible with the fundamental principle of creative freedom in programming in Scratch, where correct program behaviour may be implemented with deviations in the graphical appearance or timing of the program. The event-driven and actor-oriented nature of Scratch programs, however, makes them a natural fit for describing program behaviour using finite state machines. In this paper, we introduce a model-based testing approach by extending Whisker, an automated testing framework for Scratch programs. The model-based extension describes expected program behaviour in terms of state machines, which makes it feasible to check the abstract behaviour of a program independent of exact timing and pixel-precise graphical details, and to automatically derive test inputs testing even challenging programs. A video demonstrating model-based testing with Whisker is available at the following URL: //youtu.be/edgCNbGSGEY
In recent years, with the rapid growth of Internet data, the number and types of scientific and technological resources are also rapidly expanding. However, the increase in the number and category of information data will also increase the cost of information acquisition. For technology-based enterprises or users, in addition to general papers, patents, etc., policies related to technology or the development of their industries should also belong to a type of scientific and technological resources. The cost and difficulty of acquiring users. Extracting valuable science and technology policy resources from a huge amount of data with mixed contents and providing accurate and fast retrieval will help to break down information barriers and reduce the cost of information acquisition, which has profound social significance and social utility. This article focuses on the difficulties and problems in the field of science and technology policy, and introduces related technologies and developments.
Video-based programming tutorials are a popular form of tutorial used by authors to guide learners to code. Still, the interactivity of these videos is limited primarily to control video flow. There are existing works with increased interactivity that are shown to improve the learning experience. Still, these solutions require setting up a custom recording environment and are not well-integrated with the playback environment. This paper describes our integrated ITSS environment and evaluates the ease of authoring and playback of our interactive programming tutorials. Our environment is designed to run within the browser sandbox and is less intrusive to record interactivity actions. We develop a recording approach that tracks the author's interactivity actions (e.g., typing code, highlighting words, scrolling panels) on the browser and stored in text and audio formats. We replay these actions using the recorded artefacts for learners to have a more interactive, integrated and realistic playback of the author's actions instead of watching video frames. Our design goals are 1) efficient recording and playback, 2) extensible interactivity features to help students learn better, and 3) a scalable web-based environment. Our first user study of 20 participants who carry out the author tasks agree that it is efficient and easy to author interactive videos in our environment with no additional software needed. Our second user study of 84 students using the environment agrees that the increased interactivity can help them learn better over a video-based tutorial. Our performance test shows that the environment can scale to support up to 500 concurrent users. We hope our open-source environment enable more educators to create interactive programming tutorials.
Generating a test suite for a quantum program such that it has the maximum number of failing tests is an optimization problem. For such optimization, search-based testing has shown promising results in the context of classical programs. To this end, we present a test generation tool for quantum programs based on a genetic algorithm, called QuSBT (Search-based Testing of Quantum Programs). QuSBT automates the testing of quantum programs, with the aim of finding a test suite having the maximum number of failing test cases. QuSBT utilizes IBM's Qiskit as the simulation framework for quantum programs. We present the tool architecture in addition to the implemented methodology (i.e., the encoding of the search individual, the definition of the fitness function expressing the search problem, and the test assessment w.r.t. two types of failures). Finally, we report results of the experiments in which we tested a set of faulty quantum programs with QuSBT to assess its effectiveness. Repository (code and experimental results): //github.com/Simula-COMPLEX/qusbt-tool Video: //youtu.be/3apRCtluAn4
We study the problem of testing whether a function $f: \mathbb{R}^n \to \mathbb{R}$ is a polynomial of degree at most $d$ in the \emph{distribution-free} testing model. Here, the distance between functions is measured with respect to an unknown distribution $\mathcal{D}$ over $\mathbb{R}^n$ from which we can draw samples. In contrast to previous work, we do not assume that $\mathcal{D}$ has finite support. We design a tester that given query access to $f$, and sample access to $\mathcal{D}$, makes $(d/\varepsilon)^{O(1)}$ many queries to $f$, accepts with probability $1$ if $f$ is a polynomial of degree $d$, and rejects with probability at least $2/3$ if every degree-$d$ polynomial $P$ disagrees with $f$ on a set of mass at least $\varepsilon$ with respect to $\mathcal{D}$. Our result also holds under mild assumptions when we receive only a polynomial number of bits of precision for each query to $f$, or when $f$ can only be queried on rational points representable using a logarithmic number of bits. Along the way, we prove a new stability theorem for multivariate polynomials that may be of independent interest.
The dynamic response of the legged robot locomotion is non-Lipschitz and can be stochastic due to environmental uncertainties. To test, validate, and characterize the safety performance of legged robots, existing solutions on observed and inferred risk can be incomplete and sampling inefficient. Some formal verification methods suffer from the model precision and other surrogate assumptions. In this paper, we propose a scenario sampling based testing framework that characterizes the overall safety performance of a legged robot by specifying (i) where (in terms of a set of states) the robot is potentially safe, and (ii) how safe the robot is within the specified set. The framework can also help certify the commercial deployment of the legged robot in real-world environment along with human and compare safety performance among legged robots with different mechanical structures and dynamic properties. The proposed framework is further deployed to evaluate a group of state-of-the-art legged robot locomotion controllers from various model-based, deep neural network involved, and reinforcement learning based methods in the literature. Among a series of intended work domains of the studied legged robots (e.g. tracking speed on sloped surface, with abrupt changes on demanded velocity, and against adversarial push-over disturbances), we show that the method can adequately capture the overall safety characterization and the subtle performance insights. Many of the observed safety outcomes, to the best of our knowledge, have never been reported by the existing work in the legged robot literature.
Present-day atomistic simulations generate long trajectories of ever more complex systems. Analyzing these data, discovering metastable states, and uncovering their nature is becoming increasingly challenging. In this paper, we first use the variational approach to conformation dynamics to discover the slowest dynamical modes of the simulations. This allows the different metastable states of the system to be located and organized hierarchically. The physical descriptors that characterize metastable states are discovered by means of a machine learning method. We show in the cases of two proteins, Chignolin and Bovine Pancreatic Trypsin Inhibitor, how such analysis can be effortlessly performed in a matter of seconds. Another strength of our approach is that it can be applied to the analysis of both unbiased and biased simulations.
Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used scheduling algorithm. Many scheduling heuristics have been proposed in existing works; nevertheless, they are often tested in oversimplified environments. We provide an extensible simulation environment designed for prototyping and benchmarking task schedulers, which contains implementations of various scheduling algorithms and is open-sourced, in order to be fully reproducible. We use this environment to perform a comprehensive analysis of workflow scheduling algorithms with a focus on quantifying the effect of scheduling challenges that have so far been mostly neglected, such as delays between scheduler invocations or partially unknown task durations. Our results indicate that network models used by many previous works might produce results that are off by an order of magnitude in comparison to a more realistic model. Additionally, we show that certain implementation details of scheduling algorithms which are often neglected can have a large effect on the scheduler's performance, and they should thus be described in great detail to enable proper evaluation.
We present a novel static analysis technique to derive higher moments for program variables for a large class of probabilistic loops with potentially uncountable state spaces. Our approach is fully automatic, meaning it does not rely on externally provided invariants or templates. We employ algebraic techniques based on linear recurrences and introduce program transformations to simplify probabilistic programs while preserving their statistical properties. We develop power reduction techniques to further simplify the polynomial arithmetic of probabilistic programs and define the theory of moment-computable probabilistic loops for which higher moments can precisely be computed. Our work has applications towards recovering probability distributions of random variables and computing tail probabilities. The empirical evaluation of our results demonstrates the applicability of our work on many challenging examples.
Behaviors of the synthetic characters in current military simulations are limited since they are generally generated by rule-based and reactive computational models with minimal intelligence. Such computational models cannot adapt to reflect the experience of the characters, resulting in brittle intelligence for even the most effective behavior models devised via costly and labor-intensive processes. Observation-based behavior model adaptation that leverages machine learning and the experience of synthetic entities in combination with appropriate prior knowledge can address the issues in the existing computational behavior models to create a better training experience in military training simulations. In this paper, we introduce a framework that aims to create autonomous synthetic characters that can perform coherent sequences of believable behavior while being aware of human trainees and their needs within a training simulation. This framework brings together three mutually complementary components. The first component is a Unity-based simulation environment - Rapid Integration and Development Environment (RIDE) - supporting One World Terrain (OWT) models and capable of running and supporting machine learning experiments. The second is Shiva, a novel multi-agent reinforcement and imitation learning framework that can interface with a variety of simulation environments, and that can additionally utilize a variety of learning algorithms. The final component is the Sigma Cognitive Architecture that will augment the behavior models with symbolic and probabilistic reasoning capabilities. We have successfully created proof-of-concept behavior models leveraging this framework on realistic terrain as an essential step towards bringing machine learning into military simulations.
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.