The importance of programming education has lead to dedicated educational programming environments, where users visually arrange block-based programming constructs that typically control graphical, interactive game-like programs. The Scratch programming environment is particularly popular, with more than 70 million registered users at the time of this writing. While the block-based nature of Scratch helps learners by preventing syntactical mistakes, there nevertheless remains a need to provide feedback and support in order to implement desired functionality. To support individual learning and classroom settings, this feedback and support should ideally be provided in an automated fashion, which requires tests to enable dynamic program analysis. The Whisker framework enables automated testing of Scratch programs, but creating these automated tests for Scratch programs is challenging. In this paper, we therefore investigate how to automatically generate Whisker tests. This raises important challenges: First, game-like programs are typically randomised, leading to flaky tests. Second, Scratch programs usually consist of animations and interactions with long delays, inhibiting the application of classical test generation approaches. Evaluation on common programming exercises, a random sample of 1000 Scratch user programs, and the 1000 most popular Scratch programs demonstrates that our approach enables Whisker to reliably accelerate test executions, and even though many Scratch programs are small and easy to cover, there are many unique challenges for which advanced search-based test generation using many-objective algorithms is needed in order to achieve high coverage.
Modern code review is a critical and indispensable practice in a pull-request development paradigm that prevails in Open Source Software (OSS) development. Finding a suitable reviewer in projects with massive participants thus becomes an increasingly challenging task. Many reviewer recommendation approaches (recommenders) have been developed to support this task which apply a similar strategy, i.e. modeling the review history first then followed by predicting/recommending a reviewer based on the model. Apparently, the better the model reflects the reality in review history, the higher recommender's performance we may expect. However, one typical scenario in a pull-request development paradigm, i.e. one Pull-Request (PR) (such as a revision or addition submitted by a contributor) may have multiple reviewers and they may impact each other through publicly posted comments, has not been modeled well in existing recommenders. We adopted the hypergraph technique to model this high-order relationship (i.e. one PR with multiple reviewers herein) and developed a new recommender, namely HGRec, which is evaluated by 12 OSS projects with more than 87K PRs, 680K comments in terms of accuracy and recommendation distribution. The results indicate that HGRec outperforms the state-of-the-art recommenders on recommendation accuracy. Besides, among the top three accurate recommenders, HGRec is more likely to recommend a diversity of reviewers, which can help to relieve the core reviewers' workload congestion issue. Moreover, since HGRec is based on hypergraph, which is a natural and interpretable representation to model review history, it is easy to accommodate more types of entities and realistic relationships in modern code review scenarios. As the first attempt, this study reveals the potentials of hypergraph on advancing the pragmatic solutions for code reviewer recommendation.
We consider the problem of extracting joint and individual signals from multi-view data, that is data collected from different sources on matched samples. While existing methods for multi-view data decomposition explore single matching of data by samples, we focus on double-matched multi-view data (matched by both samples and source features). Our motivating example is the miRNA data collected from both primary tumor and normal tissues of the same subjects; the measurements from two tissues are thus matched both by subjects and by miRNAs. Our proposed double-matched matrix decomposition allows to simultaneously extract joint and individual signals across subjects, as well as joint and individual signals across miRNAs. Our estimation approach takes advantage of double-matching by formulating a new type of optimization problem with explicit row space and column space constraints, for which we develop an efficient iterative algorithm. Numerical studies indicate that taking advantage of double-matching leads to superior signal estimation performance compared to existing multi-view data decomposition based on single-matching. We apply our method to miRNA data as well as data from the English Premier League soccer matches, and find joint and individual multi-view signals that align with domain specific knowledge.
Approximately 50% of development resources are devoted to UI development tasks [9]. Occupying a large proportion of development resources, developing icons can be a time-consuming task, because developers need to consider not only effective implementation methods but also easy-to-understand descriptions. In this paper, we present Auto-Icon+, an approach for automatically generating readable and efficient code for icons from design artifacts. According to our interviews to understand the gap between designers (icons are assembled from multiple components) and developers (icons as single images), we apply a heuristic clustering algorithm to compose the components into an icon image. We then propose an approach based on a deep learning model and computer vision methods to convert the composed icon image to fonts with descriptive labels, thereby reducing the laborious manual effort for developers and facilitating UI development. We quantitatively evaluate the quality of our method in the real world UI development environment and demonstrate that our method offers developers accurate, efficient, readable, and usable code for icon designs, in terms of saving 65.2% implementing time.
Video-based programming tutorials are a popular form of tutorial used by authors to guide learners to code. Still, the interactivity of these videos is limited primarily to control video flow. There are existing works with increased interactivity that are shown to improve the learning experience. Still, these solutions require setting up a custom recording environment and are not well-integrated with the playback environment. This paper describes our integrated ITSS environment and evaluates the ease of authoring and playback of our interactive programming tutorials. Our environment is designed to run within the browser sandbox and is less intrusive to record interactivity actions. We develop a recording approach that tracks the author's interactivity actions (e.g., typing code, highlighting words, scrolling panels) on the browser and stored in text and audio formats. We replay these actions using the recorded artefacts for learners to have a more interactive, integrated and realistic playback of the author's actions instead of watching video frames. Our design goals are 1) efficient recording and playback, 2) extensible interactivity features to help students learn better, and 3) a scalable web-based environment. Our first user study of 20 participants who carry out the author tasks agree that it is efficient and easy to author interactive videos in our environment with no additional software needed. Our second user study of 84 students using the environment agrees that the increased interactivity can help them learn better over a video-based tutorial. Our performance test shows that the environment can scale to support up to 500 concurrent users. We hope our open-source environment enable more educators to create interactive programming tutorials.
Modern web services routinely provide REST APIs for clients to access their functionality. These APIs present unique challenges and opportunities for automated testing, driving the recent development of many techniques and tools that generate test cases for API endpoints using various strategies. Understanding how these techniques compare to one another is difficult, as they have been evaluated on different benchmarks and using different metrics. To fill this gap, we performed an empirical study aimed to understand the landscape in automated testing of REST APIs and guide future research in this area. We first identified, through a systematic selection process, a set of 10 state-of-the-art REST API testing tools that included tools developed by both researchers and practitioners. We then applied these tools to a benchmark of 20 real-world open-source RESTful services and analyzed their performance in terms of code coverage achieved and unique failures triggered. This analysis allowed us to identify strengths, weaknesses, and limitations of the tools considered and of their underlying strategies, as well as implications of our findings for future research in this area.
With the rapid growth of software, using third-party libraries (TPLs) has become increasingly popular. The prosperity of the library usage has provided the software engineers with handful of methods to facilitate and boost the program development. Unfortunately, it also poses great challenges as it becomes much more difficult to manage the large volume of libraries. Researches and studies have been proposed to detect and understand the TPLs in the software. However, most existing approaches rely on syntactic features, which are not robust when these features are changed or deliberately hidden by the adversarial parties. Moreover, these approaches typically model each of the imported libraries as a whole, therefore, cannot be applied to scenarios where the host software only partially uses the library code segments. To detect both fully and partially imported TPLs at the semantic level, we propose ModX, a framework that leverages novel program modularization techniques to decompose the program into finegrained functionality-based modules. By extracting both syntactic and semantic features, it measures the distance between modules to detect similar library module reuse in the program. Experimental results show that ModX outperforms other modularization tools by distinguishing more coherent program modules with 353% higher module quality scores and beats other TPL detection tools with on average 17% better in precision and 8% better in recall.
Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventional methods for off-policy PG estimation often suffer from either significant bias or exponentially large variance. In this paper, we propose the double Fitted PG estimation (FPG) algorithm. FPG can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. In the case of linear value function approximation, we provide a tight finite-sample upper bound on policy gradient estimation error, that is governed by the amount of distribution mismatch measured in feature space. We also establish the asymptotic normality of FPG estimation error with a precise covariance characterization, which is further shown to be statistically optimal with a matching Cramer-Rao lower bound. Empirically, we evaluate the performance of FPG on both policy gradient estimation and policy optimization, using either softmax tabular or ReLU policy networks. Under various metrics, our results show that FPG significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.
Blockchain and smart contract technology are novel approaches to data and code management that facilitate trusted computing by allowing for development in a distributed and decentralized manner. Testing smart contracts comes with its own set of challenges which have not yet been fully identified and explored. Although existing tools can identify and discover known vulnerabilities and their interactions on the Ethereum blockchain through random search or symbolic execution, these tools generally do not produce test suites suitable for human oracles. In this paper, we present AGSOLT (Automated Generator of Solidity Test Suites). We demonstrate its efficiency by implementing two search algorithms to automatically generate test suites for stand-alone Solidity smart contracts, taking into account some of the blockchain-specific challenges. To test AGSOLT, we compared a random search algorithm and a genetic algorithm on a set of 36 real-world smart contracts. We found that AGSOLT is capable of achieving high branch coverage with both approaches and even discovered some errors in some of the most popular Solidity smart contracts on Github.
Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used scheduling algorithm. Many scheduling heuristics have been proposed in existing works; nevertheless, they are often tested in oversimplified environments. We provide an extensible simulation environment designed for prototyping and benchmarking task schedulers, which contains implementations of various scheduling algorithms and is open-sourced, in order to be fully reproducible. We use this environment to perform a comprehensive analysis of workflow scheduling algorithms with a focus on quantifying the effect of scheduling challenges that have so far been mostly neglected, such as delays between scheduler invocations or partially unknown task durations. Our results indicate that network models used by many previous works might produce results that are off by an order of magnitude in comparison to a more realistic model. Additionally, we show that certain implementation details of scheduling algorithms which are often neglected can have a large effect on the scheduler's performance, and they should thus be described in great detail to enable proper evaluation.
In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.