Automated program repair is an emerging technology that seeks to automatically rectify bugs and vulnerabilities using learning, search, and semantic analysis. Trust in automatically generated patches is necessary for achieving greater adoption of program repair. Towards this goal, we survey more than 100 software practitioners to understand the artifacts and setups needed to enhance trust in automatically generated patches. Based on the feedback from the survey on developer preferences, we quantitatively evaluate existing test-suite based program repair tools. We find that they cannot produce high-quality patches within a top-10 ranking and an acceptable time period of 1 hour. The developer feedback from our qualitative study and the observations from our quantitative examination of existing repair tools point to actionable insights to drive program repair research. Specifically, we note that producing repairs within an acceptable time-bound is very much dependent on leveraging an abstract search space representation of a rich enough search space. Moreover, while additional developer inputs are valuable for generating or ranking patches, developers do not seem to be interested in a significant human-in-the-loop interaction.
Transfer learning has witnessed remarkable progress in recent years, for example, with the introduction of augmentation-based contrastive self-supervised learning methods. While a number of large-scale empirical studies on the transfer performance of such models have been conducted, there is not yet an agreed-upon set of control baselines, evaluation practices, and metrics to report, which often hinders a nuanced and calibrated understanding of the real efficacy of the methods. We share an evaluation standard that aims to quantify and communicate transfer learning performance in an informative and accessible setup. This is done by baking a number of simple yet critical control baselines in the evaluation method, particularly the blind-guess (quantifying the dataset bias), scratch-model (quantifying the architectural contribution), and maximal-supervision (quantifying the upper-bound). To demonstrate how the evaluation standard can be employed, we provide an example empirical study investigating a few basic questions about self-supervised learning. For example, using this standard, the study shows the effectiveness of existing self-supervised pre-training methods is skewed towards image classification tasks versus dense pixel-wise predictions. In general, we encourage using/reporting the suggested control baselines in evaluating transfer learning in order to gain a more meaningful and informative understanding.
It is imperative for all stakeholders that digital forensics investigations produce reliable results to ensure the field delivers a positive contribution to the pursuit of justice across the globe. Some aspects of these investigations are inevitably contingent on trust, however this is not always explicitly considered or critically evaluated. Erroneously treating features of the investigation as trusted can be enormously damaging to the overall reliability of an investigations findings as well as the confidence that external stakeholders can have in it. As an example, digital crime scenes can be manipulated by tampering with the digital artefacts left on devices, yet recent studies have shown that efforts to detect occurrences of this are rare and argue that this leaves digital forensics investigations vulnerable to accusations of inaccuracy. In this paper a new approach to digital forensics is considered based on the concept of Zero Trust, an increasingly popular design in network security. Zero Trust describes the practitioner mindset and principles upon which the reliance on trust in network components is eliminated in favour of dynamic verification of network interactions. An initial definition of Zero Trust Digital Forensics will be proposed and then a specific example considered showing how this strategy can be applied to digital forensic investigations to mitigate against the specific risk of evidence tampering. A definition of Zero Trust Digital Forensics is proposed, specifically that it is a strategy adopted by investigators whereby each aspect of an investigation is assumed to be unreliable until verified. A new principle will be introduced, namely the multifaceted verification of digital artefacts that can be used by practitioners who wish to adopt a Zero Trust Digital Forensics strategy during their investigations...
The ongoing NIST standardization process has shown that Proof of Knowledge (PoK) based signatures have become an important type of possible post-quantum signatures. Regarding code-based cryptography, the original approach for PoK based signatures is the Stern protocol which allows to prove the knowledge of a small weight vector solving a given instance of the Syndrome Decoding (SD) problem over F2. It features a soundness error equal to 2/3. This protocol was improved a few years later by V\'eron who proposed a variation of the scheme based on the General Syndrome Decoding (GSD) problem which leads to better results in term of communication. A few years later, the AGS protocol introduced a variation of the V\'eron protocol based on Quasi-Cyclic (QC) matrices. The AGS protocol permits to obtain an asymptotic soundness error of 1/2 and an improvement in term of communications. In the present paper, we introduce the Quasi-Cyclic Stern PoK which constitutes an adaptation of the AGS scheme in a SD context, as well as several new optimizations for code-based PoK. Our main optimization on the size of the signature can't be applied to GSD based protocols such as AGS and therefore motivated the design of our new protocol. In addition, we also provide a special soundness proof that is compatible with the use of the Fiat-Shamir transform for 5-round protocols. This approach is valid for our protocol but also for the AGS protocol which was lacking such a proof. We compare our results with existing signatures including the recent code-based signatures based on PoK leveraging the MPC in the head paradigm. In practice, our new protocol is as fast as AGS while reducing its associated signature length by 20%. As a consequence, it constitutes an interesting trade-off between signature length and execution time for the design of a code-based signature relying only on the difficulty of the SD problem.
The most natural method for evaluating program repair systems is to run them on bug datasets, such as Defects4J. Yet, using this evaluation technique on arbitrary real-world programs requires heavy configuration. In this paper, we propose a purely static method to evaluate the potential of the search space of repair approaches. This new method enables researchers and practitioners to encode the search spaces of repair approaches and select potentially useful ones without struggling with tool configuration and execution. We encode the search spaces by specifying the repair strategies they employ. Next, we use the specifications to check whether past commits lie in repair search spaces. For a repair approach, including many human-written past commits in its search space indicates its potential to generate useful patches. We implement our evaluation method in LighteR. LighteR gets a Git repository and outputs a list of commits whose source code changes lie in repair search spaces. We run LighteR on 55,309 commits from the history of 72 Github repositories with and show that LighteR's precision and recall are 77% and 92%, respectively. Overall, our experiments show that our novel method is both lightweight and effective to study the search space of program repair approaches.
Biomimicry is a powerful science that takes advantage of nature's remarkable ability to devise innovative solutions to challenging problems. In this work, we use asymptotic methods to develop the mathematical foundations for the exchange of design inspiration and features between biological hearing systems, signal processing algorithms and acoustic metamaterials. Our starting point is a concise asymptotic analysis of high-contrast acoustic metamaterials. We are able to fine tune this graded structure to mimic the biomechanical properties of the cochlea, at the same scale. We then turn our attention to developing a biomimetic signal processing algorithm. We use the response of the cochlea-like metamaterial as an initial filtering layer and then add additional biomimetic processing stages, designed to mimic the human auditory system's ability to recognise the global properties of natural sounds. This demonstrates the three-way exchange of ideas that, thanks to our analysis, is possible between signal processing, metamaterials and biology.
We present Wolverine2, an integrated Debug-Localize-Repair environment for heap manipulating programs. Wolverine2 provides an interactive debugging environment: while concretely executing a program via on an interactive shell supporting common debugging facilities, Wolverine2 displays the abstract program states (as box-and-arrow diagrams) as a visual aid to the programmer, packages a novel, proof-directed repair algorithm to quickly synthesize the repair patches and a new bug localization algorithm to reduce the search space of repairs. Wolverine2 supports "hot-patching" of the generated patches to provide a seamless debugging environment, and also facilitates new debug-localize-repair possibilities: \textit{specification refinement} and \textit{checkpoint-based hopping}. We evaluate Wolverine2 on 6400 buggy programs (generated using automated fault injection) on a variety of data-structures like singly, doubly, and circular linked lists, AVL trees, Red-Black trees, Splay Trees and Binary Search Trees; Wolverine2 could repair all the buggy instances within realistic programmer wait-time (less than 5 sec in most cases). Wolverine2 could also repair more than 80\% of the 247 (buggy) student submissions where a reasonable attempt was made.
Recently, many works have tried to augment the performance of Chinese named entity recognition (NER) using word lexicons. As a representative, Lattice-LSTM (Zhang and Yang, 2018) has achieved new benchmark results on several public Chinese NER datasets. However, Lattice-LSTM has a complex model architecture. This limits its application in many industrial areas where real-time NER responses are needed. In this work, we propose a simple but effective method for incorporating the word lexicon into the character representations. This method avoids designing a complicated sequence modeling architecture, and for any neural NER model, it requires only subtle adjustment of the character representation layer to introduce the lexicon information. Experimental studies on four benchmark Chinese NER datasets show that our method achieves an inference speed up to 6.15 times faster than those of state-ofthe-art methods, along with a better performance. The experimental results also show that the proposed method can be easily incorporated with pre-trained models like BERT.
Interest in the field of Explainable Artificial Intelligence has been growing for decades and has accelerated recently. As Artificial Intelligence models have become more complex, and often more opaque, with the incorporation of complex machine learning techniques, explainability has become more critical. Recently, researchers have been investigating and tackling explainability with a user-centric focus, looking for explanations to consider trustworthiness, comprehensibility, explicit provenance, and context-awareness. In this chapter, we leverage our survey of explanation literature in Artificial Intelligence and closely related fields and use these past efforts to generate a set of explanation types that we feel reflect the expanded needs of explanation for today's artificial intelligence applications. We define each type and provide an example question that would motivate the need for this style of explanation. We believe this set of explanation types will help future system designers in their generation and prioritization of requirements and further help generate explanations that are better aligned to users' and situational needs.
Reinforcement learning (RL) algorithms have been around for decades and been employed to solve various sequential decision-making problems. These algorithms however have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This paper addresses an important aspect of deep RL related to situations that demand multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multi-agent deep RL (MADRL) is presented, including non-stationarity, partial observability, continuous state and action spaces, multi-agent training schemes, multi-agent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed, with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to future development of more robust and highly useful multi-agent learning methods for solving real-world problems.
Generative adversarial nets (GANs) have generated a lot of excitement. Despite their popularity, they exhibit a number of well-documented issues in practice, which apparently contradict theoretical guarantees. A number of enlightening papers have pointed out that these issues arise from unjustified assumptions that are commonly made, but the message seems to have been lost amid the optimism of recent years. We believe the identified problems deserve more attention, and highlight the implications on both the properties of GANs and the trajectory of research on probabilistic models. We recently proposed an alternative method that sidesteps these problems.