In this paper, we present the findings from a survey study to investigate how developers and managers define and trade-off developer productivity and software quality (two related lenses into software development). We found that developers and managers, as cohorts, are not well aligned in their views of what it means to be productive (developers think of productivity in terms of activity, while more managers think of productivity in terms of performance). We also found that developers are not accurate at predicting their managers' views of productivity. In terms of quality, we found that individual developers and managers have quite varied views of what quality means to them, but as cohorts they are closely aligned in their different views, with the majority in both groups defining quality in terms of robustness. Over half of the developers and managers reported that quality can be traded for higher productivity and why this trade-off can be justified, while one third consider quality as a necessary part of productivity that cannot be traded. We also present a new descriptive framework for quality, TRUCE, that we synthesize from the survey responses. We call for more discussion between developers and managers about what they each consider as important software quality attributes, and to have open debate about how software quality relates to developer productivity and what trade-offs should or should not be made.
The task of determining which network architectures provide the best ratio in terms of operation and management efforts \textit{vs.} performance guarantees is not trivial. In this paper, we investigate the complexity of operating different types of architectures from the perspective of the space of network parameters that need to be monitored and configured. We present OPLEX, a novel framework based on the analysis of YANG data models of network implementations that enables operators to compare architecture options based on the dimension of the parameter space. We implement OPLEX as part of an operator-friendly tool that can be used to determine the space associated with an architecture in an automatic and flexible way. The benefits of the proposed framework are demonstrated in the use case of Internet Exchange Point (IXP) network architectures, for which we take advantage of the rich set of publicly available data. We also exploit the results of a survey and direct consultations we conducted with operators and vendors of IXPs on their perception of complexity when operating different architectures. OPLEX is flexible, builds upon data models with widespread usage in the community, and provides a practical solution geared towards operators for characterizing the complexity of network architecture options.
As machine learning is increasingly applied to high-impact, high-risk domains, there have been a number of new methods aimed at making AI models more human interpretable. Despite the recent growth of interpretability work, there is a lack of systematic evaluation of proposed techniques. In this work, we propose a novel human evaluation framework HIVE (Human Interpretability of Visual Explanations) for diverse interpretability methods in computer vision; to the best of our knowledge, this is the first work of its kind. We argue that human studies should be the gold standard in properly evaluating how interpretable a method is to human users. While human studies are often avoided due to challenges associated with cost, study design, and cross-method comparison, we describe how our framework mitigates these issues and conduct IRB-approved studies of four methods that represent the diversity of interpretability works: GradCAM, BagNet, ProtoPNet, and ProtoTree. Our results suggest that explanations (regardless of if they are actually correct) engender human trust, yet are not distinct enough for users to distinguish between correct and incorrect predictions. Lastly, we also open-source our framework to enable future studies and to encourage more human-centered approaches to interpretability.
Comparing with lecturer marked assessments, peer assessment is a more comprehensive learning process and many of the associated problems have occurred. In this research work, we study the peer-assessment impact on group learning activities in order to provide a complete and systematic review, increase the practice and quality of the peer assessment process. Pilot studies were conducted and took the form of surveys, focus group interviews, and questionnaires. Prelimi-nary surveys were conducted with 582 students and 276 responses were received, giving a response rate of 47.4%. The results show 37% student will choose individual work over group work if given the choice. In the case study, 82.1% of the total of 28 students have en-joyed working in a group using Facebook as communication tools. 89.3% of the students can demonstrate their skills through group-working and most importantly, 82.1% of them agree that peer assess-ment is an impartial method of assessment with the help of Facebook as proof of self-contribution. Our suggestions to make group work a pleasant experience are by identifying and taking action against the freeloader, giving credit to the deserving students, educating students on how to give constructive feedback and making the assessment pro-cess transparent to all.
Information technology and software services are pervasive, occupying the centre of most aspects of contemporary societies. This has given rise to commonly expected norms and expectations around how such systems should work, appropriate penalties for violating these expectations, and more importantly, indicators of how to reduce the consequences of violations and sanctions. Evidence for expectation violations and ensuing sanctions exists in a range of portals used by individuals and groups to start new friendships, explore new ideas, and provide feedback for products and services. Therein lies insights that could lead to functional socio-technical systems, and general awareness and anticipations of human actions (and interactions) when using information technology and software services. However, limited previous work has examined such artifacts to provide these understandings. To contribute to such understandings and theoretical advancement we study expectation violations in mobile apps, considered among the most engaging socio-technical systems. We used content analysis and expectancy violation theory (EVT) and expectation confirmation theory (ECT) to explore the evidence and nature of sanctions in app reviews for a specific domain of apps. Our outcomes show that users respond to expectation violation with sanctions when their app does not work as anticipated, developers seem to target specific market niches when providing services in an app domain, and users within an app domain respond with similar sanctions. We contribute to the advancement of expectation violation theories, and we provide practical insights for the mobile app community.
Despite recent advances in modern machine learning algorithms, the opaqueness of their underlying mechanisms continues to be an obstacle in adoption. To instill confidence and trust in artificial intelligence systems, Explainable Artificial Intelligence has emerged as a response to improving modern machine learning algorithms' explainability. Inductive Logic Programming (ILP), a subfield of symbolic artificial intelligence, plays a promising role in generating interpretable explanations because of its intuitive logic-driven framework. ILP effectively leverages abductive reasoning to generate explainable first-order clausal theories from examples and background knowledge. However, several challenges in developing methods inspired by ILP need to be addressed for their successful application in practice. For example, existing ILP systems often have a vast solution space, and the induced solutions are very sensitive to noises and disturbances. This survey paper summarizes the recent advances in ILP and a discussion of statistical relational learning and neural-symbolic algorithms, which offer synergistic views to ILP. Following a critical review of the recent advances, we delineate observed challenges and highlight potential avenues of further ILP-motivated research toward developing self-explanatory artificial intelligence systems.
The gender gap in computer science (CS) research is a well-studied problem, with an estimated ratio of 15%--30% women researchers. However, far less is known about gender representation in specific fields within CS. Here, we investigate the gender gap in one large field, computer systems. To this end, we combined data from 53 leading systems conferences with external demographic and bibliometric data to evaluate the ratio of women authors and the factors that might affect this ratio. Our main findings are that women represent only about 10% of systems researchers, and that this ratio is not associated with various conference factors such as size, prestige, double-blind reviewing, and inclusivity policies. Author research experience also does not significantly affect this ratio, although author country and work sector do. The 10% ratio of women authors is significantly lower than that of CS as a whole. Our findings suggest that focusing on inclusivity policies alone cannot address this large gap. Increasing women's participation in systems research will require addressing the systemic causes of their exclusion, which are even more pronounced in systems than in the rest of CS.
Most, if not all, modern software systems are highly configurable to tailor both their functional and non-functional properties to a variety of stakeholders. Due to the black-box nature, it is difficult, if not impossible, to analyze and understand its behavior, such as the interaction between combinations of configuration options with regard to the performance, in particular, which is of great importance to advance the controllability of the underlying software system. This paper proposes a tool, dubbed LONViZ, which is the first of its kind, to facilitate the exploratory analysis of black-box configurable software systems. It starts from a systematic sampling over the configuration space of the underlying system. Then LONViZ seeks to construct a structurally stable LON by synthesizing multiple repeats of sampling results. Finally, exploratory analysis can be conducted on the stable LON from both qualitative and quantitative perspectives. In experiments, we choose four widely used real-world configurable software systems to develop benchmark platforms under 42 different running environments. From our empirical study, we find that LONViZ enables both qualitative and quantitative analysis and disclose various interesting hidden patterns and properties of different software systems.
The phenomenon of architecture erosion can negatively impact the maintenance and evolution of software systems, and manifest in a variety of symptoms during software development. While erosion is often considered rather late, its symptoms can act as early warnings to software developers, if detected in time. In addition to static source code analysis, code reviews can be a source of detecting erosion symptoms and subsequently taking action. In this study, we investigate the erosion symptoms discussed in code reviews, as well as their trends, and the actions taken by developers. Specifically, we conducted an empirical study with the two most active Open Source Software (OSS) projects in the OpenStack community (i.e., Nova and Neutron). We manually checked 21,274 code review comments retrieved by keyword search and random selection, and identified 502 code review comments (from 472 discussion threads) that discuss erosion. Our findings show that (1) the proportion of erosion symptoms is rather low, yet notable in code reviews and the most frequently identified erosion symptoms are architectural violation, duplicate functionality, and cyclic dependency; (2) the declining trend of the identified erosion symptoms in the two OSS projects indicates that the architecture tends to stabilize over time; and (3) most code reviews that identify erosion symptoms have a positive impact on removing erosion symptoms, but a few symptoms still remain and are ignored by developers. The results suggest that (1) code review provides a practical way to reduce erosion symptoms; and (2) analyzing the trend of erosion symptoms can help get an insight about the erosion status of software systems, and subsequently avoid the potential risk of architecture erosion.
The increased use of Unmanned Aerial Vehicles (UAVs) in numerous domains will result in high traffic densities in the low-altitude airspace. Consequently, UAVs Traffic Management (UTM) systems that allow the integration of UAVs in the low-altitude airspace are gaining a lot of momentum. Furthermore, the 5th generation of mobile networks (5G) will most likely provide the underlying support for UTM systems by providing connectivity to UAVs, enabling the control, tracking and communication with remote applications and services. However, UAVs may need to communicate with services with different communication Quality of Service (QoS) requirements, ranging form best-effort services to Ultra-Reliable Low-Latency Communications (URLLC) services. Indeed, 5G can ensure efficient Quality of Service (QoS) enhancements using new technologies, such as network slicing and Multi-access Edge Computing (MEC). In this context, Network Functions Virtualization (NFV) is considered as one of the pillars of 5G systems, by providing a QoS-aware Management and Orchestration (MANO) of softwarized services across cloud and MEC platforms. The MANO process of UAV's services can be enhanced further using the information provided by the UTM system, such as the UAVs'flight plans. In this paper,we propose an extended framework for the management and orchestration of UAVs'services in MECNFV environment by combining the functionalities provided by the MEC-NFV management and orchestration framework with the functionalities of a UTM system. Moreover, we propose an Integer Linear Programming (ILP) model of the placement scheme of our framework and we evaluate its performances.
Conversational systems have come a long way after decades of research and development, from Eliza and Parry in the 60's and 70's, to task-completion systems as in the ATIS project, to intelligent personal assistants such as Siri, and to today's social chatbots like XiaoIce. Social chatbots' appeal lies in not only their ability to respond to users' diverse requests, but also in being able to establish an emotional connection with users. The latter is done by satisfying the users' essential needs for communication, affection, and social belonging. The design of social chatbots must focus on user engagement and take both intellectual quotient (IQ) and emotional quotient (EQ) into account. Users should want to engage with the social chatbot; as such, we define the success metric for social chatbots as conversation-turns per session (CPS). Using XiaoIce as an illustrative example, we discuss key technologies in building social chatbots from core chat to visual sense to skills. We also show how XiaoIce can dynamically recognize emotion and engage the user throughout long conversations with appropriate interpersonal responses. As we become the first generation of humans ever living with AI, social chatbots that are well-designed to be both useful and empathic will soon be ubiquitous.