Without a specific functional context, non-functional requirements can only be approached as cross-cutting concerns and treated uniformly across all features of an application. This neglects, however, the heterogeneity of non-functional requirements that arises from stakeholder interests and the distinct functional scopes of software systems, which mutually influence how these non-functional requirements have to be satisfied. Earlier studies showed that the different types and objectives of non-functional requirements result in either vague or unbalanced specification of non-functional requirements. We propose a task analytic approach for eliciting and modeling user tasks to approach the stakeholders' pursued interests towards the software product. Stakeholder interests are structurally related to user tasks and each interest can be specified individually as a constraint of a specific user task. These constraints support DevOps teams with important guidance on how the interest of the stakeholder can be satisfied in the software lifecycle sufficiently. We propose a structured approach, intertwining task-oriented functional requirements with non-functional stakeholder interests to specify constraints on the level of user tasks. We also present results of a case study with domain experts, which reveals that our task modeling and interest-tailoring method increases the comprehensibility of non-functional requirements as well as their impact on the functional requirements, i.e., the users' tasks.
Empirical results in software engineering have long started to show that findings are unlikely to be applicable to all software systems, or any domain: results need to be evaluated in specified contexts, and limited to the type of systems that they were extracted from. This is a known issue, and requires the establishment of a classification of software types. This paper makes two contributions: the first is to evaluate the quality of the current software classifications landscape. The second is to perform a case study showing how to create a classification of software types using a curated set of software systems. Our contributions show that existing, and very likely even new, classification attempts are deemed to fail for one or more issues, that we named as the `antipatterns' of software classification tasks. We collected 7 of these antipatterns that emerge from both our case study, and the existing classifications. These antipatterns represent recurring issues in a classification, so we discuss practical ways to help researchers avoid these pitfalls. It becomes clear that classification attempts must also face the daunting task of formulating a taxonomy of software types, with the objective of establishing a hierarchy of categories in a classification.
In a sports competition, a team might lose a powerful incentive to exert full effort if its final rank does not depend on the outcome of the matches still to be played. Therefore, the organiser should reduce the probability of such a situation to the extent possible. Our paper provides a classification scheme to identify these weakly (where one team is indifferent) or strongly (where both teams are indifferent) stakeless games. A statistical model is estimated to simulate the UEFA Champions League groups and compare the candidate schedules used in the 2021/22 season according to the competitiveness of the matches played in the last round(s). The option followed in four of the eight groups is found to be optimal under a wide set of parameters. Minimising the number of strongly stakeless matches is verified to be a likely goal in the computer draw of the fixture that remains hidden from the public.
Recruitment in large organisations often involves interviewing a large number of candidates. The process is resource intensive and complex. Therefore, it is important to carry it out efficiently and effectively. Planning the selection process consists of several problems, each of which maps to one or the other well-known computing problem. Research that looks at each of these problems in isolation is rich and mature. However, research that takes an integrated view of the problem is not common. In this paper, we take two of the most important aspects of the application processing problem, namely review/interview panel creation and interview scheduling. We have implemented our approach as a prototype system and have used it to automatically plan the interview process of a real-life data set. Our system provides a distinctly better plan than the existing practice, which is predominantly manual. We have explored various algorithmic options and have customised them to solve these panel creation and interview scheduling problems. We have evaluated these design options experimentally on a real data set and have presented our observations. Our prototype and experimental process and results may be a very good starting point for a full-fledged development project for automating application processing process.
The Coronavirus disease 2019 (COVID-19) outbreak quickly spread around the world, resulting in over 240 million infections and 4 million deaths by Oct 2021. While the virus is spreading from person to person silently, fear has also been spreading around the globe. The COVID-19 information from the Australian Government is convincing but not timely or detailed, and there is much information on social networks with both facts and rumors. As software engineers, we have spontaneously and rapidly constructed a COVID-19 information dashboard aggregating reliable information semi-automatically checked from different sources for providing one-stop information sharing site about the latest status in Australia. Inspired by the John Hopkins University COVID-19 Map, our dashboard contains the case statistics, case distribution, government policy, latest news, with interactive visualization. In this paper, we present a participant's in-person observations in which the authors acted as founders of //covid-19-au.com/ serving more than 830K users with 14M page views since March 2020. According to our first-hand experience, we summarize 9 lessons for developers, researchers and instructors. These lessons may inspire the development, research and teaching in software engineer aspects for coping with similar public crises in the future.
Conventionally, generation of natural language for dialogue agents may be viewed as a statistical learning problem: determine the patterns in human-provided data and generate appropriate responses with similar statistical properties. However, dialogue can also be regarded as a goal directed process, where speakers attempt to accomplish a specific task. Reinforcement learning (RL) algorithms are designed specifically for solving such goal-directed problems, but the most direct way to apply RL -- through trial-and-error learning in human conversations, -- is costly. In this paper, we study how offline reinforcement learning can instead be used to train dialogue agents entirely using static datasets collected from human speakers. Our experiments show that recently developed offline RL methods can be combined with language models to yield realistic dialogue agents that better accomplish task goals.
We study the problem of testing whether a function $f: \mathbb{R}^n \to \mathbb{R}$ is a polynomial of degree at most $d$ in the \emph{distribution-free} testing model. Here, the distance between functions is measured with respect to an unknown distribution $\mathcal{D}$ over $\mathbb{R}^n$ from which we can draw samples. In contrast to previous work, we do not assume that $\mathcal{D}$ has finite support. We design a tester that given query access to $f$, and sample access to $\mathcal{D}$, makes $(d/\varepsilon)^{O(1)}$ many queries to $f$, accepts with probability $1$ if $f$ is a polynomial of degree $d$, and rejects with probability at least $2/3$ if every degree-$d$ polynomial $P$ disagrees with $f$ on a set of mass at least $\varepsilon$ with respect to $\mathcal{D}$. Our result also holds under mild assumptions when we receive only a polynomial number of bits of precision for each query to $f$, or when $f$ can only be queried on rational points representable using a logarithmic number of bits. Along the way, we prove a new stability theorem for multivariate polynomials that may be of independent interest.
As machine learning algorithms become increasingly integrated in crucial decision-making scenarios, such as healthcare, recruitment, and risk assessment, there have been increasing concerns about the privacy and fairness of such systems. Federated learning has been viewed as a promising solution for collaboratively training of machine learning models among multiple parties while maintaining the privacy of their local data. However, federated learning also poses new challenges in mitigating the potential bias against certain populations (e.g., demographic groups), as this typically requires centralized access to the sensitive information (e.g., race, gender) of each data point. Motivated by the importance and challenges of group fairness in federated learning, in this work, we propose FairFed, a novel algorithm to enhance group fairness via a fairness-aware aggregation method, which aims to provide fair model performance across different sensitive groups (e.g., racial, gender groups) while maintaining high utility. This formulation can further provide more flexibility in the customized local debiasing strategies for each client. We build our FairFed algorithm around the secure aggregation protocol of federated learning. When running federated training on widely investigated fairness datasets, we demonstrate that our proposed method outperforms the state-of-the-art fair federated learning frameworks under a high heterogeneous sensitive attribute distribution. We also investigate the performance of FairFed on naturally distributed real-life data collected from different geographical locations or departments within an organization.
Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs. While early work identified certain biases in NLI models, recent advancements in modeling and datasets demonstrated promising performance. In this work, we further explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on. First, we analyze the robustness of these models to longer and out-of-domain inputs. Then, we develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset. Interestingly, we find NLI scores to provide strong retrieval signals, leading to more relevant evidence extractions compared to common similarity-based methods. Finally, we go further and investigate whole document clusters to identify both discrepancies and consensus among sources. In a test case, we find real inconsistencies between Wikipedia pages in different languages about the same topic.
Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used scheduling algorithm. Many scheduling heuristics have been proposed in existing works; nevertheless, they are often tested in oversimplified environments. We provide an extensible simulation environment designed for prototyping and benchmarking task schedulers, which contains implementations of various scheduling algorithms and is open-sourced, in order to be fully reproducible. We use this environment to perform a comprehensive analysis of workflow scheduling algorithms with a focus on quantifying the effect of scheduling challenges that have so far been mostly neglected, such as delays between scheduler invocations or partially unknown task durations. Our results indicate that network models used by many previous works might produce results that are off by an order of magnitude in comparison to a more realistic model. Additionally, we show that certain implementation details of scheduling algorithms which are often neglected can have a large effect on the scheduler's performance, and they should thus be described in great detail to enable proper evaluation.
Autonomous driving has achieved a significant milestone in research and development over the last decade. There is increasing interest in the field as the deployment of self-operating vehicles on roads promises safer and more ecologically friendly transportation systems. With the rise of computationally powerful artificial intelligence (AI) techniques, autonomous vehicles can sense their environment with high precision, make safe real-time decisions, and operate more reliably without human interventions. However, intelligent decision-making in autonomous cars is not generally understandable by humans in the current state of the art, and such deficiency hinders this technology from being socially acceptable. Hence, aside from making safe real-time decisions, the AI systems of autonomous vehicles also need to explain how these decisions are constructed in order to be regulatory compliant across many jurisdictions. Our study sheds a comprehensive light on developing explainable artificial intelligence (XAI) approaches for autonomous vehicles. In particular, we make the following contributions. First, we provide a thorough overview of the present gaps with respect to explanations in the state-of-the-art autonomous vehicle industry. We then show the taxonomy of explanations and explanation receivers in this field. Thirdly, we propose a framework for an architecture of end-to-end autonomous driving systems and justify the role of XAI in both debugging and regulating such systems. Finally, as future research directions, we provide a field guide on XAI approaches for autonomous driving that can improve operational safety and transparency towards achieving public approval by regulators, manufacturers, and all engaged stakeholders.