Dependency management bots automatically open pull requests to update software dependencies on behalf of developers. Early research shows that developers are suspicious of updates performed by dependency management bots and feel tired of overwhelming notifications from these bots. Despite this, dependency management bots are becoming increasingly popular. Such contrast motivates us to investigate Dependabot, currently the most visible bot on GitHub, to reveal the effectiveness and limitations of state-of-art dependency management bots. We use exploratory data analysis and a developer survey to evaluate the effectiveness of Dependabot in keeping dependencies up-to-date, interacting with developers, reducing update suspicion, and reducing notification fatigue. We obtain mixed findings. On the positive side, projects do reduce technical lag after Dependabot adoption and developers are highly receptive to its pull requests. On the negative side, its compatibility scores are too scarce to be effective in reducing update suspicion; developers tend to configure Dependabot toward reducing the number of notifications; and 11.3% of projects have deprecated Dependabot in favor of other alternatives. The survey confirms our findings and provides insights into the key missing features of Dependabot. Based on our findings, we derive and summarize the key characteristics of an ideal dependency management bot which can be grouped into four dimensions: configurability, autonomy, transparency, and self-adaptability.
Pre-trained large language models (LLMs) have recently emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance across a wide range of tasks. Meanwhile, software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. As the scope and complexity of software systems continue to grow, the need for more effective software testing techniques becomes increasingly urgent, and making it an area ripe for innovative approaches such as the use of LLMs. This paper provides a comprehensive review of the utilization of LLMs in software testing. It analyzes 52 relevant studies that have used LLMs for software testing, from both the software testing and LLMs perspectives. The paper presents a detailed discussion of the software testing tasks for which LLMs are commonly used, among which test case preparation and program repair are the most representative ones. It also analyzes the commonly used LLMs, the types of prompt engineering that are employed, as well as the accompanied techniques with these LLMs. It also summarizes the key challenges and potential opportunities in this direction. This work can serve as a roadmap for future research in this area, highlighting potential avenues for exploration, and identifying gaps in our current understanding of the use of LLMs in software testing.
Selection bias is a common concern in epidemiologic studies. In the literature, selection bias is often viewed as a missing data problem. Popular approaches to adjust for bias due to missing data, such as inverse probability weighting, rely on the assumption that data are missing at random and can yield biased results if this assumption is violated. In observational studies with outcome data missing not at random, Heckman's sample selection model can be used to adjust for bias due to missing data. In this paper, we review Heckman's method and a similar approach proposed by Tchetgen Tchetgen and Wirth (2017). We then discuss how to apply these methods to Mendelian randomization analyses using individual-level data, with missing data for either the exposure or outcome or both. We explore whether genetic variants associated with participation can be used as instruments for selection. We then describe how to obtain missingness-adjusted Wald ratio, two-stage least squares and inverse variance weighted estimates. The two methods are evaluated and compared in simulations, with results suggesting that they can both mitigate selection bias but may yield parameter estimates with large standard errors in some settings. In an illustrative real-data application, we investigate the effects of body mass index on smoking using data from the Avon Longitudinal Study of Parents and Children.
Nowadays, cars offer many possibilities to explore the world around you by providing location-based information displayed on a 2D-Map. However, this information is often only available to front-seat passengers while being restricted to in-car displays. To propose a more natural way of interacting with the environment, we implemented an augmented reality head-mounted display to overlay points of interest onto the real world. We aim to compare multiple selection techniques for digital objects located outside a moving car by investigating head gaze with dwell time, head gaze with hardware button, eye gaze with hardware button, and hand pointing with gesture confirmation. Our study was conducted in a moving car under real-world conditions (N=22), with significant results indicating that hand pointing usage led to slower and less precise content selection while eye gaze was preferred by participants and performed on par with the other techniques.
Modern image-to-text systems typically adopt the encoder-decoder framework, which comprises two main components: an image encoder, responsible for extracting image features, and a transformer-based decoder, used for generating captions. Taking inspiration from the analysis of neural networks' robustness against adversarial perturbations, we propose a novel gray-box algorithm for creating adversarial examples in image-to-text models. Unlike image classification tasks that have a finite set of class labels, finding visually similar adversarial examples in an image-to-text task poses greater challenges because the captioning system allows for a virtually infinite space of possible captions. In this paper, we present a gray-box adversarial attack on image-to-text, both untargeted and targeted. We formulate the process of discovering adversarial perturbations as an optimization problem that uses only the image-encoder component, meaning the proposed attack is language-model agnostic. Through experiments conducted on the ViT-GPT2 model, which is the most-used image-to-text model in Hugging Face, and the Flickr30k dataset, we demonstrate that our proposed attack successfully generates visually similar adversarial examples, both with untargeted and targeted captions. Notably, our attack operates in a gray-box manner, requiring no knowledge about the decoder module. We also show that our attacks fool the popular open-source platform Hugging Face.
Interaction-driven modeling of diseases over real-world contact data has been shown to promote the understanding of the spread of diseases in communities. This temporal modeling follows the path-preserving order and timing of the contacts, which are essential for accurate modeling. Yet, other important aspects were overlooked. Various airborne pathogens differ in the duration of exposure needed for infection. Also, from the individual perspective, Covid-19 progression differs between individuals, and its severity is statistically correlated with age. Here, we enrich an interaction-driven model of Covid-19 and similar airborne viral diseases with (a) meetings duration and (b) personal disease progression. The enriched model enables predicting outcomes at both the population and the individual levels. It further allows predicting individual risk of engaging in social interactions as a function of the virus characteristics and its prevalence in the population. We further showed that the enigmatic nature of asymptomatic transmission stems from the latent effect of the network density on this transmission and that asymptomatic transmission has a substantial impact only in sparse communities.
Tests executed by human testers are still widespread in practice and fill the gap left by limitations of automated approaches. Among the human-centered approaches, exploratory testing is the de facto approach in agile teams. Although it is focused on the expertise and creativity of the tester, the activity of exploratory testing may benefit from support provided by an automated agent that interacts with the human testers. This paper presents a chatbot, called BotExpTest, designed to support testers while performing exploratory tests of software applications. We implemented BotExpTest on top of the instant messaging social platform Discord; this version includes functionalities to report bugs and issues, time management of test sessions, guidelines for app testing, and presentation of exploratory testing strategies. To assess BotExpTest, we conducted a user study with six software engineering professionals. They carried out two sessions performing exploratory tests along with BotExpTest. Participants were capable of revealing bugs and found the experience to interact with the chatbot positive. Preliminary analyses indicate that chatbot-enabled exploratory testing may be as effective as similar approaches and help testers to uncover different bugs. Bots are shown to be valuable resources for Software Engineering, and initiatives like BotExpTest may help to improve the effectiveness of testing activities like exploratory testing.
Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.