Numerical software is usually shipped with built-in hyperparameters. By carefully tuning those hyperparameters, significant performance enhancements can be achieved for specific applications. We developed MindOpt Tuner, a new automatic tuning tool that supports a wide range of numerical software, including optimization and other solvers. MindOpt Tuner uses elastic cloud resources, features a web-based task management panel and integration with ipython notebook with both command-line tools and Python APIs. Our experiments with COIN-OR Cbc, an open-source mixed-integer optimization solver, demonstrate remarkable improvements with the tuned parameters compared to the default ones on the MIPLIB2017 test set, resulting in over 100x acceleration on several problem instances. Additionally, the results demonstrate that Tuner has a higher tuning efficiency compared to the state-of-the-art automatic tuning tool SMAC3.
Most real-world classification tasks suffer from label noise to some extent. Such noise in the data adversely affects the generalization error of learned models and complicates the evaluation of noise-handling methods, as their performance cannot be accurately measured without clean labels. In label noise research, typically either noisy or incomplex simulated data are accepted as a baseline, into which additional noise with known properties is injected. In this paper, we propose SYNLABEL, a framework that aims to improve upon the aforementioned methodologies. It allows for creating a noiseless dataset informed by real data, by either pre-specifying or learning a function and defining it as the ground truth function from which labels are generated. Furthermore, by resampling a number of values for selected features in the function domain, evaluating the function and aggregating the resulting labels, each data point can be assigned a soft label or label distribution. Such distributions allow for direct injection and quantification of label noise. The generated datasets serve as a clean baseline of adjustable complexity into which different types of noise may be introduced. We illustrate how the framework can be applied, how it enables quantification of label noise and how it improves over existing methodologies.
Software Testing is a well-established area in software engineering, encompassing various techniques and methodologies to ensure the quality and reliability of software systems. However, with the advent of generative artificial intelligence (GenAI) systems, new challenges arise in the testing domain. These systems, capable of generating novel and creative outputs, introduce unique complexities that require novel testing approaches. In this paper, I aim to explore the challenges posed by generative AI systems and discuss potential opportunities for future research in the field of testing. I will touch on the specific characteristics of GenAI systems that make traditional testing techniques inadequate or insufficient. By addressing these challenges and pursuing further research, we can enhance our understanding of how to safeguard GenAI and pave the way for improved quality assurance in this rapidly evolving domain.
We present a new procedure to infer size bounds for integer programs automatically. Size bounds are important for the deduction of bounds on the runtime complexity or in general, for the resource analysis of programs. We show that our technique is complete (i.e., it always computes finite size bounds) for a subclass of loops, possibly with non-linear arithmetic. Moreover, we present a novel approach to combine and integrate this complete technique into an incomplete approach to infer size and runtime bounds of general integer programs. We prove completeness of our integration for an important subclass of integer programs. We implemented our new algorithm in the automated complexity analysis tool KoAT to evaluate its power, in particular on programs with non-linear arithmetic.
Deep Neural Networks (DNNs) are becoming a crucial component of modern software systems, but they are prone to fail under conditions that are different from the ones observed during training (out-of-distribution inputs) or on inputs that are truly ambiguous, i.e., inputs that admit multiple classes with nonzero probability in their labels. Recent work proposed DNN supervisors to detect high-uncertainty inputs before their possible misclassification leads to any harm. To test and compare the capabilities of DNN supervisors, researchers proposed test generation techniques, to focus the testing effort on high-uncertainty inputs that should be recognized as anomalous by supervisors. However, existing test generators aim to produce out-of-distribution inputs. No existing model- and supervisor independent technique targets the generation of truly ambiguous test inputs, i.e., inputs that admit multiple classes according to expert human judgment. In this paper, we propose a novel way to generate ambiguous inputs to test DNN supervisors and used it to empirically compare several existing supervisor techniques. In particular, we propose AmbiGuess to generate ambiguous samples for image classification problems. AmbiGuess is based on gradient-guided sampling in the latent space of a regularized adversarial autoencoder. Moreover, we conducted what is -- to the best of our knowledge -- the most extensive comparative study of DNN supervisors, considering their capabilities to detect 4 distinct types of high-uncertainty inputs, including truly ambiguous ones. We find that the tested supervisors' capabilities are complementary: Those best suited to detect true ambiguity perform worse on invalid, out-of-distribution and adversarial inputs and vice-versa.
For decades, much software engineering research has been dedicated to devising automated solutions aimed at enhancing developer productivity and elevating software quality. The past two decades have witnessed an unparalleled surge in the development of intelligent solutions tailored for software engineering tasks. This momentum established the Artificial Intelligence for Software Engineering (AI4SE) area, which has swiftly become one of the most active and popular areas within the software engineering field. This Future of Software Engineering (FoSE) paper navigates through several focal points. It commences with a succinct introduction and history of AI4SE. Thereafter, it underscores the core challenges inherent to AI4SE, particularly highlighting the need to realize trustworthy and synergistic AI4SE. Progressing, the paper paints a vision for the potential leaps achievable if AI4SE's key challenges are surmounted, suggesting a transition towards Software Engineering 2.0. Two strategic roadmaps are then laid out: one centered on realizing trustworthy AI4SE, and the other on fostering synergistic AI4SE. While this paper may not serve as a conclusive guide, its intent is to catalyze further progress. The ultimate aspiration is to position AI4SE as a linchpin in redefining the horizons of software engineering, propelling us toward Software Engineering 2.0.
Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another perspective, these generative models imply clues about the precise correlation between words and pixels. In this work, a simple but effective method is proposed to utilize the attention mechanism in the denoising network of text-to-image diffusion models. Without re-training nor inference-time optimization, the semantic grounding of phrases can be attained directly. We evaluate our method on Pascal VOC 2012 and Microsoft COCO 2014 under weakly-supervised semantic segmentation setting and our method achieves superior performance to prior methods. In addition, the acquired word-pixel correlation is found to be generalizable for the learned text embedding of customized generation methods, requiring only a few modifications. To validate our discovery, we introduce a new practical task called "personalized referring image segmentation" with a new dataset. Experiments in various situations demonstrate the advantages of our method compared to strong baselines on this task. In summary, our work reveals a novel way to extract the rich multi-modal knowledge hidden in diffusion models for segmentation.
Developers often perform repetitive code editing activities for various reasons (e.g., code refactoring) during software development. Pre-trained code editing models have achieved the state-of-the-art (SOTA) results. Pre-trained models are first pre-trained with pre-training tasks and fine-tuned with the code editing task. Existing pre-training tasks mainly are code infilling tasks (e.g., masked language modeling), which are derived from the natural language processing field and are not designed for automatic code editing. This paper proposes a novel pre-training task specialized in code editing and presents an effective pre-trained code editing model named CodeEditor. Our pre-training task further improves the performance and generalization ability of code editing models. Specifically, we collect lots of real-world code snippets as the ground truth and use a powerful generator to rewrite them into mutated versions. Then, we pre-train our CodeEditor to edit mutated versions into the corresponding ground truth, to learn edit patterns. We conduct experiments on four code editing datasets and evaluate the pre-trained CodeEditor in three settings. (1) In the fine-tuning setting, we train the pre-trained CodeEditor with four datasets and evaluate it on the test data. CodeEditor outperforms the SOTA baselines by 15%, 25.5%, and 9.4% and 26.6% on four datasets. (2) In the few-shot setting, we train the pre-trained CodeEditor with limited data and evaluate it on the test data. CodeEditor substantially performs better than all baselines. (3) In the zero-shot setting, CodeEditor correctly edits 1,113 programs while the SOTA baselines can not work.
A key challenge to ensuring the rapid transition of robotic systems from the industrial sector to more ubiquitous applications is the development of algorithms that can guarantee safe operation while in close proximity to humans. Motion planning and control methods, for instance, must be able to certify safety while operating in real-time in arbitrary environments and in the presence of model uncertainty. This paper proposes Wrench Analysis for Inertial Transport using Reachability (WAITR), a certifiably safe motion planning and control framework for serial link manipulators that manipulate unsecured objects in arbitrary environments. WAITR uses reachability analysis to construct over-approximations of the contact wrench applied to unsecured objects, which captures uncertainty in the manipulator dynamics, the object dynamics, and contact parameters such as the coefficient of friction. An optimization problem formulation is presented that can be solved in real-time to generate provably-safe motions for manipulating the unsecured objects. This paper illustrates that WAITR outperforms state of the art methods in a variety of simulation experiments and demonstrates its performance in the real-world.
Humans possess the cognitive ability to comprehend scenes in a compositional manner. To empower AI systems with similar capabilities, object-centric learning aims to acquire representations of individual objects from visual scenes without any supervision. Although recent advances in object-centric learning have made remarkable progress on complex synthesis datasets, there is a huge challenge for application to complex real-world scenes. One of the essential reasons is the scarcity of real-world datasets specifically tailored to object-centric learning. To address this problem, we propose a versatile real-world dataset of tabletop scenes for object-centric learning called OCTScenes, which is meticulously designed to serve as a benchmark for comparing, evaluating, and analyzing object-centric learning methods. OCTScenes contains 5000 tabletop scenes with a total of 15 objects. Each scene is captured in 60 frames covering a 360-degree perspective. Consequently, OCTScenes is a versatile benchmark dataset that can simultaneously satisfy the evaluation of object-centric learning methods based on single-image, video, and multi-view. Extensive experiments of representative object-centric learning methods are conducted on OCTScenes. The results demonstrate the shortcomings of state-of-the-art methods for learning meaningful representations from real-world data, despite their impressive performance on complex synthesis datasets. Furthermore, OCTScenes can serve as a catalyst for the advancement of existing methods, inspiring them to adapt to real-world scenes. Dataset and code are available at //huggingface.co/datasets/Yinxuan/OCTScenes.
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.