亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Asynchronous waits are one of the most prevalent root causes of flaky tests and a major time-influential factor of web application testing. To investigate the characteristics of asynchronous wait flaky tests and their fixes in web testing, we build a dataset of 49 reproducible flaky tests, from 26 open-source projects, caused by asynchronous waits, along with their corresponding developer-written fixes. Our study of these flaky tests reveals that in approximately 63% of them (31 out of 49), developers addressed Asynchronous Wait flaky tests by adapting the wait time, even for cases where the root causes lie elsewhere. Based on this finding, we propose TRaf, an automated time-based repair method for asynchronous wait flaky tests in web applications. TRaf tackles the flakiness issues by suggesting a proper waiting time for each asynchronous call in a web application, using code similarity and past change history. The core insight is that as developers often make similar mistakes more than once, hints for the efficient wait time exist in the current or past codebase. Our analysis shows that TRaf can suggest a shorter wait time to resolve the test flakiness compared to developer-written fixes, reducing the test execution time by 11.1%. With additional dynamic tuning of the new wait time, TRaf further reduces the execution time by 20.2%.

相關內容

Leveraging Graphics Processing Units (GPUs) to accelerate scientific software has proven to be highly successful, but in order to extract more performance, GPU programmers must overcome the high latency costs associated with their use. One method of reducing or hiding this latency cost is to use asynchronous streams to issue commands to the GPU. While performant, the streams model is an invasive abstraction, and has therefore proven difficult to integrate into general-purpose libraries. In this work, we enumerate the difficulties specific to library authors in adopting streams, and present recent work on addressing them. Finally, we present a unified asynchronous programming model for use in the Portable, Extensible, Toolkit for Scientific Computation (PETSc) to overcome these challenges. The new model shows broad performance benefits while remaining ergonomic to the user.

Explanations of neural models aim to reveal a model's decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model's inner workings. This work explores the challenging question of evaluating the faithfulness of natural language explanations (NLEs). To this end, we present two tests. First, we propose a counterfactual input editor for inserting reasons that lead to counterfactual predictions but are not reflected by the NLEs. Second, we reconstruct inputs from the reasons stated in the generated NLEs and check how often they lead to the same predictions. Our tests can evaluate emerging NLE models, proving a fundamental tool in the development of faithful NLEs.

Performance bugs are non-functional bugs that can even manifest in well-tested commercial products. Fixing these performance bugs is an important yet challenging problem. In this work, we address this challenge and present a new approach called Retrieval-Augmented Prompt Generation (RAPGen). Given a code snippet with a performance issue, RAPGen first retrieves a prompt instruction from a pre-constructed knowledge-base of previous performance bug fixes and then generates a prompt using the retrieved instruction. It then uses this prompt on a Large Language Model (such as Codex) in zero-shot to generate a fix. We compare our approach with the various prompt variations and state of the art methods in the task of performance bug fixing. Our evaluation shows that RAPGen can generate performance improvement suggestions equivalent or better than a developer in ~60% of the cases, getting ~39% of them verbatim, in an expert-verified dataset of past performance changes made by C# developers.

DBSCAN is a well-known density-based clustering algorithm to discover arbitrary shape clusters. While conceptually simple in serial, the algorithm is challenging to efficiently parallelize on manycore GPU architectures. Common pitfalls, such as asynchronous range query calls, result in high thread execution divergence in many implementations. In this paper, we propose a new framework for GPU-accelerated DBSCAN, and describe two tree-based algorithms within that framework. Both algorithms fuse the search for neighbors with updating cluster information, but differ in their treatment of dense regions of the data. We show that the time taken to compute clusters is at most twice that of determination of the neighbors. We compare the proposed algorithms with existing CPU and GPU implementations, and demonstrate their competitiveness and performance using a fast traversal structure (bounding volume hierarchy) for low dimensional data. We also show that the memory usage can be reduced by processing object neighbors dynamically without storing them.

The increasing access to data poses both opportunities and risks in deep learning, as one can manipulate the behaviors of deep learning models with malicious training samples. Such attacks are known as data poisoning. Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving state-of-the-art results in certified poisoning robustness. However, the practical implications of these approaches remain unclear. Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness. For evaluations, we use ImageNet resized to a resolution of 64 by 64 to enable evaluations at a larger scale than previous ones. Firstly, we demonstrate a simple yet practical approach to scaling base models, which improves the efficiency of training and inference for aggregation defenses. Secondly, we provide empirical evidence supporting the data-to-complexity ratio, i.e. the ratio between the data set size and sample complexity, as a practical estimation of the maximum number of base models that can be deployed while preserving accuracy. Last but not least, we point out how aggregation defenses boost poisoning robustness empirically through the poisoning overfitting phenomenon, which is the key underlying mechanism for the empirical poisoning robustness of aggregations. Overall, our findings provide valuable insights for practical implementations of aggregation defenses to mitigate the threat of data poisoning.

Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. Existing threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, if attackers can poison more samples than expected with affordable overhead, as in many practical scenarios, they may be able to render existing defenses ineffective in a short time. To address this issue, we leverage timestamps denoting the birth dates of data, which are often available but neglected in the past. Benefiting from these timestamps, we propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted. Using these metrics, we define the notions of temporal robustness against data poisoning, providing a meaningful sense of protection even with unbounded amounts of poisoned samples. We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal threat model for data poisoning.

The extremal theory of forbidden 0--1 matrices studies the asymptotic growth of the function $\mathrm{Ex}(P,n)$, which is the maximum weight of a matrix $A\in\{0,1\}^{n\times n}$ whose submatrices avoid a fixed pattern $P\in\{0,1\}^{k\times l}$. This theory has been wildly successful at resolving problems in combinatorics, discrete and computational geometry, structural graph theory, and the analysis of data structures, particularly corollaries of the dynamic optimality conjecture. All these applications use acyclic patterns, meaning that when $P$ is regarded as the adjacency matrix of a bipartite graph, the graph is acyclic. The biggest open problem in this area is to bound $\mathrm{Ex}(P,n)$ for acyclic $P$. Prior results have only ruled out the strict $O(n\log n)$ bound conjectured by Furedi and Hajnal. It is consistent with prior results that $\forall P. \mathrm{Ex}(P,n)\leq n\log^{1+o(1)} n$, and also consistent that $\forall \epsilon>0.\exists P. \mathrm{Ex}(P,n) \geq n^{2-\epsilon}$. In this paper we establish a stronger lower bound on the extremal functions of acyclic $P$. Specifically, we give a new construction of relatively dense 0--1 matrices with $\Theta(n(\log n/\log\log n)^t)$ 1s that avoid an acyclic $X_t$. Pach and Tardos have conjectured that this type of result is the best possible, i.e., no acyclic $P$ exists for which $\mathrm{Ex}(P,n)\geq n(\log n)^{\omega(1)}$.

During the usage phase, a technical product system is in permanent interaction with its environment. This interaction can lead to failures that significantly endanger the safety of the user and negatively affect the quality and reliability of the product. Conventional methods of failure analysis focus on the technical product system. The interaction of the product with its environment in the usage phase is not sufficiently considered, resulting in undetected potential failures of the product that lead to complaints. For this purpose, a methodology for failure identification is developed, which is continuously improved through product usage scenarios. The use cases are modelled according to a systems engineering approach with four views. The linking of the product system, physical effects, events and environmental factors enable the analysis of fault chains. These four parameters are subject to great complexity and must be systematically analysed using databases and expert knowledge. The scenarios are continuously updated by field data and complaints. The new approach can identify potential failures in a more systematic and holistic way. Complaints provide direct input on the scenarios. Unknown, previously unrecognized events can be systematically identified through continuous improvement. The complexity of the relationship between the product system and its environmental factors can thus be adequately taken into account in product development. Keywords: failure analysis, methodology, product development, systems engineering, scenario analysis, scenario improvement, environmental factors, product environment, continuous improvement.

In the past few decades, artificial intelligence (AI) technology has experienced swift developments, changing everyone's daily life and profoundly altering the course of human society. The intention of developing AI is to benefit humans, by reducing human labor, bringing everyday convenience to human lives, and promoting social good. However, recent research and AI applications show that AI can cause unintentional harm to humans, such as making unreliable decisions in safety-critical scenarios or undermining fairness by inadvertently discriminating against one group. Thus, trustworthy AI has attracted immense attention recently, which requires careful consideration to avoid the adverse effects that AI may bring to humans, so that humans can fully trust and live in harmony with AI technologies. Recent years have witnessed a tremendous amount of research on trustworthy AI. In this survey, we present a comprehensive survey of trustworthy AI from a computational perspective, to help readers understand the latest technologies for achieving trustworthy AI. Trustworthy AI is a large and complex area, involving various dimensions. In this work, we focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being. For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems. We also discuss the accordant and conflicting interactions among different dimensions and discuss potential aspects for trustworthy AI to investigate in the future.

Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.

北京阿比特科技有限公司