亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper presents a detailed case study examining the application of Large Language Models (LLMs) in the construction of test cases within the context of software engineering. LLMs, characterized by their advanced natural language processing capabilities, are increasingly garnering attention as tools to automate and enhance various aspects of the software development life cycle. Leveraging a case study methodology, we systematically explore the integration of LLMs in the test case construction process, aiming to shed light on their practical efficacy, challenges encountered, and implications for software quality assurance. The study encompasses the selection of a representative software application, the formulation of test case construction methodologies employing LLMs, and the subsequent evaluation of outcomes. Through a blend of qualitative and quantitative analyses, this study assesses the impact of LLMs on test case comprehensiveness, accuracy, and efficiency. Additionally, delves into challenges such as model interpretability and adaptation to diverse software contexts. The findings from this case study contributes with nuanced insights into the practical utility of LLMs in the domain of test case construction, elucidating their potential benefits and limitations. By addressing real-world scenarios and complexities, this research aims to inform software practitioners and researchers alike about the tangible implications of incorporating LLMs into the software testing landscape, fostering a more comprehensive understanding of their role in optimizing the software development process.

相關內容

This research explores the integration of language embeddings for active learning in autonomous driving datasets, with a focus on novelty detection. Novelty arises from unexpected scenarios that autonomous vehicles struggle to navigate, necessitating higher-level reasoning abilities. Our proposed method employs language-based representations to identify novel scenes, emphasizing the dual purpose of safety takeover responses and active learning. The research presents a clustering experiment using Contrastive Language-Image Pretrained (CLIP) embeddings to organize datasets and detect novelties. We find that the proposed algorithm effectively isolates novel scenes from a collection of subsets derived from two real-world driving datasets, one vehicle-mounted and one infrastructure-mounted. From the generated clusters, we further present methods for generating textual explanations of elements which differentiate scenes classified as novel from other scenes in the data pool, presenting qualitative examples from the clustered results. Our results demonstrate the effectiveness of language-driven embeddings in identifying novel elements and generating explanations of data, and we further discuss potential applications in safe takeovers, data curation, and multi-task active learning.

This paper introduces the Quantified Boolean Bayesian Network (QBBN), which provides a unified view of logical and probabilistic reasoning. The QBBN is meant to address a central problem with the Large Language Model (LLM), which has become extremely popular in Information Retrieval, which is that the LLM hallucinates. A Bayesian Network, by construction, cannot hallucinate, because it can only return answers that it can explain. We show how a Bayesian Network over an unbounded number of boolean variables can be configured to represent the logical reasoning underlying human language. We do this by creating a key-value version of the First-Order Calculus, for which we can prove consistency and completeness. We show that the model is trivially trained over fully observed data, but that inference is non-trivial. Exact inference in a Bayesian Network is intractable (i.e. $\Omega(2^N)$ for $N$ variables). For inference, we investigate the use of Loopy Belief Propagation (LBP), which is not guaranteed to converge, but which has been shown to often converge in practice. Our experiments show that LBP indeed does converge very reliably, and our analysis shows that a round of LBP takes time $O(N2^n)$, where $N$ bounds the number of variables considered, and $n$ bounds the number of incoming connections to any factor, and further improvements may be possible. Our network is specifically designed to alternate between AND and OR gates in a Boolean Algebra, which connects more closely to logical reasoning, allowing a completeness proof for an expanded version of our network, and also allows inference to follow specific but adequate pathways, that turn out to be fast.

This paper presents a Nonlinear Model Predictive Control (NMPC) scheme targeted at motion planning for mechatronic motion systems, such as drones and mobile platforms. NMPC-based motion planning typically requires low computation times to be able to provide control inputs at the required rate for system stability, disturbance rejection, and overall performance. Although there exist various ways in literature to reduce the solution times in NMPC, such times may not be low enough to allow real-time implementations. This paper presents ASAP-MPC, an approach to handle varying, sometimes restrictively large, solution times with an asynchronous update scheme, always allowing for full convergence and real-time execution. The NMPC algorithm is combined with a linear state feedback controller tracking the optimised trajectories for improved robustness against possible disturbances and plant-model mismatch. ASAP-MPC seamlessly merges trajectories, resulting from subsequent NMPC solutions, providing a smooth and continuous overall trajectory for the motion system. This frameworks applicability to embedded applications is shown on two different experiment setups where a state-of-the-art method fails: a quadcopter flying through a cluttered environment in hardware-in-the-loop simulation and a scale model truck-trailer manoeuvring in a structured lab environment.

This paper explores the assumption that Large Language Models (LLMs) skilled in generation tasks are equally adept as evaluators. We assess the performance of three LLMs and one open-source LM in Question-Answering (QA) and evaluation tasks using the TriviaQA (Joshi et al., 2017) dataset. Results indicate a significant disparity, with LLMs exhibiting lower performance in evaluation tasks compared to generation tasks. Intriguingly, we discover instances of unfaithful evaluation where models accurately evaluate answers in areas where they lack competence, underscoring the need to examine the faithfulness and trustworthiness of LLMs as evaluators. This study contributes to the understanding of "the Generative AI Paradox" (West et al., 2023), highlighting a need to explore the correlation between generative excellence and evaluation proficiency, and the necessity to scrutinize the faithfulness aspect in model evaluations.

We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-as-betting framework and provides a non-asymptotic $\alpha$-level test across any stopping time. PEAK is computationally tractable and efficiently rejects hypotheses that are incorrect across all potential distributions that satisfy our nonparametric assumption, enabling joint composite hypothesis testing on multiple streams of data. We numerically validate our theoretical findings under the best arm identification and threshold identification in the bandit setting, illustrating the computational efficiency of our method against state-of-the-art testing methods.

We present a novel approach to exploring innovation problem and solution domains using LLM fine-tuning with a custom idea database. By semantically traversing the bi-directional problem and solution tree at different temperature levels we achieve high diversity in solution edit distance while still remaining close to the original problem statement semantically. In addition to finding a variety of solutions to a given problem, this method can also be used to refine and clarify the original problem statement. As further validation of the approach, we implemented a proof-of-concept Slack bot to serve as an innovation assistant.

In the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of the theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated Predicted Residual Sum of Squares (PRESS) statistic. We also suggest strategies for efficient estimation of the minimum PRESS value and full PRESS function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge- and Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated in several applications with real and highly multivariate datasets.

Recent advances in visual anomaly detection research have seen AUROC and AUPRO scores on public benchmark datasets such as MVTec and VisA converge towards perfect recall, giving the impression that these benchmarks are near-solved. However, high AUROC and AUPRO scores do not always reflect qualitative performance, which limits the validity of these metrics in real-world applications. We argue that the artificial ceiling imposed by the lack of an adequate evaluation metric restrains progression of the field, and it is crucial that we revisit the evaluation metrics used to rate our algorithms. In response, we introduce Per-IMage Overlap (PIMO), a novel metric that addresses the shortcomings of AUROC and AUPRO. PIMO retains the recall-based nature of the existing metrics but introduces two distinctions: the assignment of curves (and respective area under the curve) is per-image, and its X-axis relies solely on normal images. Measuring recall per image simplifies instance score indexing and is more robust to noisy annotations. As we show, it also accelerates computation and enables the usage of statistical tests to compare models. By imposing low tolerance for false positives on normal images, PIMO provides an enhanced model validation procedure and highlights performance variations across datasets. Our experiments demonstrate that PIMO offers practical advantages and nuanced performance insights that redefine anomaly detection benchmarks -- notably challenging the perception that MVTec AD and VisA datasets have been solved by contemporary models. Available on GitHub: //github.com/jpcbertoldo/aupimo.

The inherent diversity of computation types within individual Deep Neural Network (DNN) models imposes a corresponding need for a varied set of computation units within hardware processors. This diversity poses a significant constraint on computation efficiency during the execution of different neural networks. In this study, we present NeuralMatrix, a framework that transforms the computation of entire DNNs into linear matrix operations. This transformation seamlessly enables the execution of various DNN models using a single General-Purpose Matrix Multiplication (GEMM) accelerator. Extensive experimental results spanning different DNN models demonstrate that our approach preserves network accuracy while providing both generality and application-specific levels of computation efficiency. This allows a broad spectrum of DNN models to be executed using a single GEMM accelerator, eliminating the need for additional special function units.

We study inductive bias in Transformers in the infinitely over-parameterized Gaussian process limit and argue transformers tend to be biased towards more permutation symmetric functions in sequence space. We show that the representation theory of the symmetric group can be used to give quantitative analytical predictions when the dataset is symmetric to permutations between tokens. We present a simplified transformer block and solve the model at the limit, including accurate predictions for the learning curves and network outputs. We show that in common setups, one can derive tight bounds in the form of a scaling law for the learnability as a function of the context length. Finally, we argue WikiText dataset, does indeed possess a degree of permutation symmetry.

北京阿比特科技有限公司