亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Successful deployment of Deep Neural Networks (DNNs), particularly in safety-critical systems, requires their validation with an adequate test set to ensure a sufficient degree of confidence in test outcomes. Mutation analysis, a well-established technique for measuring test adequacy in traditional software, has been adapted to DNNs in recent years. This technique is based on generating mutants that ideally aim to be representative of actual faults and thus can be used for test adequacy assessment. In this paper, we investigate for the first time whether and how mutation operators that directly modify the trained DNN model (i.e., post-training operators) can be used for reliably assessing the test inputs of DNNs. Our results show that these operators, though they do not aim to represent realistic faults, exhibit strong, non-linear relationships with faults. Inspired by this finding and considering the significant computational advantage of post-training operators compared to the operators that modify the training data or program (i.e., pre-training operators), we propose and evaluate TEASMA, an approach based on posttraining mutation for assessing the adequacy of DNNs test sets. In practice, TEASMA allows engineers to decide whether they will be able to trust test results and thus validate the DNN before its deployment. Based on a DNN model`s training set, TEASMA provides a methodology to build accurate DNNspecific prediction models of the Fault Detection Rate (FDR) of a test set from its mutation score, thus enabling its assessment. Our large empirical evaluation, across multiple DNN models, shows that predicted FDR values have a strong linear correlation (R2 >= 0.94) with actual values. Consequently, empirical evidence suggests that TEASMA provides a reliable basis for confidently deciding whether to trust test results or improve the test set of a DNN model.

相關內容

An increasing number of regulations propose the notion of AI audits as an enforcement mechanism for achieving transparency and accountability for AI systems. Despite some converging norms around various forms of AI auditing, auditing for the purpose of compliance and assurance currently have little to no agreed upon practices, procedures, taxonomies, and standards. We propose the criterion audit as an operationalizable compliance and assurance external audit framework. We model elements of this approach after financial auditing practices, and argue that AI audits should similarly provide assurance to their stakeholders about AI organizations' ability to govern their algorithms in ways that mitigate harms and uphold human values. We discuss the necessary conditions for the criterion audit, and provide a procedural blueprint for performing an audit engagement in practice. We illustrate how this framework can be adapted to current regulations by deriving the criteria on which bias audits for hiring algorithms can be performed, as required by the recently effective New York City Local Law 144 of 2021. We conclude by offering critical discussion on the benefits, inherent limitations, and implementation challenges of applying practices of the more mature financial auditing industry to AI auditing where robust guardrails against quality assurance issues are only starting to emerge. Our discussion as informed by experiences in performing these audits in practice highlights the critical role that an audit ecosystem plays in ensuring the effectiveness of such methodology.

With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are increasing interests in distilling the capabilies of close-sourced LLMs to smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT to generate a set of instructions and answers, for the student model to learn. However, such standard distillation approach neglects the merits and conditions of the student model. Inspired by modern teaching principles, we design a personalised distillation process, in which the student attempts to solve a task first, then the teacher provides an adaptive refinement for the student to improve. Instead of feeding the student with teacher's prior, personalised distillation enables personalised learning for the student model, as it only learns on examples it makes mistakes upon and learns to improve its own solution. On code generation, personalised distillation consistently outperforms standard distillation with only one third of the data. With only 2.5-3K personalised examples that incur a data-collection cost of 4-6$, we boost CodeGen-mono-16B by 7% to achieve 36.4% pass@1 and StarCoder by 12.2% to achieve 45.8% pass@1 on HumanEval.

Terms of Service (ToS) form an integral part of any agreement as it defines the legal relationship between a service provider and an end-user. Not only do they establish and delineate reciprocal rights and responsibilities, but they also provide users with information on essential aspects of contracts that pertain to the use of digital spaces. These aspects include a wide range of topics, including limitation of liability, data protection, etc. Users tend to accept the ToS without going through it before using any application or service. Such ignorance puts them in a potentially weaker situation in case any action is required. Existing methodologies for the detection or classification of unfair clauses are however obsolete and show modest performance. In this research paper, we present SOTA(State of The Art) results on unfair clause detection from ToS documents based on unprecedented custom BERT Fine-tuning in conjunction with SVC(Support Vector Classifier). The study shows proficient performance with a macro F1-score of 0.922 at unfair clause detection, and superior performance is also shown in the classification of unfair clauses by each tag. Further, a comparative analysis is performed by answering research questions on the Transformer models utilized. In order to further research and experimentation the code and results are made available on //github.com/batking24/Unfair-TOS-An-Automated-Approach-based-on-Fine-tuning-BERT-in-conjunction-with-ML.

Learning accurate, data-driven predictive models for multiple interacting agents following unknown dynamics is crucial in many real-world physical and social systems. In many scenarios, dynamics prediction must be performed under incomplete observations, i.e., only a subset of agents are known and observable from a larger topological system while the behaviors of the unobserved agents and their interactions with the observed agents are not known. When only incomplete observations of a dynamical system are available, so that some states remain hidden, it is generally not possible to learn a closed-form model in these variables using either analytic or data-driven techniques. In this work, we propose STEMFold, a spatiotemporal attention-based generative model, to learn a stochastic manifold to predict the underlying unmeasured dynamics of the multi-agent system from observations of only visible agents. Our analytical results motivate STEMFold design using a spatiotemporal graph with time anchors to effectively map the observations of visible agents to a stochastic manifold with no prior information about interaction graph topology. We empirically evaluated our method on two simulations and two real-world datasets, where it outperformed existing networks in predicting complex multiagent interactions, even with many unobserved agents.

Intervertebral disc disease, a prevalent ailment, frequently leads to intermittent or persistent low back pain, and diagnosing and assessing of this disease rely on accurate measurement of vertebral bone and intervertebral disc geometries from lumbar MR images. Deep neural network (DNN) models may assist clinicians with more efficient image segmentation of individual instances (disks and vertebrae) of the lumbar spine in an automated way, which is termed as instance image segmentation. In this work, we proposed SymTC, an innovative lumbar spine MR image segmentation model that combines the strengths of Transformer and Convolutional Neural Network (CNN). Specifically, we designed a parallel dual-path architecture to merge CNN layers and Transformer layers, and we integrated a novel position embedding into the self-attention module of Transformer, enhancing the utilization of positional information for more accurate segmentation. To further improves model performance, we introduced a new data augmentation technique to create synthetic yet realistic MR image dataset, named SSMSpine, which is made publicly available. We evaluated our SymTC and the other 15 existing image segmentation models on our private in-house dataset and the public SSMSpine dataset, using two metrics, Dice Similarity Coefficient and 95% Hausdorff Distance. The results show that our SymTC has the best performance for segmenting vertebral bones and intervertebral discs in lumbar spine MR images. The SymTC code and SSMSpine dataset are available at //github.com/jiasongchen/SymTC.

Implicit neural representations (INRs) are a rapidly growing research field, which provides alternative ways to represent multimedia signals. Recent applications of INRs include image super-resolution, compression of high-dimensional signals, or 3D rendering. However, these solutions usually focus on visual data, and adapting them to the audio domain is not trivial. Moreover, it requires a separately trained model for every data sample. To address this limitation, we propose HyperSound, a meta-learning method leveraging hypernetworks to produce INRs for audio signals unseen at training time. We show that our approach can reconstruct sound waves with quality comparable to other state-of-the-art models.

This preprint presents the current status of research into the development and application of an autonomous, self-driving compost turner. The aim is to overcome challenges in the composting industry, such as adverse working conditions, by automating the composting process. The preprint provides a comprehensive overview of the overall concept of the self-driving compost turner, including the hardware architecture with sensors, navigation module and control module. In addition, the methodical development of the architecture of concepts, models and their subsequent software integration in ROS using model-based systems engineering is described. The validation and verification of the overall system is carried out in an industrial environment using three scenarios. The capabilities of the compost turner are demonstrated by autonomously following predefined trajectories in the composting plant and performing the required composting tasks. The results show that the autonomous compost turner is capable of performing the required activities. In addition, the compost turner has intelligent processing capabilities for compost data as well as its transmission, visualization and storage in a cloud server. It is important to note that this work is a preprint that represents the current state of research. The authors aim to publish the full paper in a peer-reviewed journal in the near future.

Understanding and effectively managing Technical Debt (TD) remains a vital challenge in software engineering. While many studies on code-level TD have been published, few illustrate the business impact of low-quality source code. In this study, we combine two publicly available datasets to study the association between code quality on the one hand, and defect count and implementation time on the other hand. We introduce a value-creation model, derived from regression analyses, to explore relative changes from a baseline. Our results show that the associations vary across different intervals of code quality. Furthermore, the value model suggests strong non-linearities at the extremes of the code quality spectrum. Most importantly, the model suggests amplified returns on investment in the upper end. We discuss the findings within the context of the "broken windows" theory and recommend organizations to diligently prevent the introduction of code smells in files with high churn. Finally, we argue that the value-creation model can be used to initiate discussions regarding the return on investment in refactoring efforts.

As the influence of large language models (LLMs) spans across global communities, their safety challenges in multilingual settings become paramount for alignment research. This paper examines the variations in safety challenges faced by LLMs across different languages and discusses approaches to alleviating such concerns. By comparing how state-of-the-art LLMs respond to the same set of malicious prompts written in higher- vs. lower-resource languages, we observe that (1) LLMs tend to generate unsafe responses much more often when a malicious prompt is written in a lower-resource language, and (2) LLMs tend to generate more irrelevant responses to malicious prompts in lower-resource languages. To understand where the discrepancy can be attributed, we study the effect of instruction tuning with reinforcement learning from human feedback (RLHF) or supervised finetuning (SFT) on the HH-RLHF dataset. Surprisingly, while training with high-resource languages improves model alignment, training in lower-resource languages yields minimal improvement. This suggests that the bottleneck of cross-lingual alignment is rooted in the pretraining stage. Our findings highlight the challenges in cross-lingual LLM safety, and we hope they inform future research in this direction.

In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.

北京阿比特科技有限公司