99热日韩这里只有国产中文精品_国产亚洲欧美丝袜在线观看三区_天堂8中文在线官网_影性视片品偷免区区中内_特黄50人伦交大战免费视频_成人在线观看一区_国产福利一区二区三区精品

The proliferation of malware, particularly through the use of packing, presents a significant challenge to static analysis and signature-based malware detection techniques. The application of packing to the original executable code renders extracting meaningful features and signatures challenging. To deal with the increasing amount of malware in the wild, researchers and anti-malware companies started harnessing machine learning capabilities with very promising results. However, little is known about the effects of packing on static machine learning-based malware detection and classification systems. This work addresses this gap by investigating the impact of packing on the performance of static machine learning-based models used for malware detection and classification, with a particular focus on those using visualisation techniques. To this end, we present a comprehensive analysis of various packing techniques and their effects on the performance of machine learning-based detectors and classifiers. Our findings highlight the limitations of current static detection and classification systems and underscore the need to be proactive to effectively counteract the evolving tactics of malware authors.

相關內容

Packing

關注 0

Oracle · 代碼 · 可理解性 · TOG · 評論員 ·

2024 年 12 月 12 日

Doc2Oracle: Investigating the Impact of Javadoc Comments on Test Oracle Generation

Soneya Binta Hossain,Raygan Taylor,Matthew Dwyer

Code documentation is a critical aspect of software development, serving as a bridge between human understanding and machine-readable code. Beyond assisting developers in understanding and maintaining code, documentation also plays a critical role in automating various software engineering tasks, such as test oracle generation (TOG). In Java, Javadoc comments provide structured, natural language documentation embedded directly in the source code, typically detailing functionality, usage, parameters, return values, and exceptions. While prior research has utilized Javadoc comments in test oracle generation (TOG), there has not been a thorough investigation into their impact when combined with other contextual information, nor into identifying the most relevant components for generating correct and strong test oracles, or understanding their role in detecting real bugs. In this study, we dive deep into investigating the impact of Javadoc comments on TOG.

MoDELS · 控制器 · 機器人 · 多樣性 · 可理解性 ·

2024 年 12 月 12 日

Robotics meets Fluid Dynamics: A Characterization of the Induced Airflow below a Quadrotor as a Turbulent Jet

Leonard Bauersfeld,Koen Muller,Dominic Ziegler,Filippo Coletti,Davide Scaramuzza

from arxiv, 7+1 pages

The widespread adoption of quadrotors for diverse applications, from agriculture to public safety, necessitates an understanding of the aerodynamic disturbances they create. This paper introduces a computationally lightweight model for estimating the time-averaged magnitude of the induced flow below quadrotors in hover. Unlike related approaches that rely on expensive computational fluid dynamics (CFD) simulations or drone specific time-consuming empirical measurements, our method leverages classical theory from turbulent flows. By analyzing over 16 hours of flight data from drones of varying sizes within a large motion capture system, we show for the first time that the combined flow from all drone propellers is well-approximated by a turbulent jet after 2.5 drone-diameters below the vehicle. Using a novel normalization and scaling, we experimentally identify model parameters that describe a unified mean velocity field below differently sized quadrotors. The model, which requires only the drone's mass, propeller size, and drone size for calculations, accurately describes the far-field airflow over a long-range in a very large volume which is impractical to simulate using CFD. Our model offers a practical tool for ensuring safer operations near humans, optimizing sensor placements and drone control in multi-agent scenarios. We demonstrate the latter by designing a controller that compensates for the downwash of another drone, leading to a four times lower altitude deviation when passing below.

Performer · 語言模型化 · 可辨認的 · CASES · 分解的 ·

2024 年 12 月 12 日

Exploring the Limitations of Detecting Machine-Generated Text

Jad Doughman,Osama Mohammed Afzal,Hawau Olamide Toyin,Shady Shehata,Preslav Nakov,Zeerak Talat

from arxiv, Accepted to COLING 2025

Recent improvements in the quality of the generations by large language models have spurred research into identifying machine-generated text. Such work often presents high-performing detectors. However, humans and machines can produce text in different styles and domains, yet the performance impact of such on machine generated text detection systems remains unclear. In this paper, we audit the classification performance for detecting machine-generated text by evaluating on texts with varying writing styles. We find that classifiers are highly sensitive to stylistic changes and differences in text complexity, and in some cases degrade entirely to random classifiers. We further find that detection systems are particularly susceptible to misclassify easy-to-read texts while they have high performance for complex texts, leading to concerns about the reliability of detection systems. We recommend that future work attends to stylistic factors and reading difficulty levels of human-written and machine-generated text.

AI · 結點 · MoDELS · ResNet · 圖像分類器 ·

2024 年 12 月 11 日

Empirical Measurements of AI Training Power Demand on a GPU-Accelerated Node

Imran Latif,Alex C. Newkirk,Matthew R. Carbone,Arslan Munir,Yuewei Lin,Jonathan Koomey,Xi Yu,Zhiuha Dong

The expansion of artificial intelligence (AI) applications has driven substantial investment in computational infrastructure, especially by cloud computing providers. Quantifying the energy footprint of this infrastructure requires models parameterized by the power demand of AI hardware during training. We empirically measured the instantaneous power draw of an 8-GPU NVIDIA H100 HGX node during the training of open-source image classifier (ResNet) and large-language models (Llama2-13b). The maximum observed power draw was approximately 8.4 kW, 18% lower than the manufacturer-rated 10.2 kW, even with GPUs near full utilization. Holding model architecture constant, increasing batch size from 512 to 4096 images for ResNet reduced total training energy consumption by a factor of 4. These findings can inform capacity planning for data center operators and energy use estimates by researchers. Future work will investigate the impact of cooling technology and carbon-aware scheduling on AI workload energy consumption.

數據集 · Machine Learning · Learning · MoDELS · ML ·

2024 年 12 月 11 日

Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning

Lucas Lange,Maurice-Maximilian Heykeroth,Erhard Rahm

from arxiv, Accepted at 21st Conference on Database Systems for Business, Technology and Web (BTW 2025)

Machine Learning (ML) is crucial in many sectors, including computer vision. However, ML models trained on sensitive data face security challenges, as they can be attacked and leak information. Privacy-Preserving Machine Learning (PPML) addresses this by using Differential Privacy (DP) to balance utility and privacy. This study identifies image dataset characteristics that affect the utility and vulnerability of private and non-private Convolutional Neural Network (CNN) models. Through analyzing multiple datasets and privacy budgets, we find that imbalanced datasets increase vulnerability in minority classes, but DP mitigates this issue. Datasets with fewer classes improve both model utility and privacy, while high entropy or low Fisher Discriminant Ratio (FDR) datasets deteriorate the utility-privacy trade-off. These insights offer valuable guidance for practitioners and researchers in estimating and optimizing the utility-privacy trade-off in image datasets, helping to inform data and privacy modifications for better outcomes based on dataset characteristics.

有向 · RecSys · Learning · 教程 · MoDELS ·

2024 年 12 月 11 日

A Tutorial of Personalized Federated Recommender Systems: Recent Advances and Future Directions

Jing Jiang,Chunxu Zhang,Honglei Zhang,Zhiwei Li,Yidong Li,Bo Yang

from arxiv, A technical tutorial will appear at The Web Conference 2025

Personalization stands as the cornerstone of recommender systems (RecSys), striving to sift out redundant information and offer tailor-made services for users. However, the conventional cloud-based RecSys necessitates centralized data collection, posing significant risks of user privacy breaches. In response to this challenge, federated recommender systems (FedRecSys) have emerged, garnering considerable attention. FedRecSys enable users to retain personal data locally and solely share model parameters with low privacy sensitivity for global model training, significantly bolstering the system's privacy protection capabilities. Within the distributed learning framework, the pronounced non-iid nature of user behavior data introduces fresh hurdles to federated optimization. Meanwhile, the ability of federated learning to concurrently learn multiple models presents an opportunity for personalized user modeling. Consequently, the development of personalized FedRecSys (PFedRecSys) is crucial and holds substantial significance. This tutorial seeks to provide an introduction to PFedRecSys, encompassing (1) an overview of existing studies on PFedRecSys, (2) a comprehensive taxonomy of PFedRecSys spanning four pivotal research directions-client-side adaptation, server-side aggregation, communication efficiency, privacy and protection, and (3) exploration of open challenges and promising future directions in PFedRecSys. This tutorial aims to establish a robust foundation and spark new perspectives for subsequent exploration and practical implementations in the evolving realm of RecSys.

Engineering · Analysis · 語言模型化 · ESE · MoDELS ·

2024 年 12 月 10 日

Applications and Implications of Large Language Models in Qualitative Analysis: A New Frontier for Empirical Software Engineering

Matheus de Morais Le?a,Lucas Valen?a,Reydne Santos,Ronnie de Souza Santos

The use of large language models (LLMs) for qualitative analysis is gaining attention in various fields, including software engineering, where qualitative methods are essential for understanding human and social factors. This study aimed to investigate how LLMs are currently used in qualitative analysis and their potential applications in software engineering research, focusing on the benefits, limitations, and practices associated with their use. A systematic mapping study was conducted, analyzing 21 relevant studies to explore reported uses of LLMs for qualitative analysis. The findings indicate that LLMs are primarily used for tasks such as coding, thematic analysis, and data categorization, offering benefits like increased efficiency and support for new researchers. However, limitations such as output variability, challenges in capturing nuanced perspectives, and ethical concerns related to privacy and transparency were also identified. The study emphasizes the need for structured strategies and guidelines to optimize LLM use in qualitative research within software engineering, enhancing their effectiveness while addressing ethical considerations. While LLMs show promise in supporting qualitative analysis, human expertise remains crucial for interpreting data, and ongoing exploration of best practices will be vital for their successful integration into empirical software engineering research.

MoDELS · 語言模型化 · Learning · 逼真度 · 講稿 ·

2024 年 4 月 11 日

Best Practices and Lessons Learned on Synthetic Data for Language Models

Ruibo Liu,Jerry Wei,Fangyu Liu,Chenglei Si,Yanzhe Zhang,Jinmeng Rao,Steven Zheng,Daiyi Peng,Diyi Yang,Denny Zhou,Andrew M. Dai

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.

MoDELS · 講稿 · Learning · Sphering · 表示 ·

2023 年 11 月 2 日

A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

Hang Chen,Keqing Du,Chenguang Li,Xinyu Yang

from arxiv, under review

The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.

Faster R-CNN · domain shift · R-CNN · 目標檢測 · 可約的 ·

2018 年 3 月 8 日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Yuhua Chen,Wen Li,Christos Sakaridis,Dengxin Dai,Luc Van Gool

from arxiv, Accepted to CVPR 2018

Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.