Containerization allows bundling applications and their dependencies into a single image. The containerization framework Docker eases the use of this concept and enables sharing images publicly, gaining high momentum. However, it can lead to users creating and sharing images that include private keys or API secrets-either by mistake or out of negligence. This leakage impairs the creator's security and that of everyone using the image. Yet, the extent of this practice and how to counteract it remains unclear. In this paper, we analyze 337,171 images from Docker Hub and 8,076 other private registries unveiling that 8.5% of images indeed include secrets. Specifically, we find 52,107 private keys and 3,158 leaked API secrets, both opening a large attack surface, i.e., putting authentication and confidentiality of privacy-sensitive data at stake and even allow active attacks. We further document that those leaked keys are used in the wild: While we discovered 1,060 certificates relying on compromised keys being issued by public certificate authorities, based on further active Internet measurements, we find 275,269 TLS and SSH hosts using leaked private keys for authentication. To counteract this issue, we discuss how our methodology can be used to prevent secret leakage and reuse.
The advent of novel 5G services and applications with binding latency requirements and guaranteed Quality of Service (QoS) hastened the need to incorporate autonomous and proactive decision-making in network management procedures. The objective of our study is to provide a thorough analysis of predictive latency within 5G networks by utilizing real-world network data that is accessible to mobile network operators (MNOs). In particular, (i) we present an analytical formulation of the user-plane latency as a Hypoexponential distribution, which is validated by means of a comparative analysis with empirical measurements, and (ii) we conduct experimental results of probabilistic regression, anomaly detection, and predictive forecasting leveraging on emerging domains in Machine Learning (ML), such as Bayesian Learning (BL) and Machine Learning on Graphs (GML). We test our predictive framework using data gathered from scenarios of vehicular mobility, dense-urban traffic, and social gathering events. Our results provide valuable insights into the efficacy of predictive algorithms in practical applications.
Both CNN-based and Transformer-based object detection with bounding box representation have been extensively studied in computer vision and medical image analysis, but circular object detection in medical images is still underexplored. Inspired by the recent anchor free CNN-based circular object detection method (CircleNet) for ball-shape glomeruli detection in renal pathology, in this paper, we present CircleFormer, a Transformer-based circular medical object detection with dynamic anchor circles. Specifically, queries with circle representation in Transformer decoder iteratively refine the circular object detection results, and a circle cross attention module is introduced to compute the similarity between circular queries and image features. A generalized circle IoU (gCIoU) is proposed to serve as a new regression loss of circular object detection as well. Moreover, our approach is easy to generalize to the segmentation task by adding a simple segmentation branch to CircleFormer. We evaluate our method in circular nuclei detection and segmentation on the public MoNuSeg dataset, and the experimental results show that our method achieves promising performance compared with the state-of-the-art approaches. The effectiveness of each component is validated via ablation studies as well. Our code is released at: \url{//github.com/zhanghx-iim-ahu/CircleFormer}.
Detecting visually similar images is a particularly useful attribute to look to when calculating product recommendations. Embedding similarity, which utilizes pre-trained computer vision models to extract high-level image features, has demonstrated remarkable efficacy in identifying images with similar compositions. However, there is a lack of methods for evaluating the embeddings generated by these models, as conventional loss and performance metrics do not adequately capture their performance in image similarity search tasks. In this paper, we evaluate the viability of the image embeddings from numerous pre-trained computer vision models using a novel approach named CorrEmbed. Our approach computes the correlation between distances in image embeddings and distances in human-generated tag vectors. We extensively evaluate numerous pre-trained Torchvision models using this metric, revealing an intuitive relationship of linear scaling between ImageNet1k accuracy scores and tag-correlation scores. Importantly, our method also identifies deviations from this pattern, providing insights into how different models capture high-level image features. By offering a robust performance evaluation of these pre-trained models, CorrEmbed serves as a valuable tool for researchers and practitioners seeking to develop effective, data-driven approaches to similar item recommendations in fashion retail.
The Software Bill of Materials (SBOM) has emerged as a promising solution, providing a machine-readable inventory of software components used, thus bolstering supply chain security. This paper presents an extensive study concerning the practical aspects of SBOM practice. Leveraging an analysis of 4,786 GitHub discussions from 510 SBOM-related projects, our research delineates key topics, challenges, and solutions intrinsic to the effective utilization of SBOMs. Furthermore, we shed light on commonly used tools and frameworks for generating SBOMs, exploring their respective strengths and limitations. Our findings underscore the pivotal role SBOMs play in ensuring resilient software development practices and underscore the imperative of their widespread integration to bolster supply chain security. The insights accrued from our study hold significance as valuable input for prospective research and development in this crucial domain.
We introduce ACER, an AST-based call graph generator framework. ACER leverages tree-sitter to interface with any language. We opted to focus on generators that operate on abstract syntax trees (ASTs) due to their speed and simplicitly in certain scenarios; however, a fully quantified intermediate representation usually provides far better information at the cost of requiring compilation. To evaluate our framework, we created two context-insensitive Java generators and compared them to existing open-source Java generators.
This report describes the state of the art in verifiable computation. The problem being solved is the following: The Verifiable Computation Problem (Verifiable Computing Problem) Suppose we have two computing agents. The first agent is the verifier, and the second agent is the prover. The verifier wants the prover to perform a computation. The verifier sends a description of the computation to the prover. Once the prover has completed the task, the prover returns the output to the verifier. The output will contain proof. The verifier can use this proof to check if the prover computed the output correctly. The check is not required to verify the algorithm used in the computation. Instead, it is a check that the prover computed the output using the computation specified by the verifier. The effort required for the check should be much less than that required to perform the computation. This state-of-the-art report surveys 128 papers from the literature comprising more than 4,000 pages. Other papers and books were surveyed but were omitted. The papers surveyed were overwhelmingly mathematical. We have summarised the major concepts that form the foundations for verifiable computation. The report contains two main sections. The first, larger section covers the theoretical foundations for probabilistically checkable and zero-knowledge proofs. The second section contains a description of the current practice in verifiable computation. Two further reports will cover (i) military applications of verifiable computation and (ii) a collection of technical demonstrators. The first of these is intended to be read by those who want to know what applications are enabled by the current state of the art in verifiable computation. The second is for those who want to see practical tools and conduct experiments themselves.
We present DiffBIR, which leverages pretrained text-to-image diffusion models for blind image restoration problem. Our framework adopts a two-stage pipeline. In the first stage, we pretrain a restoration module across diversified degradations to improve generalization capability in real-world scenarios. The second stage leverages the generative ability of latent diffusion models, to achieve realistic image restoration. Specifically, we introduce an injective modulation sub-network -- LAControlNet for finetuning, while the pre-trained Stable Diffusion is to maintain its generative ability. Finally, we introduce a controllable module that allows users to balance quality and fidelity by introducing the latent image guidance in the denoising process during inference. Extensive experiments have demonstrated its superiority over state-of-the-art approaches for both blind image super-resolution and blind face restoration tasks on synthetic and real-world datasets. The code is available at //github.com/XPixelGroup/DiffBIR.
Large language models(LLMs) have sparked a new wave of exciting AI applications. Hosting these models at scale requires significant memory resources. One crucial memory bottleneck for the deployment stems from the context window. It is commonly recognized that model weights are memory hungry; however, the size of key-value embedding stored during the generation process (KV cache) can easily surpass the model size. The enormous size of the KV cache puts constraints on the inference batch size, which is crucial for high throughput inference workload. Inspired by an interesting observation of the attention scores, we hypothesize the persistence of importance: only pivotal tokens, which had a substantial influence at one step, will significantly influence future generations. Based on our empirical verification and theoretical analysis around this hypothesis, we propose Scissorhands, a system that maintains the memory usage of the KV cache at a fixed budget without finetuning the model. In essence, Scissorhands manages the KV cache by storing the pivotal tokens with a higher probability. We validate that Scissorhands reduces the inference memory usage of the KV cache by up to 5X without compromising model quality. We further demonstrate that Scissorhands can be combined with 4-bit quantization, traditionally used to compress model weights, to achieve up to 20X compression.
Many real-world data stream applications not only suffer from concept drift but also class imbalance. Yet, very few existing studies investigated this joint challenge. Data difficulty factors, which have been shown to be key challenges in class imbalanced data streams, are not taken into account by existing approaches when learning class imbalanced data streams. In this work, we propose a drift adaptable oversampling strategy to synthesise minority class examples based on stream clustering. The motivation is that stream clustering methods continuously update themselves to reflect the characteristics of the current underlying concept, including data difficulty factors. This nature can potentially be used to compress past information without caching data in the memory explicitly. Based on the compressed information, synthetic examples can be created within the region that recently generated new minority class examples. Experiments with artificial and real-world data streams show that the proposed approach can handle concept drift involving different minority class decomposition better than existing approaches, especially when the data stream is severely class imbalanced and presenting high proportions of safe and borderline minority class examples.
Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.