一本色道综合久久欧美日韩精品,国产黄网永久免费视频,日本NV精品在线看,亚洲亚中文久久精品无码长长,天天射天天色综合

Recently, many Deep Learning fuzzers have been proposed for testing of DL libraries. However, they either perform unguided input generation (e.g., not considering the relationship between API arguments when generating inputs) or only support a limited set of corner case test inputs. Furthermore, a substantial number of developer APIs crucial for library development remain untested, as they are typically not well-documented and lack clear usage guidelines. To fill this gap, we propose a novel fuzzer named Orion, which combines guided test input generation and corner case test input generation based on a set of fuzzing rules constructed from historical data that is known to trigger vulnerabilities in the implementation of DL APIs. To extract the fuzzing rules, we first conduct an empirical study regarding the root cause analysis of 376 vulnerabilities in two of the most popular DL libraries, i.e., PyTorch and TensorFlow. We then construct the rules based on the root causes of the historical vulnerabilities. Our evaluation shows that Orion reports 135 vulnerabilities on the latest releases of TensorFlow and PyTorch, 76 of which were confirmed by the library developers. Among the 76 confirmed vulnerabilities, 69 are previously unknown, and 7 have already been fixed. The rest are awaiting further confirmation. Regarding end-user APIs, Orion was able to detect 31.8% and 90% more vulnerabilities on TensorFlow and PyTorch, respectively, compared to the state-of-the-art conventional fuzzer, i.e., DeepRel. When compared to the state-of-the-art LLM-based DL fuzzer, AtlasFuzz, Orion detected 13.63% more vulnerabilities on TensorFlow and 18.42% more vulnerabilities on PyTorch. Regarding developer APIs, Orion stands out by detecting 117% more vulnerabilities on TensorFlow and 100% more vulnerabilities on PyTorch compared to the most relevant fuzzer designed for developer APIs, such as FreeFuzz.

相關內容

PyTorch

關注 354

PyTorch

截斷點 · 近似 · HER · Facebook AI Research · binary ·

2024 年 2 月 13 日

Dueling Over Dessert, Mastering the Art of Repeated Cake Cutting

Simina Branzei,MohammadTaghi Hajiaghayi,Reed Phillips,Suho Shin,Kun Wang

We consider the setting of repeated fair division between two players, denoted Alice and Bob, with private valuations over a cake. In each round, a new cake arrives, which is identical to the ones in previous rounds. Alice cuts the cake at a point of her choice, while Bob chooses the left piece or the right piece, leaving the remainder for Alice. We consider two versions: sequential, where Bob observes Alice's cut point before choosing left/right, and simultaneous, where he only observes her cut point after making his choice. The simultaneous version was first considered by Aumann and Maschler (1995). We observe that if Bob is almost myopic and chooses his favorite piece too often, then he can be systematically exploited by Alice through a strategy akin to a binary search. This strategy allows Alice to approximate Bob's preferences with increasing precision, thereby securing a disproportionate share of the resource over time. We analyze the limits of how much a player can exploit the other one and show that fair utility profiles are in fact achievable. Specifically, the players can enforce the equitable utility profile of $(1/2, 1/2)$ in the limit on every trajectory of play, by keeping the other player's utility to approximately $1/2$ on average while guaranteeing they themselves get at least approximately $1/2$ on average. We show this theorem using a connection with Blackwell approachability. Finally, we analyze a natural dynamic known as fictitious play, where players best respond to the empirical distribution of the other player. We show that fictitious play converges to the equitable utility profile of $(1/2, 1/2)$ at a rate of $O(1/\sqrt{T})$.

不變 · 環 · 秩 · Automator · Notability ·

2024 年 2 月 12 日

Ranking LLM-Generated Loop Invariants for Program Verification

Saikat Chakraborty,Shuvendu K. Lahiri,Sarah Fakhoury,Madanlal Musuvathi,Akash Lal,Aseem Rastogi,Aditya Senthilnathan,Rahul Sharma,Nikhil Swamy

from arxiv, Findings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP-findings 2023)

Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an invariant. To address this issue, we propose a {\it re-ranking} approach for the generated results of LLMs. We have designed a ranker that can distinguish between correct inductive invariants and incorrect attempts based on the problem definition. The ranker is optimized as a contrastive ranker. Experimental results demonstrate that this re-ranking mechanism significantly improves the ranking of correct invariants among the generated candidates, leading to a notable reduction in the number of calls to a verifier. The source code and the experimental data for this paper are available in \url{//github.com/microsoft/NeuralInvariantRanker}.

近似 · Networking · Neural Networks · 泛函 · 有限差分 ·

2024 年 2 月 12 日

Correctness Verification of Neural Networks Approximating Differential Equations

Petros Ellinas,Rahul Nellikath,Ignasi Ventura,Jochen Stiasny,Spyros Chatzivasileiadis

Verification of Neural Networks (NNs) that approximate the solution of Partial Differential Equations (PDEs) is a major milestone towards enhancing their trustworthiness and accelerating their deployment, especially for safety-critical systems. If successful, such NNs can become integral parts of simulation software tools which can accelerate the simulation of complex dynamic systems more than 100 times. However, the verification of these functions poses major challenges; it is not straightforward how to efficiently bound them or how to represent the derivative of the NN. This work addresses both these problems. First, we define the NN derivative as a finite difference approximation. Then, we formulate the PDE residual bounding problem alongside the Initial Value Problem's error propagation. Finally, for the first time, we tackle the problem of bounding an NN function without a priori knowledge of the output domain. For this, we build a parallel branching algorithm that combines the incomplete CROWN solver and Gradient Attack for termination and domain rejection conditions. We demonstrate the strengths and weaknesses of the proposed framework, and we suggest further work to enhance its efficiency.

情景 · 估計/估計量 · MoDELS · 標注 · Performer ·

2024 年 2 月 12 日

An Empirical Study Into What Matters for Calibrating Vision-Language Models

Weijie Tu,Weijian Deng,Dylan Campbell,Stephen Gould,Tom Gedeon

from arxiv, 12 pages, 8 figures, this version is not fully edited and will be updated soon

Vision--Language Models (VLMs) have emerged as the dominant approach for zero-shot recognition, adept at handling diverse scenarios and significant distribution changes. However, their deployment in risk-sensitive areas requires a deeper understanding of their uncertainty estimation capabilities, a relatively uncharted area. In this study, we explore the calibration properties of VLMs across different architectures, datasets, and training strategies. In particular, we analyze the uncertainty estimation performance of VLMs when calibrated in one domain, label set or hierarchy level, and tested in a different one. Our findings reveal that while VLMs are not inherently calibrated for uncertainty, temperature scaling significantly and consistently improves calibration, even across shifts in distribution and changes in label set. Moreover, VLMs can be calibrated with a very small set of examples. Through detailed experimentation, we highlight the potential applications and importance of our insights, aiming for more reliable and effective use of VLMs in critical, real-world scenarios.

圖像檢索 · 穩健性 · MoDELS · 可辨認的 · Extensibility ·

2024 年 2 月 11 日

Collapse-Aware Triplet Decoupling for Adversarially Robust Image Retrieval

Qiwei Tian,Chenhao Lin,Zhengyu Zhao,Qian Li,Chao Shen

Adversarial training has achieved substantial performance in defending image retrieval against adversarial examples. However, existing studies in deep metric learning (DML) still suffer from two major limitations: weak adversary and model collapse. In this paper, we address these two limitations by proposing collapse-aware triplet decoupling (CA-TRIDE). Specifically, TRIDE yields a strong adversary by spatially decoupling the perturbation targets into the anchor and the other candidates. Furthermore, CA prevents the consequential model collapse, based on a novel metric, collapseness, which is incorporated into the optimization of perturbation. We also identify two drawbacks of the existing robustness metric in image retrieval and propose a new metric for a more reasonable robustness evaluation. Extensive experiments on three datasets demonstrate that CA-TRIDE outperforms existing defense methods in both conventional and new metrics.

MoDELS · 語言模型化 · 可約的 · 視覺問答 · Vision ·

2024 年 2 月 11 日

Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal,Jihan Yin,Erhan Bas

from arxiv, AAAI 2024

Instruction tuned Large Vision Language Models (LVLMs) have significantly advanced in generalizing across a diverse set of multi-modal tasks, especially for Visual Question Answering (VQA). However, generating detailed responses that are visually grounded is still a challenging task for these models. We find that even the current state-of-the-art LVLMs (InstructBLIP) still contain a staggering 30 percent of the hallucinatory text in the form of non-existent objects, unfaithful descriptions, and inaccurate relationships. To address this, we introduce M-HalDetect, a (M)ultimodal (Hal)lucination (Detect)ion Dataset that can be used to train and benchmark models for hallucination detection and prevention. M-HalDetect consists of 16k fine-grained annotations on VQA examples, making it the first comprehensive multi-modal hallucination detection dataset for detailed image descriptions. Unlike previous work that only consider object hallucination, we additionally annotate both entity descriptions and relationships that are unfaithful. To demonstrate the potential of this dataset for hallucination prevention, we optimize InstructBLIP through our novel Fine-grained Direct Preference Optimization (FDPO). We also train fine-grained multi-modal reward models from InstructBLIP and evaluate their effectiveness with best-of-n rejection sampling. We perform human evaluation on both FDPO and rejection sampling, and find that they reduce hallucination rates in InstructBLIP by 41% and 55% respectively. We also find that our reward model generalizes to other multi-modal models, reducing hallucinations in LLaVA and mPLUG-OWL by 15% and 57% respectively, and has strong correlation with human evaluated accuracy scores.

Performance · 統計量 · 估計/估計量 · MoDELS · HTTPS ·

2024 年 2 月 9 日

Introducing Grid WAR: Rethinking WAR for Starting Pitchers

Ryan S. Brill,Abraham J. Wyner

The baseball statistic "Wins Above Replacement" (WAR) has emerged as one of the most popular evaluation metrics. But it is not readily observed and tabulated; WAR is an estimate of a parameter in a vaguely defined model with all its attendant assumptions. Industry-standard models of WAR for starting pitchers from FanGraphs and Baseball Reference all assume that season-long averages are sufficient statistics for a pitcher's performance. This provides an invalid mathematical foundation for many reasons, especially because WAR should not be linear with respect to any counting statistic. To repair this defect, as well as many others, we devise a new measure, Grid WAR, which accurately estimates a starting pitcher's WAR on a per-game basis. The convexity of Grid WAR diminishes the impact of "blow-up" games and upweights exceptional games, raising the valuation of pitchers like Sandy Koufax, Whitey Ford, and Catfish Hunter who exhibit fundamental game-by-game variance. Grid WAR is designed to accurately measure past performance, but also has predictive value insofar as a pitcher's Grid WAR is better than WAR at predicting future performance. Finally, at //gridwar.xyz we host a Shiny app which displays the Grid WAR results of each MLB game since 1952, including career, season, and game level results, which updates automatically every morning.

MoDELS · 優化器 · Analysis · 推斷 · 估計/估計量 ·

2022 年 9 月 19 日

A Survey of Deep Causal Model

Zongyu Li,Zhenfeng Zhu

The concept of causality plays an important role in human cognition . In the past few decades, causal inference has been well developed in many fields, such as computer science, medicine, economics, and education. With the advancement of deep learning techniques, it has been increasingly used in causal inference against counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective optimization functions to estimate counterfactual data unbiasedly based on the different optimization methods. This paper focuses on the survey of the deep causal models, and its core contributions are as follows: 1) we provide relevant metrics under multiple treatments and continuous-dose treatment; 2) we incorporate a comprehensive overview of deep causal models from both temporal development and method classification perspectives; 3) we assist a detailed and comprehensive classification and analysis of relevant datasets and source code.

Single-Shot · Branch · 目標檢測 · 推斷 · MS ·

2018 年 4 月 8 日

Single-Shot Object Detection with Enriched Semantics

Zhishuai Zhang,Siyuan Qiao,Cihang Xie,Wei Shen,Bo Wang,Alan L. Yuille

We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.

Performance · 相似度度量 · Performer · state-of-the-art · 圖像檢索 ·

2018 年 4 月 6 日

Cross-Domain Image Matching with Deep Feature Maps

Bailey Kong,James Supancic,Deva Ramanan,Charless C. Fowlkes

We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features extracted by pre-trained convolutional neural nets are surprisingly effective descriptors for this specialized domains. However, the choice of similarity measure for matching exemplars to a query image is essential to good performance. For matching multi-channel deep features, we propose the use of multi-channel normalized cross-correlation and analyze its effectiveness. Our proposed metric significantly improves performance in matching crime scene shoeprints to laboratory test impressions. We also show its effectiveness in other cross-domain image retrieval problems: matching facade images to segmentation labels and aerial photos to map images. Finally, we introduce a discriminatively trained variant and fine-tune our system through our proposed metric, obtaining state-of-the-art performance.