亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Compiled software is delivered as executable binary code. Developers write source code to express the software semantics, but the compiler converts it to a binary format that the CPU can directly execute. Therefore, binary code analysis is critical to applications in reverse engineering and computer security tasks where source code is not available. However, unlike source code and natural language that contain rich semantic information, binary code is typically difficult for human engineers to understand and analyze. While existing work uses AI models to assist source code analysis, few studies have considered binary code. In this paper, we propose a COntrastive learning Model for Binary cOde Analysis, or COMBO, that incorporates source code and comment information into binary code during representation learning. Specifically, we present three components in COMBO: (1) a primary contrastive learning method for cold-start pre-training, (2) a simplex interpolation method to incorporate source code, comments, and binary code, and (3) an intermediate representation learning algorithm to provide binary code embeddings. Finally, we evaluate the effectiveness of the pre-trained representations produced by COMBO using three indicative downstream tasks relating to binary code: algorithmic functionality classification, binary code similarity, and vulnerability detection. Our experimental results show that COMBO facilitates representation learning of binary code visualized by distribution analysis, and improves the performance on all three downstream tasks by 5.45% on average compared to state-of-the-art large-scale language representation models. To the best of our knowledge, COMBO is the first language representation model that incorporates source code, binary code, and comments into contrastive code representation learning and unifies multiple tasks for binary code analysis.

相關內容

Recent work has shown how to prompt large language models with explanations to obtain strong performance on textual reasoning tasks, i.e., the chain-of-thought paradigm. However, subtly different explanations can yield widely varying downstream task accuracy. Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance. This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion. We first generate sets of candidate explanations for each example in the prompt using a leave-one-out scheme, then find an effective combination of these explanations with a two-stage framework. We first evaluate explanations for each in-context example in isolation according to two proxy metrics, log likelihood and accuracy on new examples. Then, we search over combinations of explanations to find one that yields high performance against a silver-labeled development set. Across four textual reasoning tasks spanning question answering, mathematical reasoning, and natural language inference, results show that our proxy metrics correlate with ground truth accuracy and our overall method can effectively improve prompts over crowdworker annotations and naive search strategies

The study of the generalized Hamming weight of linear codes is a significant research topic in coding theory as it conveys the structural information of the codes and determines their performance in various applications. However, determining the generalized Hamming weights of linear codes, especially the weight hierarchy, is generally challenging. In this paper, we investigate the generalized Hamming weights of a class of linear code $\C$ over $\bF_q$, which is constructed from defining sets. These defining sets are either special simplicial complexes or their complements in $\bF_q^m$. We determine the complete weight hierarchies of these codes by analyzing the maximum or minimum intersection of certain simplicial complexes and all $r$-dimensional subspaces of $\bF_q^m$, where $1\leq r\leq {\rm dim}_{\bF_q}(\C)$.

Guidance in conditional diffusion generation is of great importance for sample quality and controllability. However, existing guidance schemes are to be desired. On one hand, mainstream methods such as classifier guidance and classifier-free guidance both require extra training with labeled data, which is time-consuming and unable to adapt to new conditions. On the other hand, training-free methods such as universal guidance, though more flexible, have yet to demonstrate comparable performance. In this work, through a comprehensive investigation into the design space, we show that it is possible to achieve significant performance improvements over existing guidance schemes by leveraging off-the-shelf classifiers in a training-free fashion, enjoying the best of both worlds. Employing calibration as a general guideline, we propose several pre-conditioning techniques to better exploit pretrained off-the-shelf classifiers for guiding diffusion generation. Extensive experiments on ImageNet validate our proposed method, showing that state-of-the-art diffusion models (DDPM, EDM, DiT) can be further improved (up to 20%) using off-the-shelf classifiers with barely any extra computational cost. With the proliferation of publicly available pretrained classifiers, our proposed approach has great potential and can be readily scaled up to text-to-image generation tasks. The code is available at //github.com/AlexMaOLS/EluCD/tree/main.

Testing is an important aspect of software development, but unfortunately, it is often neglected. While test quality analyses such as code coverage or mutation analysis inform developers about the quality of their tests, such reports are viewed only sporadically during continuous integration or code review, if they are considered at all, and their impact on the developers' testing behavior therefore tends to be negligible. To actually influence developer behavior, it may rather be necessary to motivate developers directly within their programming environment, while they are coding. We introduce IntelliGame, a gamified plugin for the popular IntelliJ Java Integrated Development Environment, which rewards developers for positive testing behavior using a multi-level achievement system: A total of 27 different achievements, each with incremental levels, provide affirming feedback when developers exhibit commendable testing behavior, and provide an incentive to further continue and improve this behavior. A controlled experiment with 49 participants given a Java programming task reveals substantial differences in the testing behavior triggered by IntelliGame: Incentivized developers write more tests, achieve higher coverage and mutation scores, run their tests more often, and achieve functionality earlier.

Digitization of scanned receipts aims to extract text from receipt images and save it into structured documents. This is usually split into two sub-tasks: text localization and optical character recognition (OCR). Most existing OCR models only focus on the cropped text instance images, which require the bounding box information provided by a text region detection model. Introducing an additional detector to identify the text instance images in advance adds complexity, however instance-level OCR models have very low accuracy when processing the whole image for the document-level OCR, such as receipt images containing multiple text lines arranged in various layouts. To this end, we propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end. Specifically, we finetune the pretrained instance-level model TrOCR with randomly cropped image chunks, and gradually increase the image chunk size to generalize the recognition ability from instance images to full-page images. In our experiments on the SROIE receipt OCR dataset, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8% character error rate (CER), respectively, which outperforms the baseline results with 48.5 F1-score and 50.6% CER. The best model, which splits the full image into 15 equally sized chunks, gives 87.8 F1-score and 4.98% CER with minimal additional pre or post-processing of the output. Moreover, the characters in the generated document-level sequences are arranged in the reading order, which is practical for real-world applications.

Locally repairable codes (LRCs), which can recover any symbol of a codeword by reading only a small number of other symbols, have been widely used in real-world distributed storage systems, such as Microsoft Azure Storage and Ceph Storage Cluster. Since binary linear LRCs can significantly reduce coding and decoding complexity, constructions of binary LRCs are of particular interest. The aim of this paper is to construct dimensional optimal binary locally repairable codes with disjoint local repair groups. We introduce how to connect intersection subspaces with binary locally repairable codes and construct dimensional optimal binary linear LRCs with locality $2^b$ ($b\geq 3$) and minimum distance $d\geq 6$ by employing intersection subspaces deduced from the direct sum. This method will sufficiently increase the number of possible repair groups of dimensional optimal LRCs, and thus efficiently expanding the range of the construction parameters while keeping the largest code rates compared with all known binary linear LRCs with minimum distance $d\geq 6$ and locality $2^b$ ($b\geq 3$).

Tensor-valued data arise frequently from a wide variety of scientific applications, and many among them can be translated into an alteration detection problem of tensor dependence structures. In this article, we formulate the problem under the popularly adopted tensor-normal distributions and aim at two-sample correlation/partial correlation comparisons of tensor-valued observations. Through decorrelation and centralization, a separable covariance structure is employed to pool sample information from different tensor modes to enhance the power of the test. Additionally, we propose a novel Sparsity-Exploited Reranking Algorithm (SERA) to further improve the multiple testing efficiency. The algorithm is approached through reranking of the p-values derived from the primary test statistics, by incorporating a carefully constructed auxiliary tensor sequence. Besides the tensor framework, SERA is also generally applicable to a wide range of two-sample large-scale inference problems with sparsity structures, and is of independent interest. The asymptotic properties of the proposed test are derived and the algorithm is shown to control the false discovery at the pre-specified level. We demonstrate the efficacy of the proposed method through intensive simulations and two scientific applications.

While existing work in robust deep learning has focused on small pixel-level $\ell_p$ norm-based perturbations, this may not account for perturbations encountered in several real world settings. In many such cases although test data might not be available, broad specifications about the types of perturbations (such as an unknown degree of rotation) may be known. We consider a setup where robustness is expected over an unseen test domain that is not i.i.d. but deviates from the training domain. While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes. We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space, without having access to the data from the test domain. Our adversarial training solves a min-max optimization problem, with the inner maximization generating adversarial perturbations, and the outer minimization finding model parameters by optimizing the loss on adversarial perturbations generated from the inner maximization. We demonstrate the applicability of our approach on three types of naturally occurring perturbations -- object-related shifts, geometric transformations, and common image corruptions. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations. We demonstrate the usefulness of the proposed approach by showing the robustness gains of deep neural networks trained using our adversarial training on MNIST, CIFAR-10, and a new variant of the CLEVR dataset.

We consider the problem of referring image segmentation. Given an input image and a natural language expression, the goal is to segment the object referred by the language expression in the image. Existing works in this area treat the language expression and the input image separately in their representations. They do not sufficiently capture long-range correlations between these two modalities. In this paper, we propose a cross-modal self-attention (CMSA) module that effectively captures the long-range dependencies between linguistic and visual features. Our model can adaptively focus on informative words in the referring expression and important regions in the input image. In addition, we propose a gated multi-level fusion module to selectively integrate self-attentive cross-modal features corresponding to different levels in the image. This module controls the information flow of features at different levels. We validate the proposed approach on four evaluation datasets. Our proposed approach consistently outperforms existing state-of-the-art methods.

Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.

北京阿比特科技有限公司