2020久久精品亚洲热综合,亚洲精品无码中出中文字幕,97尤物人妻在线视频,日韩美女一级操逼毛片免费看

Tens of thousands of engineers use Sourcegraph day-to-day to search for code and rely on it to make progress on software development tasks. We face a key challenge in designing a query language that accommodates the needs of a broad spectrum of users. Our experience shows that users express different and often contradictory preferences for how queries should be interpreted. These preferences stem from users with differing usage contexts, technical experience, and implicit expectations from using prior tools. At the same time, designing a code search query language poses unique challenges because it intersects traditional search engines and full-fledged programming languages. For example, code search queries adopt certain syntactic conventions in the interest of simplicity and terseness but invariably risk encoding implicit semantics that are ambiguous at face-value (a single space in a query could mean three or more semantically different things depending on surrounding terms). Users often need to disambiguate intent with additional syntax so that a query expresses what they actually want to search. This need to disambiguate is one of the primary frustrations we've seen users experience with writing search queries in the last three years. We share our observations that lead us to a fresh perspective where code search behavior can straddle seemingly ambiguous queries. We develop Automated Query Evaluation (AQE), a new technique that automatically generates and adaptively runs alternative query interpretations in frustration-prone conditions. We evaluate AQE with an A/B test across more than 10,000 unique users on our publicly-available code search instance. Our main result shows that relative to the control group, users are on average 22% more likely to click on a search result at all on any given day when AQE is active.

相關內容

代碼

關注 10

代碼（Code）是專知網的一個重要知識資料文檔板塊，旨在整理收錄論文源代碼、復現代碼，經典工程代碼等，便于用戶查閱下載使用。

情景 · Analysis · 知識 (knowledge) · 可辨認的 · 類別 ·

2023 年 2 月 9 日

Pushing the Boundaries of Private, Large-Scale Query Answering

Brendan Avent,Aleksandra Korolova

from arxiv, Workshop on Privacy-Preserving Artificial Intelligence (PPAI-23), AAAI, 2023

We address the problem of efficiently and effectively answering large numbers of queries on a sensitive dataset while ensuring differential privacy (DP). We separately analyze this problem in two distinct settings, grounding our work in a state-of-the-art DP mechanism for large-scale query answering: the Relaxed Adaptive Projection (RAP) mechanism. The first setting is a classic setting in DP literature where all queries are known to the mechanism in advance. Within this setting, we identify challenges in the RAP mechanism's original analysis, then overcome them with an enhanced implementation and analysis. We then extend the capabilities of the RAP mechanism to be able to answer a more general and powerful class of queries (r-of-k thresholds) than previously considered. Empirically evaluating this class, we find that the mechanism is able to answer orders of magnitude larger sets of queries than prior works, and does so quickly and with high utility. We then define a second setting motivated by real-world considerations and whose definition is inspired by work in the field of machine learning. In this new setting, a mechanism is only given partial knowledge of queries that will be posed in the future, and it is expected to answer these future-posed queries with high utility. We formally define this setting and how to measure a mechanism's utility within it. We then comprehensively empirically evaluate the RAP mechanism's utility within this new setting. From this evaluation, we find that even with weak partial knowledge of the future queries that will be posed, the mechanism is able to efficiently and effectively answer arbitrary queries posed in the future. Taken together, the results from these two settings advance the state of the art on differentially private large-scale query answering.

約束 · Cognition · CASE · 相似度 · Vision ·

2023 年 2 月 9 日

A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

Uri Berger,Lea Frermann,Gabriel Stanovsky,Omri Abend

from arxiv, Accepted to EACL 2023 Findings

We present a large, multilingual study into how vision constrains linguistic choice, covering four languages and five linguistic properties, such as verb transitivity or use of numerals. We propose a novel method that leverages existing corpora of images with captions written by native speakers, and apply it to nine corpora, comprising 600k images and 3M captions. We study the relation between visual input and linguistic choices by training classifiers to predict the probability of expressing a property from raw images, and find evidence supporting the claim that linguistic properties are constrained by visual context across languages. We complement this investigation with a corpus study, taking the test case of numerals. Specifically, we use existing annotations (number or type of objects) to investigate the effect of different visual conditions on the use of numeral expressions in captions, and show that similar patterns emerge across languages. Our methods and findings both confirm and extend existing research in the cognitive literature. We additionally discuss possible applications for language generation.

語言模型化 · TOOLS · MoDELS · Performer · SimPLe ·

2023 年 2 月 9 日

Toolformer: Language Models Can Teach Themselves to Use Tools

Timo Schick,Jane Dwivedi-Yu,Roberto Dessì,Roberta Raileanu,Maria Lomeli,Luke Zettlemoyer,Nicola Cancedda,Thomas Scialom

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.

Weight · Learning · 相似度 · 動力系統 · MoDELS ·

2023 年 2 月 8 日

Learning Dynamical Systems by Leveraging Data from Similar Systems

Lei Xin,Lintao Ye,George Chiu,Shreyas Sundaram

from arxiv, 14 pages,9 figures. Submitting to IEEE Transactions on Automatic Control. arXiv admin note: text overlap with arXiv:2204.05446

We consider the problem of learning the dynamics of a linear system when one has access to data generated by an auxiliary system that shares similar (but not identical) dynamics, in addition to data from the true system. We use a weighted least squares approach, and provide a finite sample error bound of the learned model as a function of the number of samples and various system parameters from the two systems as well as the weight assigned to the auxiliary data. We show that the auxiliary data can help to reduce the intrinsic system identification error due to noise, at the price of adding a portion of error that is due to the differences between the two system models. We further provide a data-dependent bound that is computable when some prior knowledge about the systems is available. This bound can also be used to determine the weight that should be assigned to the auxiliary data during the model training stage.

推斷 · 統計量 · motivation · 特化 · 方陣 ·

2023 年 2 月 8 日

Reactmine: a statistical search algorithm for inferring chemical reactions from time series data

Julien Martinelli,Jeremy Grignard,Sylvain Soliman,Annabelle Ballesta,Fran?ois Fages

Inferring chemical reaction networks (CRN) from concentration time series is a challenge encouragedby the growing availability of quantitative temporal data at the cellular level. This motivates thedesign of algorithms to infer the preponderant reactions between the molecular species observed ina given biochemical process, and build CRN structure and kinetics models. Existing ODE-basedinference methods such as SINDy resort to least square regression combined with sparsity-enforcingpenalization, such as Lasso. However, we observe that these methods fail to learn sparse modelswhen the input time series are only available in wild type conditions, i.e. without the possibility toplay with combinations of zeroes in the initial conditions. We present a CRN inference algorithmwhich enforces sparsity by inferring reactions in a sequential fashion within a search tree of boundeddepth, ranking the inferred reaction candidates according to the variance of their kinetics on theirsupporting transitions, and re-optimizing the kinetic parameters of the CRN candidates on the wholetrace in a final pass. We show that Reactmine succeeds both on simulation data by retrievinghidden CRNs where SINDy fails, and on two real datasets, one of fluorescence videomicroscopyof cell cycle and circadian clock markers, the other one of biomedical measurements of systemiccircadian biomarkers possibly acting on clock gene expression in peripheral organs, by inferringpreponderant regulations in agreement with previous model-based analyses. The code is available at//gitlab.inria.fr/julmarti/crninf/ together with introductory notebooks.

正則化項 · 代價 · INFORMS · ONCE · SCAN ·

2023 年 2 月 7 日

Energy Complexity of Regular Languages

F?rat K?yak,A. C. Cem Say

from arxiv, A preliminary version of this paper by Y{\i}lmaz, \"O., K{\i}yak, F., \"Ung\"or, M., and Say, A.C.C. appeared in the CIAA 2022 proceedings. Contains proofs omitted from that version due to space constraints, as well as additional material

Each step that results in a bit of information being ``forgotten'' by a computing device has an intrinsic energy cost. Although any Turing machine can be rewritten to be thermodynamically reversible without changing the recognized language, finite automata that are restricted to scan their input once in ``real-time'' fashion can only recognize the members of a proper subset of the class of regular languages in this reversible manner. We study the energy expenditure associated with the computations of deterministic and quantum finite automata. We prove that zero-error quantum finite automata have no advantage over their classical deterministic counterparts in terms of the maximum obligatory thermodynamic cost associated by any step during the recognition of different regular languages. We also demonstrate languages for which ``error can be traded for energy'', i.e. whose zero-error recognition is associated with computation steps having provably bigger obligatory energy cost when compared to their bounded-error recognition by real-time finite-memory quantum devices. We show that regular languages can be classified according to the intrinsic energy requirements on the recognizing automaton as a function of input length, and prove upper and lower bounds.

可理解性 · 有向 · 塊 · Engineering · Attention ·

2023 年 2 月 7 日

An Empirical Study on Software Bill of Materials: Where We Stand and the Road Ahead

Boming Xia,Tingting Bi,Zhenchang Xing,Qinghua Lu,Liming Zhu

from arxiv, Accepted to ICSE 2023; Camera-ready version

The rapid growth of software supply chain attacks has attracted considerable attention to software bill of materials (SBOM). SBOMs are a crucial building block to ensure the transparency of software supply chains that helps improve software supply chain security. Although there are significant efforts from academia and industry to facilitate SBOM development, it is still unclear how practitioners perceive SBOMs and what are the challenges of adopting SBOMs in practice. Furthermore, existing SBOM-related studies tend to be ad-hoc and lack software engineering focuses. To bridge this gap, we conducted the first empirical study to interview and survey SBOM practitioners. We applied a mixed qualitative and quantitative method for gathering data from 17 interviewees and 65 survey respondents from 15 countries across five continents to understand how practitioners perceive the SBOM field. We summarized 26 statements and grouped them into three topics on SBOM's states of practice. Based on the study results, we derived a goal model and highlighted future directions where practitioners can put in their effort.

語言模型化 · MoDELS · Performer · 詞義消歧 · AIM ·

2023 年 2 月 7 日

What do Language Models know about word senses? Zero-Shot WSD with Language Models and Domain Inventories

Oscar Sainz,Oier Lopez de Lacalle,Eneko Agirre,German Rigau

from arxiv, Presented at GWC2023

Language Models are the core for almost any Natural Language Processing system nowadays. One of their particularities is their contextualized representations, a game changer feature when a disambiguation between word senses is necessary. In this paper we aim to explore to what extent language models are capable of discerning among senses at inference time. We performed this analysis by prompting commonly used Languages Models such as BERT or RoBERTa to perform the task of Word Sense Disambiguation (WSD). We leverage the relation between word senses and domains, and cast WSD as a textual entailment problem, where the different hypothesis refer to the domains of the word senses. Our results show that this approach is indeed effective, close to supervised systems.

文本分類 · 標注 · Extensibility · state-of-the-art · 正則化項 ·

2021 年 2 月 15 日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Yu Zhang,Zhihong Shen,Yuxiao Dong,Kuansan Wang,Jiawei Han

from arxiv, 12 pages; Accepted to WWW 2021

Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set. Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. However, most existing studies focus on only modeling the text information, with a few attempts to utilize either metadata or hierarchy signals, but not both of them. In this paper, we bridge the gap by formalizing the problem of metadata-aware text classification in a large label hierarchy (e.g., with tens of thousands of labels). To address this problem, we present the MATCH solution -- an end-to-end framework that leverages both metadata and hierarchy information. To incorporate metadata, we pre-train the embeddings of text and metadata in the same space and also leverage the fully-connected attentions to capture the interrelations between them. To leverage the label hierarchy, we propose different ways to regularize the parameters and output probability of each child label by its parents. Extensive experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH over state-of-the-art deep learning baselines.