国产乱理伦片A级在线看,忘忧草社区中文字幕WWW,轻轻色在线视频中文字幕,五月天婷婷一区二区三区,色哟哟国产精品一区

A major challenge in the field of Text Generation is evaluation because we lack a sound theory that can be leveraged to extract guidelines for evaluation campaigns. In this work, we propose a first step towards such a theory that incorporates different sources of uncertainty, such as imperfect automated metrics and insufficiently sized test sets. The theory has practical applications, such as determining the number of samples needed to reliably distinguish the performance of a set of Text Generation systems in a given setting. We showcase the application of the theory on the WMT 21 and Spot-The-Bot evaluation data and outline how it can be leveraged to improve the evaluation protocol regarding the reliability, robustness, and significance of the evaluation outcome.

相關內容

Automator

關注 5

Automator是蘋果公司為他們的Mac OS X系統開發的一款軟件。 只要通過點擊拖拽鼠標等操作就可以將一系列動作組合成一個工作流，從而幫助你自動的（可重復的）完成一些復雜的工作。Automator還能橫跨很多不同種類的程序，包括：查找器、Safari網絡瀏覽器、iCal、地址簿或者其他的一些程序。它還能和一些第三方的程序一起工作，如微軟的Office、Adobe公司的Photoshop或者Pixelmator等。

可約的 · Performer · 情景 · Learning · 講稿 ·

2022 年 12 月 8 日

PKDGA: A Partial Knowledge-based Domain Generation Algorithm for Botnets

Lihai Nie,Xiaoyang Shan,Laiping Zhao,Keqiu Li

from arxiv, 12 pages,11 figures

Domain generation algorithms (DGAs) can be categorized into three types: zero-knowledge, partial-knowledge, and full-knowledge. While prior research merely focused on zero-knowledge and full-knowledge types, we characterize their anti-detection ability and practicality and find that zero-knowledge DGAs present low anti-detection ability against detectors, and full-knowledge DGAs suffer from low practicality due to the strong assumption that they are fully detector-aware. Given these observations, we propose PKDGA, a partial knowledge-based domain generation algorithm with high anti-detection ability and high practicality. PKDGA employs the reinforcement learning architecture, which makes it evolve automatically based only on the easily-observable feedback from detectors. We evaluate PKDGA using a comprehensive set of real-world datasets, and the results demonstrate that it reduces the detection performance of existing detectors from 91.7% to 52.5%. We further apply PKDGA to the Mirai malware, and the evaluations show that the proposed method is quite lightweight and time-efficient.

多跳 · 機器閱讀理解 · 數據集 · MoDELS · INFORMS ·

2022 年 12 月 8 日

A Comprehensive Survey on Multi-hop Machine Reading Comprehension Datasets and Metrics

Azade Mohammadi,Reza Ramezani,Ahmad Baraani

from arxiv, 25 pages, 22 tables, 12 figures. This paper under review at Applied intelligence journal

Multi-hop Machine reading comprehension is a challenging task with aim of answering a question based on disjoint pieces of information across the different passages. The evaluation metrics and datasets are a vital part of multi-hop MRC because it is not possible to train and evaluate models without them, also, the proposed challenges by datasets often are an important motivation for improving the existing models. Due to increasing attention to this field, it is necessary and worth reviewing them in detail. This study aims to present a comprehensive survey on recent advances in multi-hop MRC evaluation metrics and datasets. In this regard, first, the multi-hop MRC problem definition will be presented, then the evaluation metrics based on their multi-hop aspect will be investigated. Also, 15 multi-hop datasets have been reviewed in detail from 2017 to 2022, and a comprehensive analysis has been prepared at the end. Finally, open issues in this field have been discussed.

估計/估計量 · 穩健性 · ForCES · Analysis · 分解的 ·

2022 年 12 月 7 日

Nonparametric Estimation of Conditional Incremental Effects

Alec McClean,Zach Branson,Edward H. Kennedy

Conditional effect estimation has great scientific and policy importance because interventions may impact subjects differently depending on their characteristics. Previous work has focused primarily on estimating the conditional average treatment effect (CATE), which considers the difference between counterfactual mean outcomes under interventions when all subjects receive treatment and all subjects receive control. However, these interventions may be unrealistic in certain policy scenarios. Furthermore, identification of the CATE requires that all subjects have a non-zero probability of receiving treatment, or positivity, which may be unrealistic in practice. In this paper, we propose conditional effects based on incremental propensity score interventions, which are stochastic interventions under which the odds of treatment are multiplied by some user-specified factor. These effects do not require positivity for identification and can be better suited for modeling real-world policies in which people cannot be forced to treatment. We develop a projection estimator, the "Projection-Learner", and a flexible nonparametric estimator, the "I-DR-Learner", which can each estimate all the conditional effects we propose. We derive model-agnostic error guarantees for both estimators, and show that both satisfy a form of double robustness, whereby the Projection-Learner attains parametric efficiency and the I-DR-Learner attains oracle efficiency under weak convergence conditions on the nuisance function estimators. We then propose a summary of treatment effect heterogeneity, the variance of a conditional derivative, and derive a nonparametric estimator for the effect that also satisfies a form of double robustness. Finally, we demonstrate our estimators with an analysis of the the effect of ICU admission on mortality using a dataset from the (SPOT)light prospective cohort study.

估計/估計量 · Continuity · 累積分布函數 · 標準正交 · Performer ·

2022 年 12 月 7 日

Wavelet-based estimation of power densities of size-biased data

Michel H. Montoril,Aluísio Pinheiro,Brani Vidakovic

from arxiv, 45 pages, 9 figures

We propose a new wavelet-based method for density estimation when the data are size-biased. More specifically, we consider a power of the density of interest, where this power exceeds 1/2. Warped wavelet bases are employed, where warping is attained by some continuous cumulative distribution function. A special case is the conventional orthonormal wavelet estimation, where the warping distribution is the standard continuous uniform. We show that both linear and nonlinear wavelet estimators are consistent, with optimal and/or near-optimal rates. Monte Carlo simulations are performed to compare four special settings which are easy to interpret in practice. An application with a real dataset on fatal traffic accidents involving alcohol illustrates the method. We observe that warped bases provide more flexible and superior estimates for both simulated and real data. Moreover, we find that estimating the power of a density (for instance, its square root) further improves the results.

Pair · MoDELS · 聯合分布 · Learning · 可辨認的 ·

2022 年 12 月 6 日

Learning the joint distribution of two sequences using little or no paired data

Soroosh Mariooryad,Matt Shannon,Siyuan Ma,Tom Bagby,David Kao,Daisy Stanton,Eric Battenberg,RJ Skerry-Ryan

We present a noisy channel generative model of two sequences, for example text and speech, which enables uncovering the association between the two modalities when limited paired data is available. To address the intractability of the exact model under a realistic data setup, we propose a variational inference approximation. To train this variational model with categorical data, we propose a KL encoder loss approach which has connections to the wake-sleep algorithm. Identifying the joint or conditional distributions by only observing unpaired samples from the marginals is only possible under certain conditions in the data distribution and we discuss under what type of conditional independence assumptions that might be achieved, which guides the architecture designs. Experimental results show that even tiny amount of paired data (5 minutes) is sufficient to learn to relate the two modalities (graphemes and phonemes here) when a massive amount of unpaired data is available, paving the path to adopting this principled approach for all seq2seq models in low data resource regimes.

模型評估 · Machine Translation · 機器翻譯 · INFORMS · Performer ·

2022 年 12 月 6 日

ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics

Chantal Amrhein,Nikita Moghe,Liane Guillou

from arxiv, preprint for WMT 2022 with updated tables

As machine translation (MT) metrics improve their correlation with human judgement every year, it is crucial to understand the limitations of such metrics at the segment level. Specifically, it is important to investigate metric behaviour when facing accuracy errors in MT because these can have dangerous consequences in certain contexts (e.g., legal, medical). We curate ACES, a translation accuracy challenge set, consisting of 68 phenomena ranging from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. We use ACES to evaluate a wide range of MT metrics including the submissions to the WMT 2022 metrics shared task and perform several analyses leading to general recommendations for metric developers. We recommend: a) combining metrics with different strengths, b) developing metrics that give more weight to the source and less to surface-level overlap with the reference and c) explicitly modelling additional language-specific information beyond what is available via multilingual embeddings.

異常點 · CASES · 異常檢測 · 評論員 · Machine Learning ·

2021 年 10 月 21 日

Generalized Out-of-Distribution Detection: A Survey

Jingkang Yang,Kaiyang Zhou,Yixuan Li,Ziwei Liu

from arxiv, Issues, comments, and questions are all welcomed in //github.com/Jingkang50/OODSurvey

Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.

MoDELS · Performer · Processing（編程語言） · 學成 · 穩健性 ·

2021 年 9 月 3 日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Paul Michel

from arxiv, PhD thesis

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.

Extensibility · 可辨認的 · CASES · state-of-the-art · MoDELS ·

2020 年 6 月 8 日

Text Detection and Recognition in the Wild: A Review

Zobeir Raisi,Mohamed A. Naiel,Paul Fieguth,Steven Wardell,John Zelek

Detection and recognition of text in natural images are two main problems in the field of computer vision that have a wide variety of applications in analysis of sports videos, autonomous driving, industrial automation, to name a few. They face common challenging problems that are factors in how text is represented and affected by several environmental conditions. The current state-of-the-art scene text detection and/or recognition methods have exploited the witnessed advancement in deep learning architectures and reported a superior accuracy on benchmark datasets when tackling multi-resolution and multi-oriented text. However, there are still several remaining challenges affecting text in the wild images that cause existing methods to underperform due to there models are not able to generalize to unseen data and the insufficient labeled data. Thus, unlike previous surveys in this field, the objectives of this survey are as follows: first, offering the reader not only a review on the recent advancement in scene text detection and recognition, but also presenting the results of conducting extensive experiments using a unified evaluation framework that assesses pre-trained models of the selected methods on challenging cases, and applies the same evaluation criteria on these techniques. Second, identifying several existing challenges for detecting or recognizing text in the wild images, namely, in-plane-rotation, multi-oriented and multi-resolution text, perspective distortion, illumination reflection, partial occlusion, complex fonts, and special characters. Finally, the paper also presents insight into the potential research directions in this field to address some of the mentioned challenges that are still encountering scene text detection and recognition techniques.

小樣本學習 · MoDELS · Pivotal（公司） · 情景 · 標注 ·

2020 年 2 月 27 日

Few-shot Natural Language Generation for Task-Oriented Dialog

Baolin Peng,Chenguang Zhu,Chunyuan Li,Xiujun Li,Jinchao Li,Michael Zeng,Jianfeng Gao

from arxiv, Project website: //aka.ms/scgpt ; Code and data: //github.com/pengbaolin/SC-GPT

As a crucial component in task-oriented dialog systems, the Natural Language Generation (NLG) module converts a dialog act represented in a semantic form into a response in natural language. The success of traditional template-based or statistical models typically relies on heavily annotated data, which is infeasible for new domains. Therefore, it is pivotal for an NLG system to generalize well with limited labelled data in real applications. To this end, we present FewShotWoz, the first NLG benchmark to simulate the few-shot learning setting in task-oriented dialog systems. Further, we develop the SC-GPT model. It is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability, and fine-tuned with only a few domain-specific labels to adapt to new domains. Experiments on FewShotWoz and the large Multi-Domain-WOZ datasets show that the proposed SC-GPT significantly outperforms existing methods, measured by various automatic metrics and human evaluations.