三级电影一区二区三区_中文字幕无线在线视频观看_精品72久久久久久久中文字幕_日本A级作爱片金瓶双艳_午夜一级毛片福利视频_国产区日韩精品一区二区三区_日韩在线中文字幕网站

The rise in malicious usage of large language models, such as fake content creation and academic plagiarism, has motivated the development of approaches that identify AI-generated text, including those based on watermarking or outlier detection. However, the robustness of these detection algorithms to paraphrases of AI-generated text remains unclear. To stress test these detectors, we build a 11B parameter paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking, GPTZero, DetectGPT, and OpenAI's text classifier. For example, DIPPER drops detection accuracy of DetectGPT from 70.3% to 4.6% (at a constant false positive rate of 1%), without appreciably modifying the input semantics. To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider. Given a candidate text, our algorithm searches a database of sequences previously generated by the API, looking for sequences that match the candidate text within a certain threshold. We empirically verify our defense using a database of 15M generations from a fine-tuned T5-XXL model and find that it can detect 80% to 97% of paraphrased generations across different settings while only classifying 1% of human-written sequences as AI-generated. We open-source our models, code and data.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 量子計算 · MoDELS · 統計量 · Processing（編程語言） ·

2023 年 12 月 5 日

Quantum delegation with an off-the-shelf device

Anne Broadbent,Arthur Mehta,Yuming Zhao

from arxiv, 42 pages. This version (v2) contains new results that were not presented in an earlier version (v1) of this paper. We have also rephrased the OTS model to focus on the OTS device being generic and efficient

Given that reliable cloud quantum computers are becoming closer to reality, the concept of delegation of quantum computations and its verifiability is of central interest. Many models have been proposed, each with specific strengths and weaknesses. Here, we put forth a new model where the client trusts only its classical processing, makes no computational assumptions, and interacts with a quantum server in a single round. In addition, during a set-up phase, the client specifies the size $n$ of the computation and receives an untrusted, off-the-shelf (OTS) quantum device that is used to report the outcome of a single measurement. We show how to delegate polynomial-time quantum computations in the OTS model. This also yields an interactive proof system for all of QMA, which, furthermore, we show can be accomplished in statistical zero-knowledge. This provides the first relativistic (one-round), two-prover zero-knowledge proof system for QMA. As a proof approach, we provide a new self-test for n EPR pairs using only constant-sized Pauli measurements, and show how it provides a new avenue for the use of simulatable codes for local Hamiltonian verification. Along the way, we also provide an enhanced version of a well-known stability result due to Gowers and Hatami and show how it completes a common argument used in self-testing.

ChatGPT · Performer · ReQuEST · 樣例 · Prompt ·

2023 年 12 月 4 日

ChatGPT and post-test probability

Samuel J. Weisenthal

from arxiv, 138 pages, 4 tables. Updated abstract, methods, affiliations, and acknowledgements

Reinforcement learning-based large language models, such as ChatGPT, are believed to have potential to aid human experts in many domains, including healthcare. There is, however, little work on ChatGPT's ability to perform a key task in healthcare: formal, probabilistic medical diagnostic reasoning. This type of reasoning is used, for example, to update a pre-test probability to a post-test probability. In this work, we probe ChatGPT's ability to perform this task. In particular, we ask ChatGPT to give examples of how to use Bayes rule for medical diagnosis. Our prompts range from queries that use terminology from pure probability (e.g., requests for a "posterior probability") to queries that use terminology from the medical diagnosis literature (e.g., requests for a "post-test probability"). We show how the introduction of medical variable names leads to an increase in the number of errors that ChatGPT makes. Given our results, we also show how one can use prompt engineering to facilitate ChatGPT's partial avoidance of these errors. We discuss our results in light of recent commentaries on sensitivity and specificity. We also discuss how our results might inform new research directions for large language models.

MoDELS · fMRI · 縮放 · Performer · 語言模型化 ·

2023 年 12 月 4 日

Scaling laws for language encoding models in fMRI

Richard Antonello,Aditya Vaidya,Alexander G. Huth

from arxiv, Accepted to the Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS 2023)

Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales logarithmically with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar logarithmic behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable improvements with model size. A noise ceiling analysis of these large, high-performance encoding models showed that performance is nearing the theoretical maximum for brain areas such as the precuneus and higher auditory cortex. These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain, enabling better scientific understanding as well as applications such as decoding.

TOOLS · Bioinformatics · 蓋樂世（Galaxy） · Analysis · Performer ·

2023 年 12 月 4 日

Right-sizing compute resource allocations for bioinformatics tools with Total Perspective Vortex

Nuwan Goonasekera,Catherine Bromhead,Simon Gladman,Nate Coraor,Bjorn Gruning,Enis Afgan

In biomedical research, computational methods have become indispensable and their use is increasing, making the efficient allocation of computing resources paramount. Practitioners routinely allocate resources far in excess of what is required for batch processing jobs, leading to not just inflated wait times and costs, but also unnecessary carbon emissions. This is not without reason however, as accurately determining resource needs is complex, affected by the nature of tools, data size, and analysis parameters, especially on popular servers that handle numerous jobs. The Galaxy platform, a web-based hub for biomedical analysis used globally by scientists, exemplifies this challenge. Serving nearly half a million registered users and managing around 2 million monthly jobs, Galaxy's growth outpaces the resources at its disposal. This is necessitating smarter resource utilization. To address this, we have developed a tool named Total Perspective Vortex (TPV) - a software package that right-sizes resource allocations for each job. TPV is able to dynamically set resource requirements for individual jobs and perform meta-scheduling across heterogeneous resources. It also includes a first-ever community-curated database of default resource requirements for nearly 1,000 popular bioinformatics tools. Deployments in Galaxy Australia and Europe demonstrate its effectiveness with meta-scheduling user jobs and an improved experience for systems administrators managing Galaxy servers.

REST · 同質 · AIM · 成對型 · 值域 ·

2023 年 12 月 4 日

Spectral homogeneity cross frequencies can be a quality metric for the large-scale resting EEG preprocessing

Shiang Hu,Jie Ruan,Nicolas Langer,Jorge Bosch-Bayard,Zhao Lv,Dezhong Yao,Pedro Antonio Valdes-Sosa

The brain projects require the collection of massive electrophysiological data, aiming to the longitudinal, sectional, or populational neuroscience studies. Quality metrics automatically label the data after centralized preprocessing. However, although the waveforms-based metrics are partially useful, they may be unreliable by neglecting the spectral profiles. Here, we detected the phenomenon of parallel log spectra (PaLOS) that the scalp EEG power in the log scale were parallel to each other from 10% of 2549 HBN EEG. This phenomenon was reproduced in 8% of 412 PMDT EEG from 4 databases. We designed the PaLOS index (PaLOSi) to indicate this phenomenon by decomposing the cross-spectra at different frequencies into the common principal component spaces. We found that the PaLOS biophysically implied a prominently dominant dipole in the source space which was implausible for the resting EEG. And it may be practically resulted from excessive preprocessing. Compared with the 1966 normative EEG cross-spectra, the HBN and the PMDT EEG with PaLOS presented generally much higher electrode pairwise coherences and higher similarity of coherence-based network patterns, which went against the known frequency dependent characteristic of coherence networks. We suggest the PaLOSi should lay in the range of 0.4-0.7 for large resting EEG quality assurance.

解碼 · LDPC · 連結 · 估計/估計量 · 似然 ·

2023 年 12 月 4 日

Soft-output (SO) GRAND and long, low rate codes to outperform 5 LDPCs

Peihong Yuan,Muriel Medard,Kevin Galligan,Ken R. Duffy

from arxiv, arXiv admin note: substantial text overlap with arXiv:2305.05777

We establish that a large and flexible class of long, high redundancy error correcting codes can be efficiently and accurately decoded with guessing random additive noise decoding (GRAND). Performance evaluation demonstrates that it is possible to construct simple concatenated codes that outperform low-density parity-check (LDPC) codes found in the 5G New Radio standard. The concatenated structure enables many desirable features, including: low-complexity hardware-friendly encoding and decoding; high levels of flexibility in length and rate through modularity; and high levels of parallelism in encoding and decoding that enable low latency. Central to this is the development of a method through which any soft-input (SI) GRAND algorithm can provide soft-output (SO) in the form of an accurate a-posteriori estimate of the likelihood that a decoding is correct or, in the case of list decoding, the likelihood that each element of the list is correct. The key distinguishing feature of SOGRAND in comparison to other methods is the provision of an estimate that the correct decoding has not been found, even when providing a single decoding. That per-block SO can be converted into accurate per-bit SO by a weighted sum that includes a term for the SI. Crucially, implementing SOGRAND adds negligible computation and memory to the existing decoding process, and using it results in a practical alternative to LDPC codes.

組合性 · Attention · MoDELS · motivation · Continuity ·

2023 年 12 月 1 日

Contextualized word senses: from attention to compositionality

Pablo Gamallo

The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very fruitful, they continue to be models with little or no interpretability and explainability. One of the tasks for which they are best suited is the encoding of the contextual sense of words using contextualized embeddings. In this paper we propose a transparent, interpretable, and linguistically motivated strategy for encoding the contextual sense of words by modeling semantic compositionality. Particular attention is given to dependency relations and semantic notions such as selection preferences and paradigmatic classes. A partial implementation of the proposed model is carried out and compared with Transformer-based architectures for a given semantic task, namely the similarity calculation of word senses in context. The results obtained show that it is possible to be competitive with linguistically motivated models instead of using the black boxes underlying complex neural architectures.

展開 · FAST · 分離的 · 數據集 · Better ·

2023 年 12 月 1 日

Unfolder: Fast localization and image rectification of a document with a crease from folding in half

A. M. Ershov,D. V. Tropin,E. E. Limonova,D. P. Nikolaev,V. V. Arlazarov

from arxiv, This is a preprint of the article accepted for publication in the journal "Computer Optics"

Presentation of folded documents is not an uncommon case in modern society. Digitizing such documents by capturing them with a smartphone camera can be tricky since a crease can divide the document contents into separate planes. To unfold the document, one could hold the edges potentially obscuring it in a captured image. While there are many geometrical rectification methods, they were usually developed for arbitrary bends and folds. We consider such algorithms and propose a novel approach Unfolder developed specifically for images of documents with a crease from folding in half. Unfolder is robust to projective distortions of the document image and does not fragment the image in the vicinity of a crease after rectification. A new Folded Document Images dataset was created to investigate the rectification accuracy of folded (2, 3, 4, and 8 folds) documents. The dataset includes 1600 images captured when document placed on a table and when held in hand. The Unfolder algorithm allowed for a recognition error rate of 0.33, which is better than the advanced neural network methods DocTr (0.44) and DewarpNet (0.57). The average runtime for Unfolder was only 0.25 s/image on an iPhone XR.

GPS · Analysis · 潛在 · Extensibility · MoDELS ·

2023 年 12 月 1 日

Auto-encoding GPS data to reveal individual and collective behaviour

Saint-Clair Chabert-Liddell,Nicolas Bez,Pierre Gloaguen,Sophie Donnet,Stéphanie Mahévas

We propose an innovative and generic methodology to analyse individual and collective behaviour through individual trajectory data. The work is motivated by the analysis of GPS trajectories of fishing vessels collected from regulatory tracking data in the context of marine biodiversity conservation and ecosystem-based fisheries management. We build a low-dimensional latent representation of trajectories using convolutional neural networks as non-linear mapping. This is done by training a conditional variational auto-encoder taking into account covariates. The posterior distributions of the latent representations can be linked to the characteristics of the actual trajectories. The latent distributions of the trajectories are compared with the Bhattacharyya coefficient, which is well-suited for comparing distributions. Using this coefficient, we analyse the variation of the individual behaviour of each vessel during time. For collective behaviour analysis, we build proximity graphs and use an extension of the stochastic block model for multiple networks. This model results in a clustering of the individuals based on their set of trajectories. The application to French fishing vessels enables us to obtain groups of vessels whose individual and collective behaviours exhibit spatio-temporal patterns over the period 2014-2018.

MoDELS · 模型評估 · NLP · Extensibility · 可辨認的 ·

2020 年 5 月 8 日

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Marco Tulio Ribeiro,Tongshuang Wu,Carlos Guestrin,Sameer Singh

Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.