苍井空无码免费换线_综合综合综合综合综合网_日韩一区二区在线观看免费视频_久久91超碰色中文字幕总站_色国产精品色哟哟一区在线_亚洲欧洲最大的黄色视频_国产又硬又粗进去好爽A

Given a binary word relation $\tau$ onto A * and a finite language X $\subseteq$ A * , a $\tau$-Gray cycle over X consists in a permutation w [i] 0$\le$i$\le$|X|--1 of X such that each word w [i] is an image under $\tau$ of the previous word w [i--1]. We define the complexity measure $\lambda$A,$\tau$ (n), equal to the largest cardinality of a language X having words of length at most n, and st. some $\tau$-Gray cycle over X exists. The present paper is concerned with $\tau$ = $\sigma$ k , the so-called k-character substitution, st. (u, v) $\in$ $\sigma$ k holds if, and only if, the Hamming distance of u and v is k. We present loopless (resp., constant amortized time) algorithms for computing specific maximum length $\sigma$ k-Gray cycles.

相關內容

置換

關注 0

Prompt · 網絡爬蟲 · Performer · 情景 · 開發集 ·

2023 年 10 月 27 日

Exploring Chain-of-Thought Style Prompting for Text-to-SQL

Chang-You Tai,Ziru Chen,Tianshu Zhang,Xiang Deng,Huan Sun

from arxiv, EMNLP 2023 main; long paper

In-context learning with large language models (LLMs) has recently caught increasing attention due to its superior few-shot performance on various tasks. However, its performance on text-to-SQL parsing still has much room for improvement. In this paper, we hypothesize that a crucial aspect of LLMs to improve for text-to-SQL parsing is their multi-step reasoning ability. Thus, we systematically study how to enhance LLMs' reasoning ability through chain of thought (CoT) style prompting, including the original chain-of-thought prompting (Wei et al., 2022b) and least-to-most prompting (Zhou et al., 2023). Our experiments demonstrate that iterative prompting as in Zhou et al. (2023) may be unnecessary for text-to-SQL parsing, and using detailed reasoning steps tends to have more error propagation issues. Based on these findings, we propose a new CoT-style prompting method for text-to-SQL parsing. It brings 5.2 and 6.5 point absolute gains on the Spider development set and the Spider Realistic set, respectively, compared to the standard prompting method without reasoning steps; 2.4 and 1.5 point absolute gains, compared to the least-to-most prompting method.

數據增強 · 數據集 · 可辨認的 · Taxonomy · MoDELS ·

2023 年 10 月 27 日

Data Augmentation for Emotion Detection in Small Imbalanced Text Data

Anna Koufakou,Diego Grisales,Ragy Costa de jesus,Oscar Fox

from arxiv, To be published in the Proceedings of IEEE International Conference on Machine Learning Applications IEEE (ICMLA 2023)

Emotion recognition in text, the task of identifying emotions such as joy or anger, is a challenging problem in NLP with many applications. One of the challenges is the shortage of available datasets that have been annotated with emotions. Certain existing datasets are small, follow different emotion taxonomies and display imbalance in their emotion distribution. In this work, we studied the impact of data augmentation techniques precisely when applied to small imbalanced datasets, for which current state-of-the-art models (such as RoBERTa) under-perform. Specifically, we utilized four data augmentation methods (Easy Data Augmentation EDA, static and contextual Embedding-based, and ProtAugment) on three datasets that come from different sources and vary in size, emotion categories and distributions. Our experimental results show that using the augmented data when training the classifier model leads to significant improvements. Finally, we conducted two case studies: a) directly using the popular chat-GPT API to paraphrase text using different prompts, and b) using external data to augment the training set. Results show the promising potential of these methods.

UniFormer · SimPLe · 線性的 · 值域 · 無限 ·

2023 年 10 月 26 日

Symmetric Exponential Time Requires Near-Maximum Circuit Size: Simplified, Truly Uniform

Zeyong Li

In a recent breakthrough, Chen, Hirahara and Ren prove that $\mathsf{S_2E}/_1 \not\subset \mathsf{SIZE}[2^n/n]$ by giving a single-valued $\mathsf{FS_2P}$ algorithm for the Range Avoidance Problem ($\mathsf{Avoid}$) that works for infinitely many input size $n$. Building on their work, we present a simple single-valued $\mathsf{FS_2P}$ algorithm for $\mathsf{Avoid}$ that works for all input size $n$. As a result, we obtain the circuit lower bound $\mathsf{S_2E} \not\subset \mathsf{SIZE}[2^n/n]$ and many other corollaries: 1. Near-maximum circuit lower bound for $\mathsf{\Sigma_2E} \cap \mathsf{\Pi_2E}$ and $\mathsf{ZPE}^{\mathsf{NP}}$. 2. Pseudodeterministic $\mathsf{FZPP}^{\mathsf{NP}}$ constructions for: Ramsey graphs, rigid matrices, pseudorandom generators, two-source extractors, linear codes, hard truth tables, and $K^{poly}$-random strings.

Analysis · 泛函 · JavaScript · Extensibility · Performer ·

2023 年 10 月 26 日

Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection

Yifan Xia,Ping He,Xuhong Zhang,Peiyu Liu,Shouling Ji,Wenhai Wang

from arxiv, Accepted to ESORICS 2023

The emergence of WebAssembly allows attackers to hide the malicious functionalities of JavaScript malware in cross-language interoperations, termed JavaScript-WebAssembly multilingual malware (JWMM). However, existing anti-virus solutions based on static program analysis are still limited to monolingual code. As a result, their detection effectiveness decreases significantly against JWMM. The detection of JWMM is challenging due to the complex interoperations and semantic diversity between JavaScript and WebAssembly. To bridge this gap, we present JWBinder, the first technique aimed at enhancing the static detection of JWMM. JWBinder performs a language-specific data-flow analysis to capture the cross-language interoperations and then characterizes the functionalities of JWMM through a unified high-level structure called Inter-language Program Dependency Graph. The extensive evaluation on one of the most representative real-world anti-virus platforms, VirusTotal, shows that \system effectively enhances anti-virus systems from various vendors and increases the overall successful detection rate against JWMM from 49.1\% to 86.2\%. Additionally, we assess the side effects and runtime overhead of JWBinder, corroborating its practical viability in real-world applications.

ACL · 穩健性 · INFORMS · contrastive · Learning ·

2023 年 10 月 26 日

Efficient Adversarial Contrastive Learning via Robustness-Aware Coreset Selection

Xilie Xu,Jingfeng Zhang,Feng Liu,Masashi Sugiyama,Mohan Kankanhalli

from arxiv, NeurIPS 2023 Spotlight

Adversarial contrastive learning (ACL) does not require expensive data annotations but outputs a robust representation that withstands adversarial attacks and also generalizes to a wide range of downstream tasks. However, ACL needs tremendous running time to generate the adversarial variants of all training data, which limits its scalability to large datasets. To speed up ACL, this paper proposes a robustness-aware coreset selection (RCS) method. RCS does not require label information and searches for an informative subset that minimizes a representational divergence, which is the distance of the representation between natural data and their virtual adversarial variants. The vanilla solution of RCS via traversing all possible subsets is computationally prohibitive. Therefore, we theoretically transform RCS into a surrogate problem of submodular maximization, of which the greedy search is an efficient solution with an optimality guarantee for the original problem. Empirically, our comprehensive results corroborate that RCS can speed up ACL by a large margin without significantly hurting the robustness transferability. Notably, to the best of our knowledge, we are the first to conduct ACL efficiently on the large-scale ImageNet-1K dataset to obtain an effective robust representation via RCS. Our source code is at //github.com/GodXuxilie/Efficient_ACL_via_RCS.

CASES · 泛化理論 · 最優化 · 周期的 · 表示 ·

2023 年 10 月 25 日

Faster Integer Points Counting in Parametric Polyhedra

D. Gribanov,D. Malyshev,P. Pardalos

In this paper, we consider the counting function $E_P(y) = |P_{y} \cap Z^{n_x}|$ for a parametric polyhedron $P_{y} = \{x \in R^{n_x} \colon A x \leq b + B y\}$, where $y \in R^{n_y}$. We give a new representation of $E_P(y)$, called a \emph{piece-wise step-polynomial with periodic coefficients}, which is a generalization of piece-wise step-polynomials and integer/rational Ehrhart's quasi-polynomials. It gives the fastest way to calculate $E_P(y)$ in certain scenarios. The most important cases are the following: 1) We show that, for any $y \in Q^k$ and parametric polyhedron $P_y$, defined by a standard-form system $A x = y,\, x \geq 0$ with a fixed number of equalities, $E_P(y)$ can be computed by a $poly\bigl(n, \|A\|_{\infty}\bigr)$-time algorithm; 2) Assuming that the co-dimension is fixed, we show that integer/rational Ehrhart's quasi-polynomials of a polytope can be computed by FPT-algorithms, parameterized by sub-determinants of $A$ or its elements; 3) Our representation of $E_P$ is more efficient than other known approaches, if $A$ has bounded elements, especially if it is sparse in addition. Additionally, we provide a discussion about possible applications in the area of compiler optimization. In some "natural" assumptions on a program code, our approach has the fastest complexity bounds.

語言模型化 · Performer · Agent · MoDELS · Learning ·

2023 年 5 月 19 日

Introspective Tips: Large Language Model for In-Context Decision Making

Liting Chen,Lu Wang,Hang Dong,Yali Du,Jie Yan,Fangkai Yang,Shuang Li,Pu Zhao,Si Qin,Saravan Rajmohan,Qingwei Lin,Dongmei Zhang

from arxiv, 22 pages, 4 figures

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.

命名實體識別 · entity · 學成 · 深度學習 · 可辨認的 ·

2020 年 3 月 13 日

A Survey on Deep Learning for Named Entity Recognition

Jing Li,Aixin Sun,Jianglei Han,Chenliang Li

from arxiv, 20 pages, 12 figures, 3 tables. arXiv admin note: text overlap with arXiv:1702.02098, arXiv:1904.10503 by other authors

Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.

同義詞集 · 知識庫 · 基 · INFORMS · 極小點 ·

2019 年 12 月 4 日

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

Fanchao Qi,Liang Chang,Maosong Sun,Sicong Ouyang,Zhiyuan Liu

from arxiv, Accepted by AAAI Conference on Artificial Intelligence 2020 for oral presentation

A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a unified sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over $15$ thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to explore important factors and difficulties in the task. All the source code and data of this work can be obtained on //github.com/thunlp/BabelNet-Sememe-Prediction.

INTERACT · 峰值 · FAST · 圖卷積神經網絡/圖卷積網絡 · MoDELS ·

2019 年 3 月 16 日

Fast Interactive Object Annotation with Curve-GCN

Huan Ling,Jun Gao,Amlan Kar,Wenzheng Chen,Sanja Fidler

from arxiv, In Computer Vision and Pattern Recognition (CVPR), Long Beach, US, 2019

Manually labeling objects by tracing their boundaries is a laborious process. In Polygon-RNN++ the authors proposed Polygon-RNN that produces polygonal annotations in a recurrent manner using a CNN-RNN architecture, allowing interactive correction via humans-in-the-loop. We propose a new framework that alleviates the sequential nature of Polygon-RNN, by predicting all vertices simultaneously using a Graph Convolutional Network (GCN). Our model is trained end-to-end. It supports object annotation by either polygons or splines, facilitating labeling efficiency for both line-based and curved objects. We show that Curve-GCN outperforms all existing approaches in automatic mode, including the powerful PSP-DeepLab and is significantly more efficient in interactive mode than Polygon-RNN++. Our model runs at 29.3ms in automatic, and 2.6ms in interactive mode, making it 10x and 100x faster than Polygon-RNN++.