欧美综合一本热第九页-欧美日韩一区二区视频在线观看

Lately, instruction-based techniques have made significant strides in improving performance in few-shot learning scenarios. They achieve this by bridging the gap between pre-trained language models and fine-tuning for specific downstream tasks. Despite these advancements, the performance of Large Language Models (LLMs) in information extraction tasks like Named Entity Recognition (NER), using prompts or instructions, still falls short of supervised baselines. The reason for this performance gap can be attributed to the fundamental disparity between NER and LLMs. NER is inherently a sequence labeling task, where the model must assign entity-type labels to individual tokens within a sentence. In contrast, LLMs are designed as a text generation task. This distinction between semantic labeling and text generation leads to subpar performance. In this paper, we transform the NER task into a text-generation task that can be readily adapted by LLMs. This involves enhancing source sentences with task-specific instructions and answer choices, allowing for the identification of entities and their types within natural language. We harness the strength of LLMs by integrating supervised learning within them. The goal of this combined strategy is to boost the performance of LLMs in extraction tasks like NER while simultaneously addressing hallucination issues often observed in LLM-generated content. A novel corpus Contract NER comprising seven frequently observed contract categories, encompassing named entities associated with 18 distinct legal entity types is released along with our baseline models. Our models and dataset are available to the community for future research * .

知識薈萃

精品入門和進階教程、論文和代碼整理等

查看相關VIP內容、論文、資訊等

MoDELS · Minimax · 有向模型 · 回火 · Performer ·

2024 年 3 月 6 日

Heavy-tailed Bayesian nonparametric adaptation

Sergios Agapiou,Isma?l Castillo

We propose a new Bayesian strategy for adaptation to smoothness in nonparametric models based on heavy tailed series priors. We illustrate it in a variety of settings, showing in particular that the corresponding Bayesian posterior distributions achieve adaptive rates of contraction in the minimax sense (up to logarithmic factors) without the need to sample hyperparameters. Unlike many existing procedures, where a form of direct model (or estimator) selection is performed, the method can be seen as performing a soft selection through the prior tail. In Gaussian regression, such heavy tailed priors are shown to lead to (near-)optimal simultaneous adaptation both in the $L^2$- and $L^\infty$-sense. Results are also derived for linear inverse problems, for anisotropic Besov classes, and for certain losses in more general models through the use of tempered posterior distributions. We present numerical simulations corroborating the theory.

變換 · Performer · Metamaterial · 設計 · GROUP ·

2024 年 3 月 6 日

Deployable polyhedrons with one-DOF radial transformation

Yuanqing Gu,Yan Chen

Deployable polyhedrons can transform between Platonic and Archimedean polyhedrons to meet the demands of various engineering applications. However, the existing design solutions are often with multiple degrees of freedom and complicated mechanism links and joints, which greatly limited their potential in practice. Combining the fundamentals of solid geometry and mechanism kinematics, this paper proposes a family of kirigami Archimedean polyhedrons based on the N-fold-symmetric loops of spatial 7R linkage, which perform one-DOF radial transformation following tetrahedral, octahedral, or icosahedral symmetry. Moreover, in each symmetric polyhedral group, three different transforming paths can be achieved from one identical deployed configuration. We also demonstrated that such design strategy can be readily applied to polyhedral tessellation. This work provides a family of rich solutions for deployable polyhedrons to facilitate their applications in aerospace exploration, architecture, metamaterials and so on.

Better · 語音識別 · 自動語音識別 · 縮放 · MoDELS ·

2024 年 3 月 5 日

Zipformer: A faster and better encoder for automatic speech recognition

Zengwei Yao,Liyong Guo,Xiaoyu Yang,Wei Kang,Fangjun Kuang,Yifan Yang,Zengrui Jin,Long Lin,Daniel Povey

from arxiv, Published as a conference paper at ICLR 2024

The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame rates; 2) reorganized block structure with more modules, within which we re-use attention weights for efficiency; 3) a modified form of LayerNorm called BiasNorm allows us to retain some length information; 4) new activation functions SwooshR and SwooshL work better than Swish. We also propose a new optimizer, called ScaledAdam, which scales the update by each tensor's current scale to keep the relative change about the same, and also explictly learns the parameter scale. It achieves faster convergence and better performance than Adam. Extensive experiments on LibriSpeech, Aishell-1, and WenetSpeech datasets demonstrate the effectiveness of our proposed Zipformer over other state-of-the-art ASR models. Our code is publicly available at //github.com/k2-fsa/icefall.

Nuance · 學習器 · 小樣本學習 · 類別 · 歸納偏好 ·

2024 年 3 月 5 日

Few-shot Learner Parameterization by Diffusion Time-steps

Zhongqi Yue,Pan Zhou,Richang Hong,Hanwang Zhang,Qianru Sun

from arxiv, Accepted by CVPR 2024

Even when using large multi-modal foundation models, few-shot learning is still challenging -- if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i.e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent. Building on this, we propose Time-step Few-shot (TiF) learner. We train class-specific low-rank adapters for a text-conditioned DM to make up for the lost attributes, such that images can be accurately reconstructed from their noisy ones given a prompt. Hence, at a small time-step, the adapter and prompt are essentially a parameterization of only the nuanced class attributes. For a test image, we can use the parameterization to only extract the nuanced class attributes for classification. TiF learner significantly outperforms OpenCLIP and its adapters on a variety of fine-grained and customized few-shot learning tasks. Codes are in //github.com/yue-zhongqi/tif.

Performer · GPUs · 中央處理器 (CPU) · 樣例 · 前向 ·

2024 年 3 月 4 日

Hybrid quantum programming with PennyLane Lightning on HPC platforms

Ali Asadi,Amintor Dusko,Chae-Yeun Park,Vincent Michaud-Rioux,Isidor Schoch,Shuli Shu,Trevor Vincent,Lee James O'Riordan

from arxiv, For all data and workloads, see //github.com/PennyLaneAI/lightning-on-hpc

We introduce PennyLane's Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures and showcase the scale of problems that can be simulated using our tooling. We benchmark the performance of Lightning with backends supporting CPUs, as well as NVidia and AMD GPUs, and compare the results to other commonly used high-performance simulator packages, demonstrating where Lightning's implementations give performance leads. We show improved CPU performance by employing explicit SIMD intrinsics and multi-threading, batched task-based execution across multiple GPUs, and distributed forward and gradient-based quantum circuit executions across multiple nodes. Our data shows we can comfortably simulate a variety of circuits, giving examples with up to 30 qubits on a single device or node, and up to 41 qubits using multiple nodes.

暫退法 · 集成 · Networking · MoDELS · SOUPS ·

2024 年 3 月 1 日

Fine-tuning with Very Large Dropout

Jianyu Zhang,Léon Bottou

from arxiv, 13 pages

It is impossible today to pretend that the practice of machine learning is compatible with the idea that training and testing data follow the same distribution. Several authors have recently used ensemble techniques to show how scenarios involving multiple data distributions are best served by representations that are both richer than those obtained by regularizing for the best in-distribution performance, and richer than those obtained under the influence of the implicit sparsity bias of common stochastic gradient procedures. This contribution investigates the use of very high dropout rates instead of ensembles to obtain such rich representations. Although training a deep network from scratch using such dropout rates is virtually impossible, fine-tuning a large pre-trained model under such conditions is not only possible but also achieves out-of-distribution performances that exceed those of both ensembles and weight averaging methods such as model soups. This result has practical significance because the importance of the fine-tuning scenario has considerably grown in recent years. This result also provides interesting insights on the nature of rich representations and on the intrinsically linear nature of fine-tuning a large network using a comparatively small dataset.

MoDELS · Analysis · 估計/估計量 · 統計量 · 數學 ·

2024 年 3 月 1 日

The `Why' behind including `Y' in your imputation model

Lucy D'Agostino McGowan,Sarah C. Lotspeich,Staci A. Hepler

Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. Here, we investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommendation always holds and why this is sometimes true. We examine deterministic imputation (i.e., single imputation with fixed values) and stochastic imputation (i.e., single or multiple imputation with random values) methods and their implications for estimating the relationship between the imputed covariate and the outcome. We mathematically demonstrate that including the outcome variable in imputation models is not just a recommendation but a requirement to achieve unbiased results when using stochastic imputation methods. Moreover, we dispel common misconceptions about deterministic imputation models and demonstrate why the outcome should not be included in these models. This paper aims to bridge the gap between imputation in theory and in practice, providing mathematical derivations to explain common statistical recommendations. We offer a better understanding of the considerations involved in imputing missing covariates and emphasize when it is necessary to include the outcome variable in the imputation model.

MoDELS · Processing（編程語言） · Vision · Continuity · HTTPS ·

2023 年 2 月 20 日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Xiao Wang,Guangyao Chen,Guangwu Qian,Pengcheng Gao,Xiao-Yong Wei,Yaowei Wang,Yonghong Tian,Wen Gao

from arxiv, Accepted by Machine Intelligence Research

With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as BERT, ViT, GPT, etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network architectures, and knowledge enhanced pre-training. After that, we introduce the downstream tasks used for the validation of large-scale MM-PTMs, including generative, classification, and regression tasks. We also give visualization and analysis of the model parameters and results on representative downstream tasks. Finally, we point out possible research directions for this topic that may benefit future works. In addition, we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models: //github.com/wangxiao5791509/MultiModal_BigModels_Survey

MoDELS · Better · Vision · Processing（編程語言） · 自然語言處理 ·

2022 年 2 月 21 日

VLP: A Survey on Vision-Language Pre-training

Feilong Chen,Duzhen Zhang,Minglun Han,Xiuyi Chen,Jing Shi,Shuang Xu,Bo Xu

from arxiv, A Survey on Vision-Language Pre-training

In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances from five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey on VLP. We hope that this survey can shed light on future research in the VLP field.

Performer · Machine Learning · 模型性能 · MoDELS · Processing（編程語言） ·

2021 年 8 月 2 日

A Survey of Human-in-the-loop for Machine Learning

Xingjiao Wu,Luwei Xiao,Yixuan Sun,Junhang Zhang,Tianlong Ma,Liang He

Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.