高清国产三级在线播放_全部免费毛片AV_国产成人成人一区二区_欧美特级黑人大长粗色色色_免费人妻AV无码专区无码专区_亚洲黄色成人黄色网站_日本免费无遮挡吸乳视频网站

MoDELS · 小樣本學習 · 目標領域 · Prompt · Learning ·

2023 年 7 月 12 日

Prompt Generate Train (PGT): A framework for few-shot domain adaptation, alignment, and uncertainty calibration of a retriever augmented generation (RAG) model for domain specific open book question-answering

C. S. Krishna

from arxiv, 10

We present a framework - Prompt, Generate, Train (PGT) - to efficiently develop a generative question-answering model for open-book question-answering over a proprietary collection of text documents. The framework adapts a retriever augmented generation model to the target domain using supervised finetuning and reinforcement learning with synthetic feedback in a few-shot setting. This yields an aligned, uncertainty calibrated model that is competitive with GPT-4 based in-context retrieval augmented generation in generating relevant answers at lower serving costs. The synthetic generation pipeline generates high quality synthetic training data musing a medium sized LLM, Flan-T5 XXL, and a novel consistency filtering scheme. The pipeline is designed to generate both abstractive and extractive questions that span the entire corpus. Using samples from this dataset, the framework fine-tunes a smaller RAG model comprising a dense retriever and a smaller sized LLM on samples from the dataset. In parallel, the framework trains a Reward model to score domain grounded answers higher than hallucinated answers. In the next phase, the framework aligns to the RAG model with the target domain using reinforcement learning. This step improves the RAG model's ability to generate grounded answers and ignore out of domain questions. In the final phase, the framework calibrates the model uncertainty for extractive question-answers. This is a desirable feature since the model can be integrated into a cascading system where the RAG model's answer is surfaced only when the model is confident of its answer.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · Performer · 3D · Attention · 變換 ·

2023 年 9 月 1 日

SSD-MonoDETR: Supervised Scale-aware Deformable Transformer for Monocular 3D Object Detection

Xuan He,Fan Yang,Kailun Yang,Jiacheng Lin,Haolong Fu,Meng Wang,Jin Yuan,Zhiyong Li

from arxiv, Accepted to IEEE Transactions on Intelligent Vehicles (T-IV). Code will be made publicly available at //github.com/mikasa3lili/SSD-MonoDETR

Transformer-based methods have demonstrated superior performance for monocular 3D object detection recently, which aims at predicting 3D attributes from a single 2D image. Most existing transformer-based methods leverage both visual and depth representations to explore valuable query points on objects, and the quality of the learned query points has a great impact on detection accuracy. Unfortunately, existing unsupervised attention mechanisms in transformers are prone to generate low-quality query features due to inaccurate receptive fields, especially on hard objects. To tackle this problem, this paper proposes a novel "Supervised Scale-aware Deformable Attention" (SSDA) for monocular 3D object detection. Specifically, SSDA presets several masks with different scales and utilizes depth and visual features to adaptively learn a scale-aware filter for object query augmentation. Imposing the scale awareness, SSDA could well predict the accurate receptive field of an object query to support robust query feature generation. Aside from this, SSDA is assigned with a Weighted Scale Matching (WSM) loss to supervise scale prediction, which presents more confident results as compared to the unsupervised attention mechanisms. Extensive experiments on the KITTI and Waymo Open datasets demonstrate that SSDA significantly improves the detection accuracy, especially on moderate and hard objects, yielding state-of-the-art performance as compared to the existing approaches. Our code will be made publicly available at //github.com/mikasa3lili/SSD-MonoDETR.

方陣 · Learning · Neural Networks · Integration · 機器人 ·

2023 年 9 月 1 日

Spiking based Cellular Learning Automata (SCLA) algorithm for mobile robot motion formulation

Vahid Pashaei Rad,Vahid Azimi Rad,Saleh Valizadeh Sotubadi

In this paper a new method called SCLA which stands for Spiking based Cellular Learning Automata is proposed for a mobile robot to get to the target from any random initial point. The proposed method is a result of the integration of both cellular automata and spiking neural networks. The environment consists of multiple squares of the same size and the robot only observes the neighboring squares of its current square. It should be stated that the robot only moves either up and down or right and left. The environment returns feedback to the learning automata to optimize its decision making in the next steps resulting in cellular automata training. Simultaneously a spiking neural network is trained to implement long term improvements and reductions on the paths. The results show that the integration of both cellular automata and spiking neural network ends up in reinforcing the proper paths and training time reduction at the same time.

cache · 層 · Performer · Better · Docker ·

2023 年 8 月 31 日

Charliecloud's layer-free, Git-based container build cache

Reid Priedhorsky,Jordan Ogas,Claude H.,Davis IV,Z. Noah Hounshel,Ashlyn Lee,Benjamin Stormer,R. Shane Goff

from arxiv, 12 pages, 12 figures

A popular approach to deploying scientific applications in high performance computing (HPC) is Linux containers, which package an application and all its dependencies as a single unit. This image is built by interpreting instructions in a machine-readable recipe, which is faster with a build cache that stores instruction results for re-use. The standard approach (used e.g. by Docker and Podman) is a many-layered union filesystem, encoding differences between layers as tar archives. Our experiments show this performs similarly to layered caches on both build time and disk usage, with a considerable advantage for many-instruction recipes. Our approach also has structural advantages: better diff format, lower cache overhead, and better file de-duplication. These results show that a Git-based cache for layer-free container implementations is not only possible but may outperform the layered approach on important dimensions.

Python · Learning · Processing（編程語言） · Machine Learning · 在線 ·

2023 年 8 月 31 日

TurboGP: A flexible and advanced python based GP library

Lino Rodriguez-Coayahuitl,Alicia Morales-Reyes,Hugo Jair Escalante

We introduce TurboGP, a Genetic Programming (GP) library fully written in Python and specifically designed for machine learning tasks. TurboGP implements modern features not available in other GP implementations, such as island and cellular population schemes, different types of genetic operations (migration, protected crossovers), online learning, among other features. TurboGP's most distinctive characteristic is its native support for different types of GP nodes to allow different abstraction levels, this makes TurboGP particularly useful for processing a wide variety of data sources.

模態 · 原點 · CASE · AIM · SimPLe ·

2023 年 8 月 31 日

Combining swap structures: the case of Paradefinite Ivlev-like modal logics based on FDE

Marcelo E. Coniglio

from arxiv, 32 pages

The aim of this paper is to combine several Ivev-like modal systems characterized by 4-valued non-deterministic matrices (Nmatrices) with IDM4, a 4-valued expansion of Belnap-Dunn's logic FDE with an implication introduced by Pynko in 1999. In order to to this, we introduce a new methodology for combining logics which are characterized by means of swap structures, based on what we call superposition of snapshots. In particular, the combination of IDM4 with Tm, the 4-valued Ivlev's version of KT, will be analyzed with more details. From the semantical perspective, the idea is to combine the 4-valued swap structures (Nmatrices) for Tm (and several of its extensions) with the 4-valued twist structure (logical matrix) for IDM4. This superposition produces a universe of 6 snapshots, with 3 of them being designated. The multioperators over the new universe are defined by combining the specifications of the given swap and twist structures. This gives origin to 6 different paradefinite Ivlev-like modal logics, each one of them characterized by a 6-valued Nmatrix, and conservatively extending the original modal logic and IDM4. This important feature allows us to consider the proposed construction as a genuine technique for combining logics. In addition, it is possible to define in the combined logics a classicality operator in the sense of logics of evidence and truth (LETs). A sound and complete Hilbert-style axiomatization is also presented for the 6 combined systems, as well as a very simple Prolog program which implements the swap structures semantics for the 6 systems, which gives a decision procedure for satisfiability, refutability and validity of formulas in these logics.

貝葉斯推斷 · 近似 · 對數似然 · 推斷 · MoDELS ·

2023 年 8 月 31 日

Approximate Bayesian inference from noisy likelihoods with Gaussian process emulated MCMC

Marko J?rvenp??,Jukka Corander

from arxiv, Major revision: Improved writing, some content was reorganised, more consistent notation, some new theoretical insights, extended numerical experiments, redesigned presentation of the numerical results. 55 pages, 20 figures

We present a framework for approximate Bayesian inference when only a limited number of noisy log-likelihood evaluations can be obtained due to computational constraints, which is becoming increasingly common for applications of complex models. We model the log-likelihood function using a Gaussian process (GP) and the main methodological innovation is to apply this model to emulate the progression that an exact Metropolis-Hastings (MH) sampler would take if it was applicable. Informative log-likelihood evaluation locations are selected using a sequential experimental design strategy until the MH accept/reject decision is done accurately enough according to the GP model. The resulting approximate sampler is conceptually simple and sample-efficient. It is also more robust to violations of GP modelling assumptions compared with earlier, related "Bayesian optimisation-like" methods tailored for Bayesian inference. We discuss some theoretical aspects and various interpretations of the resulting approximate MH sampler, and demonstrate its benefits in the context of Bayesian and generalised Bayesian likelihood-free inference for simulator-based statistical models.

SQL · Prompt · 相似度 · MoDELS · 樣例 ·

2023 年 8 月 31 日

Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval

Chunxi Guo,Zhiliang Tian,Jintao Tang,Pancheng Wang,Zhihua Wen,Kang Yang,Ting Wang

Text-to-SQL is a task that converts a natural language question into a structured query language (SQL) to retrieve information from a database. Large language models (LLMs) work well in natural language generation tasks, but they are not specifically pre-trained to understand the syntax and semantics of SQL commands. In this paper, we propose an LLM-based framework for Text-to-SQL which retrieves helpful demonstration examples to prompt LLMs. However, questions with different database schemes can vary widely, even if the intentions behind them are similar and the corresponding SQL queries exhibit similarities. Consequently, it becomes crucial to identify the appropriate SQL demonstrations that align with our requirements. We design a de-semanticization mechanism that extracts question skeletons, allowing us to retrieve similar examples based on their structural similarity. We also model the relationships between question tokens and database schema items (i.e., tables and columns) to filter out scheme-related information. Our framework adapts the range of the database schema in prompts to balance length and valuable information. A fallback mechanism allows for a more detailed schema to be provided if the generated SQL query fails. Ours outperforms state-of-the-art models and demonstrates strong generalization ability on three cross-domain Text-to-SQL benchmarks.

Networking · 求逆 · 規范化的 · 可約的 · MoDELS ·

2023 年 8 月 31 日

Invertible normalizing flow neural networks by JKO scheme

Chen Xu,Xiuyuan Cheng,Yao Xie

Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks without sampling SDE trajectories or inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one by one, reducing the memory load and difficulty in performing end-to-end deep flow network training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with the existing flow and diffusion models at a significantly reduced computational and memory cost.

Performer · Prompt · tuning · MoDELS · 圖片分類 ·

2023 年 8 月 31 日

Federated Adaptive Prompt Tuning for Multi-domain Collaborative Learning

Shangchao Su,Mingzhao Yang,Bin Li,Xiangyang Xue

Federated learning (FL) enables multiple clients to collaboratively train a global model without disclosing their data. Previous researches often require training the complete model parameters. However, the emergence of powerful pre-trained models makes it possible to achieve higher performance with fewer learnable parameters in FL. In this paper, we propose a federated adaptive prompt tuning algorithm, FedAPT, for multi-domain collaborative image classification with powerful foundation models, like CLIP. Compared with direct federated prompt tuning, our core idea is to adaptively unlock specific domain knowledge for each test sample in order to provide them with personalized prompts. To implement this idea, we design an adaptive prompt tuning module, which consists of a meta prompt, an adaptive network, and some keys. The server randomly generates a set of keys and assigns a unique key to each client. Then all clients cooperatively train the global adaptive network and meta prompt with the local datasets and the frozen keys. Ultimately, the global aggregation model can assign a personalized prompt to CLIP based on the domain features of each test sample. We perform extensive experiments on two multi-domain image classification datasets across two different settings - supervised and unsupervised. The results show that FedAPT can achieve better performance with less than 10\% of the number of parameters of the fully trained model, and the global model can perform well in diverse client domains simultaneously.

圖片分類 · 前饋網絡 · INTERACT · Networking · 前饋 ·

2021 年 5 月 7 日

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron,Piotr Bojanowski,Mathilde Caron,Matthieu Cord,Alaaeldin El-Nouby,Edouard Grave,Armand Joulin,Gabriel Synnaeve,Jakob Verbeek,Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.