在线亚洲91SE亚洲综合在线,日韩一区二区三区免费在线观看

Retrieving answers in a quick and low cost manner without hallucinations from a combination of structured and unstructured data using Language models is a major hurdle. This is what prevents employment of Language models in knowledge retrieval automation. This becomes accentuated when one wants to integrate a speech interface on top of a text based knowledge retrieval system. Besides, for commercial search and chat-bot applications, complete reliance on commercial large language models (LLMs) like GPT 3.5 etc. can be very costly. In the present study, the authors have addressed the aforementioned problem by first developing a keyword based search framework which augments discovery of the context from the document to be provided to the LLM. The keywords in turn are generated by a relatively smaller LLM and cached for comparison with keywords generated by the same smaller LLM against the query raised. This significantly reduces time and cost to find the context within documents. Once the context is set, a larger LLM uses that to provide answers based on a prompt tailored for Q\&A. This research work demonstrates that use of keywords in context identification reduces the overall inference time and cost of information retrieval. Given this reduction in inference time and cost with the keyword augmented retrieval framework, a speech based interface for user input and response readout was integrated. This allowed a seamless interaction with the language model.

相關內容

語言模型化

關注 9

基 · SimPLe · 平方損失 · 類標記 · Continuity ·

2023 年 12 月 15 日

Stochastic interpolants with data-dependent couplings

Michael S. Albergo,Mark Goldstein,Nicholas M. Boffi,Rajesh Ranganath,Eric Vanden-Eijnden

Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities, whereby samples from the base are computed conditionally given samples from the target in a way that is different from (but does preclude) incorporating information about class labels or continuous embeddings. This enables us to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.

Extensibility · 查準率/準確率 · BASIC · 有向 · 論文 ·

2023 年 12 月 15 日

Proofs as stateful programs: A first-order logic with abstract Hoare triples, and an interpretation into an imperative language

Thomas Powell

from arxiv, 32 pages

We introduce an extension of first-order logic that comes equipped with additional predicates for reasoning about an abstract state. Sequents in the logic comprise a main formula together with pre- and postconditions in the style of Hoare logic, and the axioms and rules of the logic ensure that the assertions about the state compose in the correct way. The main result of the paper is a realizability interpretation of our logic that extracts programs into a mixed functional/imperative language. All programs expressible in this language act on the state in a sequential manner, and we make this intuition precise by interpreting them in a semantic metatheory using the state monad. Our basic framework is very general, and our intention is that it can be instantiated and extended in a variety of different ways. We outline in detail one such extension: A monadic version of Heyting arithmetic with a wellfounded while rule, and conclude by outlining several other directions for future work.

推斷 · MoDELS · Markovian · 平穩分布 · 參數化模型 ·

2023 年 12 月 15 日

Inferring Causality from Time Series data based on Structural Causal Model and its application to Neural Connectomics

Rahul Biswas,SuryaNarayana Sripada,Somabha Mukherjee

Inferring causation from time series data is of scientific interest in different disciplines, particularly in neural connectomics. While different approaches exist in the literature with parametric modeling assumptions, we focus on a non-parametric model for time series satisfying a Markovian structural causal model with stationary distribution and without concurrent effects. We show that the model structure can be used to its advantage to obtain an elegant algorithm for causal inference from time series based on conditional dependence tests, coined Causal Inference in Time Series (CITS) algorithm. We describe Pearson's partial correlation and Hilbert-Schmidt criterion as candidates for such conditional dependence tests that can be used in CITS for the Gaussian and non-Gaussian settings, respectively. We prove the mathematical guarantee of the CITS algorithm in recovering the true causal graph, under standard mixing conditions on the underlying time series. We also conduct a comparative evaluation of performance of CITS with other existing methodologies in simulated datasets. We then describe the utlity of the methodology in neural connectomics -- in inferring causal functional connectivity from time series of neural activity, and demonstrate its application to a real neurobiological dataset of electro-physiological recordings from the mouse visual cortex recorded by Neuropixel probes.

Performer · MoDELS · GPT-3 · 數據集 · MINE ·

2023 年 12 月 15 日

Is ChatGPT a game changer for geocoding -- a benchmark for geocoding address parsing techniques

Zhengcong Yin,Diya Li,Daniel W. Goldberg

The remarkable success of GPT models across various tasks, including toponymy recognition motivates us to assess the performance of the GPT-3 model in the geocoding address parsing task. To ensure that the evaluation more accurately mirrors performance in real-world scenarios with diverse user input qualities and resolve the pressing need for a 'gold standard' evaluation dataset for geocoding systems, we introduce a benchmark dataset of low-quality address descriptions synthesized based on human input patterns mining from actual input logs of a geocoding system in production. This dataset has 21 different input errors and variations; contains over 239,000 address records that are uniquely selected from streets across all U.S. 50 states and D.C.; and consists of three subsets to be used as training, validation, and testing sets. Building on this, we train and gauge the performance of the GPT-3 model in extracting address components, contrasting its performance with transformer-based and LSTM-based models. The evaluation results indicate that Bidirectional LSTM-CRF model has achieved the best performance over these transformer-based models and GPT-3 model. Transformer-based models demonstrate very comparable results compared to the Bidirectional LSTM-CRF model. The GPT-3 model, though trailing in performance, showcases potential in the address parsing task with few-shot examples, exhibiting room for improvement with additional fine-tuning. We open source the code and data of this presented benchmark so that researchers can utilize it for future model development or extend it to evaluate similar tasks, such as document geocoding.

CRAFT · Obvious · MoDELS · Notability · SimPLe ·

2023 年 12 月 15 日

Effective and Imperceptible Adversarial Textual Attack via Multi-objectivization

Shengcai Liu,Ning Lu,Wenjing Hong,Chao Qian,Ke Tang

The field of adversarial textual attack has significantly grown over the last few years, where the commonly considered objective is to craft adversarial examples (AEs) that can successfully fool the target model. However, the imperceptibility of attacks, which is also essential for practical attackers, is often left out by previous studies. In consequence, the crafted AEs tend to have obvious structural and semantic differences from the original human-written text, making them easily perceptible. In this work, we advocate leveraging multi-objectivization to address such issue. Specifically, we reformulate the problem of crafting AEs as a multi-objective optimization problem, where the attack imperceptibility is considered as an auxiliary objective. Then, we propose a simple yet effective evolutionary algorithm, dubbed HydraText, to solve this problem. To the best of our knowledge, HydraText is currently the only approach that can be effectively applied to both score-based and decision-based attack settings. Exhaustive experiments involving 44237 instances demonstrate that HydraText consistently achieves competitive attack success rates and better attack imperceptibility than the recently proposed attack approaches. A human evaluation study also shows that the AEs crafted by HydraText are more indistinguishable from human-written text. Finally, these AEs exhibit good transferability and can bring notable robustness improvement to the target model by adversarial training.

MoDELS · 語言模型化 · GPT-3.5 · 數據集 · Performer ·

2023 年 12 月 14 日

TinyGSM: achieving >80% on GSM8k with small language models

Bingbin Liu,Sebastien Bubeck,Ronen Eldan,Janardhan Kulkarni,Yuanzhi Li,Anh Nguyen,Rachel Ward,Yi Zhang

Small-scale models offer various computational advantages, and yet to which extent size is critical for problem-solving abilities remains an open question. Specifically for solving grade school math, the smallest model size so far required to break the 80\% barrier on the GSM8K benchmark remains to be 34B. Our work studies how high-quality datasets may be the key for small language models to acquire mathematical reasoning. We introduce \texttt{TinyGSM}, a synthetic dataset of 12.3M grade school math problems paired with Python solutions, generated fully by GPT-3.5. After finetuning on \texttt{TinyGSM}, we find that a duo of a 1.3B generation model and a 1.3B verifier model can achieve 81.5\% accuracy, outperforming existing models that are orders of magnitude larger. This also rivals the performance of the GPT-3.5 ``teacher'' model (77.4\%), from which our model's training data is generated. Our approach is simple and has two key components: 1) the high-quality dataset \texttt{TinyGSM}, 2) the use of a verifier, which selects the final outputs from multiple candidate generations.

Prompt · MoDELS · Learning · Performer · 模型復雜度 ·

2023 年 12 月 14 日

Exploration of visual prompt in Grounded pre-trained open-set detection

Qibo Chen,Weizhong Jin,Shuchang Li,Mengdi Liu,Li Yu,Jian Jiang,Xiaozheng Wang

from arxiv, Accepted at ICASSP 2024

Text prompts are crucial for generalizing pre-trained open-set object detection models to new categories. However, current methods for text prompts are limited as they require manual feedback when generalizing to new categories, which restricts their ability to model complex scenes, often leading to incorrect detection results. To address this limitation, we propose a novel visual prompt method that learns new category knowledge from a few labeled images, which generalizes the pre-trained detection model to the new category. To allow visual prompts to represent new categories adequately, we propose a statistical-based prompt construction module that is not limited by predefined vocabulary lengths, thus allowing more vectors to be used when representing categories. We further utilize the category dictionaries in the pre-training dataset to design task-specific similarity dictionaries, which make visual prompts more discriminative. We evaluate the method on the ODinW dataset and show that it outperforms existing prompt learning methods and performs more consistently in combinatorial inference.

MoDELS · 閾值 · Performer · 估計/估計量 · 置信度 ·

2023 年 12 月 14 日

The application of accumulation tests in Peaks-Over-Threshold modeling with Norwegian Fire insurance Data

Bowen Liu,Malwane M. A. Ananda

from arxiv, 19 pages, 4 figures, 11 tables,

Modeling excess remains to be an important topic in insurance data modeling. Among the alternatives of modeling excess, the Peaks Over Threshold (POT) framework with Generalized Pareto distribution (GPD) is regarded as an efficient approach due to its flexibility. However, the selection of an appropriate threshold for such framework is a major difficulty. To address such difficulty, we applied several accumulation tests along with Anderson-Darling test to determine an optimal threshold. Based on the selected thresholds, the fitted GPD with the estimated quantiles can be found. We applied the procedure to the well-known Norwegian Fire Insurance data and constructed the confidence intervals for the Value-at-Risks (VaR). The accumulation test approach provides satisfactory performance in modeling the high quantiles of Norwegian Fire Insurance data compared to the previous graphical methods.

簇 · 變分自編碼 · 去噪 · MoDELS · 潛在 ·

2023 年 12 月 13 日

ClusterDDPM: An EM clustering framework with Denoising Diffusion Probabilistic Models

Jie Yan,Jing Liu,Zhong-yuan Zhang

Variational autoencoder (VAE) and generative adversarial networks (GAN) have found widespread applications in clustering and have achieved significant success. However, the potential of these approaches may be limited due to VAE's mediocre generation capability or GAN's well-known instability during adversarial training. In contrast, denoising diffusion probabilistic models (DDPMs) represent a new and promising class of generative models that may unlock fresh dimensions in clustering. In this study, we introduce an innovative expectation-maximization (EM) framework for clustering using DDPMs. In the E-step, we aim to derive a mixture of Gaussian priors for the subsequent M-step. In the M-step, our focus lies in learning clustering-friendly latent representations for the data by employing the conditional DDPM and matching the distribution of latent representations to the mixture of Gaussian priors. We present a rigorous theoretical analysis of the optimization process in the M-step, proving that the optimizations are equivalent to maximizing the lower bound of the Q function within the vanilla EM framework under certain constraints. Comprehensive experiments validate the advantages of the proposed framework, showcasing superior performance in clustering, unsupervised conditional generation and latent representation learning.

估計/估計量 · MoDELS · 通道 · motivation · 查準率/準確率 ·

2023 年 12 月 13 日

An a posteriori error estimate for a 0D/2D coupled model

Hussein Albazzal,Alexei Lozinski,Roberta Tittarelli

This work is motivated by the need of efficient numerical simulations of gas flows in the serpentine channels used in proton-exchange membrane fuel cells. In particular, we consider the Poisson problem in a 2D domain composed of several long straight rectangular sections and of several bends corners. In order to speed up the resolution, we propose a 0D model in the rectangular parts of the channel and a Finite Element resolution in the bends. To find a good compromise between precision and time consuming, the challenge is double: how to choose a suitable position of the interface between the 0D and the 2D models and how to control the discretization error in the bends. We shall present an \textit{a posteriori} error estimator based on an equilibrated flux reconstruction in the subdomains where the Finite Element method is applied. The estimates give a global upper bound on the error measured in the energy norm of the difference between the exact and approximate solutions on the whole domain. They are guaranteed, meaning that they feature no undetermined constants. (global) Lower bounds for the error are also derived. An adaptive algorithm is proposed to use smartly the estimator for aforementioned double challenge. A numerical validation of the estimator and the algorithm completes the work. \end{abstract}