亚洲精品无码国产爽快A片百度,中文熟妇亚洲视频观看

Researchers and practitioners operating on a limited budget face the cost-performance trade-off dilemma. The challenging decision often centers on whether to use a large LLM with better performance or a smaller one with reduced costs. This has motivated recent research in the optimisation of LLM calls. Either a cascading strategy is used, where a smaller LLM or both are called sequentially, or a routing strategy is used, where only one model is ever called. Both scenarios are dependent on a decision criterion which is typically implemented by an extra neural model. In this work, we propose a simpler solution; we use only the uncertainty of the generations of the small LLM as the decision criterion. We compare our approach with both cascading and routing strategies using three different pairs of pre-trained small and large LLMs, on nine different tasks and against approaches that require an additional neural model. Our experiments reveal this simple solution optimally balances cost and performance, outperforming existing methods on 25 out of 27 experimental setups.

相關內容

大語言模型

關注 56

大語言模型是基于海量文本數據訓練的深度學習模型。它不僅能夠生成自然語言文本，還能夠深入理解文本含義，處理各種自然語言任務，如文本摘要、問答、翻譯等。2023年，大語言模型及其在人工智能領域的應用已成為全球科技研究的熱點，其在規模上的增長尤為引人注目，參數量已從最初的十幾億躍升到如今的一萬億。參數量的提升使得模型能夠更加精細地捕捉人類語言微妙之處，更加深入地理解人類語言的復雜性。在過去的一年里，大語言模型在吸納新知識、分解復雜任務以及圖文對齊等多方面都有顯著提升。隨著技術的不斷成熟，它將不斷拓展其應用范圍，為人類提供更加智能化和個性化的服務，進一步改善人們的生活和生產方式。

任務對話系統 · DST (Digital Sky Technologies) · 多樣性 · MoDELS · 數據集 ·

2024 年 6 月 13 日

Diverse and Effective Synthetic Data Generation for Adaptable Zero-Shot Dialogue State Tracking

James D. Finch,Jinho D. Choi

We demonstrate substantial performance gains in zero-shot dialogue state tracking (DST) by enhancing training data diversity through synthetic data generation. Existing DST datasets are severely limited in the number of application domains and slot types they cover due to the high costs of data collection, restricting their adaptability to new domains. This work addresses this challenge with a novel, fully automatic data generation approach that creates synthetic zero-shot DST datasets. Distinguished from previous methods, our approach can generate dialogues across a massive range of application domains, complete with silver-standard dialogue state annotations and slot descriptions. This technique is used to create the D0T dataset for training zero-shot DST models, encompassing an unprecedented 1,000+ domains. Experiments on the MultiWOZ benchmark show that training models on diverse synthetic data improves Joint Goal Accuracy by 6.7%, achieving results competitive with models 13.5 times larger than ours.

優化器 · Performer · 可辨認的 · INFORMS · 估計/估計量 ·

2024 年 6 月 13 日

Covariate Selection for Optimizing Balance with Covariate-Adjusted Response-Adaptive Randomization

Ziqing Guo,Yang Liu,Lucy Xia

from arxiv, 54 pages, 4 figures

Balancing influential covariates is crucial for valid treatment comparisons in clinical studies. While covariate-adaptive randomization is commonly used to achieve balance, its performance can be inadequate when the number of baseline covariates is large. It is therefore essential to identify the influential factors associated with the outcome and ensure balance among these critical covariates. In this article, we propose a novel covariate-adjusted response-adaptive randomization that integrates the patients' responses and covariates information to select sequentially significant covariates and maintain their balance. We establish theoretically the consistency of our covariate selection method and demonstrate that the improved covariate balancing, as evidenced by a faster convergence rate of the imbalance measure, leads to higher efficiency in estimating treatment effects. Furthermore, we provide extensive numerical and empirical studies to illustrate the benefits of our proposed method across various settings.

推斷 · 模型評估 · 語音識別 · 自動語音識別 · MoDELS ·

2024 年 6 月 13 日

A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

Ziyang Zhuang,Chenfeng Miao,Kun Zou,Shuai Gong,Ming Fang,Tao Wei,Zijian Li,Wei Hu,Shaojun Wang,Jing Xiao

Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EfficientASR. It uses an Index Mapping Vector (IMV) based alignment generator to generate alignments during training, and an alignment predictor to learn the alignments for inference. It can be trained end-to-end (E2E) with cross-entropy loss combined with alignment loss. The proposed EfficientASR achieves competitive results on the AISHELL-1 and AISHELL-2 benchmarks compared to the state-of-the-art (SOTA) models. Specifically, it achieves character error rates (CER) of 4.26%/4.62% on the AISHELL-1 dev/test dataset, which outperforms the SOTA AR Conformer with about 30x inference speedup.

潛在 · 端到端 · MoDELS · 監督 · Learning ·

2024 年 6 月 12 日

Enhancing End-to-End Autonomous Driving with Latent World Model

Yingyan Li,Lue Fan,Jiawei He,Yuqi Wang,Yuntao Chen,Zhaoxiang Zhang,Tieniu Tan

End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels. Specifically, our framework \textbf{LAW} uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame. The predicted latent features are supervised by the actually observed features in the future. This supervision jointly optimizes the latent feature learning and action prediction, which greatly enhances the driving performance. As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations.

Analysis · INFORMS · 可辨認的 · Branch · CRAFT ·

2024 年 6 月 12 日

Generator-Based Fuzzers with Type-Based Targeted Mutation

Soha Hussein,Stephen McCamant,Mike Whalen

from arxiv, Fixing rendering of figure

As with any fuzzer, directing Generator-Based Fuzzers (GBF) to reach particular code targets can increase the fuzzer's effectiveness. In previous work, coverage-guided fuzzers used a mix of static analysis, taint analysis, and constraint-solving approaches to address this problem. However, none of these techniques were particularly crafted for GBF where input generators are used to construct program inputs. The observation is that input generators carry information about the input structure that is naturally present through the typing composition of the program input. In this paper, we introduce a type-based mutation heuristic, along with constant string lookup, for Java GBF. Our key intuition is that if one can identify which sub-part (types) of the input will likely influence the branching decision, then focusing on mutating the choices of the generators constructing these types is likely to achieve the desired coverages. We used our technique to fuzz AWSLambda applications. Results compared to a baseline GBF tool show an almost 20\% average improvement in application coverage, and larger improvements when third-party code is included.

估計/估計量 · 3D · 簇 · 評論員 · LIDAR ·

2024 年 6 月 11 日

Personalized Product Assortment with Real-time 3D Perception and Bayesian Payoff Estimation

Porter Jenkins,Michael Selander,J. Stockton Jenkins,Andrew Merrill,Kyle Armstrong

from arxiv, Accepted to KDD 2024

Product assortment selection is a critical challenge facing physical retailers. Effectively aligning inventory with the preferences of shoppers can increase sales and decrease out-of-stocks. However, in real-world settings the problem is challenging due to the combinatorial explosion of product assortment possibilities. Consumer preferences are typically heterogeneous across space and time, making inventory-preference alignment challenging. Additionally, existing strategies rely on syndicated data, which tends to be aggregated, low resolution, and suffer from high latency. To solve these challenges we introduce a real-time recommendation system, which we call \ours. Our system utilizes recent advances in 3D computer vision for perception and automatic, fine grained sales estimation. These perceptual components run on the edge of the network and facilitate real-time reward signals. Additionally, we develop a Bayesian payoff model to account for noisy estimates from 3D LIDAR data. We rely on spatial clustering to allow the system to adapt to heterogeneous consumer preferences, and a graph-based candidate generation algorithm to address the combinatorial search problem. We test our system in real-world stores across two, 6-8 week A/B tests with beverage products and demonstrate a 35% and 27\% increase in sales respectively. Finally, we monitor the deployed system for a period of 28 weeks with an observational study and show a 9.4\% increase in sales.

估計/估計量 · 估計誤差 · 方差 · 控制器 · MoDELS ·

2024 年 6 月 10 日

Model-Free Error Assessment for Breadth-First Studies, with Applications to Cell-Perturbation Experiments

Jackson Loper,Jeffrey Regier

from arxiv, Submitted to JASA

With the advent of high-throughput screenings, it has become increasingly common for studies to devote limited resources to estimating many parameters imprecisely rather than to estimating a few parameters well. In these studies, only two or three independent replicates measure each parameter, and therefore it is challenging to assess the variance of these measurements. One solution is to pool variance estimates across different parameters using a parametric model of estimator error. However, such models are difficult to specify correctly, especially in the presence of ``batch effects.'' In this paper, we propose new model-free methods for assessing and controlling estimator error. Our focus is on type S error, which is of particular importance in many settings. To produce tight confidence intervals without making unrealistic assumptions, we improve on Hoeffding's bounds for sums of bounded random variables and obtain the tightest possible Chernoff-Cram\'er bound. Our methods compare favorably with existing practice for high-throughput screenings, such as methods based on the Irreproducible Discovery Rate (IDR) and the Benjamini-Hochberg procedure. Existing practices fail to control error at the nominal level in some cases and are needlessly conservative in others.

Learning · 強化學習 · 機器人 · Automator · Performer ·

2024 年 6 月 10 日

Sim-To-Real Transfer for Visual Reinforcement Learning of Deformable Object Manipulation for Robot-Assisted Surgery

Paul Maria Scheikl,Eleonora Tagliabue,Balázs Gyenes,Martin Wagner,Diego Dall'Alba,Paolo Fiorini,Franziska Mathis-Ullrich

Automation holds the potential to assist surgeons in robotic interventions, shifting their mental work load from visuomotor control to high level decision making. Reinforcement learning has shown promising results in learning complex visuomotor policies, especially in simulation environments where many samples can be collected at low cost. A core challenge is learning policies in simulation that can be deployed in the real world, thereby overcoming the sim-to-real gap. In this work, we bridge the visual sim-to-real gap with an image-based reinforcement learning pipeline based on pixel-level domain adaptation and demonstrate its effectiveness on an image-based task in deformable object manipulation. We choose a tissue retraction task because of its importance in clinical reality of precise cancer surgery. After training in simulation on domain-translated images, our policy requires no retraining to perform tissue retraction with a 50% success rate on the real robotic system using raw RGB images. Furthermore, our sim-to-real transfer method makes no assumptions on the task itself and requires no paired images. This work introduces the first successful application of visual sim-to-real transfer for robotic manipulation of deformable objects in the surgical field, which represents a notable step towards the clinical translation of cognitive surgical robotics.

知識 (knowledge) · ChatGPT · MoDELS · Learning · Extensibility ·

2024 年 6 月 6 日

Leveraging Codebook Knowledge with NLI and ChatGPT for Zero-Shot Political Relation Classification

Yibo Hu,Erick Skorupa Parolin,Latifur Khan,Patrick T. Brandt,Javier Osorio,Vito J. D'Orazio

from arxiv, ACL 2024

Is it possible accurately classify political relations within evolving event ontologies without extensive annotations? This study investigates zero-shot learning methods that use expert knowledge from existing annotation codebook, and evaluates the performance of advanced ChatGPT (GPT-3.5/4) and a natural language inference (NLI)-based model called ZSP. ChatGPT uses codebook's labeled summaries as prompts, whereas ZSP breaks down the classification task into context, event mode, and class disambiguation to refine task-specific hypotheses. This decomposition enhances interpretability, efficiency, and adaptability to schema changes. The experiments reveal ChatGPT's strengths and limitations, and crucially show ZSP's outperformance of dictionary-based methods and its competitive edge over some supervised models. These findings affirm the value of ZSP for validating event records and advancing ontology development. Our study underscores the efficacy of leveraging transfer learning and existing domain expertise to enhance research efficiency and scalability.

Tensor · 近似 · 估計/估計量 · Performer · TOOLS ·

2024 年 6 月 6 日

A Random Matrix Approach to Low-Multilinear-Rank Tensor Approximation

Hugo Lebeau,Florent Chatelain,Romain Couillet

This work presents a comprehensive understanding of the estimation of a planted low-rank signal from a general spiked tensor model near the computational threshold. Relying on standard tools from the theory of large random matrices, we characterize the large-dimensional spectral behavior of the unfoldings of the data tensor and exhibit relevant signal-to-noise ratios governing the detectability of the principal directions of the signal. These results allow to accurately predict the reconstruction performance of truncated multilinear SVD (MLSVD) in the non-trivial regime. This is particularly important since it serves as an initialization of the higher-order orthogonal iteration (HOOI) scheme, whose convergence to the best low-multilinear-rank approximation depends entirely on its initialization. We give a sufficient condition for the convergence of HOOI and show that the number of iterations before convergence tends to $1$ in the large-dimensional limit.