东京热加勒比中文无码_久久一级高潮A免费_免费专区丝袜调教视频_一区二区三区免费视大全2017_免费毛片AV无码专区_在线观看国产成人免费福利_亚洲国产码专区在线观看

We develop new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in conformal language modeling identifies a subset of the text that satisfies a high-probability guarantee of correctness. These methods work by filtering claims from the LLM's original response if a scoring function evaluated on the claim fails to exceed a threshold calibrated via split conformal prediction. Existing methods in this area suffer from two deficiencies. First, the guarantee stated is not conditionally valid. The trustworthiness of the filtering step may vary based on the topic of the response. Second, because the scoring function is imperfect, the filtering step can remove many valuable and accurate claims. We address both of these challenges via two new conformal methods. First, we generalize the conditional conformal procedure of Gibbs et al. (2023) in order to adaptively issue weaker guarantees when they are required to preserve the utility of the output. Second, we show how to systematically improve the quality of the scoring function via a novel algorithm for differentiating through the conditional conformal procedure. We demonstrate the efficacy of our approach on biography and medical question-answering datasets.

相關內容

Conformer

關注 0

MoDELS · 穩健性 · 圖片分類 · 語言模型化 · 大語言模型 ·

2024 年 12 月 13 日

Robust image classification with multi-modal large language models

Francesco Villani,Igor Maljkovic,Dario Lazzaro,Angelo Sotgiu,Antonio Emanuele Cinà,Fabio Roli

Deep Neural Networks are vulnerable to adversarial examples, i.e., carefully crafted input samples that can cause models to make incorrect predictions with high confidence. To mitigate these vulnerabilities, adversarial training and detection-based defenses have been proposed to strengthen models in advance. However, most of these approaches focus on a single data modality, overlooking the relationships between visual patterns and textual descriptions of the input. In this paper, we propose a novel defense, Multi-Shield, designed to combine and complement these defenses with multi-modal information to further enhance their robustness. Multi-Shield leverages multi-modal large language models to detect adversarial examples and abstain from uncertain classifications when there is no alignment between textual and visual representations of the input. Extensive evaluations on CIFAR-10 and ImageNet datasets, using robust and non-robust image classification models, demonstrate that Multi-Shield can be easily integrated to detect and reject adversarial examples, outperforming the original defenses.

Integration · 近似 · 設計 · 數值分析 ·

2024 年 12 月 12 日

A new way of deriving implicit Runge-Kutta methods based on repeated integrals

Hana Mizerová,Katarína Tvrdá

from arxiv, 24 pages, 8 figures, 11 tables

Runge-Kutta methods have an irreplaceable position among numerical methods designed to solve ordinary differential equations. Especially, implicit ones are suitable for approximating solutions of stiff initial value problems. We propose a new way of deriving coefficients of implicit Runge-Kutta methods. This approach based on repeated integrals yields both new and well-known Butcher's tableaux. We discuss the properties of newly derived methods and compare them with standard collocation implicit Runge-Kutta methods in a series of numerical experiments, including the Prothero-Robinson problem.

近似 · 最大后驗 · 論文 · 操作 · INFORMS ·

2024 年 12 月 12 日

Enzymatic cycle-based receivers for approximate maximum a posteriori demodulation of concentration modulated signals

Chun Tung Chou

Molecular communication is a bio-inspired communication paradigm where molecules are used as the information carrier. This paper considers a molecular communication network where the transmitter uses concentration modulated signals for communication. Our focus is to design receivers that can demodulate these signals. We want the receivers to use enzymatic cycles as their building blocks and can work approximately as a maximum a posteriori (MAP) demodulator. No receivers with all these features exist in the current molecular communication literature. We consider enzymatic cycles because they are a very common class of chemical reactions that are found in living cells. In addition, a MAP receiver has good statistical performance. In this paper, we study the operating regime of an enzymatic cycle and how the parameters of the enzymatic cycles can be chosen so that the receiver can approximately implement a MAP demodulator. We use simulation to study the performance of this receiver. We show that we can reduce the bit-error ratio of the demodulator if the enzymatic cycle operates in specific parameter regimes.

正則化項 · 極大 · 估計/估計量 · Analysis · 線性的 ·

2024 年 12 月 12 日

A priori and a posteriori error estimates for discontinuous Galerkin time-discrete methods via maximal regularity

Georgios Akrivis,Stig Larsson

from arxiv, 14 pages

The maximal regularity property of discontinuous Galerkin methods for linear parabolic equations is used together with variational techniques to establish a priori and a posteriori error estimates of optimal order under optimal regularity assumptions. The analysis is set in the maximal regularity framework of UMD Banach spaces. Similar results were proved in an earlier work, based on the consistency analysis of Radau IIA methods. The present error analysis, which is based on variational techniques, is of independent interest, but the main motivation is that it extends to nonlinear parabolic equations; in contrast to the earlier work. Both autonomous and nonautonomous linear equations are considered.

集成 · MoDELS · 模型評估 · 可理解性 · Performer ·

2024 年 12 月 12 日

Beyond forecast leaderboards: Measuring individual model importance based on contribution to ensemble accuracy

Minsu Kim,Evan L. Ray,Nicholas G. Reich

from arxiv, 28 pages, 8 figures in the main text; includes supplementary material

Ensemble forecasts often outperform forecasts from individual standalone models, and have been used to support decision-making and policy planning in various fields. As collaborative forecasting efforts to create effective ensembles grow, so does interest in understanding individual models' relative importance in the ensemble. To this end, we propose two practical methods that measure the difference between ensemble performance when a given model is or is not included in the ensemble: a leave-one-model-out algorithm and a leave-all-subsets-of-models-out algorithm, which is based on the Shapley value. We explore the relationship between these metrics, forecast accuracy, and the similarity of errors, both analytically and through simulations. We illustrate this measure of the value a component model adds to an ensemble in the presence of other models using US COVID-19 death forecasts. This study offers valuable insight into individual models' unique features within an ensemble, which standard accuracy metrics alone cannot reveal.

Learning · 數據集 · Machine Learning · 正則化項 · Performer ·

2024 年 12 月 11 日

Benchmarking learned algorithms for computed tomography image reconstruction tasks

Maximilian B. Kiss,Ander Biguri,Zakhar Shumaylov,Ferdia Sherry,K. Joost Batenburg,Carola-Bibiane Sch?nlieb,Felix Lucka

Computed tomography (CT) is a widely used non-invasive diagnostic method in various fields, and recent advances in deep learning have led to significant progress in CT image reconstruction. However, the lack of large-scale, open-access datasets has hindered the comparison of different types of learned methods. To address this gap, we use the 2DeteCT dataset, a real-world experimental computed tomography dataset, for benchmarking machine learning based CT image reconstruction algorithms. We categorize these methods into post-processing networks, learned/unrolled iterative methods, learned regularizer methods, and plug-and-play methods, and provide a pipeline for easy implementation and evaluation. Using key performance metrics, including SSIM and PSNR, our benchmarking results showcase the effectiveness of various algorithms on tasks such as full data reconstruction, limited-angle reconstruction, sparse-angle reconstruction, low-dose reconstruction, and beam-hardening corrected reconstruction. With this benchmarking study, we provide an evaluation of a range of algorithms representative for different categories of learned reconstruction methods on a recently published dataset of real-world experimental CT measurements. The reproducible setup of methods and CT image reconstruction tasks in an open-source toolbox enables straightforward addition and comparison of new methods later on. The toolbox also provides the option to load the 2DeteCT dataset differently for extensions to other problems and different CT reconstruction tasks.

AI · MoDELS · 模型評估 · 泛化理論 · 語言模型化 ·

2024 年 12 月 11 日

LA4SR: illuminating the dark proteome with generative AI

David R. Nelson,Ashish Kumar Jaiswal,Noha Ismail,Alexandra Mystikou,Kourosh Salehi-Ashtiani

AI language models (LMs) show promise for biological sequence analysis. We re-engineered open-source LMs (GPT-2, BLOOM, DistilRoBERTa, ELECTRA, and Mamba, ranging from 70M to 12B parameters) for microbial sequence classification. The models achieved F1 scores up to 95 and operated 16,580x faster and at 2.9x the recall of BLASTP. They effectively classified the algal dark proteome - uncharacterized proteins comprising about 65% of total proteins - validated on new data including a new, complete Hi-C/Pacbio Chlamydomonas genome. Larger (>1B) LA4SR models reached high accuracy (F1 > 86) when trained on less than 2% of available data, rapidly achieving strong generalization capacity. High accuracy was achieved when training data had intact or scrambled terminal information, demonstrating robust generalization to incomplete sequences. Finally, we provide custom AI explainability software tools for attributing amino acid patterns to AI generative processes and interpret their outputs in evolutionary and biophysical contexts.

Subspace · 優化器 · 正則化項 · 線性的 · Learning ·

2024 年 12 月 11 日

Efficient gradient-based methods for bilevel learning via recycling Krylov subspaces

Matthias J. Ehrhardt,Silvia Gazzola,Sebastian J. Scott

from arxiv, 27 pages, 11 figures

Many optimization problems require hyperparameters, i.e., parameters that must be pre-specified in advance, such as regularization parameters and parametric regularizers in variational regularization methods for inverse problems, and dictionaries in compressed sensing. A data-driven approach to determine appropriate hyperparameter values is via a nested optimization framework known as bilevel learning. Even when it is possible to employ a gradient-based solver to the bilevel optimization problem, construction of the gradients, known as hypergradients, is computationally challenging, each one requiring both a solution of a minimization problem and a linear system solve. These systems do not change much during the iterations, which motivates us to apply recycling Krylov subspace methods, wherein information from one linear system solve is re-used to solve the next linear system. Existing recycling strategies often employ eigenvector approximations called Ritz vectors. In this work we propose a novel recycling strategy based on a new concept, Ritz generalized singular vectors, which acknowledge the bilevel setting. Additionally, while existing iterative methods primarily terminate according to the residual norm, this new concept allows us to define a new stopping criterion that directly approximates the error of the associated hypergradient. The proposed approach is validated through extensive numerical testing in the context of an inverse problem in imaging.

Integration · MoDELS · 評論員 · 可行 · Processing（編程語言） ·

2024 年 12 月 11 日

A computational framework to predict weld integrity and microstructural heterogeneity: application to hydrogen transmission

J. Wijnen,J. Parker,M. Gagliano,E. Martínez-Pa?eda

We present a novel computational framework to assess the structural integrity of welds. In the first stage of the simulation framework, local fractions of microstructural constituents within weld regions are predicted based on steel composition and welding parameters. The resulting phase fraction maps are used to define heterogeneous properties that are subsequently employed in structural integrity assessments using an elastoplastic phase field fracture model. The framework is particularised to predicting failure in hydrogen pipelines, demonstrating its potential to assess the feasibility of repurposing existing pipeline infrastructure to transport hydrogen. First, the process model is validated against experimental microhardness maps for vintage and modern pipeline welds. Additionally, the influence of welding conditions on hardness and residual stresses is investigated, demonstrating that variations in heat input, filler material composition, and weld bead order can significantly affect the properties within the weld region. Coupled hydrogen diffusion-fracture simulations are then conducted to determine the critical pressure at which hydrogen transport pipelines will fail. To this end, the model is enriched with a microstructure-sensitive description of hydrogen transport and hydrogen-dependent fracture resistance. The analysis of an X52 pipeline reveals that even 2 mm defects in a hard heat-affected zone can drastically reduce the critical failure pressure.

泛函 · SimPLe · Agent · MoDELS · 設計 ·

2024 年 11 月 26 日

A Behavior Tree-inspired programming language for autonomous agents

Oliver Biggar,Iman Shames

We propose a design for a functional programming language for autonomous agents, built off the ideas and motivations of Behavior Trees (BTs). BTs are a popular model for designing agents behavior in robotics and AI. However, as their growth has increased dramatically, the simple model of BTs has come to be limiting. There is a growing push to increase the functionality of BTs, with the end goal of BTs evolving into a programming language in their own right, centred around the defining BT properties of modularity and reactiveness. In this paper, we examine how the BT model must be extended in order to grow into such a language. We identify some fundamental problems which must be solved: implementing `reactive' selection, 'monitoring' safety-critical conditions, and passing data between actions. We provide a variety of small examples which demonstrate that these problems are complex, and that current BT approaches do not handle them in a manner consistent with modularity. We instead provide a simple set of modular programming primitives for handling these use cases, and show how they can be combined to build complex programs. We present a full specification for our BT-inspired language, and give an implementation in the functional programming language Haskell. Finally, we demonstrate our language by translating a large and complex BT into a simple, unambiguous program.