亚洲AV午夜成人片精品网站听书,全网最新黄色网站,91亚洲精品福利在线播放,蜜臀AV秘无码一区二区三区,中文字幕中文字幕在线中一区

Evaluations of model editing currently only use the `next few token' completions after a prompt. As a result, the impact of these methods on longer natural language generation is largely unknown. We introduce long-form evaluation of model editing (LEME) a novel evaluation protocol that measures the efficacy and impact of model editing in long-form generative settings. Our protocol consists of a machine-rated survey and a classifier which correlates well with human ratings. Importantly, we find that our protocol has very little relationship with previous short-form metrics (despite being designed to extend efficacy, generalization, locality, and portability into a long-form setting), indicating that our method introduces a novel set of dimensions for understanding model editing methods. Using this protocol, we benchmark a number of model editing techniques and present several findings including that, while some methods (ROME and MEMIT) perform well in making consistent edits within a limited scope, they suffer much more from factual drift than other methods. Finally, we present a qualitative analysis that illustrates common failure modes in long-form generative settings including internal consistency, lexical cohesion, and locality issues.

相關內容

MoDELS

關注 0

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 在線 · 覆蓋 · 估計/估計量 · 機器學習 ·

2024 年 5 月 28 日

Online conformal prediction with decaying step sizes

Anastasios N. Angelopoulos,Rina Foygel Barber,Stephen Bates

We introduce a method for online conformal prediction with decaying step sizes. Like previous methods, ours possesses a retrospective guarantee of coverage for arbitrary sequences. However, unlike previous methods, we can simultaneously estimate a population quantile when it exists. Our theory and experiments indicate substantially improved practical properties: in particular, when the distribution is stable, the coverage is close to the desired level for every time point, not just on average over the observed sequence.

多峰值 · INFORMS · Attention · 推薦系統 · Performer ·

2024 年 5 月 28 日

Attention-based sequential recommendation system using multimodal data

Hyungtaik Oh,Wonkeun Jo,Dongil Kim

from arxiv, 18 pages, 4 figures, preprinted

Sequential recommendation systems that model dynamic preferences based on a use's past behavior are crucial to e-commerce. Recent studies on these systems have considered various types of information such as images and texts. However, multimodal data have not yet been utilized directly to recommend products to users. In this study, we propose an attention-based sequential recommendation method that employs multimodal data of items such as images, texts, and categories. First, we extract image and text features from pre-trained VGG and BERT and convert categories into multi-labeled forms. Subsequently, attention operations are performed independent of the item sequence and multimodal representations. Finally, the individual attention information is integrated through an attention fusion function. In addition, we apply multitask learning loss for each modality to improve the generalization performance. The experimental results obtained from the Amazon datasets show that the proposed method outperforms those of conventional sequential recommendation systems.

Weight · INFORMS · 情景 · MoDELS · 相似度 ·

2024 年 5 月 28 日

Improving prediction models by incorporating external data with weights based on similarity

Max Behrens,Maryam Farhadizadeh,Angelika Rohde,Alexander Rühle,Nils H. Nicolay,Harald Binder,Daniela Z?ller

from arxiv, minor update of affiliations and funding

In clinical settings, we often face the challenge of building prediction models based on small observational data sets. For example, such a data set might be from a medical center in a multi-center study. Differences between centers might be large, thus requiring specific models based on the data set from the target center. Still, we want to borrow information from the external centers, to deal with small sample sizes. There are approaches that either assign weights to each external data set or each external observation. To incorporate information on differences between data sets and observations, we propose an approach that combines both into weights that can be incorporated into a likelihood for fitting regression models. Specifically, we suggest weights at the data set level that incorporate information on how well the models that provide the observation weights distinguish between data sets. Technically, this takes the form of inverse probability weighting. We explore different scenarios where covariates and outcomes differ among data sets, informing our simulation design for method evaluation. The concept of effective sample size is used for understanding the effectiveness of our subgroup modeling approach. We demonstrate our approach through a clinical application, predicting applied radiotherapy doses for cancer patients. Generally, the proposed approach provides improved prediction performance when external data sets are similar. We thus provide a method for quantifying similarity of external data sets to the target data set and use this similarity to include external observations for improving performance in a target data set prediction modeling task with small data.

INTERACT · 樣例 · 原點 · 近似 · MoDELS ·

2024 年 5 月 25 日

Numerical scheme for delay-type stochastic McKean-Vlasov equations driven by fractional Brownian motion

Shuaibin Gao,Qian Guo,Zhuoqi Liu,Chenggui Yuan

This paper focuses on the numerical scheme for delay-type stochastic McKean-Vlasov equations (DSMVEs) driven by fractional Brownian motion with Hurst parameter $H\in (0,1/2)\cup (1/2,1)$. The existence and uniqueness of the solutions to such DSMVEs whose drift coefficients contain polynomial delay terms are proved by exploting the Banach fixed point theorem. Then the propagation of chaos between interacting particle system and non-interacting system in $\mathcal{L}^p$ sense is shown. We find that even if the delay term satisfies the polynomial growth condition, the unmodified classical Euler-Maruyama scheme still can approximate the corresponding interacting particle system without the particle corruption. The convergence rates are revealed for $H\in (0,1/2)\cup (1/2,1)$. Finally, as an example that closely fits the original equation, a stochastic opinion dynamics model with both extrinsic memory and intrinsic memory is simulated to illustrate the plausibility of the theoretical result.

奇異的 · Fisher信息矩陣 · INFORMS · 方差 · 自助法/自舉法 ·

2024 年 5 月 24 日

Bootstrap test procedure for variance components in nonlinear mixed effects models in the presence of nuisance parameters and a singular Fisher Information Matrix

Tom Guédon,Charlotte Baey,Estelle Kuhn

We examine the problem of variance components testing in general mixed effects models using the likelihood ratio test. We account for the presence of nuisance parameters, i.e. the fact that some untested variances might also be equal to zero. Two main issues arise in this context leading to a non regular setting. First, under the null hypothesis the true parameter value lies on the boundary of the parameter space. Moreover, due to the presence of nuisance parameters the exact location of these boundary points is not known, which prevents from using classical asymptotic theory of maximum likelihood estimation. Then, in the specific context of nonlinear mixed-effects models, the Fisher information matrix is singular at the true parameter value. We address these two points by proposing a shrinked parametric bootstrap procedure, which is straightforward to apply even for nonlinear models. We show that the procedure is consistent, solving both the boundary and the singularity issues, and we provide a verifiable criterion for the applicability of our theoretical results. We show through a simulation study that, compared to the asymptotic approach, our procedure has a better small sample performance and is more robust to the presence of nuisance parameters. A real data application is also provided.

GM · MoDELS · 矩 · 概率圖模型 · Performer ·

2024 年 5 月 24 日

Propagating moments in probabilistic graphical models with polynomial regression forms for decision support systems

Victoria Volodina,Nikki Sonenberg,Peter Challenor,Jim Q. Smith

Probabilistic graphical models are widely used to model complex systems under uncertainty. Traditionally, Gaussian directed graphical models are applied for analysis of large networks with continuous variables as they can provide conditional and marginal distributions in closed form simplifying the inferential task. The Gaussianity and linearity assumptions are often adequate, yet can lead to poor performance when dealing with some practical applications. In this paper, we model each variable in graph G as a polynomial regression of its parents to capture complex relationships between individual variables and with a utility function of polynomial form. We develop a message-passing algorithm to propagate information throughout the network solely using moments which enables the expected utility scores to be calculated exactly. Our propagation method scales up well and enables to perform inference in terms of a finite number of expectations. We illustrate how the proposed methodology works with examples and in an application to decision problems in energy planning and for real-time clinical decision support.

縮放 · 簇 · 數據可視化 · 估計/估計量 · 稀疏化 ·

2024 年 5 月 24 日

Cluster-based multidimensional scaling embedding tool for data visualization

Patricia Hernández-León,Miguel A. Caro

We present a new technique for visualizing high-dimensional data called cluster MDS (cl-MDS), which addresses a common difficulty of dimensionality reduction methods: preserving both local and global structures of the original sample in a single 2-dimensional visualization. Its algorithm combines the well-known multidimensional scaling (MDS) tool with the $k$-medoids data clustering technique, and enables hierarchical embedding, sparsification and estimation of 2-dimensional coordinates for additional points. While cl-MDS is a generally applicable tool, we also include specific recipes for atomic structure applications. We apply this method to non-linear data of increasing complexity where different layers of locality are relevant, showing a clear improvement in their retrieval and visualization quality.

MoDELS · 近似 · C2 · 樣本 · 可交換的 ·

2024 年 5 月 24 日

TPMS2STEP: error-controlled and C2 continuity-preserving translation of TPMS models to STEP files based on constrained-PIA

Yaonaiming Zhao,Qiang Zou,Guoyue Luo,Jiayu Wu,Sifan Chen,Depeng Gao,Minghao Xuan,Fuyu Wang

Triply periodic minimal surface (TPMS) is emerging as an important way of designing microstructures. However, there has been limited use of commercial CAD/CAM/CAE software packages for TPMS design and manufacturing. This is mainly because TPMS is consistently described in the functional representation (F-rep) format, while modern CAD/CAM/CAE tools are built upon the boundary representation (B-rep) format. One possible solution to this gap is translating TPMS to STEP, which is the standard data exchange format of CAD/CAM/CAE. Following this direction, this paper proposes a new translation method with error-controlling and $C^2$ continuity-preserving features. It is based on an approximation error-driven TPMS sampling algorithm and a constrained-PIA algorithm. The sampling algorithm controls the deviation between the original and translated models. With it, an error bound of $2\epsilon$ on the deviation can be ensured if two conditions called $\epsilon$-density and $\epsilon$-approximation are satisfied. The constrained-PIA algorithm enforces $C^2$ continuity constraints during TPMS approximation, and meanwhile attaining high efficiency. A theoretical convergence proof of this algorithm is also given. The effectiveness of the translation method has been demonstrated by a series of examples and comparisons.

Machine Translation · 樣例 · 優化器 · 覆蓋 · AIM ·

2024 年 5 月 23 日

Optimizing example selection for retrieval-augmented machine translation with translation memories

Maxime Bouthors,Josep Crego,Fran?ois Yvon

from arxiv, TALN conference, French, 10 pages, 7 figures

Retrieval-augmented machine translation leverages examples from a translation memory by retrieving similar instances. These examples are used to condition the predictions of a neural decoder. We aim to improve the upstream retrieval step and consider a fixed downstream edit-based model: the multi-Levenshtein Transformer. The task consists of finding a set of examples that maximizes the overall coverage of the source sentence. To this end, we rely on the theory of submodular functions and explore new algorithms to optimize this coverage. We evaluate the resulting performance gains for the machine translation task.

自助法/自舉法 · 泛函 · 線性的 · 異方差 · 推斷 ·

2024 年 5 月 23 日

Bootstrap inference in functional linear regression models with scalar response under heteroscedasticity

Hyemin Yeon,Xiongtao Dai,Daniel John Nordman

Inference for functional linear models in the presence of heteroscedastic errors has received insufficient attention given its practical importance; in fact, even a central limit theorem has not been studied in this case. At issue, conditional mean estimates have complicated sampling distributions due to the infinite dimensional regressors, where truncation bias and scaling issues are compounded by non-constant variance under heteroscedasticity. As a foundation for distributional inference, we establish a central limit theorem for the estimated conditional mean under general dependent errors, and subsequently we develop a paired bootstrap method to provide better approximations of sampling distributions. The proposed paired bootstrap does not follow the standard bootstrap algorithm for finite dimensional regressors, as this version fails outside of a narrow window for implementation with functional regressors. The reason owes to a bias with functional regressors in a naive bootstrap construction. Our bootstrap proposal incorporates debiasing and thereby attains much broader validity and flexibility with truncation parameters for inference under heteroscedasticity; even when the naive approach may be valid, the proposed bootstrap method performs better numerically. The bootstrap is applied to construct confidence intervals for centered projections and for conducting hypothesis tests for the multiple conditional means. Our theoretical results on bootstrap consistency are demonstrated through simulation studies and also illustrated with a real data example.