四虎亚洲精品高清在线观看_爆乳护士一区二区三区在线播放_亚洲精品成人无遮挡毛片_色婷婷丁香五月在线一区视频_国产在线视频在线_天天插天天日天天狠_亚洲AV无码国产成人久久电影

Test log-likelihood is commonly used to compare different models of the same data or different approximate inference algorithms for fitting the same probabilistic model. We present simple examples demonstrating how comparisons based on test log-likelihood can contradict comparisons according to other objectives. Specifically, our examples show that (i) approximate Bayesian inference algorithms that attain higher test log-likelihoods need not also yield more accurate posterior approximations and (ii) conclusions about forecast accuracy based on test log-likelihood comparisons may not agree with conclusions based on root mean squared error.

相關內容

對數似然(ran)

關注 0

Analysis · PCA · tuning · 指數損失 · 頻率主義學派 ·

2023 年 10 月 19 日

Sequential Gibbs Posteriors with Applications to Principal Component Analysis

Steven Winter,Omar Melikechi,David B. Dunson

Gibbs posteriors are proportional to a prior distribution multiplied by an exponentiated loss function, with a key tuning parameter weighting information in the loss relative to the prior and providing a control of posterior uncertainty. Gibbs posteriors provide a principled framework for likelihood-free Bayesian inference, but in many situations, including a single tuning parameter inevitably leads to poor uncertainty quantification. In particular, regardless of the value of the parameter, credible regions have far from the nominal frequentist coverage even in large samples. We propose a sequential extension to Gibbs posteriors to address this problem. We prove the proposed sequential posterior exhibits concentration and a Bernstein-von Mises theorem, which holds under easy to verify conditions in Euclidean space and on manifolds. As a byproduct, we obtain the first Bernstein-von Mises theorem for traditional likelihood-based Bayesian posteriors on manifolds. All methods are illustrated with an application to principal component analysis.

Learning · MoDELS · 強化學習 · 傳感器 · 樣本 ·

2023 年 10 月 18 日

What model does MuZero learn?

Jinke He,Thomas M. Moerland,Frans A. Oliehoek

Model-based reinforcement learning has drawn considerable interest in recent years, given its promise to improve sample efficiency. Moreover, when using deep-learned models, it is potentially possible to learn compact models from complex sensor data. However, the effectiveness of these learned models, particularly their capacity to plan, i.e., to improve the current policy, remains unclear. In this work, we study MuZero, a well-known deep model-based reinforcement learning algorithm, and explore how far it achieves its learning objective of a value-equivalent model and how useful the learned models are for policy improvement. Amongst various other insights, we conclude that the model learned by MuZero cannot effectively generalize to evaluate unseen policies, which limits the extent to which we can additionally improve the current policy by planning with the model.

估計/估計量 · 預測器/決策函數 · 均方誤差 · motivation · 方陣 ·

2023 年 10 月 18 日

Mack's estimator motivated by large exposure asymptotics in a compound Poisson setting

Nils Engler,Filip Lindskog

The distribution-free chain ladder of Mack justified the use of the chain ladder predictor and enabled Mack to derive an estimator of conditional mean squared error of prediction for the chain ladder predictor. Classical insurance loss models, i.e. of compound Poisson type, are not consistent with Mack's distribution-free chain ladder. However, for a sequence of compound Poisson loss models indexed by exposure (e.g. number of contracts), we show that the chain ladder predictor and Mack's estimator of conditional mean squared error of prediction can be derived by considering large exposure asymptotics. Hence, quantifying chain ladder prediction uncertainty can be done with Mack's estimator without relying on the validity of the model assumptions of the distribution-free chain ladder.

縮放 · Machine Learning · Learning · 方差 · 方差縮放 ·

2023 年 10 月 18 日

Can bin-wise scaling improve consistency and adaptivity of prediction uncertainty for machine learning regression ?

Pascal Pernot

Binwise Variance Scaling (BVS) has recently been proposed as a post hoc recalibration method for prediction uncertainties of machine learning regression problems that is able of more efficient corrections than uniform variance (or temperature) scaling. The original version of BVS uses uncertainty-based binning, which is aimed to improve calibration conditionally on uncertainty, i.e. consistency. I explore here several adaptations of BVS, in particular with alternative loss functions and a binning scheme based on an input-feature (X) in order to improve adaptivity, i.e. calibration conditional on X. The performances of BVS and its proposed variants are tested on a benchmark dataset for the prediction of atomization energies and compared to the results of isotonic regression.

MoDELS · 有向模型 · AIM · 有向 · 可辨認的 ·

2023 年 10 月 18 日

Emptying the Ocean with a Spoon: Should We Edit Models?

Yuval Pinter,Michael Elhadad

from arxiv, Findings of ACL: EMNLP 2023

We call into question the recently popularized method of direct model editing as a means of correcting factual errors in LLM generations. We contrast model editing with three similar but distinct approaches that pursue better defined objectives: (1) retrieval-based architectures, which decouple factual memory from inference and linguistic capabilities embodied in LLMs; (2) concept erasure methods, which aim at preventing systemic bias in generated text; and (3) attribution methods, which aim at grounding generations into identified textual sources. We argue that direct model editing cannot be trusted as a systematic remedy for the disadvantages inherent to LLMs, and while it has proven potential in improving model explainability, it opens risks by reinforcing the notion that models can be trusted for factuality. We call for cautious promotion and application of model editing as part of the LLM deployment process, and for responsibly limiting the use cases of LLMs to those not relying on editing as a critical component.

極小點 · 正交 · 分解的 · SimPLe · 秩 ·

2023 年 10 月 17 日

Orthogonal dissection into few rectangles

David Eppstein

from arxiv, 18 pages, 8 figures. This version adds results on dissection with rotations and reflections

We describe a polynomial time algorithm that takes as input a polygon with axis-parallel sides but irrational vertex coordinates, and outputs a set of as few rectangles as possible into which it can be dissected by axis-parallel cuts and translations. The number of rectangles is the rank of the Dehn invariant of the polygon. The same method can also be used to dissect an axis-parallel polygon into a simple polygon with the minimum possible number of edges. When rotations or reflections are allowed, we can approximate the minimum number of rectangles to within a factor of two.

近似 · Networking · 統計量 · Learning · 動力系統 ·

2023 年 10 月 17 日

How accurate are neural approximations of complex network dynamics?

Vaiva Vasiliauskaite,Nino Antulov-Fantulin

Data-driven approximations of ordinary differential equations offer a promising alternative to classical methods of discovering a dynamical system model, particularly in complex systems lacking explicit first principles. This paper focuses on a complex system whose dynamics is described with a system of such equations, coupled through a complex network. Numerous real-world systems, including financial, social, and neural systems, belong to this class of dynamical models. We propose essential elements for approximating these dynamical systems using neural networks, including necessary biases and an appropriate neural architecture. Emphasizing the differences from static supervised learning, we advocate for evaluating generalization beyond classical assumptions of statistical learning theory. To estimate confidence in prediction during inference time, we introduce a dedicated null model. By studying various complex network dynamics, we demonstrate that the neural approximations of dynamics generalize across complex network structures, sizes, and statistical properties of inputs. Our comprehensive framework enables accurate and reliable deep learning approximations of high-dimensional, nonlinear dynamical systems.

Learning · 泛函 · 平滑 · 樣本 · Principle ·

2023 年 10 月 16 日

How many samples are needed to leverage smoothness?

Vivien Cabannes,Stefano Vigogna

from arxiv, 34 pages, 13 figures

A core principle in statistical learning is that smoothness of target functions allows to break the curse of dimensionality. However, learning a smooth function seems to require enough samples close to one another to get meaningful estimate of high-order derivatives, which would be hard in machine learning problems where the ratio between number of data and input dimension is relatively small. By deriving new lower bounds on the generalization error, this paper formalizes such an intuition, before investigating the role of constants and transitory regimes which are usually not depicted beyond classical learning theory statements while they play a dominant role in practice.

全局極小值 · 優化器 · 極小值 · 非凸 · 近似 ·

2021 年 3 月 24 日

Why Do Local Methods Solve Nonconvex Problems?

Tengyu Ma

from arxiv, This is the Chapter 21 of the book "Beyond the Worst-Case Analysis of Algorithms"

Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.

AdderNet · Neural Networks · Networking · 卷積 · 模型評估 ·

2019 年 12 月 31 日

AdderNet: Do We Really Need Multiplications in Deep Learning?

Hanting Chen,Yunhe Wang,Chunjing Xu,Boxin Shi,Chao Xu,Qi Tian,Chang Xu

Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.