清纯唯美另类亚洲欧美综合,AV乱码AV免费AⅤ成人,久久综合久久综合久久综合久久综合久久综合

Bayesian coresets speed up posterior inference in the large-scale data regime by approximating the full-data log-likelihood function with a surrogate log-likelihood based on a small, weighted subset of the data. But while Bayesian coresets and methods for construction are applicable in a wide range of models, existing theoretical analysis of the posterior inferential error incurred by coreset approximations only apply in restrictive settings -- i.e., exponential family models, or models with strong log-concavity and smoothness assumptions. This work presents general upper and lower bounds on the Kullback-Leibler (KL) divergence of coreset approximations that reflect the full range of applicability of Bayesian coresets. The lower bounds require only mild model assumptions typical of Bayesian asymptotic analyses, while the upper bounds require the log-likelihood functions to satisfy a generalized subexponentiality criterion that is weaker than conditions used in earlier work. The lower bounds are applied to obtain fundamental limitations on the quality of coreset approximations, and to provide a theoretical explanation for the previously-observed poor empirical performance of importance sampling-based construction methods. The upper bounds are used to analyze the performance of recent subsample-optimize methods. The flexibility of the theory is demonstrated in validation experiments involving multimodal, unidentifiable, heavy-tailed Bayesian posterior distributions.

相關內容

對數似然

關注 0

Lyapunov · Extensibility · Performer · Continuity · 近似 ·

2024 年 6 月 29 日

An extension of the low-rank Lyapunov ADI to non-zero initial values and its applications

Jonas Schulze,Jens Saak

from arxiv, v2: replace "instationary" by "nonstationary"

We derive the Alternating-Direction Implicit (ADI) method based on a commuting operator split and apply the results to the continuous time algebraic Lyapunov equation with low-rank constant term and approximate solution. Previously, it has been mandatory to start the low-rank ADI (LR-ADI) with an all-zero initial value. Our approach retains the known efficient iteration schemes of low-rank increments and residual to arbitrary low-rank initial values for the LR-ADI method. We further generalize some of the known properties of the LR-ADI for Lyapunov equations to larger classes of algorithms or problems. We investigate the performance of arbitrary initial values using two outer iterations in which LR-ADI is typically called. First, we solve an algebraic Riccati equation with the Newton method. Second, we solve a differential Riccati equation with a first-order Rosenbrock method. Numerical experiments confirm that the proposed new initial value of the alternating-directions implicit (ADI) can lead to a significant reduction in the total number of ADI steps, while also showing a 17% and 8x speed-up over the zero initial value for the two equation types, respectively.

貝葉斯推斷 · 推斷 · 邊緣化 · 似然 · 參數空間 ·

2024 年 6 月 28 日

Bayesian inference: More than Bayes's theorem

Thomas J. Loredo,Robert L. Wolpert

from arxiv, 35 pages, 11 figures; accepted for publication in Frontiers in Astronomy and Space Sciences (special issue for iid2022: Statistical Methods for Event Data - Illuminating the Dynamic Universe); fixed minor typo

Bayesian inference gets its name from *Bayes's theorem*, expressing posterior probabilities for hypotheses about a data generating process as the (normalized) product of prior probabilities and a likelihood function. But Bayesian inference uses all of probability theory, not just Bayes's theorem. Many hypotheses of scientific interest are *composite hypotheses*, with the strength of evidence for the hypothesis dependent on knowledge about auxiliary factors, such as the values of nuisance parameters (e.g., uncertain background rates or calibration factors). Many important capabilities of Bayesian methods arise from use of the law of total probability, which instructs analysts to compute probabilities for composite hypotheses by *marginalization* over auxiliary factors. This tutorial targets relative newcomers to Bayesian inference, aiming to complement tutorials that focus on Bayes's theorem and how priors modulate likelihoods. The emphasis here is on marginalization over parameter spaces -- both how it is the foundation for important capabilities, and how it may motivate caution when parameter spaces are large. Topics covered include the difference between likelihood and probability, understanding the impact of priors beyond merely shifting the maximum likelihood estimate, and the role of marginalization in accounting for uncertainty in nuisance parameters, systematic error, and model misspecification.

縮放 · Weight · 測試誤差 · Integration · Scaling Law ·

2024 年 6 月 28 日

Scaling laws for learning with real and surrogate data

Ayush Jain,Andrea Montanari,Eren Sasoglu

from arxiv, Added new experiments

Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources, e.g. data collected under different circumstances or synthesized by generative models. We refer to such data as `surrogate data.' We introduce a weighted empirical risk minimization (ERM) approach for integrating surrogate data into training. We analyze mathematically this method under several classical statistical models, and validate our findings empirically on datasets from different domains. Our main findings are: $(i)$ Integrating surrogate data can significantly reduce the test error on the original distribution. Surprisingly, this can happen even when the surrogate data is unrelated to the original ones. We trace back this behavior to the classical Stein's paradox. $(ii)$ In order to reap the benefit of surrogate data, it is crucial to use optimally weighted ERM. $(iii)$ The test error of models trained on mixtures of real and surrogate data is approximately described by a scaling law. This scaling law can be used to predict the optimal weighting scheme, and to choose the amount of surrogate data to add.

MoDELS · 查準率/準確率 · 集成 · Performer · Performance ·

2024 年 6 月 28 日

Precision matters: Precision-aware ensemble for weakly supervised semantic segmentation

Junsung Park,Hyunjung Shim

from arxiv, 5 pages, 5 figures, accepted in AAAI 2024 Edge Intelligence Workshop

Weakly Supervised Semantic Segmentation (WSSS) employs weak supervision, such as image-level labels, to train the segmentation model. Despite the impressive achievement in recent WSSS methods, we identify that introducing weak labels with high mean Intersection of Union (mIoU) does not guarantee high segmentation performance. Existing studies have emphasized the importance of prioritizing precision and reducing noise to improve overall performance. In the same vein, we propose ORANDNet, an advanced ensemble approach tailored for WSSS. ORANDNet combines Class Activation Maps (CAMs) from two different classifiers to increase the precision of pseudo-masks (PMs). To further mitigate small noise in the PMs, we incorporate curriculum learning. This involves training the segmentation model initially with pairs of smaller-sized images and corresponding PMs, gradually transitioning to the original-sized pairs. By combining the original CAMs of ResNet-50 and ViT, we significantly improve the segmentation performance over the single-best model and the naive ensemble model, respectively. We further extend our ensemble method to CAMs from AMN (ResNet-like) and MCTformer (ViT-like) models, achieving performance benefits in advanced WSSS models. It highlights the potential of our ORANDNet as a final add-on module for WSSS models.

相互獨立的 · 成對型 · Copulas · Nuance · 相關系數 ·

2024 年 6 月 27 日

On asymptotic independence in higher dimensions

Bikramjit Das,Vicky Fasen-Hartmann

from arxiv, 10 pages. A short version of the discussion between pairwise and mutual asymptotic independence mentioned in this paper appeared in a previous version of arXiv:2309.15511 (see v1), but has been subsequently removed. The current paper introduces this concept with additional ideas, notes and examples

In the study of extremes, the presence of asymptotic independence signifies that extreme events across multiple variables are probably less likely to occur together. Although well-understood in a bivariate context, the concept remains relatively unexplored when addressing the nuances of joint occurrence of extremes in higher dimensions. In this paper, we propose a notion of mutual asymptotic independence to capture the behavior of joint extremes in dimensions larger than two and contrast it with the classical notion of (pairwise) asymptotic independence. Furthermore, we define $k$-wise asymptotic independence which lies in between pairwise and mutual asymptotic independence. The concepts are compared using examples of Archimedean, Gaussian and Marshall-Olkin copulas among others. Notably, for the popular Gaussian copula, we provide explicit conditions on the correlation matrix for mutual asymptotic independence to hold; moreover, we are able to compute exact tail orders for various tail events.

INFORMS · PID · Principle · 可辨認的 · IB ·

2024 年 6 月 27 日

Partial information decomposition: redundancy as information bottleneck

Artemy Kolchinsky

from arxiv, Entropy, 2024

The partial information decomposition (PID) aims to quantify the amount of redundant information that a set of sources provides about a target. Here, we show that this goal can be formulated as a type of information bottleneck (IB) problem, termed the "redundancy bottleneck" (RB). The RB formalizes a tradeoff between prediction and compression: it extracts information from the sources that best predict the target, without revealing which source provided the information. It can be understood as a generalization of "Blackwell redundancy", which we previously proposed as a principled measure of PID redundancy. The "RB curve" quantifies the prediction--compression tradeoff at multiple scales. This curve can also be quantified for individual sources, allowing subsets of redundant sources to be identified without combinatorial optimization. We provide an efficient iterative algorithm for computing the RB curve.

MoDELS · 估計/估計量 · 塊坐標下降 · 坐標下降 · INFORMS ·

2024 年 6 月 27 日

On scalable ARMA models

Yuchang Lin,Wenyu Li,Qianqian Zhu,Guodong Li

from arxiv, 67 pages, 3 figures, 7 tables

This paper considers both the least squares and quasi-maximum likelihood estimation for the recently proposed scalable ARMA model, a parametric infinite-order vector AR model, and their asymptotic normality is also established. It makes feasible the inference on this computationally efficient model, especially for economic and financial time series. An efficient block coordinate descent algorithm is further introduced to search for estimates, and a Bayesian information criterion with selection consistency is suggested for model selection. Simulation experiments are conducted to illustrate their finite sample performance, and a real application on six macroeconomic indicators illustrates the usefulness of the proposed methodology.

Principle · 時間步 · Integration · 動力系統 · 近似 ·

2024 年 6 月 27 日

Simpson's quadrature for a nonlinear variational symplectic scheme

Fran?ois Dubois,Juan Antonio Rojas-Quintero

from arxiv, arXiv admin note: text overlap with arXiv:2406.16423

We propose a variational symplectic numerical method for the time integration of dynamical systems issued from the least action principle. We assume a quadratic internal interpolation of the state between two time steps and we approximate the action in one time step by the Simpson's quadrature formula. The resulting scheme is nonlinear and symplectic. First numerical experiments concern a nonlinear pendulum and we have observed experimentally very good convergence properties.

方差 · Processing（編程語言） · Markov · 極大 · 得分 ·

2024 年 6 月 26 日

Demonic variance and a non-determinism score for Markov decision processes

Jakob Piribauer

from arxiv, This is the extended version of a conference paper accepted for publication at MFCS 2024

This paper studies the influence of probabilism and non-determinism on some quantitative aspect X of the execution of a system modeled as a Markov decision process (MDP). To this end, the novel notion of demonic variance is introduced: For a random variable X in an MDP M, it is defined as 1/2 times the maximal expected squared distance of the values of X in two independent execution of M in which also the non-deterministic choices are resolved independently by two distinct schedulers. It is shown that the demonic variance is between 1 and 2 times as large as the maximal variance of X in M that can be achieved by a single scheduler. This allows defining a non-determinism score for M and X measuring how strongly the difference of X in two executions of M can be influenced by the non-deterministic choices. Properties of MDPs M with extremal values of the non-determinism score are established. Further, the algorithmic problems of computing the maximal variance and the demonic variance are investigated for two random variables, namely weighted reachability and accumulated rewards. In the process, also the structure of schedulers maximizing the variance and of scheduler pairs realizing the demonic variance is analyzed.

哈希學習 · 圖像檢索 · 卷積神經網絡 · 優化器 · 損失函數（機器學習） ·

2020 年 6 月 10 日

A survey on deep hashing for image retrieval

Xiaopeng Zhang

Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.