99视频在线播放喷射_无码一级毛片免费_青草草草草在线视频_欧色欧美中文字幕一二三四区人妻_国产四虎免费精品_欧美日韩精品一区二区三区_丰满人妻一区二区三区AV

Bootstrap is a principled and powerful frequentist statistical tool for uncertainty quantification. Unfortunately, standard bootstrap methods are computationally intensive due to the need of drawing a large i.i.d. bootstrap sample to approximate the ideal bootstrap distribution; this largely hinders their application in large-scale machine learning, especially deep learning problems. In this work, we propose an efficient method to explicitly \emph{optimize} a small set of high quality "centroid" points to better approximate the ideal bootstrap distribution. We achieve this by minimizing a simple objective function that is asymptotically equivalent to the Wasserstein distance to the ideal bootstrap distribution. This allows us to provide an accurate estimation of uncertainty with a small number of bootstrap centroids, outperforming the naive i.i.d. sampling approach. Empirically, we show that our method can boost the performance of bootstrap in a variety of applications.

相關內容

自(zi)助法(fa)/自(zi)舉法(fa)

關注 0

SimPLe · 估計/估計量 · 學成 · 深度學習 · Performer ·

2021 年 12 月 14 日

Calibrated and Sharp Uncertainties in Deep Learning via Simple Density Estimation

Volodymyr Kuleshov,Shachi Deshpande

Predictive uncertainties can be characterized by two properties--calibration and sharpness. This paper argues for reasoning about uncertainty in terms these properties and proposes simple algorithms for enforcing them in deep learning. Our methods focus on the strongest notion of calibration--distribution calibration--and enforce it by fitting a low-dimensional density or quantile function with a neural estimator. The resulting approach is much simpler and more broadly applicable than previous methods across both classification and regression. Empirically, we find that our methods improve predictive uncertainties on several tasks with minimal computational and implementation overhead. Our insights suggest simple and improved ways of training deep learning models that lead to accurate uncertainties that should be leveraged to improve performance across downstream applications.

可約的 · 近似 · ResNet · 殘差網絡 · 訓練數據 ·

2021 年 12 月 14 日

Adaptive Projected Residual Networks for Learning Parametric Maps from Sparse Data

Thomas O'Leary-Roseberry,Xiaosong Du,Anirban Chaudhuri,Joaquim R. R. A. Martins,Karen Willcox,Omar Ghattas

We present a parsimonious surrogate framework for learning high dimensional parametric maps from limited training data. The need for parametric surrogates arises in many applications that require repeated queries of complex computational models. These applications include such "outer-loop" problems as Bayesian inverse problems, optimal experimental design, and optimal design and control under uncertainty, as well as real time inference and control problems. Many high dimensional parametric mappings admit low dimensional structure, which can be exploited by mapping-informed reduced bases of the inputs and outputs. Exploiting this property, we develop a framework for learning low dimensional approximations of such maps by adaptively constructing ResNet approximations between reduced bases of their inputs and output. Motivated by recent approximation theory for ResNets as discretizations of control flows, we prove a universal approximation property of our proposed adaptive projected ResNet framework, which motivates a related iterative algorithm for the ResNet construction. This strategy represents a confluence of the approximation theory and the algorithm since both make use of sequentially minimizing flows. In numerical examples we show that these parsimonious, mapping-informed architectures are able to achieve remarkably high accuracy given few training data, making them a desirable surrogate strategy to be implemented for minimal computational investment in training data generation.

Performer · 線性的 · Storage · 確切的 · 近似 ·

2021 年 12 月 13 日

On Exact and Approximate Policies for Linear Tape Scheduling in Data Centers

Carlos H. Cardonha,Andre A. Cire,Lucas C. Villa Real

from arxiv, 32 pages, 6 tables, 8 figures

This paper investigates scheduling policies for file retrieval in linear storage devices, such as magnetic tapes. Tapes are the technology of choice for long-term storage in data centers due to their low cost per capacity, reliability, and data security. While scheduling problems associated with data retrieval in tapes are classical, existing works focus on more straightforward heuristic approaches due to limited computational times imposed by standard tape specifications. Our first contribution is a theoretical investigation of three standard policies, presenting their worst-case performance and special cases of practical relevance for which they are optimal. Next, we show that the problem is polynomially solvable via two interleaved recursive models, albeit with high computational complexity. We leverage our previous results to develop two new scheduling policies with constant-ratio performance and low computational cost. Finally, we investigate properties associated with the online variant of the problem, presenting a new constant-factor competitive algorithm. Our numerical analysis on synthetic and real-world tapes from an industry partner provides insights into dataset configurations where each policy is more effective, which is of relevance to data center managers. In particular, our new best-performing policy is practical for large datasets and significantly improves upon standard algorithms in the area.

自助法/自舉法 · Networking · Machine Learning · Neural Networks · 學成 ·

2021 年 12 月 13 日

Neural Bootstrapper

Minsuk Shin,Hyungjoo Cho,Hyun-seok Min,Sungbin Lim

from arxiv, 19 pages, 13 figures. Accepted for NeurIPS 2021. Corresponding Author: Sungbin Lim

Bootstrapping has been a primary tool for ensemble and uncertainty quantification in machine learning and statistics. However, due to its nature of multiple training and resampling, bootstrapping deep neural networks is computationally burdensome; hence it has difficulties in practical application to the uncertainty estimation and related tasks. To overcome this computational bottleneck, we propose a novel approach called \emph{Neural Bootstrapper} (NeuBoots), which learns to generate bootstrapped neural networks through single model training. NeuBoots injects the bootstrap weights into the high-level feature layers of the backbone network and outputs the bootstrapped predictions of the target, without additional parameters and the repetitive computations from scratch. We apply NeuBoots to various machine learning tasks related to uncertainty quantification, including prediction calibrations in image classification and semantic segmentation, active learning, and detection of out-of-distribution samples. Our empirical results show that NeuBoots outperforms other bagging based methods under a much lower computational cost without losing the validity of bootstrapping.

近似 · 情景 · CASE · 相互獨立的 · 優化器 ·

2021 年 12 月 13 日

Verified Approximation Algorithms

Robin E?mann,Tobias Nipkow,Simon Robillard,Ujkan Sulejmani

We present the first formal verification of approximation algorithms for NP-complete optimization problems: vertex cover, independent set, set cover, center selection, load balancing, and bin packing. We uncover incompletenesses in existing proofs and improve the approximation ratio in one case. All proofs are uniformly invariant based.

提議分布 · QRS · 樣本 · 拒絕采樣 · 離散化 ·

2021 年 12 月 10 日

Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs

Bryan Eikema,Germán Kruszewski,Hady Elsahar,Marc Dymetman

Energy-Based Models (EBMs) allow for extremely flexible specifications of probability distributions. However, they do not provide a mechanism for obtaining exact samples from these distributions. Monte Carlo techniques can aid us in obtaining samples if some proposal distribution that we can easily sample from is available. For instance, rejection sampling can provide exact samples but is often difficult or impossible to apply due to the need to find a proposal distribution that upper-bounds the target distribution everywhere. Approximate Markov chain Monte Carlo sampling techniques like Metropolis-Hastings are usually easier to design, exploiting a local proposal distribution that performs local edits on an evolving sample. However, these techniques can be inefficient due to the local nature of the proposal distribution and do not provide an estimate of the quality of their samples. In this work, we propose a new approximate sampling technique, Quasi Rejection Sampling (QRS), that allows for a trade-off between sampling efficiency and sampling quality, while providing explicit convergence bounds and diagnostics. QRS capitalizes on the availability of high-quality global proposal distributions obtained from deep learning models. We demonstrate the effectiveness of QRS sampling for discrete EBMs over text for the tasks of controlled text generation with distributional constraints and paraphrase generation. We show that we can sample from such EBMs with arbitrary precision at the cost of sampling efficiency.

蒙特卡羅 · 蒙特卡羅方法 · 方差減小 · 近似 · 樣本 ·

2021 年 12 月 10 日

Frozen Gaussian Sampling: A Mesh-free Monte Carlo Method For Approximating Semiclassical Schr?dinger Equations

Ynatong Xie,Zhennan Zhou

from arxiv, 41 pages, 7 figures

In this paper, we develop a Monte Carlo algorithm named the Frozen Gaussian Sampling (FGS) to solve the semiclassical Schr\"odinger equation based on the frozen Gaussian approximation. Due to the highly oscillatory structure of the wave function, traditional mesh-based algorithms suffer from "the curse of dimensionality", which gives rise to more severe computational burden when the semiclassical parameter $\ep$ is small. The Frozen Gaussian sampling outperforms the existing algorithms in that it is mesh-free in computing the physical observables and is suitable for high dimensional problems. In this work, we provide detailed procedures to implement the FGS for both Gaussian and WKB initial data cases, where the sampling strategies on the phase space balance the need of variance reduction and sampling convenience. Moreover, we rigorously prove that, to reach a certain accuracy, the number of samples needed for the FGS is independent of the scaling parameter $\ep$. Furthermore, the complexity of the FGS algorithm is of a sublinear scaling with respect to the microscopic degrees of freedom and, in particular, is insensitive to the dimension number. The performance of the FGS is validated through several typical numerical experiments, including simulating scattering by the barrier potential, formation of the caustics and computing the high-dimensional physical observables without mesh.

流形 · 近似 · 數據點 · 線性的 · 維數災難 ·

2019 年 3 月 7 日

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Barak Sober,David Levin

In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.

可辨認的 · 行 · 統計量 · 微陣列數據 · 相似度度量 ·

2018 年 9 月 13 日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Amichai Painsky

The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem for gene expression data sets, in which each row can only be a member of a single bicluster while columns can participate in multiple ones. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters through a combination of existing biclustering algorithms and combinatorial auction techniques. We devise an approach for tuning the threshold for our algorithm based on comparison to a null model in the spirit of the Gap statistic approach. We demonstrate our approach on both synthetic and real-world gene expression data and show its power in identifying large span non-overlapping rows sub matrices, while considering their unique nature. The Gap statistic approach succeeds in identifying appropriate thresholds in all our examples.

近似 · INFORMS · SimPLe · 秩 · 線性的 ·

2018 年 1 月 2 日

Practical sketching algorithms for low-rank matrix approximation

Joel A. Tropp,Alp Yurtsever,Madeleine Udell,Volkan Cevher

This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.