国产日黄色大片一区二区,成人三级视频在线看网站,国产精品偷伦一区二区,日本一区二区三区不卡片高清,国產免費AV黄色一级片

Many data-science problems can be formulated as an inverse problem, where the parameters are estimated by minimizing a proper loss function. When complicated black-box models are involved, derivative-free optimization tools are often needed. The ensemble Kalman filter (EnKF) is a particle-based derivative-free Bayesian algorithm originally designed for data assimilation. Recently, it has been applied to inverse problems for computational efficiency. The resulting algorithm, known as ensemble Kalman inversion (EKI), involves running an ensemble of particles with EnKF update rules so they can converge to a minimizer. In this article, we investigate EKI convergence in general nonlinear settings. To improve convergence speed and stability, we consider applying EKI with non-constant step-sizes and covariance inflation. We prove that EKI can hit critical points with finite steps in non-convex settings. We further prove that EKI converges to the global minimizer polynomially fast if the loss function is strongly convex. We verify the analysis presented with numerical experiments on two inverse problems.

相關內容

損失函數（機器學習）

關注 10

損失函數，在AI中亦稱呼距離函數，度量函數。此處的距離代表的是抽象性的，代表真實數據與預測數據之間的誤差。損失函數（loss function）是用來估量你模型的預測值f(x)與真實值Y的不一致程度，它是一個非負實值函數,通常使用L(Y, f(x))來表示，損失函數越小，模型的魯棒性就越好。損失函數是經驗風險函數的核心部分，也是結構風險函數重要組成部分。

Performer · 真實值 · 準則 · 泛化理論 · 模型選擇 ·

2021 年 12 月 14 日

Convex transport potential selection with semi-dual criterion

Adrien Vacher,Fran?ois-Xavier Vialard

Over the past few years, numerous computational models have been developed to solve Optimal Transport (OT) in a stochastic setting, where distributions are represented by samples. In such situations, the goal is to find a transport map that has good generalization properties on unseen data, ideally the closest map to the ground truth, unknown in practical settings. However, in the absence of ground truth, no quantitative criterion has been put forward to measure its generalization performance although it is crucial for model selection. We propose to leverage the Brenier formulation of OT to perform this task. Theoretically, we show that this formulation guarantees that, up to a distortion parameter that depends on the smoothness/strong convexity and a statistical deviation term, the selected map achieves the lowest quadratic error to the ground truth. This criterion, estimated via convex optimization, enables parameter and model selection among entropic regularization of OT, input convex neural networks and smooth and strongly convex nearest-Brenier (SSNB) models. Last, we make an experiment questioning the use of OT in Domain-Adaptation. Thanks to the criterion, we can identify the potential that is closest to the true OT map between the source and the target and we observe that this selected potential is not the one that performs best for the downstream transfer classification task.

估計/估計量 · 重要性采樣 · 卡爾曼濾波 · 無偏估計 · 混合分布 ·

2021 年 12 月 14 日

The ensemble Kalman filter for rare event estimation

Fabian Wagner,Iason Papaioannou,Elisabeth Ullmann

We present a novel sampling-based method for estimating probabilities of rare or failure events. Our approach is founded on the Ensemble Kalman filter (EnKF) for inverse problems. Therefore, we reformulate the rare event problem as an inverse problem and apply the EnKF to generate failure samples. To estimate the probability of failure, we use the final EnKF samples to fit a distribution model and apply Importance Sampling with respect to the fitted distribution. This leads to an unbiased estimator if the density of the fitted distribution admits positive values within the whole failure domain. To handle multi-modal failure domains, we localise the covariance matrices in the EnKF update step around each particle and fit a mixture distribution model in the Importance Sampling step. For affine linear limit-state functions, we investigate the continuous-time limit and large time properties of the EnKF update. We prove that the mean of the particles converges to a convex combination of the most likely failure point and the mean of the optimal Importance Sampling density if the EnKF is applied without noise. We provide numerical experiments to compare the performance of the EnKF with Sequential Importance Sampling.

Extensibility · 近似 · 代價函數 · state-of-the-art · 黑盒子 ·

2021 年 12 月 13 日

Reverse image filtering using total derivative approximation and accelerated gradient descent

Fernando J. Galetto,Guang Deng

In this paper, we address a new problem of reversing the effect of an image filter, which can be linear or nonlinear. The assumption is that the algorithm of the filter is unknown and the filter is available as a black box. We formulate this inverse problem as minimizing a local patch-based cost function and use total derivative to approximate the gradient which is used in gradient descent to solve the problem. We analyze factors affecting the convergence and quality of the output in the Fourier domain. We also study the application of accelerated gradient descent algorithms in three gradient-free reverse filters, including the one proposed in this paper. We present results from extensive experiments to evaluate the complexity and effectiveness of the proposed algorithm. Results demonstrate that the proposed algorithm outperforms the state-of-the-art in that (1) it is at the same level of complexity as that of the fastest reverse filter, but it can reverse a larger number of filters, and (2) it can reverse the same list of filters as that of the very complex reverse filter, but its complexity is much smaller.

Neural Networks · 優化器 · Networking · 目標函數 · MoDELS ·

2021 年 12 月 13 日

Acceleration techniques for optimization over trained neural network ensembles

Keliang Wang,Leonardo Lozano,Carlos Cardonha,David Bergman

from arxiv, 17 pages, 4 tables, 2 figures

We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit (ReLU) activation. Recent literature has explored the use of a single neural network to model either uncertain or complex elements within an objective function. However, it is well known that ensembles of neural networks produce more stable predictions and have better generalizability than models with single neural networks, which suggests the application of ensembles of neural networks in a decision-making pipeline. We study how to incorporate a neural network ensemble as the objective function of an optimization model and explore computational approaches for the ensuing problem. We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network. We develop two acceleration techniques for our model, the first one is a preprocessing procedure to tighten bounds for critical neurons in the neural network while the second one is a set of valid inequalities based on Benders decomposition. Experimental evaluations of our solution methods are conducted on one global optimization problem and two real-world data sets; the results suggest that our optimization algorithm outperforms the adaption of an state-of-the-art approach in terms of computational time and optimality gaps.

子采樣 · 估計/估計量 · 推斷 · 統計量 · 置信度 ·

2021 年 12 月 13 日

Scalable subsampling: computation, aggregation and inference

Dimitris N. Politis

Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic $\hat \theta _n$ in order to conduct nonparametric inference such as the construction of confidence intervals and hypothesis tests. Subsampling has seen a resurgence in the Big Data era where the standard, full-resample size bootstrap can be infeasible to compute. Nevertheless, even choosing a single random subsample of size $b$ can be computationally challenging with both $b$ and the sample size $n$ being very large. In the paper at hand, we show how a set of appropriately chosen, non-random subsamples can be used to conduct effective -- and computationally feasible -- distribution estimation via subsampling. Further, we show how the same set of subsamples can be used to yield a procedure for subsampling aggregation -- also known as subagging -- that is scalable with big data. Interestingly, the scalable subagging estimator can be tuned to have the same (or better) rate of convergence as compared to $\hat \theta _n$. The paper is concluded by showing how to conduct inference, e.g., confidence intervals, based on the scalable subagging estimator instead of the original $\hat \theta _n$.

前向-后向算法 · 動量 · Nesterov動量法 · 泛函 · 目標函數 ·

2021 年 12 月 11 日

Convergence Rate Analysis of Accelerated Forward-Backward Algorithm with Generalized Nesterov Momentum Scheme

Yizun Lin,Si Li,Yunzhong Zhang

from arxiv, 20 pages, 3 figures

Nesterov's accelerated forward-backward algorithm (AFBA) is an efficient algorithm for solving a class of two-term convex optimization models consisting of a differentiable function with a Lipschitz continuous gradient plus a nondifferentiable function with a closed form of its proximity operator. It has been shown that the iterative sequence generated by AFBA with a modified Nesterov's momentum scheme converges to a minimizer of the objective function with an $o\left(\frac{1}{k^2}\right)$ convergence rate in terms of the function value (FV-convergence rate) and an $o\left(\frac{1}{k}\right)$ convergence rate in terms of the distance between consecutive iterates (DCI-convergence rate). In this paper, we propose a more general momentum scheme with an introduced power parameter $\omega\in(0,1]$ and show that AFBA with the proposed momentum scheme converges to a minimizer of the objective function with an $o\left(\frac{1}{k^{2\omega}}\right)$ FV-convergence rate and an $o\left(\frac{1}{k^{\omega}}\right)$ DCI-convergence rate. The generality of the proposed momentum scheme provides us a variety of parameter selections for different scenarios, which makes the resulting algorithm more flexible to achieve better performance. We then employ AFBA with the proposed momentum scheme to solve the smoothed hinge loss $\ell_1$-support vector machine model. Numerical results demonstrate that the proposed generalized momentum scheme outperforms two existing momentum schemes.

頻率主義學派 · 收縮 · MoDELS · 最大后驗 · Lipschitz ·

2021 年 12 月 10 日

Laplace priors and spatial inhomogeneity in Bayesian inverse problems

Sergios Agapiou,Sven Wang

Spatially inhomogeneous functions, which may be smooth in some regions and rough in other regions, are modelled naturally in a Bayesian manner using so-called Besov priors which are given by random wavelet expansions with Laplace-distributed coefficients. This paper studies theoretical guarantees for such prior measures - specifically, we examine their frequentist posterior contraction rates in the setting of non-linear inverse problems with Gaussian white noise. Our results are first derived under a general local Lipschitz assumption on the forward map. We then verify the assumption for two non-linear inverse problems arising from elliptic partial differential equations, the Darcy flow model from geophysics as well as a model for the Schr\"odinger equation appearing in tomography. In the course of the proofs, we also obtain novel concentration inequalities for penalized least squares estimators with $\ell^1$ wavelet penalty, which have a natural interpretation as maximum a posteriori (MAP) estimators. The true parameter is assumed to belong to some spatially inhomogeneous Besov class $B^{\alpha}_{11}$, $\alpha>0$. In a setting with direct observations, we complement these upper bounds with a lower bound on the rate of contraction for arbitrary Gaussian priors. An immediate consequence of our results is that while Laplace priors can achieve minimax-optimal rates over $B^{\alpha}_{11}$-classes, Gaussian priors are limited to a (by a polynomial factor) slower contraction rate. This gives information-theoretical justification for the intuition that Laplace priors are more compatible with $\ell^1$ regularity structure in the underlying parameter.

相互獨立的 · 線性的 · TG · 提議分布 · 泛函 ·

2021 年 12 月 10 日

Solving linear Bayesian inverse problems using a fractional total variation-Gaussian (FTG) prior and transport map

Zejun Sun,Guang-Hui Zheng

The Bayesian inference is widely used in many scientific and engineering problems, especially in the linear inverse problems in infinite-dimensional setting where the unknowns are functions. In such problems, choosing an appropriate prior distribution is an important task. In particular, when the function to infer has much detail information, such as many sharp jumps, corners, and the discontinuous and nonsmooth oscillation, the so-called total variation-Gaussian (TG) prior is proposed in function space to address it. However, the TG prior is easy to lead the blocky (staircase) effect in numerical results. In this work, we present a fractional order-TG (FTG) hybrid prior to deal with such problems, where the fractional order total variation (FTV) term is used to capture the detail information of the unknowns and simultaneously uses the Gaussian measure to ensure that it results in a well-defined posterior measure. For the numerical implementations of linear inverse problems in function spaces, we also propose an efficient independence sampler based on a transport map, which uses a proposal distribution derived from a diagonal map, and the acceptance probability associated to the proposal is independent of discretization dimensionality. And in order to take full advantage of the transport map, the hierarchical Bayesian framework is applied to flexibly determine the regularization parameter. Finally we provide some numerical examples to demonstrate the performance of the FTG prior and the efficiency and robustness of the proposed independence sampler method.

可約的 · MoDELS · Performer · state-of-the-art · 收縮 ·

2021 年 12 月 9 日

Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction

Hyungjin Chung,Byeongsu Sim,Jong Chul Ye

Diffusion models have recently attained significant interest within the community owing to their strong performance as generative models. Furthermore, its application to inverse problems have demonstrated state-of-the-art performance. Unfortunately, diffusion models have a critical downside - they are inherently slow to sample from, needing few thousand steps of iteration to generate images from pure Gaussian noise. In this work, we show that starting from Gaussian noise is unnecessary. Instead, starting from a single forward diffusion with better initialization significantly reduces the number of sampling steps in the reverse conditional diffusion. This phenomenon is formally explained by the contraction theory of the stochastic difference equations like our conditional diffusion strategy - the alternating applications of reverse diffusion followed by a non-expansive data consistency step. The new sampling strategy, dubbed Come-Closer-Diffuse-Faster (CCDF), also reveals a new insight on how the existing feed-forward neural network approaches for inverse problems can be synergistically combined with the diffusion models. Experimental results with super-resolution, image inpainting, and compressed sensing MRI demonstrate that our method can achieve state-of-the-art reconstruction performance at significantly reduced sampling steps.

Better · 強化學習 · 學成 · Performer · 最優化 ·

2018 年 4 月 24 日

Accelerated Reinforcement Learning

K. Lakshmanan

from arxiv, The proof is not complete as it has to be shown the algorithm tracks the ODE

Policy gradient methods are widely used in reinforcement learning algorithms to search for better policies in the parameterized policy space. They do gradient search in the policy space and are known to converge very slowly. Nesterov developed an accelerated gradient search algorithm for convex optimization problems. This has been recently extended for non-convex and also stochastic optimization. We use Nesterov's acceleration for policy gradient search in the well-known actor-critic algorithm and show the convergence using ODE method. We tested this algorithm on a scheduling problem. Here an incoming job is scheduled into one of the four queues based on the queue lengths. We see from experimental results that algorithm using Nesterov's acceleration has significantly better performance compared to algorithm which do not use acceleration. To the best of our knowledge this is the first time Nesterov's acceleration has been used with actor-critic algorithm.