苍井空无码免费换线,在线观看成年视频2020最新

The classical Ka\v{c}anov scheme for the solution of nonlinear variational problems can be interpreted as a fixed point iteration method that updates a given approximation by solving a linear problem in each step. Based on this observation, we introduce a modified Ka\v{c}anov method, which allows for (adaptive) damping, and, thereby, to derive a new convergence analysis under more general assumptions and for a wider range of applications. For instance, in the specific context of quasilinear diffusion models, our new approach does no longer require a standard monotonicity condition on the nonlinear diffusion coefficient to hold. Moreover, we propose two different adaptive strategies for the practical selection of the damping parameters involved.

相關內容

衰減

關注 0

MoDELS · 蒸餾 · CIFAR-10 · 樣本 · FAST ·

2022 年 2 月 1 日

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans,Jonathan Ho

from arxiv, Published as a conference paper at ICLR 2022

Diffusion models have recently shown great promise for generative modeling, outperforming GANs on perceptual quality and autoregressive models at density estimation. A remaining downside is their slow sampling time: generating high quality samples takes many hundreds or thousands of model evaluations. Here we make two contributions to help eliminate this downside: First, we present new parameterizations of diffusion models that provide increased stability when using few sampling steps. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps. We then keep progressively applying this distillation procedure to our model, halving the number of required sampling steps each time. On standard image generation benchmarks like CIFAR-10, ImageNet, and LSUN, we start out with state-of-the-art samplers taking as many as 8192 steps, and are able to distill down to models taking as few as 4 steps without losing much perceptual quality; achieving, for example, a FID of 3.0 on CIFAR-10 in 4 steps. Finally, we show that the full progressive distillation procedure does not take more time than it takes to train the original model, thus representing an efficient solution for generative modeling using diffusion at both train and test time.

優化器 · 超參數 · 估計/估計量 · Processing（編程語言） · 可約的 ·

2022 年 2 月 1 日

Preconditioning for Scalable Gaussian Process Hyperparameter Optimization

Jonathan Wenger,Geoff Pleiss,Philipp Hennig,John P. Cunningham,Jacob R. Gardner

Gaussian process hyperparameter optimization requires linear solves with, and log-determinants of, large kernel matrices. Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for preconditioning these computations. While preconditioning is well understood in the context of CG, we demonstrate that it can also accelerate convergence and reduce variance of the estimates for the log-determinant and its derivative. We prove general probabilistic error bounds for the preconditioned computation of the log-determinant, log-marginal likelihood and its derivatives. Additionally, we derive specific rates for a range of kernel-preconditioner combinations, showing that up to exponential convergence can be achieved. Our theoretical results enable provably efficient optimization of kernel hyperparameters, which we validate empirically on large-scale benchmark problems. There our approach accelerates training by up to an order of magnitude.

估計/估計量 · 離散化 · 近似 · 泛化理論 · Integration ·

2022 年 2 月 1 日

On Stability and Convergence of a Three-layer Semi-discrete Scheme for an Abstract Analogue of the Ball Integro-differential Equation

Jemal Rogava,Mikheil Tsiklauri,Zurab Vashakidze

from arxiv, 24 pages

We consider the Cauchy problem for a second-order nonlinear evolution equation in a Hilbert space. This equation represents the abstract generalization of the Ball integro-differential equation. The general nonlinear case with respect to terms of the equation which include a square of a norm of a gradient is considered. A three-layer semi-discrete scheme is proposed in order to find an approximate solution. In this scheme, the approximation of nonlinear terms that are dependent on the gradient is carried out by using an integral mean. We show that the solution of the nonlinear discrete problem and its corresponding difference analogue of a first-order derivative is uniformly bounded. For the solution of the corresponding linear discrete problem, it is obtained high-order \textit{a priori} estimates by using two-variable Chebyshev polynomials. Based on these estimates we prove the stability of the nonlinear discrete problem. For smooth solutions, we provide error estimates for the approximate solution. An iteration method is applied in order to find an approximate solution for each temporal step. The convergence of the iteration process is proved.

賭博機/老虎機 · 上置信界限 · 近似推斷 · 推斷 · 近似 ·

2022 年 1 月 31 日

Generalized Bayesian Upper Confidence Bound with Approximate Inference for Bandit Problems

Ziyi Huang,Henry Lam,Amirhossein Meisami,Haofeng Zhang

Bayesian bandit algorithms with approximate inference have been widely used in practice with superior performance. Yet, few studies regarding the fundamental understanding of their performances are available. In this paper, we propose a Bayesian bandit algorithm, which we call Generalized Bayesian Upper Confidence Bound (GBUCB), for bandit problems in the presence of approximate inference. Our theoretical analysis demonstrates that in Bernoulli multi-armed bandit, GBUCB can achieve $O(\sqrt{T}(\log T)^c)$ frequentist regret if the inference error measured by symmetrized Kullback-Leibler divergence is controllable. This analysis relies on a novel sensitivity analysis for quantile shifts with respect to inference errors. To our best knowledge, our work provides the first theoretical regret bound that is better than $o(T)$ in the setting of approximate inference. Our experimental evaluations on multiple approximate inference settings corroborate our theory, showing that our GBUCB is consistently superior to BUCB and Thompson sampling.

優化器 · 近似 · 曲率 · Better · CC ·

2022 年 1 月 29 日

Nys-Newton: Nystr?m-Approximated Curvature for Stochastic Optimization

Dinesh Singh,Hardik Tankaria,Makoto Yamada

Second-order optimization methods are among the most widely used optimization approaches for convex optimization problems, and have recently been used to optimize non-convex optimization problems such as deep learning models. The widely used second-order optimization methods such as quasi-Newton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives. In this study, we propose an approximate Newton sketch-based stochastic optimization algorithm for large-scale empirical risk minimization. Specifically, we compute a partial column Hessian of size ($d\times m$) with $m\ll d$ randomly selected variables, then use the \emph{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. We then integrate our approximated Hessian with stochastic gradient descent and stochastic variance-reduced gradient methods. The results of numerical experiments on both convex and non-convex functions show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method, exhibiting performance competitive with that of state-of-the-art first-order and stochastic quasi-Newton methods. Furthermore, we provide a theoretical convergence analysis for convex functions.

INFORMS · state-of-the-art · 情景 · 線性的 · 示例 ·

2022 年 1 月 29 日

A scaleable projection-based branch-and-cut algorithm for the $p$-center problem

Elisabeth Gaar,Markus Sinnl

The $p$-center problem (pCP) is a fundamental problem in location science, where we are given customer demand points and possible facility locations, and we want to choose $p$ of these locations to open a facility such that the maximum distance of any customer demand point to its closest open facility is minimized. State-of-the-art solution approaches of pCP use its connection to the set cover problem to solve pCP in an iterative fashion by repeatedly solving set cover problems. The classical textbook integer programming (IP) formulation of pCP is usually dismissed due to its size and bad linear programming (LP)-relaxation bounds. We present a novel solution approach that works on a new IP formulation that can be obtained by a projection from the classical formulation. The formulation is solved by means of branch-and-cut, where cuts for demand points are iteratively generated. Moreover, the formulation can be strengthened with combinatorial information to obtain a much tighter LP-relaxation. In particular, we present a novel way to use lower bound information to obtain stronger cuts. We show that the LP-relaxation bound of our strengthened formulation has the same strength as the best known bound in literature, which is based on a semi-relaxation. Finally, we also present a computational study on instances from the literature with up to more than 700000 customers and locations. Our solution algorithm is competitive with highly sophisticated set-cover-based solution algorithms, which depend on various components and parameters.

Weight · Processing（編程語言） · 散度 · 變換 · MoDELS ·

2022 年 1 月 29 日

Weighted residual empirical processes, martingale transformations and model checking for regressions

Falong Tan,Xu Guo,Lixing Zhu

In this paper we propose a new methodology for testing the parametric forms of the mean and variance functions based on weighted residual empirical processes and their martingale transformations in regression models. The dimensions of the parameter vectors can be divergent as the sample size goes to infinity. We then study the convergence of weighted residual empirical processes and their martingale transformation under the null and alternative hypotheses in the diverging dimension setting. The proposed tests based on weighted residual empirical processes can detect local alternatives distinct from the null at the fastest possible rate of order $n^{-1/2}$ but are not asymptotically distribution-free. While the tests based on martingale transformed weighted residual empirical processes can be asymptotically distribution-free, yet, unexpectedly, can only detect the local alternatives converging to the null at a much slower rate of order $n^{-1/4}$, which is somewhat different from existing asymptotically distribution-free tests based on martingale transformations. As the tests based on the residual empirical process are not distribution-free, we propose a smooth residual bootstrap and verify the validity of its approximation in diverging dimension settings. Simulation studies and a real data example are conducted to illustrate the effectiveness of our tests.

GROUP · 優化器 · 可辨認的 · INFORMS · Performer ·

2022 年 1 月 28 日

Improving Group Testing via Gradient Descent

Sundara Rajan Srinivasavaradhan,Pavlos Nikolopoulos,Christina Fragouli,Suhas Diggavi

from arxiv, 10 pages, 4 figures

We study the problem of group testing with non-identical, independent priors. So far, the pooling strategies that have been proposed in the literature take the following approach: a hand-crafted test design along with a decoding strategy is proposed, and guarantees are provided on how many tests are sufficient in order to identify all infections in a population. In this paper, we take a different, yet perhaps more practical, approach: we fix the decoder and the number of tests, and we ask, given these, what is the best test design one could use? We explore this question for the Definite Non-Defectives (DND) decoder. We formulate a (non-convex) optimization problem, where the objective function is the expected number of errors for a particular design. We find approximate solutions via gradient descent, which we further optimize with informed initialization. We illustrate through simulations that our method can achieve significant performance improvement over traditional approaches.

MoDELS · 參數空間 · 準則 · 統計量 · 正交 ·

2022 年 1 月 28 日

Bayesian definition of random sequences with respect to conditional probabilities

Hayato Takahashi

from arxiv, revised version

We study Martin-L\"{o}f random (ML-random) points on computable probability measures on sample and parameter spaces (Bayes models). We consider four variants of conditional random sequences with respect to the conditional distributions: two of them are defined by ML-randomness on Bayes models and the others are defined by blind tests for conditional distributions. We consider a weak criterion for conditional ML-randomness and show that only variants of ML-randomness on Bayes models satisfy the criterion. We show that these four variants of conditional randomness are identical when the conditional probability measure is computable and the posterior distribution converges weakly to almost all parameters. We compare ML-randomness on Bayes models with randomness for uniformly computable parametric models. It is known that two computable probability measures are orthogonal if and only if their ML-random sets are disjoint. We extend these results for uniformly computable parametric models. Finally, we present an algorithmic solution to a classical problem in Bayes statistics, i.e.~the posterior distributions converge weakly to almost all parameters if and only if the posterior distributions converge weakly to all ML-random parameters.

MoDELS · SimPLe · CC · 模型評估 · 高斯混合（模型） ·

2018 年 2 月 24 日

The Search Problem in Mixture Models

Avik Ray,Joe Neeman,Sujay Sanghavi,Sanjay Shakkottai

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.