好男人在线观看免费2019,欧美丰满大乳屁股流白浆,午夜影院顶级大片,亚洲日韩一区二区三区综合无码,欧美精品一区视频

Gradient flows are a powerful tool for optimizing functionals in general metric spaces, including the space of probabilities endowed with the Wasserstein metric. A typical approach to solving this optimization problem relies on its connection to the dynamic formulation of optimal transport and the celebrated Jordan-Kinderlehrer-Otto (JKO) scheme. However, this formulation involves optimization over convex functions, which is challenging, especially in high dimensions. In this work, we propose an approach that relies on the recently introduced input-convex neural networks (ICNN) to parametrize the space of convex functions in order to approximate the JKO scheme, as well as in designing functionals over measures that enjoy convergence guarantees. We derive a computationally efficient implementation of this JKO-ICNN framework and experimentally demonstrate its feasibility and validity in approximating solutions of low-dimensional partial differential equations with known solutions. We also demonstrate its viability in high-dimensional applications through an experiment in controlled generation for molecular discovery.

相關內容

優化器

關注 4

Continuity · MoDELS · 離散化 · 模態 · SCAN ·

2022 年 2 月 4 日

Geometric Model Checking of Continuous Space

Nick Bezhanishvili,Vincenzo Ciancia,David Gabelaia,Gianluca Grilletti,Diego Latella,Mieke Massink

Topological Spatial Model Checking is a recent paradigm where model checking techniques are developed for the topological interpretation of Modal Logic. The Spatial Logic of Closure Spaces, SLCS, extends Modal Logic with reachability connectives that, in turn, can be used for expressing interesting spatial properties, such as "being near to" or "being surrounded by". SLCS constitutes the kernel of a solid logical framework for reasoning about discrete space, such as graphs and digital images, interpreted as quasi discrete closure spaces. Following a recently developed geometric semantics of Modal Logic, we propose an interpretation of SLCS in continuous space, admitting a geometric spatial model checking procedure, by resorting to models based on polyhedra. Such representations of space are increasingly relevant in many domains of application, due to recent developments of 3D scanning and visualisation techniques that exploit mesh processing. We introduce PolyLogicA, a geometric spatial model checker for SLCS formulas on polyhedra and demonstrate feasibility of our approach on two 3D polyhedral models of realistic size. Finally, we introduce a geometric definition of bisimilarity, proving that it characterises logical equivalence.

優化器 · 提議分布 · 縮放 · MCMC · tuning ·

2022 年 2 月 4 日

Optimal Scaling of MCMC Beyond Metropolis

Sanket Agrawal,Dootika Vats,Krzysztof ?atuszyński,Gareth O. Roberts

The problem of optimally scaling the proposal distribution in a Markov chain Monte Carlo algorithm is critical to the quality of the generated samples. Much work has gone into obtaining such results for various Metropolis-Hastings (MH) algorithms. Recently, acceptance probabilities other than MH are being employed in problems with intractable target distributions. There is little resource available on tuning the Gaussian proposal distributions for this situation. We obtain optimal scaling results for a general class of acceptance functions, which includes Barker's and Lazy-MH. In particular, optimal values for the Barker's algorithm are derived and found to be significantly different from that obtained for the MH algorithm. Our theoretical conclusions are supported by numerical simulations indicating that when the optimal proposal variance is unknown, tuning to the optimal acceptance probability remains an effective strategy.

Networking · Neural Networks · 估計/估計量 · 近似 · 泛函 ·

2022 年 2 月 4 日

Self-adaptive deep neural network: Numerical approximation to functions and PDEs

Zhiqiang Cai,Jingshuang Chen,Min Liu

from arxiv, Published in Journal of Computational Physics

Designing an optimal deep neural network for a given task is important and challenging in many machine learning applications. To address this issue, we introduce a self-adaptive algorithm: the adaptive network enhancement (ANE) method, written as loops of the form train, estimate and enhance. Starting with a small two-layer neural network (NN), the step train is to solve the optimization problem at the current NN; the step estimate is to compute a posteriori estimator/indicators using the solution at the current NN; the step enhance is to add new neurons to the current NN. Novel network enhancement strategies based on the computed estimator/indicators are developed in this paper to determine how many new neurons and when a new layer should be added to the current NN. The ANE method provides a natural process for obtaining a good initialization in training the current NN; in addition, we introduce an advanced procedure on how to initialize newly added neurons for a better approximation. We demonstrate that the ANE method can automatically design a nearly minimal NN for learning functions exhibiting sharp transitional layers as well as discontinuous solutions of hyperbolic partial differential equations.

泛化理論 · Neural Networks · Networking · 學成 · 核化 ·

2022 年 2 月 4 日

Demystify Optimization and Generalization of Over-parameterized PAC-Bayesian Learning

Wei Huang,Chunrui Liu,Yilan Chen,Tianyu Liu,Richard Yi Da Xu

from arxiv, 19pages, 5 figures

PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution whilst incorporating the prior knowledge. In addition to being a pure generalization bound analysis tool, PAC-Bayesian bound can also be incorporated into an objective function to train a probabilistic neural network, making them a powerful and relevant framework that can numerically provide a tight generalization bound for supervised learning. For simplicity, we call probabilistic neural network learned using training objectives derived from PAC-Bayesian bounds as {\it PAC-Bayesian learning}. Despite their empirical success, the theoretical analysis of PAC-Bayesian learning for neural networks is rarely explored. This paper proposes a new class of convergence and generalization analysis for PAC-Bayes learning when it is used to train the over-parameterized neural networks by the gradient descent method. For a wide probabilistic neural network, we show that when PAC-Bayes learning is applied, the convergence result corresponds to solving a kernel ridge regression when the probabilistic neural tangent kernel (PNTK) is used as its kernel. Based on this finding, we further characterize the uniform PAC-Bayesian generalization bound which improves over the Rademacher complexity-based bound for non-probabilistic neural network. Finally, drawing the insight from our theoretical results, we propose a proxy measure for efficient hyperparameters selection, which is proven to be time-saving.

Storage · 離散化 · 全 · 近似 · 樣例 ·

2022 年 2 月 2 日

The full approximation storage multigrid scheme: A 1D finite element example

Ed Bueler

from arxiv, 19 pages, 7 figures

This note describes the full approximation storage (FAS) multigrid scheme for an easy one-dimensional nonlinear boundary value problem. The problem is discretized by a simple finite element (FE) scheme. We apply both FAS V-cycles and F-cycles, with a nonlinear Gauss-Seidel smoother, to solve the resulting finite-dimensional problem. The mathematics of the FAS restriction and prolongation operators, in the FE case, are explained. A self-contained Python program implements the scheme. Optimal performance, i.e. work proportional to the number of unknowns, is demonstrated for both kinds of cycles, including convergence nearly to discretization error in a single F-cycle.

優化器 · Neural Networks · AdaGrad · 移動平均 · 泛化理論 ·

2021 年 7 月 5 日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Bohan Wang,Qi Meng,Wei Chen,Tie-Yan Liu

from arxiv, ICML 2021

Despite their overwhelming capacity to overfit, deep neural networks trained by specific optimization algorithms tend to generalize well to unseen data. Recently, researchers explained it by investigating the implicit regularization effect of optimization algorithms. A remarkable progress is the work (Lyu&Li, 2019), which proves gradient descent (GD) maximizes the margin of homogeneous deep neural networks. Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process. However, theoretical guarantee for the generalization of adaptive optimization algorithms is still lacking. In this paper, we study the implicit regularization of adaptive optimization algorithms when they are optimizing the logistic loss on homogeneous deep neural networks. We prove that adaptive algorithms that adopt exponential moving average strategy in conditioner (such as Adam and RMSProp) can maximize the margin of the neural network, while AdaGrad that directly sums historical squared gradients in conditioner can not. It indicates superiority on generalization of exponential moving average strategy in the design of the conditioner. Technically, we provide a unified framework to analyze convergent direction of adaptive optimization algorithms by constructing novel adaptive gradient flow and surrogate margin. Our experiments can well support the theoretical findings on convergent direction of adaptive optimization algorithms.

優化器 · 可約的 · 近似 · 控制器 · Principle ·

2020 年 6 月 29 日

Differential Dynamic Programming Neural Optimizer

Guan-Horng Liu,Tianrong Chen,Evangelos A. Theodorou

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

離散化 · 歐氏空間 · 近似 · 神經網絡 · SimPLe ·

2020 年 4 月 13 日

Products of Euclidean metrics and applications to proximity questions among curves

Ioannis Z. Emiris,Ioannis Psarros

from arxiv, 18 pages

The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments to time-series in general dimension. For $\ell_p$-products of Euclidean metrics, for any $p$, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fr\'echet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms, our algorithm is especially efficient when the length of the curves is bounded.

重要性采樣 · 樣本空間 · 方差減小 · 樣本 · 蒙特卡羅 ·

2018 年 8 月 23 日

Learning to Importance Sample in Primary Sample Space

Quan Zheng,Matthias Zwicker

from arxiv, Submitted to SIGGRAPH ASIA'18

Importance sampling is one of the most widely used variance reduction strategies in Monte Carlo rendering. In this paper, we propose a novel importance sampling technique that uses a neural network to learn how to sample from a desired density represented by a set of samples. Our approach considers an existing Monte Carlo rendering algorithm as a black box. During a scene-dependent training phase, we learn to generate samples with a desired density in the primary sample space of the rendering algorithm using maximum likelihood estimation. We leverage a recent neural network architecture that was designed to represent real-valued non-volume preserving ('Real NVP') transformations in high dimensional spaces. We use Real NVP to non-linearly warp primary sample space and obtain desired densities. In addition, Real NVP efficiently computes the determinant of the Jacobian of the warp, which is required to implement the change of integration variables implied by the warp. A main advantage of our approach is that it is agnostic of underlying light transport effects, and can be combined with many existing rendering techniques by treating them as a black box. We show that our approach leads to effective variance reduction in several practical scenarios.