亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up inference using natural gradient descent, and provides a tighter evidence lower bound for hyperparameter learning. The approach has the same memory cost as the current SVGP methods, but it is faster and more accurate.

相關內容

Processing 是一門(men)開(kai)源編程語(yu)言和(he)(he)與之(zhi)配套的集(ji)成開(kai)發(fa)環境(IDE)的名稱(cheng)。Processing 在電(dian)子藝(yi)術和(he)(he)視覺設(she)計社區被(bei)用(yong)(yong)來教授編程基礎,并運(yun)用(yong)(yong)于大量的新(xin)媒(mei)體和(he)(he)互動藝(yi)術作品中。

Gaussian process (GP) regression is a fundamental tool in Bayesian statistics. It is also known as kriging and is the Bayesian counterpart to the frequentist kernel ridge regression. Most of the theoretical work on GP regression has focused on a large-$n$ asymptotics, characterising the behaviour of GP regression as the amount of data increases. Fixed-sample analysis is much more difficult outside of simple cases, such as locations on a regular grid. In this work we perform a fixed-sample analysis that was first studied in the context of approximation theory by Driscoll & Fornberg (2002), called the "flat limit". In flat-limit asymptotics, the goal is to characterise kernel methods as the length-scale of the kernel function tends to infinity, so that kernels appear flat over the range of the data. Surprisingly, this limit is well-defined, and displays interesting behaviour: Driscoll & Fornberg showed that radial basis interpolation converges in the flat limit to polynomial interpolation, if the kernel is Gaussian. Leveraging recent results on the spectral behaviour of kernel matrices in the flat limit, we study the flat limit of Gaussian process regression. Results show that Gaussian process regression tends in the flat limit to (multivariate) polynomial regression, or (polyharmonic) spline regression, depending on the kernel. Importantly, this holds for both the predictive mean and the predictive variance, so that the posterior predictive distributions become equivalent. Our results have practical consequences: for instance, they show that optimal GP predictions in the sense of leave-one-out loss may occur at very large length-scales, which would be invisible to current implementations because of numerical difficulties.

We consider zero-sum games on infinite graphs, with objectives specified as sets of infinite words over some alphabet of colors. A well-studied class of objectives is the one of $\omega$-regular objectives, due to its relation to many natural problems in theoretical computer science. We focus on the strategy complexity question: given an objective, how much memory does each player require to play as well as possible? A classical result is that finite-memory strategies suffice for both players when the objective is $\omega$-regular. We show a reciprocal of that statement: when both players can play optimally with a chromatic finite-memory structure (i.e., whose updates can only observe colors) in all infinite game graphs, then the objective must be $\omega$-regular. This provides a game-theoretic characterization of $\omega$-regular objectives, and this characterization can help in obtaining memory bounds. Moreover, a by-product of our characterization is a new one-to-two-player lift: to show that chromatic finite-memory structures suffice to play optimally in two-player games on infinite graphs, it suffices to show it in the simpler case of one-player games on infinite graphs. We illustrate our results with the family of discounted-sum objectives, for which $\omega$-regularity depends on the value of some parameters.

Optimal experimental design (OED) plays an important role in the problem of identifying uncertainty with limited experimental data. In many applications, we seek to minimize the uncertainty of a predicted quantity of interest (QoI) based on the solution of the inverse problem, rather than the inversion model parameter itself. In these scenarios, we develop an efficient method for goal-oriented optimal experimental design (GOOED) for large-scale Bayesian linear inverse problem that finds sensor locations to maximize the expected information gain (EIG) for a predicted QoI. By deriving a new formula to compute the EIG, exploiting low-rank structures of two appropriate operators, we are able to employ an online-offline decomposition scheme and a swapping greedy algorithm to maximize the EIG at a cost measured in model solutions that is independent of the problem dimensions. We provide detailed error analysis of the approximated EIG, and demonstrate the efficiency, accuracy, and both data- and parameter-dimension independence of the proposed algorithm for a contaminant transport inverse problem with infinite-dimensional parameter field.

We prove in this paper that optimizing wide ReLU neural networks (NNs) with at least one hidden layer using l2-regularization on the parameters enforces multi-task learning due to representation-learning - also in the limit of width to infinity. This is in contrast to multiple other results in the literature, in which idealized settings are assumed and where wide (ReLU)-NNs loose their ability to benefit from multi-task learning in the infinite width limit. We deduce the ability of multi-task learning from proving an exact quantitative macroscopic characterization of the learned NN in an appropriate function space.

Influence maximization is the task of selecting a small number of seed nodes in a social network to maximize the spread of the influence from these seeds, and it has been widely investigated in the past two decades. In the canonical setting, the whole social network as well as its diffusion parameters is given as input. In this paper, we consider the more realistic sampling setting where the network is unknown and we only have a set of passively observed cascades that record the set of activated nodes at each diffusion step. We study the task of influence maximization from these cascade samples (IMS), and present constant approximation algorithms for this task under mild conditions on the seed set distribution. To achieve the optimization goal, we also provide a novel solution to the network inference problem, that is, learning diffusion parameters and the network structure from the cascade data. Comparing with prior solutions, our network inference algorithm requires weaker assumptions and does not rely on maximum-likelihood estimation and convex programming. Our IMS algorithms enhance the learning-and-then-optimization approach by allowing a constant approximation ratio even when the diffusion parameters are hard to learn, and we do not need any assumption related to the network structure or diffusion parameters.

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.

Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration. The recently introduced batch ensembles provide a drop-in replacement that is more parameter efficient. In this paper, we design ensembles not only over weights, but over hyperparameters to improve the state of the art in both settings. For best performance independent of budget, we propose hyper-deep ensembles, a simple procedure that involves a random search over different hyperparameters, themselves stratified across multiple random initializations. Its strong performance highlights the benefit of combining models with both weight and hyperparameter diversity. We further propose a parameter efficient version, hyper-batch ensembles, which builds on the layer structure of batch ensembles and self-tuning networks. The computational and memory costs of our method are notably lower than typical ensembles. On image classification tasks, with MLP, LeNet, and Wide ResNet 28-10 architectures, our methodology improves upon both deep and batch ensembles.

We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GPs with a comparable number of parameters.

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.

Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.

北京阿比特科技有限公司