亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Intrinsic Gaussian Markov Random Fields (IGMRFs) can be used to induce conditional dependence in Bayesian hierarchical models. IGMRFs have both a precision matrix, which defines the neighbourhood structure of the model, and a precision, or scaling, parameter. Previous studies have shown the importance of selecting this scaling parameter appropriately for different types of IGMRF, as it can have a substantial impact on posterior results. Here, we focus on the two-dimensional case, where tuning of the parameter is achieved by mapping it to the marginal standard deviation of a two-dimensional IGMRF. We compare the effects of scaling various classes of IGMRF, including an application to blood pressure data using MCMC methods.

相關內容

We give a review of recent ANOVA-like procedures for testing group differences based on data in a metric space and present a new such procedure. Our statistic is based on the classic Levene's test for detecting differences in dispersion. It uses only pairwise distances of data points and and can be computed quickly and precisely in situations where the computation of barycenters ("generalized means") in the data space is slow, only by approximation or even infeasible. We show the asymptotic normality of our test statistic and present simulation studies for spatial point pattern data, in which we compare the various procedures in a 1-way ANOVA setting. As an application, we perform a 2-way ANOVA on a data set of bubbles in a mineral flotation process.

This paper proposes a general procedure to analyse high-dimensional spatio-temporal count data, with special emphasis on relative risks estimation in cancer epidemiology. Model fitting is carried out using integrated nested Laplace approximations over a partition of the spatio-temporal domain. This is a simple idea that works very well in this context as the models are defined to borrow strength locally in space and time, providing reliable risk estimates. Parallel and distributed strategies are proposed to speed up computations in a setting where Bayesian model fitting is generally prohibitively time-consuming and even unfeasible. We evaluate the whole procedure in a simulation study with a twofold objective: to estimate risks accurately and to detect extreme risk areas while avoiding false positives/negatives. We show that our method outperforms classical global models. A real data analysis comparing the global models and the new procedure is also presented.

Ensembles of networks arise in various fields where multiple independent networks are observed on the same set of nodes, for example, a collection of brain networks constructed on the same brain regions for different individuals. However, there are few models that describe both the variations and characteristics of networks in an ensemble at the same time. In this paper, we propose to model the ensemble of networks using a Dirichlet Process Mixture of Exponential Random Graph Models (DPM-ERGMs), which divides the ensemble into different clusters and models each cluster of networks using a separate Exponential Random Graph Model (ERGM). By employing a Dirichlet process mixture, the number of clusters can be determined automatically and changed adaptively with the data provided. Moreover, in order to perform full Bayesian inference for DPM-ERGMs, we employ the intermediate importance sampling technique inside the Metropolis-within-slice sampling scheme, which addressed the problem of sampling from the intractable ERGMs on an infinite sample space. We also demonstrate the performance of DPM-ERGMs with both simulated and real datasets.

Using some extensions of a theorem of Heppes on finitely supported discrete probability measures, we address the problems of classification and testing based on projections. In particular, when the support of the distributions is known in advance (as for instance for multivariate Bernoulli distributions), a single suitably chosen projection determines the distribution. Several applications of these results are considered.

Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up inference using natural gradient descent, and provides a tighter evidence lower bound for hyperparameter learning. The approach has the same memory cost as the current SVGP methods, but it is faster and more accurate.

Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in resource constrained environments has become a challenging task. In this work, we perform an extensive analysis of fine-tuned BERT models using second order Hessian information, and we use our results to propose a novel method for quantizing BERT models to ultra low precision. In particular, we propose a new group-wise quantization scheme, and we use a Hessian based mix-precision method to compress the model further. We extensively test our proposed method on BERT downstream tasks of SST-2, MNLI, CoNLL-03, and SQuAD. We can achieve comparable performance to baseline with at most $2.3\%$ performance degradation, even with ultra-low precision quantization down to 2 bits, corresponding up to $13\times$ compression of the model parameters, and up to $4\times$ compression of the embedding table as well as activations. Among all tasks, we observed the highest performance loss for BERT fine-tuned on SQuAD. By probing into the Hessian based analysis as well as visualization, we show that this is related to the fact that current training/fine-tuning strategy of BERT does not converge for SQuAD.

We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GPs with a comparable number of parameters.

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.

Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.

Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.

北京阿比特科技有限公司