亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Structure learning via MCMC sampling is known to be very challenging because of the enormous search space and the existence of Markov equivalent DAGs. Theoretical results on the mixing behavior are lacking. In this work, we prove the rapid mixing of a random walk Metropolis-Hastings algorithm, which reveals that the complexity of Bayesian learning of sparse equivalence classes grows only polynomially in $n$ and $p$, under some high-dimensional assumptions. A series of high-dimensional consistency results is obtained, including the strong selection consistency of an empirical Bayes model for structure learning. Our proof is based on two new results. First, we derive a general mixing time bound on finite state spaces, which can be applied to local MCMC schemes for other model selection problems. Second, we construct high-probability search paths on the space of equivalence classes with node degree constraints by proving a combinatorial property of DAG comparisons. Simulation studies on the proposed MCMC sampler are conducted to illustrate the main theoretical findings.

相關內容

When two different parties use the same learning rule on their own data, how can we test whether the distributions of the two outcomes are similar? In this paper, we study the similarity of outcomes of learning rules through the lens of the Total Variation (TV) distance of distributions. We say that a learning rule is TV indistinguishable if the expected TV distance between the posterior distributions of its outputs, executed on two training data sets drawn independently from the same distribution, is small. We first investigate the learnability of hypothesis classes using TV indistinguishable learners. Our main results are information-theoretic equivalences between TV indistinguishability and existing algorithmic stability notions such as replicability and approximate differential privacy. Then, we provide statistical amplification and boosting algorithms for TV indistinguishable learners.

Stein Variational Gradient Descent (SVGD) is a nonparametric particle-based deterministic sampling algorithm. Despite its wide usage, understanding the theoretical properties of SVGD has remained a challenging problem. For sampling from a Gaussian target, the SVGD dynamics with a bilinear kernel will remain Gaussian as long as the initializer is Gaussian. Inspired by this fact, we undertake a detailed theoretical study of the Gaussian-SVGD, i.e., SVGD projected to the family of Gaussian distributions via the bilinear kernel, or equivalently Gaussian variational inference (GVI) with SVGD. We present a complete picture by considering both the mean-field PDE and discrete particle systems. When the target is strongly log-concave, the mean-field Gaussian-SVGD dynamics is proven to converge linearly to the Gaussian distribution closest to the target in KL divergence. In the finite-particle setting, there is both uniform in time convergence to the mean-field limit and linear convergence in time to the equilibrium if the target is Gaussian. In the general case, we propose a density-based and a particle-based implementation of the Gaussian-SVGD, and show that several recent algorithms for GVI, proposed from different perspectives, emerge as special cases of our unified framework. Interestingly, one of the new particle-based instance from this framework empirically outperforms existing approaches. Our results make concrete contributions towards obtaining a deeper understanding of both SVGD and GVI.

The efficient representation of random fields on geometrically complex domains is crucial for Bayesian modelling in engineering and machine learning. Today's prevalent random field representations are restricted to unbounded domains or are too restrictive in terms of possible field properties. As a result, new techniques leveraging the historically established link between stochastic PDEs (SPDEs) and random fields are especially appealing for engineering applications with complex geometries which already have a finite element discretisation for solving the physical conservation equations. Unlike the dense covariance matrix of a random field, its inverse, the precision matrix, is usually sparse and equal to the stiffness matrix of a Helmholtz-like SPDE. In this paper, we use the SPDE representation to develop a scalable framework for large-scale statistical finite element analysis (statFEM) and Gaussian process (GP) regression on geometrically complex domains. We use the SPDE formulation to obtain the relevant prior probability densities with a sparse precision matrix. The properties of the priors are governed by the parameters and possibly fractional order of the Helmholtz-like SPDE so that we can model on bounded domains and manifolds anisotropic, non-homogeneous random fields with arbitrary smoothness. We use for assembling the sparse precision matrix the same finite element mesh used for solving the physical conservation equations. The observation models for statFEM and GP regression are such that the posterior probability densities are Gaussians with a closed-form mean and precision. The expressions for the mean vector and the precision matrix can be evaluated using only sparse matrix operations. We demonstrate the versatility of the proposed framework and its convergence properties with one and two-dimensional Poisson and thin-shell examples.

Additive spatial statistical models with weakly stationary process assumptions have become standard in spatial statistics. However, one disadvantage of such models is the computation time, which rapidly increases with the number of datapoints. The goal of this article is to apply an existing subsampling strategy to standard spatial additive models and to derive the spatial statistical properties. We call this strategy the ``spatial data subset model'' approach, which can be applied to big datasets in a computationally feasible way. Our approach has the advantage that one does not require any additional restrictive model assumptions. That is, computational gains increase as model assumptions are removed when using our model framework. This provides one solution to the computational bottlenecks that occur when applying methods such as Kriging to ``big data''. We provide several properties of this new spatial data subset model approach in terms of moments, sill, nugget, and range under several sampling designs. The biggest advantage of our approach is that it is scalable to a dataset of any size that can be stored. We present the results of the spatial data subset model approach on simulated datasets, and on a large dataset consists of 150,000 observations of daytime land surface temperatures measured by the MODIS instrument onboard the Terra satellite.

A new multivariate density estimator for stationary sequences is obtained by Fourier inversion of the thresholded empirical characteristic function. This estimator does not depend on the choice of parameters related to the smoothness of the density; it is directly adaptive. We establish oracle inequalities valid for independent, $\alpha$-mixing and $\tau$-mixing sequences, which allows us to derive optimal convergence rates, up to a logarithmic loss. On general anisotropic Sobolev classes, the estimator adapts to the regularity of the unknown density but also achieves directional adaptivity. In particular, if A is an invertible matrix, if the observations are drawn from X $\in$ R^d , d $\ge$ 1, it achieves the rate implied by the regularity of AX, which may be more regular than X. The estimator is easy to implement and numerically efficient. It depends on the calibration of a parameter for which we propose an innovative numerical selection procedure, using the Euler characteristic of the thresholded areas.

We examine the behaviour of the Laplace and saddlepoint approximations in the high-dimensional setting, where the dimension of the model is allowed to increase with the number of observations. Approximations to the joint density, the marginal posterior density and the conditional density are considered. Our results show that under the mildest assumptions on the model, the error of the joint density approximation is $O(p^4/n)$ if $p = o(n^{1/4})$ for the Laplace approximation and saddlepoint approximation, and $O(p^3/n)$ if $p = o(n^{1/3})$ under additional assumptions on the second derivative of the log-likelihood. Stronger results are obtained for the approximation to the marginal posterior density.

We develop a new formulation of deep learning based on the Mori-Zwanzig (MZ) formalism of irreversible statistical mechanics. The new formulation is built upon the well-known duality between deep neural networks and discrete dynamical systems, and it allows us to directly propagate quantities of interest (conditional expectations and probability density functions) forward and backward through the network by means of exact linear operator equations. Such new equations can be used as a starting point to develop new effective parameterizations of deep neural networks, and provide a new framework to study deep-learning via operator theoretic methods. The proposed MZ formulation of deep learning naturally introduces a new concept, i.e., the memory of the neural network, which plays a fundamental role in low-dimensional modeling and parameterization. By using the theory of contraction mappings, we develop sufficient conditions for the memory of the neural network to decay with the number of layers. This allows us to rigorously transform deep networks into shallow ones, e.g., by reducing the number of neurons per layer (using projection operators), or by reducing the total number of layers (using the decay property of the memory operator).

A nonlinear sea-ice problem is considered in a least-squares finite element setting. The corresponding variational formulation approximating simultaneously the stress tensor and the velocity is analysed. In particular, the least-squares functional is coercive and continuous in an appropriate solution space and this proves the well-posedness of the problem. As the method does not require a compatibility condition between the finite element space, the formulation allows the use of piecewise polynomial spaces of the same approximation order for both the stress and the velocity approximations. A Newton-type iterative method is used to linearize the problem and numerical tests are provided to illustrate the theory.

A Markov network characterizes the conditional independence structure, or Markov property, among a set of random variables. Existing work focuses on specific families of distributions (e.g., exponential families) and/or certain structures of graphs, and most of them can only handle variables of a single data type (continuous or discrete). In this work, we characterize the conditional independence structure in general distributions for all data types (i.e., continuous, discrete, and mixed-type) with a Generalized Precision Matrix (GPM). Besides, we also allow general functional relations among variables, thus giving rise to a Markov network structure learning algorithm in one of the most general settings. To deal with the computational challenge of the problem, especially for large graphs, we unify all cases under the same umbrella of a regularized score matching framework. We validate the theoretical results and demonstrate the scalability empirically in various settings.

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.

北京阿比特科技有限公司