亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider multivariate centered Gaussian models for the random variable $Z=(Z_1,\ldots, Z_p)$, invariant under the action of a subgroup of the group of permutations on $\{1,\ldots, p\}$. Using the representation theory of the symmetric group on the field of reals, we derive the distribution of the maximum likelihood estimate of the covariance parameter $\Sigma$ and also the analytic expression of the normalizing constant of the Diaconis-Ylvisaker conjugate prior for the precision parameter $K=\Sigma^{-1}$. We can thus perform Bayesian model selection in the class of complete Gaussian models invariant by the action of a subgroup of the symmetric group, which we could also call complete RCOP models. We illustrate our results with a toy example of dimension $4$ and several examples for selection within cyclic groups, including a high dimensional example with $p=100$.

相關內容

Due to spurious correlations, machine learning systems often fail to generalize to environments whose distributions differ from the ones used at training time. Prior work addressing this, either explicitly or implicitly, attempted to find a data representation that has an invariant relationship with the target. This is done by leveraging a diverse set of training environments to reduce the effect of spurious features and build an invariant predictor. However, these methods have generalization guarantees only when both data representation and classifiers come from a linear model class. We propose invariant Causal Representation Learning (iCaRL), an approach that enables out-of-distribution (OOD) generalization in the nonlinear setting (i.e., nonlinear representations and nonlinear classifiers). It builds upon a practical and general assumption: the prior over the data representation (i.e., a set of latent variables encoding the data) given the target and the environment belongs to general exponential family distributions. Based on this, we show that it is possible to identify the data representation up to simple transformations. We also prove that all direct causes of the target can be fully discovered, which further enables us to obtain generalization guarantees in the nonlinear setting. Extensive experiments on both synthetic and real-world datasets show that our approach outperforms a variety of baseline methods. Finally, in the discussion, we further explore the aforementioned assumption and propose a more general hypothesis, called the Agnostic Hypothesis: there exist a set of hidden causal factors affecting both inputs and outcomes. The Agnostic Hypothesis can provide a unifying view of machine learning. More importantly, it can inspire a new direction to explore a general theory for identifying hidden causal factors, which is key to enabling the OOD generalization guarantees.

Georeferenced compositional data are prominent in many scientific fields and in spatial statistics. This work addresses the problem of proposing models and methods to analyze and predict, through kriging, this type of data. To this purpose, a novel class of transformations, named the Isometric $\alpha$-transformation ($\alpha$-IT), is proposed, which encompasses the traditional Isometric Log-Ratio (ILR) transformation. It is shown that the ILR is the limit case of the $\alpha$-IT as $\alpha$ tends to 0 and that $\alpha=1$ corresponds to a linear transformation of the data. Unlike the ILR, the proposed transformation accepts 0s in the compositions when $\alpha>0$. Maximum likelihood estimation of the parameter $\alpha$ is established. Prediction using kriging on $\alpha$-IT transformed data is validated on synthetic spatial compositional data, using prediction scores computed either in the geometry induced by the $\alpha$-IT, or in the simplex. Application to land cover data shows that the relative superiority of the various approaches w.r.t. a prediction objective depends on whether the compositions contained any zero component. When all components are positive, the limit cases (ILR or linear transformations) are optimal for none of the considered metrics. An intermediate geometry, corresponding to the $\alpha$-IT with maximum likelihood estimate, better describes the dataset in a geostatistical setting. When the amount of compositions with 0s is not negligible, some side-effects of the transformation gets amplified as $\alpha$ decreases, entailing poor kriging performances both within the $\alpha$-IT geometry and for metrics in the simplex.

The matrix normal model, the family of Gaussian matrix-variate distributions whose covariance matrix is the Kronecker product of two lower dimensional factors, is frequently used to model matrix-variate data. The tensor normal model generalizes this family to Kronecker products of three or more factors. We study the estimation of the Kronecker factors of the covariance matrix in the matrix and tensor models. We show nonasymptotic bounds for the error achieved by the maximum likelihood estimator (MLE) in several natural metrics. In contrast to existing bounds, our results do not rely on the factors being well-conditioned or sparse. For the matrix normal model, all our bounds are minimax optimal up to logarithmic factors, and for the tensor normal model our bound for the largest factor and overall covariance matrix are minimax optimal up to constant factors provided there are enough samples for any estimator to obtain constant Frobenius error. In the same regimes as our sample complexity bounds, we show that an iterative procedure to compute the MLE known as the flip-flop algorithm converges linearly with high probability. Our main tool is geodesic strong convexity in the geometry on positive-definite matrices induced by the Fisher information metric. This strong convexity is determined by the expansion of certain random quantum channels. We also provide numerical evidence that combining the flip-flop algorithm with a simple shrinkage estimator can improve performance in the undersampled regime.

We study the problem of estimating a rank-$1$ signal in the presence of rotationally invariant noise-a class of perturbations more general than Gaussian noise. Principal Component Analysis (PCA) provides a natural estimator, and sharp results on its performance have been obtained in the high-dimensional regime. Recently, an Approximate Message Passing (AMP) algorithm has been proposed as an alternative estimator with the potential to improve the accuracy of PCA. However, the existing analysis of AMP requires an initialization that is both correlated with the signal and independent of the noise, which is often unrealistic in practice. In this work, we combine the two methods, and propose to initialize AMP with PCA. Our main result is a rigorous asymptotic characterization of the performance of this estimator. Both the AMP algorithm and its analysis differ from those previously derived in the Gaussian setting: at every iteration, our AMP algorithm requires a specific term to account for PCA initialization, while in the Gaussian case, PCA initialization affects only the first iteration of AMP. The proof is based on a two-phase artificial AMP that first approximates the PCA estimator and then mimics the true AMP. Our numerical simulations show an excellent agreement between AMP results and theoretical predictions, and suggest an interesting open direction on achieving Bayes-optimal performance.

In this work we consider a class of non-linear eigenvalue problems that admit a spectrum similar to that of a Hamiltonian matrix, in the sense that the spectrum is symmetric with respect to both the real and imaginary axis. More precisely, we present a method to iteratively approximate the eigenvalues of such non-linear eigenvalue problems closest to a given purely real or imaginary shift, while preserving the symmetries of the spectrum. To this end the presented method exploits the equivalence between the considered non-linear eigenvalue problem and the eigenvalue problem associated with a linear but infinite-dimensional operator. To compute the eigenvalues closest to the given shift, we apply a specifically chosen shift-invert transformation to this linear operator and compute the eigenvalues with the largest modulus of the new shifted and inverted operator using an (infinite) Arnoldi procedure. The advantage of the chosen shift-invert transformation is that the spectrum of the transformed operator has a "real skew-Hamiltonian"-like structure. Furthermore, it is proven that the Krylov space constructed by applying this operator, satisfies an orthogonality property in terms of a specifically chosen bilinear form. By taking this property into account in the orthogonalization process, it is ensured that even in the presence of rounding errors, the obtained approximation for, e.g., a simple, purely imaginary eigenvalue is simple and purely imaginary. The presented work can thus be seen as an extension of [V. Mehrmann and D. Watkins, "Structure-Preserving Methods for Computing Eigenpairs of Large Sparse Skew-Hamiltonian/Hamiltonian Pencils", SIAM J. Sci. Comput. (22.6), 2001], to the considered class of non-linear eigenvalue problems. Although the presented method is initially defined on function spaces, it can be implemented using finite dimensional linear algebra operations.

Selecting powerful predictors for an outcome is a cornerstone task for machine learning. However, some types of questions can only be answered by identifying the predictors that causally affect the outcome. A recent approach to this causal inference problem leverages the invariance property of a causal mechanism across differing experimental environments (Peters et al., 2016; Heinze-Deml et al., 2018). This method, invariant causal prediction (ICP), has a substantial computational defect -- the runtime scales exponentially with the number of possible causal variables. In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors. Each of these tests relies on the minimization of a novel loss function -- the Wasserstein variance -- that is derived from tools in optimal transport theory and is used to quantify distributional variability across environments. We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.

Functional graphical models explore dependence relationships of random processes. This is achieved through estimating the precision matrix of the coefficients from the Karhunen-Loeve expansion. This paper deals with the problem of estimating functional graphs that consist of the same random processes and share some of the dependence structure. By estimating a single graph we would be shrouding the uniqueness of different sub groups within the data. By estimating a different graph for each sub group we would be dividing our sample size. Instead, we propose a method that allows joint estimation of the graphs while taking into account the intrinsic differences of each sub group. This is achieved by a hierarchical penalty that first penalizes on a common level and then on an individual level. We develop a computation method for our estimator that deals with the non-convex nature of the objective function. We compare the performance of our method with existing ones on a number of different simulated scenarios. We apply our method to an EEG data set that consists of an alcoholic and a non-alcoholic group, to construct brain networks.

Simultaneous statistical inference problems are at the basis of almost any scientific discovery process. We consider a class of simultaneous inference problems that are invariant under permutations, meaning that all components of the problem are oblivious to the labelling of the multiple instances under consideration. For any such problem we identify the optimal solution which is itself permutation invariant, the most natural condition one could impose on the set of candidate solutions. Interpreted differently, for any possible value of the parameter we find a tight (non-asymptotic) lower bound on the statistical performance of any procedure that obeys the aforementioned condition. By generalizing the standard decision theoretic notions of permutation invariance, we show that the results apply to a myriad of popular problems in simultaneous inference, so that the ultimate benchmark for each of these problems is identified. The connection to the nonparametric empirical Bayes approach of Robbins is discussed in the context of asymptotic attainability of the bound uniformly in the parameter value.

We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.

北京阿比特科技有限公司