精品亚洲中文一区二区三区,日韩亚洲国产中文字幕精品在线,思思热在线视频精品

We study the sparse high-dimensional Gaussian mixture model when the number of clusters is allowed to grow with the sample size. A minimax lower bound for parameter estimation is established, and we show that a constrained maximum likelihood estimator achieves the minimax lower bound. However, this optimization-based estimator is computationally intractable because the objective function is highly nonconvex and the feasible set involves discrete structures. To address the computational challenge, we propose a Bayesian approach to estimate high-dimensional Gaussian mixtures whose cluster centers exhibit sparsity using a continuous spike-and-slab prior. Posterior inference can be efficiently computed using an easy-to-implement Gibbs sampler. We further prove that the posterior contraction rate of the proposed Bayesian method is minimax optimal. The mis-clustering rate is obtained as a by-product using tools from matrix perturbation theory. The proposed Bayesian sparse Gaussian mixture model does not require pre-specifying the number of clusters, which can be adaptively estimated via the Gibbs sampler. The validity and usefulness of the proposed method is demonstrated through simulation studies and the analysis of a real-world single-cell RNA sequencing dataset.

相關內容

高斯混合（模型）

關注 0

因子分析 · binary · 分解的 · Continuity · Analysis ·

2023 年 2 月 12 日

Factor analysis for a mixture of continuous and binary random variables

Takashi Arai

from arxiv, 34 pages, 4 figures, minor modifications

We propose a multivariate probability distribution that models a linear correlation between binary and continuous variables. The proposed distribution is a natural extension of the previously developed multivariate binary distribution. As an application of the proposed distribution, we develop a factor analysis for a mixture of continuous and binary variables. We also discuss improper solutions associated with factor analysis. As a prescription to avoid improper solutions, we propose a constraint that each row vector of factor loading matrix has the same norm. We numerically validated the proposed factor analysis and norm constraint prescription by analyzing real datasets.

估計/估計量 · 混合專家模型 · 預測器/決策函數 · Oracle · Networking ·

2023 年 2 月 11 日

Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts

TrungTin Nguyen,Hien D Nguyen,Faicel Chamroukhi,Geoffrey J McLachlan

from arxiv, Revise and add further explanations

Mixture of experts (MoE) has a well-principled finite mixture model construction for prediction, allowing the gating network (mixture weights) to learn from the predictors (explanatory variables) together with the experts' network (mixture component densities). We investigate the estimation properties of MoEs in a high-dimensional setting, where the number of predictors is much larger than the sample size, for which the literature lacks computational and especially theoretical results. We consider the class of finite MoE models with softmax gating functions and Gaussian regression experts, and focus on the theoretical properties of their $l_1$-regularized estimation via the Lasso. We provide a lower bound on the regularization parameter of the Lasso penalty that ensures an $l_1$-oracle inequality is satisfied by the Lasso estimator according to the Kullback--Leibler loss. We further state an $l_1$-ball oracle inequality for the $l_1$-penalized maximum likelihood estimator from the model selection.

SimPLe · 離散化 · 情景 · Aliasing · 樣本 ·

2023 年 2 月 11 日

A Simple Data Structure for Maintaining a Discrete Probability Distribution

Daniel Allendorf

from arxiv, Submitted to SEA 2023

We revisit the following problem: given a set of indices $S = \{1, \dots, n\}$ and weights $w_1, \dots, w_n \in \mathbb{R}_{> 0}$, provide samples from $S$ with distribution $p(i) = w_i / W$ where $W = \sum_j w_j$ gives the proper normalization. In the static setting, there is a simple data structure due to Walker called Alias Table that allows for samples to be drawn in constant time. A more challenging task is to maintain the distribution in a dynamic setting, where elements may be added or removed, or weights may change over time; here, existing solutions restrict the permissible weights, require rebuilding of the associated data structure after a number of updates, or are rather complex. In this paper, we describe, analyze, and engineer a simple data structure for maintaining a discrete probability distribution in the dynamic setting. Construction of the data structure for an arbitrary distribution takes time $O(n)$, sampling takes expected time $O(1)$, and updates of size $\Delta = O(W / n)$ can be processed in time $O(1)$. To evaluate the efficiency of the data structure we conduct an experimental study. The results suggest that the dynamic sampling performance is comparable to the static Alias Table with a minor slowdown.

近似 · 稀疏 · 可約的 · Extensibility · MASS ·

2023 年 2 月 10 日

Approximation and Structured Prediction with Sparse Wasserstein Barycenters

Minh-Hieu Do,Jean Feydy,Olga Mula

We develop a general theoretical and algorithmic framework for sparse approximation and structured prediction in $\mathcal{P}_2(\Omega)$ with Wasserstein barycenters. The barycenters are sparse in the sense that they are computed from an available dictionary of measures but the approximations only involve a reduced number of atoms. We show that the best reconstruction from the class of sparse barycenters is characterized by a notion of best $n$-term barycenter which we introduce, and which can be understood as a natural extension of the classical concept of best $n$-term approximation in Banach spaces. We show that the best $n$-term barycenter is the minimizer of a highly non-convex, bi-level optimization problem, and we develop algorithmic strategies for practical numerical computation. We next leverage this approximation tool to build interpolation strategies that involve a reduced computational cost, and that can be used for structured prediction, and metamodelling of parametrized families of measures. We illustrate the potential of the method through the specific problem of Model Order Reduction (MOR) of parametrized PDEs. Since our approach is sparse, adaptive and preserves mass by construction, it has potential to overcome known bottlenecks of classical linear methods in hyperbolic conservation laws transporting discontinuities. It also paves the way towards MOR for measure-valued PDE problems such as gradient flows.

估計/估計量 · 似然 · 泛函 · 對數似然 · 極大 ·

2023 年 2 月 10 日

Efficient and Accurate Learning of Mixtures of Plackett-Luce Models

Duc Nguyen,Anderson Y. Zhang

Mixture models of Plackett-Luce (PL) -- one of the most fundamental ranking models -- are an active research area of both theoretical and practical significance. Most previously proposed parameter estimation algorithms instantiate the EM algorithm, often with random initialization. However, such an initialization scheme may not yield a good initial estimate and the algorithms require multiple restarts, incurring a large time complexity. As for the EM procedure, while the E-step can be performed efficiently, maximizing the log-likelihood in the M-step is difficult due to the combinatorial nature of the PL likelihood function (Gormley and Murphy 2008). Therefore, previous authors favor algorithms that maximize surrogate likelihood functions (Zhao et al. 2018, 2020). However, the final estimate may deviate from the true maximum likelihood estimate as a consequence. In this paper, we address these known limitations. We propose an initialization algorithm that can provide a provably accurate initial estimate and an EM algorithm that maximizes the true log-likelihood function efficiently. Experiments on both synthetic and real datasets show that our algorithm is competitive in terms of accuracy and speed to baseline algorithms, especially on datasets with a large number of items.

Performer · 近似 · 泛函 · Learning · dynamic programming ·

2023 年 2 月 10 日

The Impact of Data Distribution on Q-learning with Function Approximation

Pedro P. Santos,Diogo S. Carvalho,Alberto Sardinha,Francisco S. Melo

We study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of Q-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results. We start by reviewing theoretical bounds on the performance of approximate dynamic programming algorithms. We then introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of Q-learning-based algorithms with function approximation, both online and offline. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline Q-learning-based algorithms under different environments. According to our results: (i) high entropy data distributions are well-suited for learning in an offline manner; and (ii) a certain degree of data diversity (data coverage) and data quality (closeness to optimal policy) are jointly desirable for offline learning.

估計/估計量 · 數據拆分 · Performer · MoDELS · 方差 ·

2023 年 2 月 10 日

The out-of-sample $R^2$: estimation and inference

Stijn Hawinkel,Willem Waegeman,Steven Maere

Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample $R^2$, which is easy to interpret and to compare across different outcome variables. As opposed to the in-sample $R^2$, the out-of-sample $R^2$ has not been well defined and the variability on the out-of-sample $\hat{R}^2$ has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define the out-of-sample $R^2$ as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the $\hat{R}^2$. The performance of the estimators for the $R^2$ and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative $\text{Brassica napus}$ and $\text{Zea mays}$ phenotypes based on gene expression data.

估計/估計量 · 極大似然 · 最大似然估計 · 近似 · 似然 ·

2023 年 2 月 10 日

Estimation of parameters of the logistic exponential distribution under progressive type-I hybrid censored sample

Subhankar Dutta,Suchandan Kayal

The paper addresses the problem of estimation of the model parameters of the logistic exponential distribution based on progressive type-I hybrid censored sample. The maximum likelihood estimates are obtained and computed numerically using Newton-Raphson method. Further, the Bayes estimates are derived under squared error, LINEX and generalized entropy loss functions. Two types (independent and bivariate) of prior distributions are considered for the purpose of Bayesian estimation. It is seen that the Bayes estimates are not of explicit forms.Thus, Lindley's approximation technique is employed to get approximate Bayes estimates. Interval estimates of the parameters based on normal approximate of the maximum likelihood estimates and normal approximation of the log-transformed maximum likelihood estimates are constructed. The highest posterior density credible intervals are obtained by using the importance sampling method. Furthermore, numerical computations are reported to review some of the results obtained in the paper. A real life dataset is considered for the purpose of illustrations.

估計/估計量 · Networking · Processing（編程語言） · 可約的 · 均方誤差 ·

2023 年 2 月 10 日

Model-Based Regression Adjustment with Model-Free Covariates for Network Interference

Kevin Han,Johan Ugander

When estimating a Global Average Treatment Effect (GATE) under network interference, units can have widely different relationships to the treatment depending on a combination of the structure of their network neighborhood, the structure of the interference mechanism, and how the treatment was distributed in their neighborhood. In this work, we introduce a sequential procedure to generate and select graph- and treatment-based covariates for GATE estimation under regression adjustment. We show that it is possible to simultaneously achieve low bias and considerably reduce variance with such a procedure. To tackle inferential complications caused by our feature generation and selection process, we introduce a way to construct confidence intervals based on a block bootstrap. We illustrate that our selection procedure and subsequent estimator can achieve good performance in terms of root mean squared error in several semi-synthetic experiments with Bernoulli designs, comparing favorably to an oracle estimator that takes advantage of regression adjustments for the known underlying interference structure. We apply our method to a real world experimental dataset with strong evidence of interference and demonstrate that it can estimate the GATE reasonably well without knowing the interference process a priori.

混合專家模型 · Processing（編程語言） · Performer · 可約的 · MoDELS ·

2023 年 2 月 9 日

Gaussian Process-Gated Hierarchical Mixtures of Experts

Yuhao Liu,Marzieh Ajirak,Petar Djuric

In this paper, we propose novel Gaussian process-gated hierarchical mixtures of experts (GPHMEs) that are used for building gates and experts. Unlike in other mixtures of experts where the gating models are linear to the input, the gating functions of our model are inner nodes built with Gaussian processes based on random features that are non-linear and non-parametric. Further, the experts are also built with Gaussian processes and provide predictions that depend on test data. The optimization of the GPHMEs is carried out by variational inference. There are several advantages of the proposed GPHMEs. One is that they outperform tree-based HME benchmarks that partition the data in the input space. Another advantage is that they achieve good performance with reduced complexity. A third advantage of the GPHMEs is that they provide interpretability of deep Gaussian processes and more generally of deep Bayesian neural networks. Our GPHMEs demonstrate excellent performance for large-scale data sets even with quite modest sizes.