亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Partially recorded data are frequently encountered in many applications and usually clustered by first removing incomplete cases or features with missing values, or by imputing missing values, followed by application of a clustering algorithm to the resulting altered dataset. Here, we develop clustering methodology through a model-based approach using the marginal density for the observed values, assuming a finite mixture model of multivariate $t$ distributions. We compare our approximate algorithm to the corresponding full expectation-maximization (EM) approach that considers the missing values in the incomplete data set and makes a missing at random (MAR) assumption, as well as case deletion and imputation methods. Since only the observed values are utilized, our approach is computationally more efficient than imputation or full EM. Simulation studies demonstrate that our approach has favorable recovery of the true cluster partition compared to case deletion and imputation under various missingness mechanisms, and is at least competitive with the full EM approach, even when MAR assumptions are violated. Our methodology is demonstrated on a problem of clustering gamma-ray bursts and is implemented at //github.com/emilygoren/MixtClust.

相關內容

In this article, we study possible relations between the structural connectome and cognitiveprofiles using a multi-response nonparametric regression model under group sparsity. The aim is to identify the brain regions having significant effect on cognitive functioning. In this article, we consider nine different attributesfor each brain region as our predictors. Here, each node thus correspond to a collectionof nodal attributes. Hence, these nodal graph metrics may naturally be grouped togetherfor each node, motivating us to introduce group sparsity for feature selection. We proposeGaussian RBF-nets with a novel group sparsity inducing prior to model the unknown meanfunctions. The covariance structure of the multivariate response is characterized in termsof a linear factor modeling framework. For posterior computation, we develop an efficientMarkov chain Monte Carlo sampling algorithm. We show that the proposed method per-forms overwhelmingly better than all its competitors. Applying our proposed method to aHuman Connectome Project (HCP) dataset, we identify the important brain regions andnodal attributes for cognitive functioning, as well as identify interesting low-dimensionaldependency structures among the cognition related test scores.

We introduce tools from numerical analysis and high dimensional probability for precision control and complexity analysis of subdivision-based algorithms in computational geometry. We combine these tools with the continuous amortization framework from exact computation. We use these tools on a well-known example from the subdivision family: the adaptive subdivision algorithm due to Plantinga and Vegter. The only existing complexity estimate on this rather fast algorithm was an exponential worst-case upper bound for its interval arithmetic version. We go beyond the worst-case by considering both average and smoothed analysis, and prove polynomial time complexity estimates for both interval arithmetic and finite-precision versions of the Plantinga-Vegter algorithm.

We deal with the shape reconstruction of inclusions in elastic bodies. For solving this inverse problem in practice, data fitting functionals are used. Those work better than the rigorous monotonicity methods from [5], but have no rigorously proven convergence theory. Therefore we show how the monotonicity methods can be converted into a regularization method for a data-fitting functional without losing the convergence properties of the monotonicity methods. This is a great advantage and a significant improvement over standard regularization techniques. In more detail, we introduce constraints on the minimization problem of the residual based on the monotonicity methods and prove the existence and uniqueness of a minimizer as well as the convergence of the method for noisy data. In addition, we compare numerical reconstructions of inclusions based on the monotonicity-based regularization with a standard approach (one-step linearization with Tikhonov-like regularization), which also shows the robustness of our method regarding noise in practice.

This article considers spectral community detection in the regime of sparse networks with heterogeneous degree distributions, for which we devise an algorithm to efficiently retrieve communities. Specifically, we demonstrate that a conveniently parametrized form of regularized Laplacian matrix can be used to perform spectral clustering in sparse networks, without suffering from its degree heterogeneity. Besides, we exhibit important connections between this proposed matrix and the now popular non-backtracking matrix, the Bethe-Hessian matrix, as well as the standard Laplacian matrix. Interestingly, as opposed to competitive methods, our proposed improved parametrization inherently accounts for the hardness of the classification problem. These findings are summarized under the form of an algorithm capable of both estimating the number of communities and achieving high-quality community reconstruction.

Node clustering is a powerful tool in the analysis of networks. We introduce a graph neural network framework to obtain node embeddings for directed networks in a self-supervised manner, including a novel probabilistic imbalance loss, which can be used for network clustering. Here, we propose directed flow imbalance measures, which are tightly related to directionality, to reveal clusters in the network even when there is no density difference between clusters. In contrast to standard approaches in the literature, in this paper, directionality is not treated as a nuisance, but rather contains the main signal. DIGRAC optimizes directed flow imbalance for clustering without requiring label supervision, unlike existing GNN methods, and can naturally incorporate node features, unlike existing spectral methods. Experimental results on synthetic data, in the form of directed stochastic block models, and real-world data at different scales, demonstrate that our method, based on flow imbalance, attains state-of-the-art results on directed graph clustering, for a wide range of noise and sparsity levels and graph structures and topologies.

We introduce a novel rule-based approach for handling regression problems. The new methodology carries elements from two frameworks: (i) it provides information about the uncertainty of the parameters of interest using Bayesian inference, and (ii) it allows the incorporation of expert knowledge through rule-based systems. The blending of those two different frameworks can be particularly beneficial for various domains (e.g. engineering), where, even though the significance of uncertainty quantification motivates a Bayesian approach, there is no simple way to incorporate researcher intuition into the model. We validate our models by applying them to synthetic applications: a simple linear regression problem and two more complex structures based on partial differential equations. Finally, we review the advantages of our methodology, which include the simplicity of the implementation, the uncertainty reduction due to the added information and, in some occasions, the derivation of better point predictions, and we address limitations, mainly from the computational complexity perspective, such as the difficulty in choosing an appropriate algorithm and the added computational burden.

Named entity recognition (NER) is a well-studied task in natural language processing. Traditional NER research only deals with flat entities and ignores nested entities. The span-based methods treat entity recognition as a span classification task. Although these methods have the innate ability to handle nested NER, they suffer from high computational cost, ignorance of boundary information, under-utilization of the spans that partially match with entities, and difficulties in long entity recognition. To tackle these issues, we propose a two-stage entity identifier. First we generate span proposals by filtering and boundary regression on the seed spans to locate the entities, and then label the boundary-adjusted span proposals with the corresponding categories. Our method effectively utilizes the boundary information of entities and partially matched spans during training. Through boundary regression, entities of any length can be covered theoretically, which improves the ability to recognize long entities. In addition, many low-quality seed spans are filtered out in the first stage, which reduces the time complexity of inference. Experiments on nested NER datasets demonstrate that our proposed method outperforms previous state-of-the-art models.

Clustering is one of the most fundamental and wide-spread techniques in exploratory data analysis. Yet, the basic approach to clustering has not really changed: a practitioner hand-picks a task-specific clustering loss to optimize and fit the given data to reveal the underlying cluster structure. Some types of losses---such as k-means, or its non-linear version: kernelized k-means (centroid based), and DBSCAN (density based)---are popular choices due to their good empirical performance on a range of applications. Although every so often the clustering output using these standard losses fails to reveal the underlying structure, and the practitioner has to custom-design their own variation. In this work we take an intrinsically different approach to clustering: rather than fitting a dataset to a specific clustering loss, we train a recurrent model that learns how to cluster. The model uses as training pairs examples of datasets (as input) and its corresponding cluster identities (as output). By providing multiple types of training datasets as inputs, our model has the ability to generalize well on unseen datasets (new clustering tasks). Our experiments reveal that by training on simple synthetically generated datasets or on existing real datasets, we can achieve better clustering performance on unseen real-world datasets when compared with standard benchmark clustering techniques. Our meta clustering model works well even for small datasets where the usual deep learning models tend to perform worse.

Because of continuous advances in mathematical programing, Mix Integer Optimization has become a competitive vis-a-vis popular regularization method for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. We tackle these challenges, reducing computational burden when tuning the sparsity bound (a parameter which is critical for effectiveness) and improving performance in the presence of feature collinearity and of signals that vary in nature and strength. Importantly, we render the approach efficient and effective in applications of realistic size and complexity - without resorting to relaxations or heuristics in the optimization, or abandoning rigorous cross-validation tuning. Computational viability and improved performance in subtler scenarios is achieved with a multi-pronged blueprint, leveraging characteristics of the Mixed Integer Programming framework and by means of whitening, a data pre-processing step.

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.

北京阿比特科技有限公司