午夜剧场成年免费视,一级黄色视频一区,久久久91精品人妻无码夜色,国产成人成人一区二区

This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and k-means require individual components of the mixture to be somewhat spherical, and perform poorly when they are stretched. To overcome this issue, we propose a non-convex program seeking for an affine transform to turn the data into a one-dimensional point cloud concentrating around $-1$ and $1$, after which clustering becomes easy. Our theoretical contributions are two-fold: (1) we show that the non-convex loss function exhibits desirable geometric properties when the sample size exceeds some constant multiple of the dimension, and (2) we leverage this to prove that an efficient first-order algorithm achieves near-optimal statistical precision without good initialization. We also propose a general methodology for clustering with flexible choices of feature transforms and loss objectives.

相關內容

簇

關注 1

Continuity · 簇 · 層次聚類 · Performer · 學成 ·

2022 年 2 月 2 日

Adaptive Resonance Theory-based Topological Clustering with a Divisive Hierarchical Structure Capable of Continual Learning

Naoki Masuyama,Narito Amako,Yuna Yamada,Yusuke Nojima,Hisao Ishibuchi

from arxiv, This paper is currently under review

Adaptive Resonance Theory (ART) is considered as an effective approach for realizing continual learning thanks to its ability to handle the plasticity-stability dilemma. In general, however, the clustering performance of ART-based algorithms strongly depends on the specification of a similarity threshold, i.e., a vigilance parameter, which is data-dependent and specified by hand. This paper proposes an ART-based topological clustering algorithm with a mechanism that automatically estimates a similarity threshold from the distribution of data points. In addition, for improving information extraction performance, a divisive hierarchical clustering algorithm capable of continual learning is proposed by introducing a hierarchical structure to the proposed algorithm. Experimental results demonstrate that the proposed algorithm has high clustering performance comparable with recently-proposed state-of-the-art hierarchical clustering algorithms.

簇 · Neck · 泛函 · SCAN · 最大似然估計 ·

2022 年 1 月 31 日

Spectral image clustering on dual-energy CT scans using functional regression mixtures

Segolene Brivet,Faicel Chamroukhi,Mark Coates,Reza Forghani,Peter Savadjiev

from arxiv, submitted to IEEE Transactions on Medical Imaging

Dual-energy computed tomography (DECT) is an advanced CT scanning technique enabling material characterization not possible with conventional CT scans. It allows the reconstruction of energy decay curves at each 3D image voxel, representing varying image attenuation at different effective scanning energy levels. In this paper, we develop novel functional data analysis (FDA) techniques and adapt them to the analysis of DECT decay curves. More specifically, we construct functional mixture models that integrate spatial context in mixture weights, with mixture component densities being constructed upon the energy decay curves as functional observations. We design unsupervised clustering algorithms by developing dedicated expectation maximization (EM) algorithms for the maximum likelihood estimation of the model parameters. To our knowledge, this is the first article to adapt statistical FDA tools and model-based clustering to take advantage of the full spectral information provided by DECT. We evaluate our methods on 91 head and neck cancer DECT scans. We compare our unsupervised clustering results to tumor contours traced manually by radiologists, as well as to several baseline algorithms. Given the inter-rater variability even among experts at delineating head and neck tumors, and given the potential importance of tissue reactions surrounding the tumor itself, our proposed methodology has the potential to add value in downstream machine learning applications for clinical outcome prediction based on DECT data in head and neck cancer.

層次聚類 · 簇 · Continuity · INFORMS · CASES ·

2022 年 1 月 31 日

Hierarchical clustering of mixed-type data based on barycentric coding

Odysseas Moschidis,Theodore Chadjipadelis,Angelos Markos

Clustering of mixed-type datasets can be a particularly challenging task as it requires taking into account the associations between variables with different level of measurement, i.e., nominal, ordinal and/or interval. In some cases, hierarchical clustering is considered a suitable approach, as it makes few assumptions about the data and its solution can be easily visualized. Since most hierarchical clustering approaches assume variables are measured on the same scale, a simple strategy for clustering mixed-type data is to homogenize the variables before clustering. This would mean either recoding the continuous variables as categorical ones or vice versa. However, typical discretization of continuous variables implies loss of information. In this work, an agglomerative hierarchical clustering approach for mixed-type data is proposed, which relies on a barycentric coding of continuous variables. The proposed approach minimizes information loss and is compatible with the framework of correspondence analysis. The utility of the method is demonstrated on real and simulated data.

邊緣似然函數 · 子采樣 · 模型選擇 · 似然 · MoDELS ·

2022 年 1 月 31 日

A subsampling approach for Bayesian model selection

Jon Lachmann,Geir Storvik,Florian Frommlet,Aliaksadr Hubin

from arxiv, 33 pages, 17 figures, tables

It is common practice to use Laplace approximations to compute marginal likelihoods in Bayesian versions of generalised linear models (GLM). Marginal likelihoods combined with model priors are then used in different search algorithms to compute the posterior marginal probabilities of models and individual covariates. This allows performing Bayesian model selection and model averaging. For large sample sizes, even the Laplace approximation becomes computationally challenging because the optimisation routine involved needs to evaluate the likelihood on the full set of data in multiple iterations. As a consequence, the algorithm is not scalable for large datasets. To address this problem, we suggest using a version of a popular batch stochastic gradient descent (BSGD) algorithm for estimating the marginal likelihood of a GLM by subsampling from the data. We further combine the algorithm with Markov chain Monte Carlo (MCMC) based methods for Bayesian model selection and provide some theoretical results on the convergence of the estimates. Finally, we report results from experiments illustrating the performance of the proposed algorithm.

INFORMS · 回火 · Performer · MCMC · 馬爾可夫鏈蒙特卡羅 ·

2022 年 1 月 30 日

Rapid Convergence of Informed Importance Tempering

Quan Zhou,Aaron Smith

Informed Markov chain Monte Carlo (MCMC) methods have been proposed as scalable solutions to Bayesian posterior computation on high-dimensional discrete state spaces, but theoretical results about their convergence behavior in general settings are lacking. In this article, we propose a class of MCMC schemes called informed importance tempering (IIT), which combine importance sampling and informed local proposals, and derive generally applicable spectral gap bounds for IIT estimators. Our theory shows that IIT samplers have remarkable scalability when the target posterior distribution concentrates on a small set. Further, both our theory and numerical experiments demonstrate that the informed proposal should be chosen with caution: the performance of some proposals may be very sensitive to the shape of the target distribution. We find that the "square-root proposal weighting" scheme tends to perform well in most settings.

優化器 · 近似 · 曲率 · Better · CC ·

2022 年 1 月 29 日

Nys-Newton: Nystr?m-Approximated Curvature for Stochastic Optimization

Dinesh Singh,Hardik Tankaria,Makoto Yamada

Second-order optimization methods are among the most widely used optimization approaches for convex optimization problems, and have recently been used to optimize non-convex optimization problems such as deep learning models. The widely used second-order optimization methods such as quasi-Newton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives. In this study, we propose an approximate Newton sketch-based stochastic optimization algorithm for large-scale empirical risk minimization. Specifically, we compute a partial column Hessian of size ($d\times m$) with $m\ll d$ randomly selected variables, then use the \emph{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. We then integrate our approximated Hessian with stochastic gradient descent and stochastic variance-reduced gradient methods. The results of numerical experiments on both convex and non-convex functions show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method, exhibiting performance competitive with that of state-of-the-art first-order and stochastic quasi-Newton methods. Furthermore, we provide a theoretical convergence analysis for convex functions.

可辨認的 · 估計/估計量 · 穩健性 · 統計量 · MoDELS ·

2021 年 10 月 4 日

Causality and Generalizability: Identifiability and Learning Methods

Martin Emil Jakobsen

from arxiv, PhD Thesis

This PhD thesis contains several contributions to the field of statistical causal modeling. Statistical causal models are statistical models embedded with causal assumptions that allow for the inference and reasoning about the behavior of stochastic systems affected by external manipulation (interventions). This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust (out-of-distribution generalizing) prediction methods. We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization. Our proposed estimators show, in certain settings, mean squared error improvements compared to both canonical and state-of-the-art estimators. We show that recent research on distributionally robust prediction methods has connections to well-studied estimators from econometrics. This connection leads us to prove that general K-class estimators possess distributional robustness properties. We, furthermore, propose a general framework for distributional robustness with respect to intervention-induced distributions. In this framework, we derive sufficient conditions for the identifiability of distributionally robust prediction methods and present impossibility results that show the necessity of several of these conditions. We present a new structure learning method applicable in additive noise models with directed trees as causal graphs. We prove consistency in a vanishing identifiability setup and provide a method for testing substructure hypotheses with asymptotic family-wise error control that remains valid post-selection. Finally, we present heuristic ideas for learning summary graphs of nonlinear time-series models.

簇 · Performer · 數據集 · MoDELS · DBSCAN ·

2019 年 10 月 30 日

Meta-Learning to Cluster

Yibo Jiang,Nakul Verma

Clustering is one of the most fundamental and wide-spread techniques in exploratory data analysis. Yet, the basic approach to clustering has not really changed: a practitioner hand-picks a task-specific clustering loss to optimize and fit the given data to reveal the underlying cluster structure. Some types of losses---such as k-means, or its non-linear version: kernelized k-means (centroid based), and DBSCAN (density based)---are popular choices due to their good empirical performance on a range of applications. Although every so often the clustering output using these standard losses fails to reveal the underlying structure, and the practitioner has to custom-design their own variation. In this work we take an intrinsically different approach to clustering: rather than fitting a dataset to a specific clustering loss, we train a recurrent model that learns how to cluster. The model uses as training pairs examples of datasets (as input) and its corresponding cluster identities (as output). By providing multiple types of training datasets as inputs, our model has the ability to generalize well on unseen datasets (new clustering tasks). Our experiments reveal that by training on simple synthetically generated datasets or on existing real datasets, we can achieve better clustering performance on unseen real-world datasets when compared with standard benchmark clustering techniques. Our meta clustering model works well even for small datasets where the usual deep learning models tend to perform worse.

簇 · GROUP · 聚類方法 · Performer · contrastive ·

2019 年 2 月 28 日

Efficient Parameter-free Clustering Using First Neighbor Relations

M. Saquib Sarfraz,Vivek Sharma,Rainer Stiefelhagen

from arxiv, CVPR 2019

We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. The main proposition is that the first neighbor of each sample is all one needs to discover large chains and finding the groups in the data. In contrast to most existing clustering algorithms our method does not require any hyper-parameters, distance thresholds and/or the need to specify the number of clusters. The proposed algorithm belongs to the family of hierarchical agglomerative methods. The technique has a very low computational overhead, is easily scalable and applicable to large practical problems. Evaluation on well known datasets from different domains ranging between 1077 and 8.1 million samples shows substantial performance gains when compared to the existing clustering techniques.

Performer · 估計/估計量 · 經驗風險最小化 · 經驗風險 · 方差 ·

2017 年 12 月 14 日

Variance-based regularization with convex objectives

John Duchi,Hongseok Namkoong

We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.