{"id":364,"date":"2013-12-06T15:10:19","date_gmt":"2013-12-06T20:10:19","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=364"},"modified":"2013-12-06T15:10:19","modified_gmt":"2013-12-06T20:10:19","slug":"k-means-clustering","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2013\/12\/06\/k-means-clustering\/","title":{"rendered":"K-Means Clustering"},"content":{"rendered":"<p align=\"justify\"><span style=\"font-family: Arial, Helvetica, sans-serif; font-size: xx-small;\"><em>The Algorithm<\/em><\/span><span style=\"font-family: 'Times New Roman', Times, serif;\"><br \/>\nK-means (<a href=\"http:\/\/home.deib.polimi.it\/matteucc\/Clustering\/tutorial_html\/kmeans.html#macqueen\">MacQueen, 1967<\/a>) is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids shoud be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early groupage is done. At this point we need to re-calculate k new centroids as barycenters of the clusters resulting from the previous step. After we have these k new centroids, a new binding has to be done between the same data set points and the nearest new centroid. A loop has been generated. As a result of this loop we may notice that the k centroids change their location step by step until no more changes are done. In other words centroids do not move any more.<br \/>\nFinally, this algorithm aims at minimizing an\u00a0<em>objective function<\/em>, in this case a squared error function. The objective function<\/span><\/p>\n<p align=\"center\"><img decoding=\"async\" alt=\"\" src=\"http:\/\/home.deib.polimi.it\/matteucc\/Clustering\/tutorial_html\/images\/image009.gif\" align=\"absmiddle\" \/>\u00a0,<\/p>\n<p align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">where\u00a0<img decoding=\"async\" alt=\"\" src=\"http:\/\/home.deib.polimi.it\/matteucc\/Clustering\/tutorial_html\/images\/image011.gif\" align=\"absmiddle\" \/>\u00a0is a chosen distance measure between a data point\u00a0<img decoding=\"async\" alt=\"\" src=\"http:\/\/home.deib.polimi.it\/matteucc\/Clustering\/tutorial_html\/images\/image013.gif\" align=\"absmiddle\" \/>\u00a0and the cluster centre\u00a0<img decoding=\"async\" alt=\"\" src=\"http:\/\/home.deib.polimi.it\/matteucc\/Clustering\/tutorial_html\/images\/image015.gif\" align=\"absmiddle\" \/>, is an indicator of the distance of the\u00a0<em>n<\/em>\u00a0data points from their respective cluster centres.<\/span><\/p>\n<p align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">The algorithm is composed of the following steps:<\/span><\/p>\n<p>&nbsp;<\/p>\n<table width=\"75%\" border=\"1\" align=\"center\">\n<tbody>\n<tr>\n<td>\n<ol>\n<li><em><span style=\"font-family: 'Times New Roman', Times, serif;\">Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.<br \/>\n<\/span><\/em><\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\"><em>Assign each object to the group that has the closest centroid.<br \/>\n<\/em><\/span><\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\"><em>When all objects have been assigned, recalculate the positions of the K centroids.<br \/>\n<\/em><\/span><\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\"><em>Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.<\/em><\/span><\/li>\n<\/ol>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">Although it can be proved that the procedure will always terminate, the k-means algorithm does not necessarily find the most optimal configuration, corresponding to the global objective function minimum. The algorithm is also significantly sensitive to the initial randomly selected cluster centres. The k-means algorithm can be run multiple times to reduce this effect.<\/span><\/p>\n<p align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">K<\/span><span style=\"font-family: 'Times New Roman', Times, serif;\">-means is a simple algorithm that has been adapted to many problem domains. As we are going to see, it is a good candidate for extension to work with fuzzy feature vectors.<\/span><\/p>\n<p align=\"justify\">\n<p align=\"justify\"><span style=\"font-family: Arial, Helvetica, sans-serif; font-size: xx-small;\"><em>An example<\/em><\/span><br \/>\n<span style=\"font-family: 'Times New Roman', Times, serif;\">Suppose that we have n sample feature vectors\u00a0<strong>x<\/strong><sub>1<\/sub>,\u00a0<strong>x<\/strong><sub>2<\/sub>, &#8230;,\u00a0<strong>x<\/strong><sub>n<\/sub>\u00a0all from the same class, and we know that they fall into k compact clusters, k &lt; n. Let\u00a0<strong>m<\/strong><sub>i<\/sub>\u00a0be the mean of the vectors in cluster i. If the clusters are well separated, we can use a minimum-distance classifier to separate them. That is, we can say that\u00a0<strong>x<\/strong>\u00a0is in cluster i if ||\u00a0<strong>x<\/strong>\u00a0&#8211;\u00a0<strong>m<\/strong><sub>i<\/sub>\u00a0|| is the minimum of all the k distances. This suggests the following procedure for finding the k means:<\/span><\/p>\n<ul>\n<li>\n<div align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">Make initial guesses for the means\u00a0<strong>m<\/strong><sub>1<\/sub>,\u00a0<strong>m<\/strong><sub>2<\/sub>, &#8230;,\u00a0<strong>m<\/strong><sub>k<\/sub><\/span><\/div>\n<\/li>\n<li>\n<div align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">Until there are no changes in any mean<\/span><\/div>\n<ul>\n<li>\n<div align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">Use the estimated means to classify the samples into clusters<\/span><\/div>\n<\/li>\n<\/ul>\n<ul>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">For i from 1 to k<\/span>\n<ul>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">Replace\u00a0<strong>m<\/strong><sub>i<\/sub>\u00a0with the mean of all of the samples for cluster i<\/span><\/li>\n<\/ul>\n<\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">end_for<\/span><\/li>\n<\/ul>\n<\/li>\n<li>\n<div align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">end_until<\/span><\/div>\n<\/li>\n<\/ul>\n<p align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">Here is an example showing how the means\u00a0<strong>m<\/strong><sub>1<\/sub>\u00a0and\u00a0<strong>m<\/strong><sub>2<\/sub>\u00a0move into the centers of two clusters.<\/span><\/p>\n<p align=\"center\"><img decoding=\"async\" alt=\"\" src=\"http:\/\/home.deib.polimi.it\/matteucc\/Clustering\/tutorial_html\/images\/image016.gif\" \/><\/p>\n<p align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\"><strong>Remarks<\/strong><br \/>\nThis is a simple version of the k-means procedure. It can be viewed as a greedy algorithm for partitioning the n samples into k clusters so as to minimize the sum of the squared distances to the cluster centers. It does have some weaknesses:<\/span><\/p>\n<ul>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">The way to initialize the means was not specified. One popular way to start is to randomly choose k of the samples.<\/span><\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">The results produced depend on the initial values for the means, and it frequently happens that suboptimal partitions are found. The standard solution is to try a number of different starting points.<\/span><\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">It can happen that the set of samples closest to\u00a0<strong>m<\/strong><sub>i<\/sub>\u00a0is empty, so that\u00a0<strong>m<\/strong><sub>i<\/sub>\u00a0cannot be updated. This is an annoyance that must be handled in an implementation, but that we shall ignore.<\/span><\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">The results depend on the metric used to measure ||\u00a0<strong>x<\/strong>\u00a0&#8211;\u00a0<strong>m<\/strong><sub>i<\/sub>\u00a0||. A popular solution is to normalize each variable by its standard deviation, though this is not always desirable.<\/span><\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">The results depend on the value of k.<\/span><\/li>\n<\/ul>\n<p align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">This last problem is particularly troublesome, since we often have no way of knowing how many clusters exist. In the example shown above, the same algorithm applied to the same data produces the following 3-means clustering. Is it better or worse than the 2-means clustering<\/span>?<\/p>\n<p align=\"center\"><img decoding=\"async\" alt=\"\" src=\"http:\/\/home.deib.polimi.it\/matteucc\/Clustering\/tutorial_html\/images\/image017.gif\" \/><\/p>\n<p align=\"justify\"><span style=\"font-family: 'Times New Roman', Times, serif;\">Unfortunately there is no general theoretical solution to find the optimal number of clusters for any given data set. A simple approach is to compare the results of multiple runs with different k classes and choose the best one according to a given criterion (for instance the Schwarz Criterion &#8211; see\u00a0<a href=\"http:\/\/home.deib.polimi.it\/matteucc\/Clustering\/tutorial_html\/kmeans.html#moore\">Moore&#8217;s slides<\/a>), but we need to be careful because increasing k results in smaller error function values by definition, but also an increasing risk of overfitting.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p align=\"justify\"><span style=\"font-family: Arial, Helvetica, sans-serif; font-size: xx-small;\"><em>Bibliography<\/em><\/span><\/p>\n<div align=\"justify\">\n<ul>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\"><a name=\"macqueen\"><\/a>J. B. MacQueen (1967): &#8220;Some Methods for classification and Analysis of Multivariate Observations,\u00a0<em>Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability&#8221;<\/em>, Berkeley, University of California Press, 1:281-297<\/span><\/li>\n<li><a name=\"moore\"><\/a><span style=\"font-family: 'Times New Roman', Times, serif;\">Andrew Moore: \u201cK-means and Hierarchical Clustering &#8211; Tutorial Slides\u201d<br \/>\n<a href=\"http:\/\/www-2.cs.cmu.edu\/~awm\/tutorials\/kmeans.html\">http:\/\/www-2.cs.cmu.edu\/~awm\/tutorials\/kmeans.html<\/a><\/span><\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">Brian T. Luke: \u201cK-Means Clustering\u201d<br \/>\n<a href=\"http:\/\/fconyx.ncifcrf.gov\/~lukeb\/kmeans.html\">http:\/\/fconyx.ncifcrf.gov\/~lukeb\/kmeans.html<\/a><\/span><\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">Tariq Rashid: \u201cClustering\u201d<br \/>\n<a href=\"http:\/\/www.cs.bris.ac.uk\/home\/tr1690\/documentation\/fuzzy_clustering_initial_report\/node11.html\">http:\/\/www.cs.bris.ac.uk\/home\/tr1690\/documentation\/fuzzy_clustering_initial_report\/node11.html<\/a><\/span><\/li>\n<li><span style=\"font-family: 'Times New Roman', Times, serif;\">Hans-Joachim Mucha and Hizir Sofyan: \u201cNonhierarchical Clustering\u201d<br \/>\n<a href=\"http:\/\/www.quantlet.com\/mdstat\/scripts\/xag\/html\/xaghtmlframe149.html\">http:\/\/www.quantlet.com\/mdstat\/scripts\/xag\/html\/xaghtmlframe149.ht<\/a><\/span><\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The Algorithm K-means (MacQueen, 1967) is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24],"tags":[],"class_list":["post-364","post","type-post","status-publish","format-standard","hentry","category-statistics"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/364","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=364"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/364\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=364"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=364"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=364"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}