{"id":48,"date":"2013-03-06T15:33:26","date_gmt":"2013-03-06T20:33:26","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=48"},"modified":"2013-03-06T15:33:26","modified_gmt":"2013-03-06T20:33:26","slug":"48","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2013\/03\/06\/48\/","title":{"rendered":"How to Make Bubble Charts"},"content":{"rendered":"<div>By\u00a0<strong>NATHAN YAU<\/strong><\/div>\n<div id=\"lead-in\">\n<p><img loading=\"lazy\" decoding=\"async\" alt=\"Crime Rates by State\" src=\"http:\/\/flowingdata.com\/wp-content\/uploads\/2010\/11\/5-edited-version1-425x285.png\" width=\"425\" height=\"285\" \/><\/p>\n<div>Ever since Hans Rosling presented a motion chart to tell his story of the wealth and health of nations, there has been an affinity for proportional bubbles on an x-y axis. This tutorial is for the static version of the motion chart: the bubble chart.<\/div>\n<div id=\"tut-meta\">\n<div><a href=\"http:\/\/flowingdata.com\/2010\/11\/23\/how-to-make-bubble-charts\/5-edited-version-2\/\">Demo<\/a><\/div>\n<div><a href=\"http:\/\/flowingdata.com\/?s2member_file_download=2010\/bubbles-tutorial-code.zip\">Download Source<\/a><\/div>\n<div><\/div>\n<\/div>\n<div><\/div>\n<\/div>\n<div>\n<p>A bubble chart can also just be straight up proportionally sized bubbles, but here we&#8217;re going to cover how to create the variety that is like a scatterplot with a third, bubbly dimension.<\/p>\n<p>The advantage of this chart type is that it lets you compare three variables at once. One is on the x-axis, one is on the y-axis, and the third is represented by area size of bubbles. Have a look at\u00a0<a href=\"http:\/\/flowingdata.com\/2010\/11\/23\/how-to-make-bubble-charts\/5-edited-version-2\/\">the final chart<\/a>\u00a0to see what we&#8217;re making.<\/p>\n<h2>Step 0. Download R<\/h2>\n<p>We&#8217;re going to use R to do this, so\u00a0<a href=\"http:\/\/www.r-project.org\/\">download that<\/a>\u00a0before moving on. It&#8217;s free and open-source, so you have nothing to lose. Plus it&#8217;s a\u00a0<a href=\"http:\/\/flowingdata.com\/2010\/11\/17\/r-is-the-need-to-know-stat-software\/\">need-to-know-name of 2011<\/a>, so you might as well get to know it now. You can thank me later.<\/p>\n<h2>Step 1. Load the data<\/h2>\n<p>Assuming you already have R open, the first thing we&#8217;ll do is load the data. We&#8217;re examining the same crime data the we did for our last tutorial. I&#8217;ve added state population this time around. One note about the data. The crime numbers are actually for 2005, while the populations are for 2008. This isn&#8217;t a huge deal since we&#8217;re more interested in relative populations than we are the raw values, but keep that in mind.<\/p>\n<p>Okay, moving on. You can download the tab-delimited file\u00a0<a href=\"http:\/\/datasets.flowingdata.com\/crimeRatesByState2005.tsv\">here<\/a>\u00a0and keep it local, but the easiest way is to load it directly into R with the below line of code:<\/p>\n<p>&nbsp;<\/p>\n<div id=\"highlighter_303461\">\n<div>\n<div>\n<table>\n<tbody>\n<tr>\n<td><code>1<\/code><\/td>\n<td><code>crime &lt;-\u00a0<\/code><code>read.csv<\/code><code>(<\/code><code>\"<a href=\"http:\/\/datasets.flowingdata.com\/crimeRatesByState2005.tsv\">http:\/\/datasets.flowingdata.com\/crimeRatesByState2005.tsv<\/a>\"<\/code><code>, header=<\/code><code>TRUE<\/code><code>, sep=<\/code><code>\"\\t\"<\/code><code>)<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>You&#8217;re telling R to download the data and read it as a comma-delimited file with a header. This loads it as a data frame in the\u00a0<code>crime<\/code>\u00a0variable.<\/p>\n<h2>Step 2. Draw some circles<\/h2>\n<p>Now we can get right to drawing circles with the\u00a0<code>symbols()<\/code>\u00a0command. Pass it values for the x-axis, y-axis, and circles, and it&#8217;ll spit out a bubble chart for you.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"highlighter_910864\">\n<div>\n<div>\n<table>\n<tbody>\n<tr>\n<td><code>1<\/code><\/td>\n<td><code>symbols<\/code><code>(crime$murder, crime$burglary, circles=crime$population)<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Run the line of code above, and you&#8217;ll get this:<\/p>\n<p>Circles incorrectly sized by radius instead of area. Large values appear much bigger.<img loading=\"lazy\" decoding=\"async\" title=\"1-wrong-sized-circles\" alt=\"\" src=\"http:\/\/flowingdata.com\/wp-content\/uploads\/2010\/11\/1-wrong-sized-circles-575x519.png\" width=\"575\" height=\"519\" \/><\/p>\n<p>All done, right? Wrong. That was a test. The above sizes the radius of the circles by population. We want to size them by\u00a0<em>area<\/em>. The relative proportions are all out of wack if you size by radius.<\/p>\n<h2>Step 3. Size the circles correctly<\/h2>\n<p>To size radiuses correctly, we look to the equation for area of a circle:<\/p>\n<p>Area of circle = \u03c0r<sup>2<\/sup><\/p>\n<p>In this case area of the circle is population. We want to know\u00a0<em>r<\/em>. Move some things around and we get this:<\/p>\n<p>r = \u221a(Area of circle \/ \u03c0)<\/p>\n<p>Substitute population for the area of the circle, and translate to R, and we get this:<\/p>\n<p>&nbsp;<\/p>\n<div id=\"highlighter_609213\">\n<div>\n<div>\n<table>\n<tbody>\n<tr>\n<td><code>1<\/code><\/td>\n<td><code>radius &lt;-\u00a0<\/code><code>sqrt<\/code><code>( crime$population\/\u00a0<\/code><code>pi<\/code>\u00a0<code>)<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div>\n<table>\n<tbody>\n<tr>\n<td><code>2<\/code><\/td>\n<td><code>symbols<\/code><code>(crime$murder, crime$burglary, circles=radius)<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Circles correctly sized by area, but the range of sizes is too much. The chart is cluttered and unreadable.<img loading=\"lazy\" decoding=\"async\" title=\"2-correctsize-too-big\" alt=\"\" src=\"http:\/\/flowingdata.com\/wp-content\/uploads\/2010\/11\/2-correctsize-too-big-575x530.png\" width=\"575\" height=\"530\" \/><\/p>\n<p>Yay. Properly scaled circles. They&#8217;re way too big though for this chart to be useful. By default,\u00a0<code>symbols()<\/code>sizes the largest bubble to one inch, and then scales the rest accordingly. We can change that by using the<code>inches<\/code>\u00a0argument. Whatever value you put will take the place of the one-inch default. While we&#8217;re at it, let&#8217;s add color and change the x- and y-axis labels.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"highlighter_29523\">\n<div>\n<div>\n<table>\n<tbody>\n<tr>\n<td><code>1<\/code><\/td>\n<td><code>symbols<\/code><code>(crime$murder, crime$burglary, circles=radius, inches=0.35, fg=<\/code><code>\"white\"<\/code><code>, bg=<\/code><code>\"red\"<\/code><code>, xlab=<\/code><code>\"Murder Rate\"<\/code><code>, ylab=<\/code><code>\"Burglary Rate\"<\/code><code>)<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Notice we use\u00a0<code>fg<\/code>\u00a0to change border color,\u00a0<code>bg<\/code>\u00a0to change fill color. Here&#8217;s what we get:<\/p>\n<p>Scale the circles to make the the chart more readable, and use the\u00a0<code>fg<\/code>\u00a0and\u00a0<code>bg<\/code>\u00a0arguments to change colors.<img loading=\"lazy\" decoding=\"async\" title=\"3-sized-circles-by-area\" alt=\"\" src=\"http:\/\/flowingdata.com\/wp-content\/uploads\/2010\/11\/3-sized-circles-by-area-575x530.png\" width=\"575\" height=\"530\" \/><\/p>\n<p>Now we&#8217;re getting somewhere.<\/p>\n<p>By the way, you can make a chart with other shapes too with\u00a0<code>symbols()<\/code>. You can make squares, rectangles, thermometers, boxplots, and stars. They take different arguments than the circle. The squares, for example, are sized by the length of a side. Again, make sure you size them appropriately.<\/p>\n<p>Here&#8217;s what squares look like, using the below line of code.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"highlighter_136370\">\n<div>\n<div>\n<table>\n<tbody>\n<tr>\n<td><code>1<\/code><\/td>\n<td><code>symbols<\/code><code>(crime$murder, crime$burglary, squares=<\/code><code>sqrt<\/code><code>(crime$population), inches=0.5)<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>You can use squares sized by area instead of circles, too.<img loading=\"lazy\" decoding=\"async\" title=\"crime-squares-no-labels\" alt=\"\" src=\"http:\/\/flowingdata.com\/wp-content\/uploads\/2010\/11\/crime-squares-no-labels-575x457.png\" width=\"575\" height=\"457\" \/><\/p>\n<p>Let&#8217;s stick with circles for now.<\/p>\n<h2>Step 4. Add labels<\/h2>\n<p>As it is, the chart shows some sense of distribution, but we don&#8217;t know which circle represents each state. So let&#8217;s add labels. We do this with\u00a0<code>text()<\/code>, whose arguments are x-coordinates, y-coordinates, and the actual text to print. We have all of these. Like the bubbles, the\u00a0<em>x<\/em>\u00a0is murders and the\u00a0<em>y<\/em>\u00a0is burglaries. The actual labels are state names, which is the first column in our data frame.<\/p>\n<p>With that in mind, we do this:<\/p>\n<p>&nbsp;<\/p>\n<div id=\"highlighter_717590\">\n<div>\n<div>\n<table>\n<tbody>\n<tr>\n<td><code>1<\/code><\/td>\n<td><code>text<\/code><code>(crime$murder, crime$burglary, crime$state, cex=0.5)<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The\u00a0<code>cex<\/code>\u00a0argument controls text size. It is 1 by default. Values greater than one will make the labels bigger and the opposite for less than one. The labels will center on the x- and y-coordinates.<\/p>\n<p>Here&#8217;s what it looks like.<\/p>\n<p>Add labels so you know what each circle represents.<img loading=\"lazy\" decoding=\"async\" title=\"4-added-labels\" alt=\"\" src=\"http:\/\/flowingdata.com\/wp-content\/uploads\/2010\/11\/4-added-labels-575x521.png\" width=\"575\" height=\"521\" \/><\/p>\n<h2>Step 5. Clean up<\/h2>\n<p>Finally, as per usual, I clean up in Adobe Illustrator. You can mess around with this in R, if you like, but I&#8217;ve found it&#8217;s way easier to save my file as a PDF and do what I want with Illustrator. I uncluttered the state labels to make them more readable, rotated the y-axis labels, so that they&#8217;re horizontal, added a legend for population, and removed the outside border. I also brought Georgia to the front, because most of it was hidden by Texas.<\/p>\n<p>Here&#8217;s the\u00a0<a href=\"http:\/\/flowingdata.com\/2010\/11\/23\/how-to-make-bubble-charts\/5-edited-version-2\/\">final version<\/a>. Click the image to see it in full.<\/p>\n<p>Cleanup and a key make the chart more informative.<a href=\"http:\/\/flowingdata.com\/2010\/11\/23\/how-to-make-bubble-charts\/5-edited-version-2\/\" rel=\"attachment wp-att-12941\"><img loading=\"lazy\" decoding=\"async\" title=\"5-edited-version\" alt=\"\" src=\"http:\/\/flowingdata.com\/wp-content\/uploads\/2010\/11\/5-edited-version1-575x385.png\" width=\"575\" height=\"385\" \/><\/a><\/p>\n<p>And there you go. Type in\u00a0<code>?symbols<\/code>\u00a0in R for more plotting options. Go wild.<\/p>\n<p>For more examples, guidance, and all-around data goodness like this,\u00a0<a href=\"http:\/\/book.flowingdata.com\/\">buy Visualize This<\/a>, the new FlowingData book.<\/p>\n<div><\/div>\n<\/div>\n<div><\/div>\n<div id=\"end\"><\/div>\n<div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>By\u00a0NATHAN YAU Ever since Hans Rosling presented a motion chart to tell his story of the wealth and health of nations, there has been an&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-48","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/48","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=48"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/48\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=48"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=48"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=48"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}