{"id":762,"date":"2015-02-17T18:14:23","date_gmt":"2015-02-18T01:14:23","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=762"},"modified":"2015-02-17T18:14:23","modified_gmt":"2015-02-18T01:14:23","slug":"10-r-packages-which-will-master-you-in-r-tool","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2015\/02\/17\/10-r-packages-which-will-master-you-in-r-tool\/","title":{"rendered":"10 R packages which will master you in R Tool"},"content":{"rendered":"<p>R can be more prickly and obscure than other languages like Python or Java. The good news is that there are tons of packages which provide simple and familiar interfaces on top of Base R. This post is about ten packages I love and use everyday and ones I wish I knew about earlier.<\/p>\n<p><strong>\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0<\/strong><\/p>\n<ol>\n<\/ol>\n<ol>\n<li>\n<pre>install.packages(\"sqldf\")<\/pre>\n<p>One of the steepest parts of the R learning curve is the syntax. It took me a while to get over using\u00a0<code>&lt;-<\/code>\u00a0instead of\u00a0<code>=<\/code>. I hear people say a lot of times\u00a0<i>How do I just do a<\/i>\u00a0<code>VLOOKUP<\/code><i>?!?<\/i>\u00a0R is great for general data munging tasks, but it takes a while to master. I think it\u2019s safe to say that\u00a0<code>sqldf<\/code>\u00a0was my R \u201ctraining wheels\u201d.<\/p>\n<p><code>sqldf<\/code>\u00a0let\u2019s you perform SQL queries on your R data frames. People coming over from SAS will find it very familiar and anyone with basic SQL skills will have no trouble using it\u2013<code>sqldf<\/code>\u00a0uses\u00a0<a href=\"http:\/\/www.sqlite.org\/lang.html\" target=\"_nofollow\" rel=\"noopener\">SQLite syntax.<\/a><\/p>\n<div id=\"gist4750570\">\n<div>\n<table cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td><\/td>\n<td>\n<div id=\"file-sqldf_examples-r-LC1\">\n<div class=\"box-wrapper light\">\n<div class=\"box light\">\n<div id=\"file-sqldf_examples-r-LC1\"><em>library(sqldf)<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC2\"><\/div>\n<div id=\"file-sqldf_examples-r-LC3\"><em>sqldf(\u201cSELECT<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC4\"><em>day<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC5\"><em>, avg(temp) as avg_temp<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC6\"><em>FROM beaver2<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC7\"><em>GROUP BY<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC8\"><em>day;\u201d)<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC9\"><\/div>\n<div id=\"file-sqldf_examples-r-LC10\"><em># day avg_temp<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC11\"><em>#1 307 37.57931<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC12\"><em>#2 308 37.71308<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC13\"><\/div>\n<div id=\"file-sqldf_examples-r-LC14\"><em>#beavers1 and beavers2 come with base R<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC15\"><em>beavers &lt;- sqldf(\u201cselect * from beaver1<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC16\"><em>union all<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC17\"><em>select * from beaver2;\u201d)<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC18\"><em>#head(beavers)<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC19\"><em># day time temp activ<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC20\"><em>#1 346 840 36.33 0<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC21\"><em>#2 346 850 36.34 0<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC22\"><em>#3 346 900 36.35 0<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC23\"><em>#4 346 910 36.42 0<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC24\"><em>#5 346 920 36.55 0<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC25\"><em>#6 346 930 36.69 0<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC26\"><\/div>\n<div id=\"file-sqldf_examples-r-LC27\"><em>movies &lt;- data.frame(<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC28\"><em>title=c(\u201cThe Great Outdoors\u201d, \u201cCaddyshack\u201d, \u201cFletch\u201d, \u201cDays of Thunder\u201d, \u201cCrazy Heart\u201d),<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC29\"><em>year=c(1988, 1980, 1985, 1990, 2009)<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC30\"><em>)<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC31\"><em>boxoffice &lt;- data.frame(<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC32\"><em>title=c(\u201cThe Great Outdoors\u201d, \u201cCaddyshack\u201d, \u201cFletch\u201d, \u201cDays of Thunder\u201d,\u201dTop Gun\u201d),<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC33\"><em>revenue=c(43455230, 39846344, 59600000, 157920733, 353816701)<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC34\"><em>)<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC35\"><\/div>\n<div id=\"file-sqldf_examples-r-LC36\"><em>sqldf(\u201cSELECT<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC37\"><em>m.*<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC38\"><em>, b.revenue<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC39\"><em>FROM<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC40\"><em>movies m<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC41\"><em>INNER JOIN<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC42\"><em>boxoffice b<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC43\"><em>ON m.title = b.title;\u201d)<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC44\"><\/div>\n<div id=\"file-sqldf_examples-r-LC45\"><em># title year revenue<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC46\"><em>#1 The Great Outdoors 1988 43455230<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC47\"><em>#2 Caddyshack 1980 39846344<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC48\"><em>#3 Fletch 1985 59600000<\/em><\/div>\n<div id=\"file-sqldf_examples-r-LC49\"><em>#4 Days of Thunder 1990 157920733<\/em><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div><\/div>\n<div><\/div>\n<\/div>\n<\/li>\n<li>\n<pre>install.packages(\"forecast\")<\/pre>\n<p>I don\u2019t do time series analysis very often, but when I do\u00a0<code>forecast<\/code>\u00a0is my library of choice.\u00a0<code>forecast<\/code>\u00a0makes it incredibly easy to fit time series models like ARIMA, ARMA, AR, Exponential Smoothing, etc.<\/p>\n<div id=\"gist4750929\">\n<div>\n<table cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td><\/td>\n<td>\n<div id=\"file-forecast_example-r-LC1\">\n<div class=\"box-wrapper light\">\n<div class=\"box light\">\n<div id=\"file-forecast_example-r-LC1\">library(forecast)<\/div>\n<div id=\"file-forecast_example-r-LC2\"><\/div>\n<div id=\"file-forecast_example-r-LC3\"># mdeaths: Monthly Deaths from Lung Diseases in the UK<\/div>\n<div id=\"file-forecast_example-r-LC4\">fit &lt;- auto.arima(mdeaths)<\/div>\n<div id=\"file-forecast_example-r-LC5\">#customize your confidence intervals<\/div>\n<div id=\"file-forecast_example-r-LC6\">forecast(fit, level=c(80, 95, 99), h=3)<\/div>\n<div id=\"file-forecast_example-r-LC7\"># Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 Lo 99 Hi 99<\/div>\n<div id=\"file-forecast_example-r-LC8\">#Jan 1980 1822.863 1564.192 2081.534 1427.259 2218.467 1302.952 2342.774<\/div>\n<div id=\"file-forecast_example-r-LC9\">#Feb 1980 1923.190 1635.530 2210.851 1483.251 2363.130 1345.012 2501.368<\/div>\n<div id=\"file-forecast_example-r-LC10\">#Mar 1980 1789.153 1495.048 2083.258 1339.359 2238.947 1198.023 2380.283<\/div>\n<div id=\"file-forecast_example-r-LC11\"><\/div>\n<div id=\"file-forecast_example-r-LC12\">plot(forecast(fit), shadecols=\u201doldstyle\u201d)<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div><a href=\"https:\/\/gist.github.com\/glamp\/4750929\/raw\/d2492ba0d64e125e3b0b54ee69e45bedaeda80ef\/forecast_example.R\">\u00a0<\/a><\/div>\n<\/div>\n<p>My favorite feature is the resulting forecast plot.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft\" title=\"10 R packages which will master you in R Tool\" src=\"http:\/\/blog.yhathq.com\/static\/img\/forecast.png\" alt=\"forecast 10 R packages which will master you in R Tool\" width=\"480\" height=\"300\" \/><\/li>\n<\/ol>\n<h5><\/h5>\n<h5>3.install.packages(\u201cplyr\u201d)<\/h5>\n<p>When I first started using R, I was using basic control operations for manipulating data (for, if, while, etc.). I quickly learned that this was an amateur move, and that there was a better way to do it.<\/p>\n<p>In R, the\u00a0<a title=\"A brief introduction to apply functions in R\" href=\"http:\/\/nsaunders.wordpress.com\/2010\/08\/20\/a-brief-introduction-to-apply-in-r\/#more-2058\"><code>apply<\/code>\u00a0family of functions<\/a>\u00a0is the preferred way to call a function on each element of a list or vector. While Base R has this out of the box, its usage can be tricky to master. I\u2019ve found the\u00a0<a title=\"Plyr Package for R\" href=\"http:\/\/cran.r-project.org\/web\/packages\/plyr\/index.html\"><code>plyr<\/code>\u00a0package<\/a>\u00a0to be an easy to use substitute for\u00a0<code>split<\/code>,\u00a0<code>apply<\/code>,<code>combine<\/code>\u00a0functionality in Base R.<\/p>\n<p><code>plyr<\/code>\u00a0gives you several functions (<code>ddply<\/code>,\u00a0<code>daply<\/code>,\u00a0<code>dlply<\/code>,\u00a0<code>adply<\/code>,\u00a0<code>ldply<\/code>) following a common blueprint: Split a data structure into groups, apply a function on each group, return the results in a data structure.<\/p>\n<p><code>ddply<\/code>\u00a0splits a data frame and returns a data frame (hence the dd).\u00a0<code>daply<\/code>\u00a0splits a data frame and results an array (hence the da). Hopefully you\u2019re getting the idea here.<\/p>\n<div id=\"gist4750641\">\n<div>\n<table cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>\n<div class=\"box-wrapper light\">\n<div class=\"box light\">\n<div id=\"file-plyr_examples-r-LC1\">library(plyr)<\/div>\n<div id=\"file-plyr_examples-r-LC2\"><\/div>\n<div id=\"file-plyr_examples-r-LC3\"># split a data frame by Species, summarize it, then convert the results<\/div>\n<div id=\"file-plyr_examples-r-LC4\"># into a data frame<\/div>\n<div id=\"file-plyr_examples-r-LC5\">ddply(iris, .(Species), summarise,<\/div>\n<div id=\"file-plyr_examples-r-LC6\">mean_petal_length=mean(Petal.Length)<\/div>\n<div id=\"file-plyr_examples-r-LC7\">)<\/div>\n<div id=\"file-plyr_examples-r-LC8\"># Species mean_petal_length<\/div>\n<div id=\"file-plyr_examples-r-LC9\">#1 setosa 1.462<\/div>\n<div id=\"file-plyr_examples-r-LC10\">#2 versicolor 4.260<\/div>\n<div id=\"file-plyr_examples-r-LC11\">#3 virginica 5.552<\/div>\n<div id=\"file-plyr_examples-r-LC12\"><\/div>\n<div id=\"file-plyr_examples-r-LC13\"># split a data frame by Species, summarize it, then convert the results<\/div>\n<div id=\"file-plyr_examples-r-LC14\"># into an array<\/div>\n<div id=\"file-plyr_examples-r-LC15\">unlist(daply(iris[,4:5], .(Species), colwise(mean)))<\/div>\n<div id=\"file-plyr_examples-r-LC16\"># setosa.Petal.Width versicolor.Petal.Width virginica.Petal.Width<\/div>\n<div id=\"file-plyr_examples-r-LC17\"># 0.246 1.326 2.026<\/div>\n<\/div>\n<\/div>\n<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<pre class=\"\">4.install.packages(\"stringr\")<\/pre>\n<p>I find base R\u2019s string functionality to be extremely difficult and cumbersome to use. Another package written by\u00a0<a title=\"Hadley Wickham blog\" href=\"http:\/\/had.co.nz\/\">Hadley Wickham<\/a>,<code>stringr<\/code>, provides some much needed string operators in R. Many of the functions use data strcutures that aren\u2019t commonly used when doing basic analysis.<\/p>\n<p><img decoding=\"async\" title=\"10 R packages which will master you in R Tool\" src=\"http:\/\/blog.yhathq.com\/static\/img\/stringr.png\" alt=\"stringr 10 R packages which will master you in R Tool\" \/><br \/>\n<code>stringr<\/code>\u00a0is remarkably easy to use. Nearly all of the functions (and all of the important ones) are prefixed with \u201cstr\u201d so they\u2019re very easy to remember.<\/p>\n<div id=\"gist4750670\">\n<div>\n<table cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td><\/td>\n<td>\n<div id=\"file-stringr_example-r-LC1\">\n<div class=\"box-wrapper light\">\n<div class=\"box light\">\n<div id=\"file-stringr_example-r-LC1\">library(stringr)<\/div>\n<div id=\"file-stringr_example-r-LC2\"><\/div>\n<div id=\"file-stringr_example-r-LC3\">names(iris)<\/div>\n<div id=\"file-stringr_example-r-LC4\">#[1] \u201cSepal.Length\u201d \u201cSepal.Width\u201d \u201cPetal.Length\u201d \u201cPetal.Width\u201d \u201cSpecies\u201d<\/div>\n<div id=\"file-stringr_example-r-LC5\">names(iris) &lt;- str_replace_all(names(iris), \u201c[.]\u201c, \u201c_\u201d)<\/div>\n<div id=\"file-stringr_example-r-LC6\">names(iris)<\/div>\n<div id=\"file-stringr_example-r-LC7\">#[1] \u201cSepal_Length\u201d \u201cSepal_Width\u201d \u201cPetal_Length\u201d \u201cPetal_Width\u201d \u201cSpecies\u201d<\/div>\n<div id=\"file-stringr_example-r-LC8\"><\/div>\n<div id=\"file-stringr_example-r-LC9\">s &lt;- c(\u201cGo to Heaven for the climate, Hell for the company.\u201d)<\/div>\n<div id=\"file-stringr_example-r-LC10\">str_extract_all(s, \u201c[H][a-z]+ \u201c)<\/div>\n<div id=\"file-stringr_example-r-LC11\">#[[1]]<\/div>\n<div id=\"file-stringr_example-r-LC12\">#[1] \u201cHeaven \u201d \u201cHell \u201c<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div><a href=\"https:\/\/gist.github.com\/glamp\/4750670\/raw\/fbc3df4d2725c0bba67a391a038add9702cc8722\/stringr_example.R\">\u00a0<\/a><\/div>\n<\/div>\n<h5 id=\"dbdriver\">5.<a href=\"http:\/\/blog.yhathq.com\/posts\/10-R-packages-I-wish-I-knew-about-earlier.html#dbdriver\" target=\"_nofollow\" rel=\"noopener\">The database driver package of your choice<\/a><\/h5>\n<pre>install.packages(\"RPostgreSQL\")\ninstall.packages(\"RMySQL\")\ninstall.packages(\"RMongo\")\ninstall.packages(\"RODBC\")\ninstall.packages(\"RSQLite\")<\/pre>\n<p>Everyone does it when they first start (myself included). You\u2019ve just written an awesome query in your preferred SQL editor. Everything is perfect \u2013 the column names are all\u00a0<a title=\"Snake Case Wikipedia Page\" href=\"http:\/\/en.wikipedia.org\/wiki\/Snake_case\">snake case<\/a>, the dates have the right datatype, you finally debugged the\u00a0<code>\"must appear in the GROUP BY clause or be used in an aggregate function\"<\/code>\u00a0issue. You\u2019re ready to do some analysis in R, so you run the query in your SQL editor, copy the results to a csv (or\u2026God forbid\u2026 .xlsx) and read into R.\u00a0<strong>You don\u2019t have to do this!<\/strong><\/p>\n<p>R has great drivers for nearly every conceivable database. On the off chance you\u2019re using a database which doesn\u2019t have a standalone driver (SQL Server), you can always use\u00a0<a title=\"RODBC R SQL Driver\" href=\"http:\/\/cran.r-project.org\/web\/packages\/RODBC\/index.html\"><code>RODBC<\/code><\/a>.<\/p>\n<div id=\"gist4750770\">\n<div>\n<table cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>\n<div class=\"box-wrapper light\">\n<div class=\"box light\">\n<div id=\"file-dbdriver_example-r-LC1\">library(RPostgreSQL)<\/div>\n<div id=\"file-dbdriver_example-r-LC2\"><\/div>\n<div id=\"file-dbdriver_example-r-LC3\">drv &lt;- dbDriver(\u201cPostgreSQL\u201d)<\/div>\n<div id=\"file-dbdriver_example-r-LC4\">db &lt;- dbConnect(drv, dbname=\u201dncaa\u201d,<\/div>\n<div id=\"file-dbdriver_example-r-LC5\">user=\u201dYOUR USER NAME\u201d, password=\u201dYOUR PASSWORD\u201d)<\/div>\n<div id=\"file-dbdriver_example-r-LC6\"><\/div>\n<div id=\"file-dbdriver_example-r-LC7\">q &lt;- \u201cSELECT<\/div>\n<div id=\"file-dbdriver_example-r-LC8\">*<\/div>\n<div id=\"file-dbdriver_example-r-LC9\">FROM<\/div>\n<div id=\"file-dbdriver_example-r-LC10\">game_scores;\u201d<\/div>\n<div id=\"file-dbdriver_example-r-LC11\"><\/div>\n<div id=\"file-dbdriver_example-r-LC12\">data &lt;- dbGetQuery(db, q)<\/div>\n<div id=\"file-dbdriver_example-r-LC13\">head(data)<\/div>\n<div id=\"file-dbdriver_example-r-LC14\">#id school game_date spread school_score opponent opp_score was_home<\/div>\n<div id=\"file-dbdriver_example-r-LC15\">#1 45111 Boston College 1985-11-16 6.0 21 Syracuse 41 False<\/div>\n<div id=\"file-dbdriver_example-r-LC16\">#2 45112 Boston College 1985-11-02 13.5 12 Penn State 16 False<\/div>\n<div id=\"file-dbdriver_example-r-LC17\">#3 45113 Boston College 1985-10-26 -11.0 17 Cincinnati 24 False<\/div>\n<div id=\"file-dbdriver_example-r-LC18\">#4 45114 Boston College 1985-10-12 -2.0 14 Army 45 False<\/div>\n<div id=\"file-dbdriver_example-r-LC19\">#5 45115 Boston College 1985-09-28 5.0 10 Miami 45 True<\/div>\n<div id=\"file-dbdriver_example-r-LC20\">#6 45116 Boston College 1985-09-21 6.5 29 Pittsburgh 22 False<\/div>\n<div id=\"file-dbdriver_example-r-LC21\">nrow(data)<\/div>\n<div id=\"file-dbdriver_example-r-LC22\">#[1] 30932<\/div>\n<div id=\"file-dbdriver_example-r-LC23\">ncol(data)<\/div>\n<div id=\"file-dbdriver_example-r-LC24\">#[1] 8<\/div>\n<\/div>\n<\/div>\n<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div><a href=\"https:\/\/gist.github.com\/glamp\/4750770\/raw\/d41663533a5fcff430c65cd812bcd88fb57855d9\/dbdriver_example.R\">\u00a0<\/a><\/div>\n<\/div>\n<p>Next time you\u2019ve got that perfect query written, just paste it into R and execute it using\u00a0<a title=\"PostgreSQL Driver for R\" href=\"http:\/\/cran.r-project.org\/web\/packages\/RPostgreSQL\/index.html\"><code>RPostgreSQL<\/code><\/a>,\u00a0<a title=\"mySQL Driver for R\" href=\"http:\/\/cran.r-project.org\/web\/packages\/RMySQL\/index.html\"><code>RMySQL<\/code><\/a>,\u00a0<a title=\"MongoDB Driver for R\" href=\"http:\/\/cran.r-project.org\/web\/packages\/RMongo\/index.html\"><code>RMongo<\/code><\/a>,\u00a0<a title=\"SQLite Driver for R\" href=\"http:\/\/cran.r-project.org\/web\/packages\/RSQLite\/index.html\"><code>RMongo<\/code><\/a>, or\u00a0<code>RODBC<\/code>. In addition to preventing you from having tens of hundreds of CSV files sitting arround, running the query in R saves you time both in I\/O but also in converting datatypes. Dates, times, and datetimes will be automatically set to their R equivalent. It also makes your R script reproducible, so you or someone else on your team can easily produce the same results.<\/p>\n<h5><a href=\"http:\/\/cran.r-project.org\/web\/packages\/lubridate\/lubridate.pdf\" target=\"_nofollow\" rel=\"noopener\">\u00a0<\/a><\/h5>\n<pre>6.install.packages(\"lubridate\")<\/pre>\n<p>I\u2019ve never had great luck with dates in R. I\u2019ve never fully grasped the idiosyncracies of working with\u00a0<a title=\"POSIX Wikipedia Page\" href=\"http:\/\/en.wikipedia.org\/wiki\/POSIX\">POSIXs<\/a>\u00a0vs. R Dates. Enter<a title=\"Lubridate Github Repository\" href=\"https:\/\/github.com\/hadley\/lubridate\"><code>lubridate<\/code><\/a>.<\/p>\n<p><code>lubridate<\/code>\u00a0is one of those magical libraries that just seems to do\u00a0<em>exactly<\/em>\u00a0what you expect it to. The functions all have obvious names like\u00a0<code>year<\/code>,\u00a0<code>month<\/code>,\u00a0<code>ymd<\/code>, and\u00a0<code>ymd_hms<\/code>. It\u2019s similar to\u00a0<a title=\"\" href=\"http:\/\/momentjs.com\/\"><code>Moment.js<\/code><\/a>\u00a0for those familiar with javascript.<\/p>\n<div id=\"gist4750846\">\n<div>\n<table cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td><\/td>\n<td>\n<div id=\"file-lubridate_example-r-LC1\">\n<div class=\"box-wrapper light\">\n<div class=\"box light\">\n<div id=\"file-lubridate_example-r-LC1\">library(lubridate)<\/div>\n<div id=\"file-lubridate_example-r-LC2\"><\/div>\n<div id=\"file-lubridate_example-r-LC3\">year(\u201c2012-12-12\u2033)<\/div>\n<div id=\"file-lubridate_example-r-LC4\">#[1] 2012<\/div>\n<div id=\"file-lubridate_example-r-LC5\">day(\u201c2012-12-12\u2033)<\/div>\n<div id=\"file-lubridate_example-r-LC6\">#[1] 12<\/div>\n<div id=\"file-lubridate_example-r-LC7\">ymd(\u201c2012-12-12\u2033)<\/div>\n<div id=\"file-lubridate_example-r-LC8\">#1 parsed with %Y-%m-%d<\/div>\n<div id=\"file-lubridate_example-r-LC9\">#[1] \u201c2012-12-12 UTC\u201d<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div><a href=\"https:\/\/gist.github.com\/glamp\/4750846\/raw\/fa910f237b7a95417e37ef720e49be547d90dcd5\/lubridate_example.R\">\u00a0<\/a><\/div>\n<\/div>\n<p>Here\u2019s a really handy reference card that I found in a\u00a0<a href=\"http:\/\/www.jstatsoft.org\/v40\/i03\/paper\">paper<\/a>. It covers just about everything you might conveivably want to do to a date. I\u2019ve also found this\u00a0<a href=\"http:\/\/blog.yhathq.com\/static\/pdf\/R_date_cheat_sheet.pdf\">Date Cheat Sheet<\/a>\u00a0to be a handy reference.<\/p>\n<h5><a href=\"http:\/\/docs.ggplot2.org\/current\/\" target=\"_nofollow\" rel=\"noopener\">\u00a0<\/a><\/h5>\n<pre>7.install.packages(\"ggplot2\")<\/pre>\n<p>Another Hadley Wickham pacakge and probably his most widely known one.\u00a0<code>ggplot2<\/code>\u00a0ranks high on everyone\u2019s list of favorite R pacakges. It\u2019s easy to use and it produces some great looking plots. It\u2019s a great way to present your work, and there are many resources available to help you get started.<\/p>\n<ul>\n<li><a title=\"ggplot2: Elegant Graphics for Data Analysis\" href=\"http:\/\/amzn.to\/WbYXus\">Elegant Graphics for Data Analysis by Hadley Wickham (Amazon)<\/a><\/li>\n<li><a title=\"Yaksis Blog - A Rosetta Stone for ggplot and Excel graphics\" href=\"http:\/\/www.yaksis.com\/posts\/r-chart-chooser.html\">A Rosetta Stone for Excel to\u00a0<code>ggplot<\/code>\u00a0(Yaksis Blog)<\/a><\/li>\n<li><a title=\"Hadley Wickham Google presentation on ggplot2\" href=\"http:\/\/www.youtube.com\/watch?v=TaxJwC_MP9Q\">Hadley Wickham ggplot2 Presentation at Google (youtube)<\/a><\/li>\n<li><a title=\"R Graphics Cookbook on Amazon by Winston Chang\" href=\"http:\/\/amzn.to\/WYrFAu\">R Graphics Cookbook by Winston Chang (Amazon)<\/a><\/li>\n<\/ul>\n<h5><a href=\"http:\/\/cran.r-project.org\/web\/packages\/qcc\/index.html\" target=\"_nofollow\" rel=\"noopener\">\u00a0<\/a><\/h5>\n<pre>8.install.packages(\"qcc\")<\/pre>\n<p><code>qcc<\/code>\u00a0is a library for\u00a0<a title=\"Statistical Process Control on Wikipedia\" href=\"http:\/\/en.wikipedia.org\/wiki\/Statistical_process_control\">statistical quality control<\/a>. Back in the 1950s, the now defunct\u00a0<a title=\" western electric company\" href=\"http:\/\/en.wikipedia.org\/wiki\/Western_Electric\">Western Electric Company<\/a>\u00a0was looking for a better way to detect problems with telephone and eletrical lines. They came up with a\u00a0<a title=\"Western Electric Rules\" href=\"http:\/\/en.wikipedia.org\/wiki\/Western_Electric_rules\">set of rules<\/a>\u00a0to help them identify problematic lines. The rules look at the historical mean of a series of datapoints and based on the standard deviation, the rules help judge whether a new set of points is experiencing a mean shift.<\/p>\n<p>The classic example is monitoring a machine that produces\u00a0<a title=\"Lug Nuts on Wikipedia\" href=\"http:\/\/en.wikipedia.org\/wiki\/Lug_nut\">lug nuts<\/a>. Let\u2019s say the machine is supposed to produce 2.5 inch long lug nuts. We measure a series of lug nuts: 2.48, 2.47, 2.51, 2.52, 2.54, 2.42, 2.52, 2.58, 2.51. Is the machine broken? Well it\u2019s hard to tell, but the Western Electric Rules can help.<\/p>\n<div id=\"gist4751004\">\n<div>\n<table cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>\n<div class=\"box-wrapper light\">\n<div class=\"box light\">\n<div id=\"file-qcc_example-r-LC1\">library(qcc)<\/div>\n<div id=\"file-qcc_example-r-LC2\"><\/div>\n<div id=\"file-qcc_example-r-LC3\"># series of value w\/ mean of 10 with a little random noise added in<\/div>\n<div id=\"file-qcc_example-r-LC4\">x &lt;- rep(10, 100) + rnorm(100)<\/div>\n<div id=\"file-qcc_example-r-LC5\"># a test series w\/ a mean of 11<\/div>\n<div id=\"file-qcc_example-r-LC6\">new.x &lt;- rep(11, 15) + rnorm(15)<\/div>\n<div id=\"file-qcc_example-r-LC7\"># qcc will flag the new points<\/div>\n<div id=\"file-qcc_example-r-LC8\">qcc(x, newdata=new.x, type=\u201dxbar.one\u201d)<\/div>\n<\/div>\n<\/div>\n<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div><a href=\"https:\/\/gist.github.com\/glamp\/4751004\/raw\/9768d9f51a0ad149de51d24a2f2f1a2df12a4d5b\/qcc_example.R\">\u00a0<\/a><\/div>\n<\/div>\n<p>While you might not be monitoring telephone lines,\u00a0<code>qcc<\/code>\u00a0can help you monitor transaction volumes, visitors or logins on your website, database operations, and lots of other processes.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft\" title=\"10 R packages which will master you in R Tool\" src=\"http:\/\/blog.yhathq.com\/static\/img\/qcc.png\" alt=\"qcc 10 R packages which will master you in R Tool\" width=\"560\" height=\"400\" \/><\/p>\n<p>&nbsp;<\/p>\n<pre>9.install.packages(\"reshape2\")<\/pre>\n<p>I always find that the hardest part of any sort of analysis is getting the data into the right format.\u00a0<code>reshape2<\/code>\u00a0is yet another package by Hadley Wickham that specializes in converting data from\u00a0<a title=\"wide data format\" href=\"http:\/\/en.wikipedia.org\/wiki\/Wide_and_narrow_data#Wide\">wide<\/a>\u00a0to\u00a0<a title=\"long data format\" href=\"http:\/\/en.wikipedia.org\/wiki\/Wide_and_narrow_data#Narrow\">long<\/a>\u00a0format and vice versa. I use it all the time in conjunction with<code>ggplot2<\/code>\u00a0and\u00a0<code>plyr<\/code>.<\/p>\n<div id=\"gist4751055\">\n<div>\n<table cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td><\/td>\n<td>\n<div id=\"file-reshape_example-r-LC1\">\n<div class=\"box-wrapper light\">\n<div class=\"box light\">\n<div id=\"file-reshape_example-r-LC1\">library(reshape2)<\/div>\n<div id=\"file-reshape_example-r-LC2\"><\/div>\n<div id=\"file-reshape_example-r-LC3\"># generate a unique id for each row; this let\u2019s us go back to wide format later<\/div>\n<div id=\"file-reshape_example-r-LC4\">iris$id &lt;- 1:nrow(iris)<\/div>\n<div id=\"file-reshape_example-r-LC5\"><\/div>\n<div id=\"file-reshape_example-r-LC6\">iris.lng &lt;- melt(iris, id=c(\u201cid\u201d, \u201cSpecies\u201d))<\/div>\n<div id=\"file-reshape_example-r-LC7\">head(iris.lng)<\/div>\n<div id=\"file-reshape_example-r-LC8\"># id Species variable value<\/div>\n<div id=\"file-reshape_example-r-LC9\">#1 1 setosa Sepal.Length 5.1<\/div>\n<div id=\"file-reshape_example-r-LC10\">#2 2 setosa Sepal.Length 4.9<\/div>\n<div id=\"file-reshape_example-r-LC11\">#3 3 setosa Sepal.Length 4.7<\/div>\n<div id=\"file-reshape_example-r-LC12\">#4 4 setosa Sepal.Length 4.6<\/div>\n<div id=\"file-reshape_example-r-LC13\">#5 5 setosa Sepal.Length 5.0<\/div>\n<div id=\"file-reshape_example-r-LC14\">#6 6 setosa Sepal.Length 5.4<\/div>\n<div id=\"file-reshape_example-r-LC15\"><\/div>\n<div id=\"file-reshape_example-r-LC16\">iris.wide &lt;- dcast(iris.lng, id + Species ~ variable)<\/div>\n<div id=\"file-reshape_example-r-LC17\">head(iris.wide)<\/div>\n<div id=\"file-reshape_example-r-LC18\"># id Species Sepal.Length Sepal.Width Petal.Length Petal.Width<\/div>\n<div id=\"file-reshape_example-r-LC19\">#1 1 setosa 5.1 3.5 1.4 0.2<\/div>\n<div id=\"file-reshape_example-r-LC20\">#2 2 setosa 4.9 3.0 1.4 0.2<\/div>\n<div id=\"file-reshape_example-r-LC21\">#3 3 setosa 4.7 3.2 1.3 0.2<\/div>\n<div id=\"file-reshape_example-r-LC22\">#4 4 setosa 4.6 3.1 1.5 0.2<\/div>\n<div id=\"file-reshape_example-r-LC23\">#5 5 setosa 5.0 3.6 1.4 0.2<\/div>\n<div id=\"file-reshape_example-r-LC24\">#6 6 setosa 5.4 3.9 1.7 0.4<\/div>\n<div id=\"file-reshape_example-r-LC25\"><\/div>\n<div id=\"file-reshape_example-r-LC26\">library(ggplot2)<\/div>\n<div id=\"file-reshape_example-r-LC27\"><\/div>\n<div id=\"file-reshape_example-r-LC28\"># plots a histogram for each numeric column in the dataset<\/div>\n<div id=\"file-reshape_example-r-LC29\">p &lt;- ggplot(aes(x=value, fill=Species), data=iris.lng)<\/div>\n<div id=\"file-reshape_example-r-LC30\">p + geom_histogram() +<\/div>\n<div id=\"file-reshape_example-r-LC31\">facet_wrap(~variable, scales=\u201dfree\u201d)<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div><a href=\"https:\/\/gist.github.com\/glamp\/4751055\/raw\/549edb23a71fd279ca5f7cbe59ce063374694356\/reshape_example.R\">\u00a0<\/a><\/div>\n<\/div>\n<p>It\u2019s a great way to quickly take a look at a dataset and get your bearings. You can use the\u00a0<code>melt<\/code>\u00a0function to convert wide data to long data, and\u00a0<code>dcast<\/code>\u00a0to go from long to wide.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"10 R packages which will master you in R Tool\" src=\"http:\/\/blog.yhathq.com\/static\/img\/melt_color_plot.png\" alt=\"melt color plot 10 R packages which will master you in R Tool\" width=\"560\" height=\"400\" \/><\/p>\n<h5><a href=\"http:\/\/cran.r-project.org\/web\/packages\/randomForest\/index.html\" target=\"_nofollow\" rel=\"noopener\">\u00a0<\/a><\/h5>\n<pre>10.install.packages(\"randomForest\")<\/pre>\n<p>This list wouldn\u2019t be complete without including at least\u00a0<em>one<\/em>\u00a0machine learning package you can\u00a0<a title=\"impress your friends\" href=\"http:\/\/www.r-bloggers.com\/10-r-one-liners-to-impress-your-friends\/\">impress your friends<\/a>\u00a0with.\u00a0<a title=\"Random Forest on Wikipedia\" href=\"http:\/\/en.wikipedia.org\/wiki\/Random_forest\">Random Forest<\/a>\u00a0is a great algorithm to start with. It\u2019s easy to use, can do supervised or unsupervised learning, it can be used with many differnet types of datasets, but most importantly\u00a0<b>it\u2019s effective!<\/b>\u00a0Here\u2019s how it works in R.<\/p>\n<div id=\"gist4751178\">\n<div>\n<table cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>\n<div class=\"box-wrapper light\">\n<div class=\"box light\">\n<div id=\"file-randomforest_example-r-LC1\">library(randomForest)<\/div>\n<div id=\"file-randomforest_example-r-LC2\"><\/div>\n<div id=\"file-randomforest_example-r-LC3\"># download Titanic Survivors data<\/div>\n<div id=\"file-randomforest_example-r-LC4\">data &lt;- read.table(\u201chttp:\/\/math.ucdenver.edu\/RTutorial\/titanic.txt\u201d, h=T, sep=\u201d\\t\u201d)<\/div>\n<div id=\"file-randomforest_example-r-LC5\"># make survived into a yes\/no<\/div>\n<div id=\"file-randomforest_example-r-LC6\">data$Survived &lt;- as.factor(ifelse(data$Survived==1, \u201cyes\u201d, \u201cno\u201d))<\/div>\n<div id=\"file-randomforest_example-r-LC7\"><\/div>\n<div id=\"file-randomforest_example-r-LC8\"># split into a training and test set<\/div>\n<div id=\"file-randomforest_example-r-LC9\">idx &lt;- runif(nrow(data)) &lt;= .75<\/div>\n<div id=\"file-randomforest_example-r-LC10\">data.train &lt;- data[idx,]<\/div>\n<div id=\"file-randomforest_example-r-LC11\">data.test &lt;- data[-idx,]<\/div>\n<div id=\"file-randomforest_example-r-LC12\"><\/div>\n<div id=\"file-randomforest_example-r-LC13\"># train a random forest<\/div>\n<div id=\"file-randomforest_example-r-LC14\">rf &lt;- randomForest(Survived ~ PClass + Age + Sex,<\/div>\n<div id=\"file-randomforest_example-r-LC15\">data=data.train, importance=TRUE, na.action=na.omit)<\/div>\n<div id=\"file-randomforest_example-r-LC16\"><\/div>\n<div id=\"file-randomforest_example-r-LC17\"># how important is each variable in the model<\/div>\n<div id=\"file-randomforest_example-r-LC18\">imp &lt;- importance(rf)<\/div>\n<div id=\"file-randomforest_example-r-LC19\">o &lt;- order(imp[,3], decreasing=T)<\/div>\n<div id=\"file-randomforest_example-r-LC20\">imp[o,]<\/div>\n<div id=\"file-randomforest_example-r-LC21\"># no yes MeanDecreaseAccuracy MeanDecreaseGini<\/div>\n<div id=\"file-randomforest_example-r-LC22\">#Sex 51.49855 53.30255 55.13458 63.46861<\/div>\n<div id=\"file-randomforest_example-r-LC23\">#PClass 25.48715 24.12522 28.43298 22.31789<\/div>\n<div id=\"file-randomforest_example-r-LC24\">#Age 20.08571 14.07954 24.64607 19.57423<\/div>\n<div id=\"file-randomforest_example-r-LC25\"><\/div>\n<div id=\"file-randomforest_example-r-LC26\"># confusion matrix [[True Neg, False Pos], [False Neg, True Pos]]<\/div>\n<div id=\"file-randomforest_example-r-LC27\">table(data.test$Survived, predict(rf, data.test), dnn=list(\u201cactual\u201d, \u201cpredicted\u201d))<\/div>\n<div id=\"file-randomforest_example-r-LC28\"># predicted<\/div>\n<div id=\"file-randomforest_example-r-LC29\">#actual no yes<\/div>\n<div id=\"file-randomforest_example-r-LC30\"># no 427 16<\/div>\n<div id=\"file-randomforest_example-r-LC31\"># yes 117 195<\/div>\n<\/div>\n<\/div>\n<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div><a href=\"https:\/\/gist.github.com\/glamp\/4751178\/raw\/99bce228fab31fe63292919e48ff242114d7c629\/randomforest_example.R\">\u00a0<\/a><\/div>\n<div>Courtesy:<a href=\"http:\/\/www.yhathq.com\/\" target=\"_blank\" rel=\"noopener\">http:\/\/www.yhathq.com<\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>R can be more prickly and obscure than other languages like Python or Java. The good news is that there are tons of packages which&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-762","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/762","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=762"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/762\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=762"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=762"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=762"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}