{"id":673,"date":"2014-10-20T13:12:23","date_gmt":"2014-10-20T20:12:23","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=673"},"modified":"2014-10-20T13:12:23","modified_gmt":"2014-10-20T20:12:23","slug":"use-sql-queries-to-manipulate-data-frames-in-r-with-sqldf-package","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2014\/10\/20\/use-sql-queries-to-manipulate-data-frames-in-r-with-sqldf-package\/","title":{"rendered":"Use SQL queries to manipulate data frames in R with sqldf package"},"content":{"rendered":"<p>I&#8217;ve covered a few topics in the past including the <a href=\"http:\/\/gettinggeneticsdone.blogspot.com\/2009\/11\/split-apply-and-combine-in-r-using-plyr.html\">plyr<\/a> package, which is kind of like &#8220;GROUP BY&#8221; for R, and the <a href=\"http:\/\/gettinggeneticsdone.blogspot.com\/2009\/04\/merging-data-from-different-files-using.html\">merge function for merging datasets<\/a>. I only recently found the <a href=\"http:\/\/code.google.com\/p\/sqldf\/\">sqldf package<\/a> for R, and it&#8217;s already one of the most useful packages I&#8217;ve ever installed. The main function in the package is sqldf(), which takes a quoted string as an argument. You can treat data frames as tables as if they were in a relational database. You can use some of the finer aspects of SQL like the INNER JOIN or the subquery, which are extremely difficult operations to mimic using standard R programming. While this isn&#8217;t an SQL tutorial, try out some of these commands to see what sqldf can do for you. Read more about the <a href=\"http:\/\/code.google.com\/p\/sqldf\/\">sqldf package here<\/a>.<\/p>\n<pre class=\"brush:plain\">&gt; # install the package\n&gt; install.packages(\"sqldf\")\n&gt; \n&gt; #load it\n&gt; library(sqldf)\n&gt; \n&gt; # set the random seed\n&gt; set.seed(42)\n&gt; \n&gt; #generate some data\n&gt; df1 = data.frame(id=1:10,class=rep(c(\"case\",\"ctrl\"),5))\n&gt; df2 = data.frame(id=1:10,cov=round(runif(10)*10,1))\n&gt; \n&gt; #look at the data\n&gt; df1\n\u00a0\u00a0 id class\n1\u00a0\u00a0 1\u00a0 case\n2\u00a0\u00a0 2\u00a0 ctrl\n3\u00a0\u00a0 3\u00a0 case\n4\u00a0\u00a0 4\u00a0 ctrl\n5\u00a0\u00a0 5\u00a0 case\n6\u00a0\u00a0 6\u00a0 ctrl\n7\u00a0\u00a0 7\u00a0 case\n8\u00a0\u00a0 8\u00a0 ctrl\n9\u00a0\u00a0 9\u00a0 case\n10 10\u00a0 ctrl\n&gt; df2\n\u00a0\u00a0 id cov\n1\u00a0\u00a0 1 9.1\n2\u00a0\u00a0 2 9.4\n3\u00a0\u00a0 3 2.9\n4\u00a0\u00a0 4 8.3\n5\u00a0\u00a0 5 6.4\n6\u00a0\u00a0 6 5.2\n7\u00a0\u00a0 7 7.4\n8\u00a0\u00a0 8 1.3\n9\u00a0\u00a0 9 6.6\n10 10 7.1\n&gt; \n&gt; # do an inner join\n&gt; sqldf(\"select * from df1 join df2 on df1.id=df2.id\")\n\u00a0\u00a0 id class id cov\n1\u00a0\u00a0 1\u00a0 case\u00a0 1 9.1\n2\u00a0\u00a0 2\u00a0 ctrl\u00a0 2 9.4\n3\u00a0\u00a0 3\u00a0 case\u00a0 3 2.9\n4\u00a0\u00a0 4\u00a0 ctrl\u00a0 4 8.3\n5\u00a0\u00a0 5\u00a0 case\u00a0 5 6.4\n6\u00a0\u00a0 6\u00a0 ctrl\u00a0 6 5.2\n7\u00a0\u00a0 7\u00a0 case\u00a0 7 7.4\n8\u00a0\u00a0 8\u00a0 ctrl\u00a0 8 1.3\n9\u00a0\u00a0 9\u00a0 case\u00a0 9 6.6\n10 10\u00a0 ctrl 10 7.1\n&gt; \n&gt; # where clauses\n&gt; sqldf(\"select * from df1 join df2 on df1.id=df2.id where class='case'\")\n\u00a0 id class id cov\n1\u00a0 1\u00a0 case\u00a0 1 9.1\n2\u00a0 3\u00a0 case\u00a0 3 2.9\n3\u00a0 5\u00a0 case\u00a0 5 6.4\n4\u00a0 7\u00a0 case\u00a0 7 7.4\n5\u00a0 9\u00a0 case\u00a0 9 6.6\n&gt; \n&gt; # lots of sql fun\n&gt; sqldf(\"select df1.id, df2.cov as covariate from df1 join df2 on df1.id=df2.id where class='case' and cov&gt;3 order by cov\")\n\u00a0 id covariate\n1\u00a0 5\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 6.4\n2\u00a0 9\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 6.6\n3\u00a0 7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 7.4\n4\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 9.1<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve covered a few topics in the past including the plyr package, which is kind of like &#8220;GROUP BY&#8221; for R, and the merge function&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-673","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/673","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=673"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/673\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=673"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=673"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=673"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}