{"id":298,"date":"2013-11-08T15:36:59","date_gmt":"2013-11-08T20:36:59","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=298"},"modified":"2013-11-08T15:36:59","modified_gmt":"2013-11-08T20:36:59","slug":"how-to-join-data-frames-in-r-inner-outer-left-right","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2013\/11\/08\/how-to-join-data-frames-in-r-inner-outer-left-right\/","title":{"rendered":"How to join data frames in R (inner, outer, left, right)?"},"content":{"rendered":"<div id=\"question-header\"><\/div>\n<div id=\"mainbar\">\n<div id=\"question\" data-questionid=\"1299871\">\n<table>\n<tbody>\n<tr>\n<td><a title=\"This question shows research effort; it is useful and clear\">\u00a0<\/a><\/td>\n<td>\n<div itemprop=\"description\">\n<p>Given two data frames<\/p>\n<pre><code>df1 = data.frame(CustomerId=c(1:6),Product=c(rep(\"Toaster\",3),rep(\"Radio\",3)))\ndf2 = data.frame(CustomerId=c(2,4,6),State=c(rep(\"Alabama\",2),rep(\"Ohio\",1)))\n\n&gt; df1\n  CustomerId Product\n           1 Toaster\n           2 Toaster\n           3 Toaster\n           4   Radio\n           5   Radio\n           6   Radio\n\n&gt; df2\n  CustomerId   State\n           2 Alabama\n           4 Alabama\n           6    Ohio<\/code><\/pre>\n<p>How can I do database style, i.e., sql style, joins? That is, how do I get:<\/p>\n<ul>\n<li>An inner join of df1 and df1<\/li>\n<li>An outer join of df1 and df2<\/li>\n<li>A left outer join of df1 and df2<\/li>\n<li>A right outer join of df1 and df2<\/li>\n<\/ul>\n<p>P.S. IKT-JARQ (I Know This &#8211; Just Adding R Questions)<\/p>\n<p>Extra credit:<\/p>\n<p>How can I do a sql style select statement?<\/p>\n<\/div>\n<div><a title=\"show questions tagged 'r'\" href=\"http:\/\/stackoverflow.com\/questions\/tagged\/r\" rel=\"tag\">r<\/a>\u00a0<a title=\"show questions tagged 'join'\" href=\"http:\/\/stackoverflow.com\/questions\/tagged\/join\" rel=\"tag\">join<\/a>\u00a0<a title=\"show questions tagged 'r-faq'\" href=\"http:\/\/stackoverflow.com\/questions\/tagged\/r-faq\" rel=\"tag\">r-faq<\/a><\/div>\n<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div id=\"answers\"><a name=\"tab-top\"><\/a><\/p>\n<div id=\"answers-header\">\n<table>\n<tbody>\n<tr>\n<td><\/td>\n<td>\n<div>\n<p>By using the merge function and its optional parameters:<\/p>\n<p>Inner join:\u00a0<code>merge(df1, df2)<\/code>\u00a0will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify\u00a0<code>merge(df1, df2, by=\"CustomerId\")<\/code>\u00a0to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.<\/p>\n<p>Outer join:\u00a0<code>merge(x = df1, y = df2, by = \"CustomerId\", all = TRUE)<\/code><\/p>\n<p>Left outer:\u00a0<code>merge(x = df1, y = df2, by = \"CustomerId\", all.x=TRUE)<\/code><\/p>\n<p>Right outer:\u00a0<code>merge(x = df1, y = df2, by = \"CustomerId\", all.y=TRUE)<\/code><\/p>\n<p>Cross join:\u00a0<code>merge(x = df1, y = df2, by = NULL)<\/code><\/p>\n<p><span style=\"text-decoration: line-through;\">Just as with the inner join, you would probably want to explicitly pass &#8220;CustomerId&#8221; to R as the matching variable.<\/span>\u00a0I think it&#8217;s almost always best to explicitly state the identifiers on which you want to merge; it&#8217;s safer if the input data.frames change unexpectedly and easier to read later on.<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div id=\"careers3\"><\/div>\n<div id=\"answer-1307824\" data-answerid=\"1307824\">\n<table>\n<tbody>\n<tr>\n<td><\/td>\n<td>\n<div>\n<p>I would recommend checking out\u00a0<a href=\"http:\/\/crantastic.org\/packages\/sqldf\">Gabor Grothendieck&#8217;s sqldf package<\/a>, which allows you to express these operations in SQL.<\/p>\n<pre><code>library(sqldf)\n\n## inner join\ndf3 &lt;- sqldf(\"SELECT CustomerId, Product, State \n       FROM df1 JOIN df2 USING(CustomerID)\")\n\n## left join (substitute 'right' for right join)\ndf4 &lt;- sqldf(\"SELECT CustomerId, Product, State \n       FROM df1 LEFT JOIN df2 USING(CustomerID)\")<\/code><\/pre>\n<p>I find the SQL syntax to be simpler and more natural than its R equivalent (but this may just reflect my RDBMS bias).<\/p>\n<p>See\u00a0<a href=\"http:\/\/code.google.com\/p\/sqldf\/#Example%5F4.%5FJoin\">Gabor&#8217;s sqldf Google Code page<\/a>\u00a0for more information<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>\u00a0 Given two data frames df1 = data.frame(CustomerId=c(1:6),Product=c(rep(&#8220;Toaster&#8221;,3),rep(&#8220;Radio&#8221;,3))) df2 = data.frame(CustomerId=c(2,4,6),State=c(rep(&#8220;Alabama&#8221;,2),rep(&#8220;Ohio&#8221;,1))) &gt; df1 CustomerId Product 1 Toaster 2 Toaster 3 Toaster 4 Radio 5 Radio&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-298","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/298","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=298"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/298\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=298"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=298"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=298"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}