{"id":646,"date":"2014-06-20T11:53:50","date_gmt":"2014-06-20T18:53:50","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=646"},"modified":"2014-06-20T11:53:50","modified_gmt":"2014-06-20T18:53:50","slug":"for-loops-and-how-to-avoid-them","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2014\/06\/20\/for-loops-and-how-to-avoid-them\/","title":{"rendered":"For loops (and how to avoid them)"},"content":{"rendered":"<p><span style=\"color: #444444;\">My experience when starting out in R was trying to clean and recode data using<\/span><i style=\"color: #444444;\">for()<\/i><span style=\"color: #444444;\">\u00a0loops,\u00a0usually with a few\u00a0<\/span><i style=\"color: #444444;\">if()<\/i><span style=\"color: #444444;\">\u00a0statements in the loop as well, and finding the whole thing complicated and frustrating.<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">In this post, I&#8217;ll go over how you can avoid\u00a0<\/span><i style=\"color: #444444;\">for()<\/i><span style=\"color: #444444;\">\u00a0loops for both improving the quality and speed of your programming, as well as your sanity.<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">So here we have our classic dataset called mydata.Rdata<\/span><i style=\"color: #444444;\">\u00a0<\/i><span style=\"color: #444444;\">(you can download this if you want, link at the right):<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><\/p>\n<div class=\"separator\" style=\"color: #444444;\"><a style=\"color: #205b87;\" href=\"http:\/\/4.bp.blogspot.com\/-BLNIicTAbZg\/UOz10XoDiGI\/AAAAAAAANCo\/eJw8GeLVNnk\/s1600\/mydata.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/4.bp.blogspot.com\/-BLNIicTAbZg\/UOz10XoDiGI\/AAAAAAAANCo\/eJw8GeLVNnk\/s320\/mydata.png\" alt=\"\" width=\"320\" height=\"188\" border=\"0\" \/><\/a><\/div>\n<p><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">And if I were in Stata and wanted to create an age group variable, I could just do:<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">gen Agegroup=1<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">replace Agegroup=2 if Age&gt;10 &amp; Age&lt;20<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">replace Agegroup=3 if Age&gt;=20<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">But when I try this in R, it fails:<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><a style=\"color: #205b87;\" href=\"http:\/\/1.bp.blogspot.com\/-BJvQtVV4HkE\/UOz10bTL4UI\/AAAAAAAANCs\/C6YNNsrJpGI\/s1600\/failcode.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/1.bp.blogspot.com\/-BJvQtVV4HkE\/UOz10bTL4UI\/AAAAAAAANCs\/C6YNNsrJpGI\/s640\/failcode.png\" alt=\"\" width=\"640\" height=\"88\" border=\"0\" \/><\/a><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">Why does it fail? It fails because Age is a\u00a0<\/span><i style=\"color: #444444;\">vector<\/i><span style=\"color: #444444;\">\u00a0so the condition<\/span><span style=\"color: #444444;\">if(mydata$Age&lt;10)<\/span><span style=\"color: #444444;\">\u00a0is asking &#8220;is the vector Age less than 10&#8221;, which is not what we want to know. \u00a0We want to ask, row by row is each element of Age&lt;10, so we need to specify\u00a0the element of the vector we&#8217;re referring to. We don&#8217;t specify the element and thus we get the warning (really, error), &#8220;only the first element will be used.&#8221; \u00a0So when this fails, the first way people try to solve this problem is with a crazy\u00a0<\/span><i style=\"color: #444444;\">for()<\/i><span style=\"color: #444444;\">\u00a0loop like this:<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">###########Unnecessarily\u00a0long and ugly code below#######<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">mydata$Agegroup1&lt;-0<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\"><br \/>\n<\/span><span style=\"color: #444444;\">for (i in \u00a01:10){<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">\u00a0 if(mydata$Age[i]&gt;10 &amp; mydata$Age[i]&lt;20){<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">\u00a0 \u00a0 mydata$Agegroup1[i]&lt;-1<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">\u00a0 }<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">\u00a0 if(mydata$Age[i]&gt;=20){<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">\u00a0 \u00a0 mydata$Agegroup1[i]&lt;-2<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">\u00a0 }<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">}<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">Here we tell R to go down the rows from i=1 to i=10, and for each of those rows indexed by i, check to see what value of Age it is, and then assign Agegroup a value of 1 or 2. \u00a0<\/span><b style=\"color: #444444;\">This works, but at a high cost<\/b><span style=\"color: #444444;\">\u00a0&#8211; you can easily make a mistake with all those indexed vectors, and also\u00a0<\/span><b style=\"color: #444444;\"><i>for()<\/i>\u00a0loops take a lot of computing time<\/b><span style=\"color: #444444;\">, which would be a big deal if this dataset were 10000 observations instead of 10.<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">So how can we avoid doing this?<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">One of the most useful functions I have found is one that I have referred to a number of times in my blog so far &#8211; the\u00a0<\/span><span style=\"color: #444444;\"><a style=\"color: #205b87;\" href=\"http:\/\/stat.ethz.ch\/R-manual\/R-patched\/library\/base\/html\/ifelse.html\" target=\"_blank\" rel=\"noopener\">ifelse()<\/a><\/span><span style=\"color: #444444;\">\u00a0function. \u00a0The\u00a0<\/span><span style=\"color: #444444;\">ifelse()<\/span><span style=\"color: #444444;\">function evaluates a condition, and then assigns a value if it&#8217;s true and a value if it&#8217;s false. \u00a0The great part about it is that it can read in a vector and check each element of the vector one by one so\u00a0<\/span><b style=\"color: #444444;\">you don&#8217;t need indices or a loop<\/b><span style=\"color: #444444;\">. You don&#8217;t even need to initialize some new variable before you run the statement. \u00a0Like this:<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">mydata$newvariable&lt;-ifelse(<span style=\"color: blue;\">Condition of some variable<\/span>,<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\"><span style=\"color: #38761d;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Value of\u00a0<\/span><\/span><span style=\"color: #38761d;\">new variable if condition is true<\/span><span style=\"color: #444444;\">,\u00a0<\/span><br style=\"color: #444444;\" \/><span style=\"color: #990000;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Value of new variable if condition is false<\/span><span style=\"color: #444444;\">)<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\"><br \/>\n<\/span><span style=\"color: #444444;\">so for example:<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">mydata$Old&lt;-ifelse(<span style=\"color: blue;\">mydata$Age&gt;40<\/span>,<span style=\"color: #38761d;\">1<\/span>,<span style=\"color: #990000;\">0<\/span>)<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">This says, check to see if the elements of the vector mydata$Age are greater than 40: if an element is greater than 40, it assigns the value of 1 to mydata$Old, and if it&#8217;s not greater than 40, it assigns the value of 0 to mydata$Old.<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">But we wanted to assign values 0, 1, and 2 to an Agegroup variable. \u00a0To do this, we can use\u00a0<\/span><b style=\"color: #444444;\">nested\u00a0ifelse()\u00a0statements<\/b><span style=\"color: #444444;\">:<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">mydata$Agegroup2&lt;-ifelse(mydata$Age&gt;10 &amp; mydata$Age&lt;20,1, \u00a0 \u00a0\u00a0<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ifelse(mydata$Age&gt;20, 2,0))<\/span><\/p>\n<div style=\"color: #444444;\"><\/div>\n<div style=\"color: #444444;\">Now this says, first check whether each element of the Age vector is &gt;10 and &lt;20. \u00a0If it is, assign 1 to Agegroup2. \u00a0If it&#8217;s not, then evaluate the next\u00a0<i>ifelse()<\/i>statement, whether Age&gt;20. \u00a0If it is, assign Agegroup2 a value of 2. \u00a0If it&#8217;s not any of those, then assign it 0. \u00a0We can see that both the loop and the\u00a0<i>ifelse()<\/i>statements give us the same result:<\/div>\n<div style=\"color: #444444;\"><\/div>\n<div class=\"separator\" style=\"color: #444444;\"><a style=\"color: #205b87;\" href=\"http:\/\/4.bp.blogspot.com\/-jrGbK-ElWKY\/UO13qJzFI1I\/AAAAAAAANDM\/O28xL79tD1o\/s1600\/resultsagegroup.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/4.bp.blogspot.com\/-jrGbK-ElWKY\/UO13qJzFI1I\/AAAAAAAANDM\/O28xL79tD1o\/s400\/resultsagegroup.png\" alt=\"\" width=\"400\" height=\"166\" border=\"0\" \/><\/a><\/div>\n<div style=\"color: #444444;\"><\/div>\n<p><span style=\"color: #444444;\">You can nest\u00a0<\/span><i style=\"color: #444444;\">ifelse()<\/i><span style=\"color: #444444;\">\u00a0statement as much as you like.\u00a0<\/span><b style=\"color: #444444;\">Just be careful about your final category<\/b><span style=\"color: #444444;\">\u00a0&#8211; it assigns the last value to whatever values are left over that didn&#8217;t meet any condition (including if a value is NA!) so make sure you want that to happen.<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444; text-decoration: underline;\">Other examples of ways to use the\u00a0<i>ifelse()<\/i>\u00a0function:<\/span><\/p>\n<ul style=\"color: #444444;\">\n<li>If you want to\u00a0<b>add a column with the mean of Weight by sex for each individual,<\/b>\u00a0you can do this with\u00a0<i>ifelse()<\/i>\u00a0like this:<\/li>\n<\/ul>\n<div style=\"color: #444444;\">\n<div>mydata$meanweight.bysex&lt;-ifelse(mydata$Sex==0,<\/div>\n<div>\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0mean(mydata$Weight[mydata$Sex==0], na.rm=TRUE),<\/div>\n<div>\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0mean(mydata$Weight[mydata$Sex==1], na.rm=TRUE))<\/div>\n<\/div>\n<div style=\"color: #444444;\"><\/div>\n<p>&nbsp;<\/p>\n<div class=\"separator\" style=\"color: #444444;\"><a style=\"color: #205b87;\" href=\"http:\/\/1.bp.blogspot.com\/-Wsp0KaOwqFc\/UO2c9d5n7VI\/AAAAAAAANDo\/xXYYXBcxLVc\/s1600\/meanweightsex.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"http:\/\/1.bp.blogspot.com\/-Wsp0KaOwqFc\/UO2c9d5n7VI\/AAAAAAAANDo\/xXYYXBcxLVc\/s1600\/meanweightsex.png\" alt=\"\" border=\"0\" \/><\/a><\/div>\n<p>&nbsp;<\/p>\n<ul style=\"color: #444444;\">\n<li>If you want to\u00a0<b>recode missing values<\/b>:<\/li>\n<\/ul>\n<p><span style=\"color: #444444;\">mydata$Height.recode&lt;-ifelse(is.na(mydata$Height),<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 9999,\u00a0<\/span><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 mydata$Height)<\/span><\/p>\n<div style=\"color: #444444;\"><\/div>\n<div style=\"color: #444444;\">\n<ul>\n<li>If you want to\u00a0<b>combine two variables together into a new one<\/b>, such as to create a new ID variable based on year (which I added to this dataframe) and ID:<\/li>\n<\/ul>\n<div>\n<div>mydata$ID.long&lt;-ifelse(mydata$ID&lt;10,<\/div>\n<div>\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 paste(mydata$year, &#8220;-0&#8243;,mydata$ID,sep=&#8221;&#8221;),<\/div>\n<div>\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 paste(mydata$year, &#8220;-&#8220;, mydata$ID, sep=&#8221;&#8221;))<\/div>\n<\/div>\n<\/div>\n<div style=\"color: #444444;\"><\/div>\n<div style=\"color: #444444;\"><\/div>\n<div class=\"separator\" style=\"color: #444444;\"><a style=\"color: #205b87;\" href=\"http:\/\/2.bp.blogspot.com\/-0sVqhZldAxQ\/UO27nymJV6I\/AAAAAAAANEE\/LcLbb6dZZKI\/s1600\/id.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"http:\/\/2.bp.blogspot.com\/-0sVqhZldAxQ\/UO27nymJV6I\/AAAAAAAANEE\/LcLbb6dZZKI\/s1600\/id.png\" alt=\"\" border=\"0\" \/><\/a><\/div>\n<div style=\"color: #444444;\"><\/div>\n<div style=\"color: #444444;\"><span style=\"text-decoration: underline;\">Other ways to avoid the for loop:<\/span><\/div>\n<div style=\"color: #444444;\">\n<ul>\n<li><b>The apply functions<\/b>: \u00a0If you think you have to use a loop because you have to apply some sort of function to each observation in your data, think again! Use the\u00a0<i>apply()<\/i>\u00a0functions instead. \u00a0For example:\n<ul>\n<li>If you have a lot of missing values and want to recode them all at once, or want to sum up the number of times you see a certain value in a row,\u00a0<a style=\"color: #205b87;\" href=\"http:\/\/rforpublichealth.blogspot.com\/2012\/09\/the-infamous-apply-function.html\" target=\"_blank\" rel=\"noopener\">check out my post on the apply function here<\/a>.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li>You can also\u00a0<b>use other functions<\/b>\u00a0such as\u00a0<i>cut()<\/i>\u00a0to do the age grouping above.\u00a0<a style=\"color: #205b87;\" href=\"http:\/\/rforpublichealth.blogspot.com\/2012\/09\/from-continuous-to-categorical.html\" target=\"_blank\" rel=\"noopener\">Here&#8217;s the post on how this function works<\/a>, so I won&#8217;t go over it again, except to say if you convert from a factor to a numeric, *always* convert to a character before converting it to numeric:<\/li>\n<\/ul>\n<\/div>\n<p><span style=\"color: #444444;\">mydata$Agegroup3&lt;-as.numeric(as.character(cut(mydata$Age, c(0,10,20,100),labels=0:2)))<\/span><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><br style=\"color: #444444;\" \/><span style=\"color: #444444;\">Basically,\u00a0<\/span><b style=\"color: #444444;\">any time you think you have to do a loop, think about how you can do it with another function<\/b><span style=\"color: #444444;\">. It will save you a lot of time and mistakes in your code.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>My experience when starting out in R was trying to clean and recode data usingfor()\u00a0loops,\u00a0usually with a few\u00a0if()\u00a0statements in the loop as well, and finding&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-646","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/646","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=646"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/646\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=646"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=646"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=646"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}