{"id":544,"date":"2014-02-28T13:17:55","date_gmt":"2014-02-28T18:17:55","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=544"},"modified":"2014-02-28T13:17:55","modified_gmt":"2014-02-28T18:17:55","slug":"simple-spatial-correlograms-for-cross-country-analysis-in-r","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2014\/02\/28\/simple-spatial-correlograms-for-cross-country-analysis-in-r\/","title":{"rendered":"Simple Spatial Correlograms for Cross-Country Analysis in R"},"content":{"rendered":"<p>Accounting for temporal dependence in econometric analysis is important, as the presence of temporal dependence violates the assumption that observations are independent units. Historically, much less attention has been paid to correcting for spatial dependence, which, if present, also violates this independence assumption.<\/p>\n<p>The comparability of temporal and spatial dependence is useful for illustrating why spatial dependence may matter. For example, a time-series analysis of a county\u2019s GDP would examine how related this year\u2019s figure was to preceding years, whereas a spatial analysis would examine how related one country\u2019s figure was to that of their neighbours.<\/p>\n<p>However, these temporal and spatial issues are not completely analogous. This difference is highlighted with the creation of lagged variables. In time-series, the lag structure is intuitively easy to comprehend. A variable y_{t} has lags of y_{t-1}, \u2026, y_{t-k}.<\/p>\n<p>Lagging a spatial variable introduces more ambiguity. A spatially lagged variable of y_{i} can be calculated as \\rho(y_{i}) where \\rho is a spatial weighting matrix. Interested readers should consult further sources on this, but needless to say there are a number of ways (nearest neighbour, distance etc.) in which we could define \\rho.<\/p>\n<p>The difference between spatial and temporal processes in the creation of lagged variables is somewhat compounded when we consider that it is not possible to create \u2018lags of lags\u2019 in the same manner as in time-series (so \\rho^2 is not the valid). Your\u00a0neighbour\u2019s\u00a0neighbour might be your neighbour.<\/p>\n<p>Given the difficulty in creating spatial lags of spatial lags, is there an easy way of recreating a simple time-series correlogram for a spatial variable? Yes, indeed it is. The solution I present here is not an exclusive way to analyze a spatially dependent variable.<\/p>\n<p>For this analysis I use different countries spatial coordinates and create a correlogram of population density in 1900 (these data are taken from this<a href=\"http:\/\/ideas.repec.org\/p\/ess\/wpaper\/id2792.html\" target=\"_blank\" rel=\"noopener\">paper<\/a>). The first step involves loading these data into the R workspace such that the data frame contains the latitude and longitude coordinates alongside the potentially spatially autocorrelated variable.<\/p>\n<p>&nbsp;<\/p>\n<div>\n<div id=\"highlighter_999477\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>\n<div>1<\/div>\n<div>2<\/div>\n<div>3<\/div>\n<div>4<\/div>\n<div>5<\/div>\n<div>6<\/div>\n<div>7<\/div>\n<div>8<\/div>\n<div>9<\/div>\n<\/td>\n<td>\n<div>\n<div><code>&gt; pop &lt;- <\/code><code>read.csv<\/code><code>(<\/code><code>\"dat2.csv\"<\/code><code>)<\/code><\/div>\n<div><code>&gt; <\/code><code>head<\/code><code>(pop)<\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/code><code>lon\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 lat\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 country\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 dens<\/code><\/div>\n<div><code>1\u00a0 66.02554\u00a0 33.83319\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Afghanistan\u00a0 79.2543748<\/code><\/div>\n<div><code>2\u00a0 17.55091 -12.29862\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Angola\u00a0 24.2375278<\/code><\/div>\n<div><code>3\u00a0 20.07006\u00a0 41.14267\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Albania 282.7854408<\/code><\/div>\n<div><code>4\u00a0 54.33089\u00a0 23.91317 United Arab Emirates\u00a0\u00a0 5.0438742<\/code><\/div>\n<div><code>5\u00a0 44.93819\u00a0 40.29368\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Armenia\u00a0\u00a0 0.2136146<\/code><\/div>\n<div><code>6 134.48679 -25.73251\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Australia\u00a0\u00a0 4.8896695<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>With the data frame, I calculated a distance matrix with the \u2018rdist.earth\u2019 function from the package \u2018<a href=\"http:\/\/cran.r-project.org\/web\/packages\/fields\/index.html\" target=\"_blank\" rel=\"noopener\">fields<\/a>\u2019. This function measures the distance between each set of data points based on their coordinates. This function recognizes that the world is not flat, and as such calculates what are known as great-circle distances.<\/p>\n<p>&nbsp;<\/p>\n<div>\n<div id=\"highlighter_123506\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>\n<div>1<\/div>\n<div>2<\/div>\n<div>3<\/div>\n<div>4<\/div>\n<div>5<\/div>\n<\/td>\n<td>\n<div>\n<div><code># calculate great cirlce distances with fields package<\/code><\/div>\n<div><code>library<\/code><code>(fields)<\/code><\/div>\n<div><code># popdists matrix<\/code><\/div>\n<div><code>popdists &lt;- <\/code><code>as.matrix<\/code><code>(<\/code><code>rdist.earth<\/code><code>(<\/code><code>cbind<\/code><code>(pop$lon, pop$lat), miles = F, R = <\/code><code>NULL<\/code><code>))<\/code><\/div>\n<div><code>diag<\/code><code>(popdists) &lt;- 0<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>With this distance based matrix it is possible to calculate a correlogram. This is performed through the function \u2018autocorr\u2019. There are three arguments. The first is w is the distance-based weights matrix. The second, x is the variable of interest (note: the order of x and w should be the same as in the original data frame), and the third are the distance bands.<\/p>\n<p>&nbsp;<\/p>\n<div>\n<div id=\"highlighter_103080\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>\n<div>1<\/div>\n<div>2<\/div>\n<div>3<\/div>\n<div>4<\/div>\n<div>5<\/div>\n<div>6<\/div>\n<div>7<\/div>\n<div>8<\/div>\n<div>9<\/div>\n<div>10<\/div>\n<div>11<\/div>\n<div>12<\/div>\n<div>13<\/div>\n<div>14<\/div>\n<div>15<\/div>\n<div>16<\/div>\n<div>17<\/div>\n<div>18<\/div>\n<div>19<\/div>\n<\/td>\n<td>\n<div>\n<div><code># autocorr function<\/code><\/div>\n<div><code>autocorr &lt;- <\/code><code>function<\/code><code>(w,x,dist=500){<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>aa &lt;- <\/code><code>ceiling<\/code><code>(<\/code><code>max<\/code><code>(w)\/dist)<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>dists &lt;- <\/code><code>seq<\/code><code>(0,aa*dist,dist)<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>cors &lt;- <\/code><code>NULL<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>for<\/code><code>(i <\/code><code>in<\/code> <code>1:aa){<\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0<\/code><code>w1 &lt;- <\/code><code>ifelse<\/code><code>(w &gt; dists[i] &amp; w &lt;= dists[i+1], 1, 0) <\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0<\/code><code>w2 &lt;- w1<\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0<\/code><code>for<\/code><code>(j <\/code><code>in<\/code> <code>1:<\/code><code>dim<\/code><code>(w1)[1]){<\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/code><code>nu &lt;- <\/code><code>sum<\/code><code>(w1[j,])<\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/code><code>if<\/code><code>(nu&gt;0){<\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/code><code>w2[j,] &lt;- w1[j,]\/nu<\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/code><code>}\u00a0 <\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0<\/code><code>}<\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0<\/code><code>lag &lt;- w2 %*% x<\/code><\/div>\n<div><code>\u00a0\u00a0\u00a0\u00a0<\/code><code>cors &lt;- <\/code><code>c<\/code><code>(cors,<\/code><code>cor<\/code><code>(x,lag))<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>}<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>return<\/code><code>(cors)<\/code><\/div>\n<div><code>}<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The function works in the following manner. For each step there is a distance band. Here, I have specified that the distance bands are 500km. For the first-step, the function calculates the correlation of the variable of interest (x) and its spatially lagged neighbours \u2013 who are defined as having a centre-point within a 500km radius. The next step calculates this correlation for neighbours falling into the 500km to 1,000 km distance band, so there will be a new spatial weighting matrix \\rho. This is an iterative procedure, and continues up until the bands exceed the furthest neighbours in the data frame.<\/p>\n<p>&nbsp;<\/p>\n<div>\n<div id=\"highlighter_1617\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>\n<div>1<\/div>\n<div>2<\/div>\n<div>3<\/div>\n<div>4<\/div>\n<div>5<\/div>\n<div>6<\/div>\n<div>7<\/div>\n<div>8<\/div>\n<div>9<\/div>\n<div>10<\/div>\n<div>11<\/div>\n<div>12<\/div>\n<div>13<\/div>\n<div>14<\/div>\n<div>15<\/div>\n<div>16<\/div>\n<div>17<\/div>\n<div>18<\/div>\n<div>19<\/div>\n<div>20<\/div>\n<div>21<\/div>\n<div>22<\/div>\n<div>23<\/div>\n<div>24<\/div>\n<div>25<\/div>\n<div>26<\/div>\n<\/td>\n<td>\n<div>\n<div><code># at 500km<\/code><\/div>\n<div><code>ac1 &lt;- <\/code><code>autocorr<\/code><code>(w=popdists,x=pop$dens,dist=500)<\/code><\/div>\n<div><code># 1000 km<\/code><\/div>\n<div><code>ac2 &lt;- <\/code><code>autocorr<\/code><code>(w=popdists,x=pop$dens,dist=1000)<\/code><\/div>\n<div><\/div>\n<div><code># MC analysis<\/code><\/div>\n<div><code>it &lt;- 1000<\/code><\/div>\n<div><code>mc &lt;- <\/code><code>matrix<\/code><code>(<\/code><code>NA<\/code><code>,nrow=it,ncol=<\/code><code>length<\/code><code>(ac1))<\/code><\/div>\n<div><code>for<\/code><code>(i <\/code><code>in<\/code> <code>1:it){<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>pop$rand &lt;- <\/code><code>sample<\/code><code>(pop$dens,<\/code><code>length<\/code><code>(pop$dens),replace=F)<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>mc[i,] &lt;- <\/code><code>autocorr<\/code><code>(w=popdists,x=pop$rand,dist=500)<\/code><\/div>\n<div><code>}<\/code><\/div>\n<div><\/div>\n<div><code>library<\/code><code>(ggplot2)<\/code><\/div>\n<div><code>ac1 &lt;- <\/code><code>data.frame<\/code><code>(<\/code><code>cbind<\/code><code>(ac1,<\/code><code>seq<\/code><code>(500,20000,500)))<\/code><\/div>\n<div><code>ac1 &lt;- <\/code><code>cbind<\/code><code>(ac1,<\/code><code>t<\/code><code>(<\/code><code>apply<\/code><code>(mc,2,quantile, probs = <\/code><code>c<\/code><code>(0.025,0.975))))<\/code><\/div>\n<div><code>names<\/code><code>(ac1) &lt;- <\/code><code>c<\/code><code>(<\/code><code>\"ac\"<\/code><code>,<\/code><code>\"dist\"<\/code><code>,<\/code><code>\"lci\"<\/code><code>,<\/code><code>\"uci\"<\/code><code>)<\/code><\/div>\n<div><\/div>\n<div><code>ggplot<\/code><code>(ac1, <\/code><code>aes<\/code><code>(dist, ac)) +<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>geom_point<\/code><code>(colour = <\/code><code>\"darkblue\"<\/code><code>, size = 3) +<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>geom_line<\/code><code>(colour = <\/code><code>\"red\"<\/code><code>) +<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>scale_x_continuous<\/code><code>(<\/code><code>'Great Circle Distance (KM)'<\/code><code>,limits=<\/code><code>c<\/code><code>(0,20000)) + <\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>scale_y_continuous<\/code><code>(<\/code><code>'Autocorrelation'<\/code><code>,limits=<\/code><code>c<\/code><code>(-0.4,0.7)) +<\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>theme_bw<\/code><code>() + <\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>geom_hline<\/code><code>(yintercept=0) +\u00a0\u00a0 <\/code><\/div>\n<div><code>\u00a0\u00a0<\/code><code>geom_smooth<\/code><code>(<\/code><code>aes<\/code><code>(ymin = lci, ymax = uci), stat=<\/code><code>\"identity\"<\/code><code>,fill=<\/code><code>\"blue\"<\/code><code>,colour=<\/code><code>\"darkblue\"<\/code><code>) <\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>I also calculated 95% confidence intervals using a simple Monte Carlo technique. This code, and that to generate the nice ggplot2 figure is shown below.<\/p>\n<p>The analysis in the above provides a simple example of how R can be used to generate descriptive spatial statistics. Those interested in implementing spatial methodologies should examine the wide variety of packages on CRAN which can be used for spatial analysis. The \u2018<a href=\"http:\/\/cran.r-project.org\/web\/packages\/spdep\/index.html\" target=\"_blank\" rel=\"noopener\">spdep<\/a>\u2018 package provides most of the essentials. The recent release of the \u2018<a href=\"http:\/\/cran.r-project.org\/web\/packages\/splm\/index.html\" target=\"_blank\" rel=\"noopener\">splm<\/a>\u2018\u00a0package, containing functions for panel regression analysis, is also noteworthy, and potentially very important, contribution.<\/p>\n<p><a href=\"http:\/\/i2.wp.com\/diffuseprior.files.wordpress.com\/2012\/05\/rplot1.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" title=\"rplot1\" alt=\"\" src=\"http:\/\/i2.wp.com\/diffuseprior.files.wordpress.com\/2012\/05\/rplot1.png?w=456\" width=\"456\" height=\"320\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Accounting for temporal dependence in econometric analysis is important, as the presence of temporal dependence violates the assumption that observations are independent units. Historically, much&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[],"class_list":["post-544","post","type-post","status-publish","format-standard","hentry","category-phddissertation"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/544","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=544"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/544\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=544"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=544"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=544"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}