{"id":1111,"date":"2019-11-08T17:07:28","date_gmt":"2019-11-09T00:07:28","guid":{"rendered":"http:\/\/www.zhuoyao.net\/?p=1111"},"modified":"2019-11-08T17:07:28","modified_gmt":"2019-11-09T00:07:28","slug":"exercise-2-nested-logit-model","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2019\/11\/08\/exercise-2-nested-logit-model\/","title":{"rendered":"Exercise 2: Nested logit model"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/cran.r-project.org\/web\/packages\/mlogit\/vignettes\/e2nlogit.html\">https:\/\/cran.r-project.org\/web\/packages\/mlogit\/vignettes\/e2nlogit.html<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Kenneth Train and Yves Croissant<\/h4>\n\n\n\n<h4 class=\"wp-block-heading\">2019-07-22<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The data set <code>HC<\/code> from <code>mlogit<\/code> contains data in <code>R<\/code> format on the choice of heating and central cooling system for 250 single-family, newly built houses in California.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The alternatives are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Gas central heat with cooling <code>gcc<\/code>,<\/li><li>Electric central resistence heat with cooling <code>ecc<\/code>,<\/li><li>Electric room resistence heat with cooling <code>erc<\/code>,<\/li><li>Electric heat pump, which provides cooling also <code>hpc<\/code>,<\/li><li>Gas central heat without cooling <code>gc<\/code>,<\/li><li>Electric central resistence heat without cooling <code>ec<\/code>,<\/li><li>Electric room resistence heat without cooling <code>er<\/code>.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Heat pumps necessarily provide both heating and cooling such that heat pump without cooling is not an alternative.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The variables are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>depvar<\/code> gives the name of the chosen alternative,<\/li><li><code>ich.alt<\/code> are the installation cost for the heating portion of the system,<\/li><li><code>icca<\/code> is the installation cost for cooling<\/li><li><code>och.alt<\/code> are the operating cost for the heating portion of the system<\/li><li><code>occa<\/code> is the operating cost for cooling<\/li><li><code>income<\/code> is the annual income of the household<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Note that the full installation cost of alternative <code>gcc<\/code> is <code>ich.gcc+icca<\/code>, and similarly for the operating cost and for the other alternatives with cooling.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Run a nested logit model on the data for two nests and one log-sum \ncoefficient that applies to both nests. Note that the model is specified\n to have the cooling alternatives (<code>gcc},<\/code>ecc}, <code>erc},<\/code>hpc}) in one nest and the non-cooling alternatives (<code>gc},<\/code>ec}, `er}) in another nest.<\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"mlogit\")\ndata(\"HC\", package = \"mlogit\")\nHC &lt;- mlogit.data(HC, varying = c(2:8, 10:16), choice = \"depvar\", shape = \"wide\")\ncooling.modes &lt;- index(HC)$alt %in% c('gcc', 'ecc', 'erc', 'hpc')\nroom.modes &lt;- index(HC)$alt %in% c('erc', 'er')\n# installation \/ operating costs for cooling are constants, \n# only relevant for mixed systems\nHC$icca[!cooling.modes] &lt;- 0\nHC$occa[!cooling.modes] &lt;- 0\n# create income variables for two sets cooling and rooms\nHC$inc.cooling &lt;- HC$inc.room &lt;- 0\nHC$inc.cooling[cooling.modes] &lt;- HC$income[cooling.modes]\nHC$inc.room[room.modes] &lt;- HC$income[room.modes]\n# create an intercet for cooling modes\nHC$int.cooling &lt;- as.numeric(cooling.modes)\n# estimate the model with only one nest elasticity\nnl &lt;- mlogit(depvar ~ ich + och +icca + occa + inc.room + inc.cooling + int.cooling | 0, HC,\n             nests = list(cooling = c('gcc','ecc','erc','hpc'), \n             other = c('gc', 'ec', 'er')), un.nest.el = TRUE)\nsummary(nl)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## \n## Call:\n## mlogit(formula = depvar ~ ich + och + icca + occa + inc.room + \n##     inc.cooling + int.cooling | 0, data = HC, nests = list(cooling = c(\"gcc\", \n##     \"ecc\", \"erc\", \"hpc\"), other = c(\"gc\", \"ec\", \"er\")), un.nest.el = TRUE)\n## \n## Frequencies of alternatives:\n##    ec   ecc    er   erc    gc   gcc   hpc \n## 0.004 0.016 0.032 0.004 0.096 0.744 0.104 \n## \n## bfgs method\n## 11 iterations, 0h:0m:0s \n## g'(-H)^-1g = 7.26E-06 \n## successive function values within tolerance limits \n## \n## Coefficients :\n##              Estimate Std. Error z-value  Pr(>|z|)    \n## ich         -0.554878   0.144205 -3.8478 0.0001192 ***\n## och         -0.857886   0.255313 -3.3601 0.0007791 ***\n## icca        -0.225079   0.144423 -1.5585 0.1191212    \n## occa        -1.089458   1.219821 -0.8931 0.3717882    \n## inc.room    -0.378971   0.099631 -3.8038 0.0001425 ***\n## inc.cooling  0.249575   0.059213  4.2149 2.499e-05 ***\n## int.cooling -6.000415   5.562423 -1.0787 0.2807030    \n## iv           0.585922   0.179708  3.2604 0.0011125 ** \n## ---\n## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n## \n## Log-Likelihood: -178.12<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\"><li>The estimated log-sum coefficient is \u22120.59<\/li><li>. What does this estimate tell you about the degree of correlation in unobserved factors over alternatives within each nest?<\/li><\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The correlation is approximately 1\u22120.59=0.41<\/p><\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>. It\u2019s a moderate correlation.<\/p><\/blockquote>\n\n\n\n<ol class=\"wp-block-list\"><li>Test the hypothesis that the log-sum coefficient is 1.0 (the value \nthat it takes for a standard logit model.) Can the hypothesis that the \ntrue model is standard logit be rejected?<\/li><\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>We can use a t-test of the hypothesis that the log-sum coefficient equal to 1. The t-statistic is :<\/p><\/blockquote>\n\n\n\n<pre class=\"wp-block-code\"><code> (coef(nl)['iv'] - 1) \/ sqrt(vcov(nl)['iv', 'iv'])<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##        iv \n## -2.304171<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The critical value of t for 95% confidence is 1.96. So we can reject the hypothesis at 95% confidence.<\/p><\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>We can also use a likelihood ratio test because the multinomial logit is a special case of the nested model.<\/p><\/blockquote>\n\n\n\n<pre class=\"wp-block-code\"><code># First estimate the multinomial logit model\nml &lt;- update(nl, nests = NULL)\nlrtest(nl, ml)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## Likelihood ratio test\n## \n## Model 1: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling | \n##     0\n## Model 2: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling | \n##     0\n##   #Df  LogLik Df  Chisq Pr(>Chisq)  \n## 1   8 -178.12                       \n## 2   7 -180.29 -1 4.3234    0.03759 *\n## ---\n## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Note that the hypothesis is rejected at 95% confidence, but not at 99% confidence.<\/p><\/blockquote>\n\n\n\n<ol class=\"wp-block-list\"><li>Re-estimate the model with the room alternatives in one nest and the\n central alternatives in another nest. (Note that a heat pump is a \ncentral system.)<\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>nl2 &lt;- update(nl, nests = list(central = c('ec', 'ecc', 'gc', 'gcc', 'hpc'), \n                    room = c('er', 'erc')))\nsummary(nl2)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## \n## Call:\n## mlogit(formula = depvar ~ ich + och + icca + occa + inc.room + \n##     inc.cooling + int.cooling | 0, data = HC, nests = list(central = c(\"ec\", \n##     \"ecc\", \"gc\", \"gcc\", \"hpc\"), room = c(\"er\", \"erc\")), un.nest.el = TRUE)\n## \n## Frequencies of alternatives:\n##    ec   ecc    er   erc    gc   gcc   hpc \n## 0.004 0.016 0.032 0.004 0.096 0.744 0.104 \n## \n## bfgs method\n## 10 iterations, 0h:0m:0s \n## g'(-H)^-1g = 5.87E-07 \n## gradient close to zero \n## \n## Coefficients :\n##              Estimate Std. Error z-value Pr(>|z|)  \n## ich          -1.13818    0.54216 -2.0993  0.03579 *\n## och          -1.82532    0.93228 -1.9579  0.05024 .\n## icca         -0.33746    0.26934 -1.2529  0.21024  \n## occa         -2.06328    1.89726 -1.0875  0.27681  \n## inc.room     -0.75722    0.34292 -2.2081  0.02723 *\n## inc.cooling   0.41689    0.20742  2.0099  0.04444 *\n## int.cooling -13.82487    7.94031 -1.7411  0.08167 .\n## iv            1.36201    0.65393  2.0828  0.03727 *\n## ---\n## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n## \n## Log-Likelihood: -180.02<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\"><li>What does the estimate imply about the substitution patterns across alternatives? Do you think the estimate is plausible?<\/li><\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The log-sum coefficient is over 1. This implies that there is more \nsubstitution across nests than within nests. I don\u2019t think this is very \nreasonable, but people can differ on their concepts of what\u2019s \nreasonable.<\/p><\/blockquote>\n\n\n\n<ol class=\"wp-block-list\"><li>Is the log-sum coefficient significantly different from 1?<\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">\\begin{answer}[5] The t-statistic is :<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> (coef(nl2)['iv'] - 1) \/ sqrt(vcov(nl2)['iv', 'iv'])<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##        iv \n## 0.5535849<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>lrtest(nl2, ml)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## Likelihood ratio test\n## \n## Model 1: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling | \n##     0\n## Model 2: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling | \n##     0\n##   #Df  LogLik Df  Chisq Pr(>Chisq)\n## 1   8 -180.02                     \n## 2   7 -180.29 -1 0.5268      0.468<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>We cannot reject the hypothesis at standard confidence levels.<\/p><\/blockquote>\n\n\n\n<ol class=\"wp-block-list\"><li>How does the value of the log-likelihood function compare for this \nmodel relative to the model in exercise 1, where the cooling \nalternatives are in one nest and the heating alternatives in the other \nnest.<\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>logLik(nl)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## 'log Lik.' -178.1247 (df=8)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>logLik(nl2)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## 'log Lik.' -180.0231 (df=8)<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The ln<em>L<\/em><\/p><\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p> is worse (more negative.) All in all, this seems like a less appropriate nesting structure.<\/p><\/blockquote>\n\n\n\n<ol class=\"wp-block-list\"><li>Rerun the model that has the cooling alternatives in one nest and \nthe non-cooling alternatives in the other nest (like for exercise 1), \nwith a separate log-sum coefficient for each nest.<\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>nl3 &lt;- update(nl, un.nest.el = FALSE)<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\"><li>Which nest is estimated to have the higher correlation in unobserved\n factors? Can you think of a real-world reason for this nest to have a \nhigher correlation?<\/li><\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The correlation in the cooling nest is around 1-0.60 = 0.4 and that \nfor the non-cooling nest is around 1-0.45 = 0.55. So the correlation is \nhigher in the non-cooling nest. Perhaps more variation in comfort when \nthere is no cooling. This variation in comfort is the same for all the \nnon-cooling alternatives.<\/p><\/blockquote>\n\n\n\n<ol class=\"wp-block-list\"><li>Are the two log-sum coefficients significantly different from each \nother? That is, can you reject the hypothesis that the model in exercise\n 1 is the true model?<\/li><\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>We can use a likelihood ratio tests with models <code>nl<\/code> and <code>nl3<\/code>.<\/p><\/blockquote>\n\n\n\n<pre class=\"wp-block-code\"><code>lrtest(nl, nl3)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## Likelihood ratio test\n## \n## Model 1: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling | \n##     0\n## Model 2: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling | \n##     0\n##   #Df  LogLik Df  Chisq Pr(>Chisq)\n## 1   8 -178.12                     \n## 2   9 -178.04  1 0.1758      0.675<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The restricted model is the one from exercise 1 that has one log-sum \ncoefficient. The unrestricted model is the one we just estimated. The \ntest statistics is 0.6299. The critical value of chi-squared with 1 \ndegree of freedom is 3.8 at the 95% confidence level. We therefore \ncannot reject the hypothesis that the two nests have the same log-sum \ncoefficient.<\/p><\/blockquote>\n\n\n\n<ol class=\"wp-block-list\"><li>Rewrite the code to allow three nests. For simplicity, estimate only\n one log-sum coefficient which is applied to all three nests. Estimate a\n model with alternatives <code>gcc<\/code>, <code>ecc<\/code> and <code>erc<\/code> in a nest, <code>hpc<\/code> in a nest alone, and alternatives <code>gc<\/code>, <code>ec<\/code> and <code>er<\/code> in a nest. Does this model seem better or worse than the model in exercise 1, which puts alternative <code>hpc<\/code> in the same nest as alternatives <code>gcc<\/code>, <code>ecc<\/code> and <code>erc<\/code>?<\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>nl4 &lt;- update(nl, nests=list(n1 = c('gcc', 'ecc', 'erc'), n2 = c('hpc'),\n                    n3 = c('gc', 'ec', 'er')))\nsummary(nl4)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## \n## Call:\n## mlogit(formula = depvar ~ ich + och + icca + occa + inc.room + \n##     inc.cooling + int.cooling | 0, data = HC, nests = list(n1 = c(\"gcc\", \n##     \"ecc\", \"erc\"), n2 = c(\"hpc\"), n3 = c(\"gc\", \"ec\", \"er\")), \n##     un.nest.el = TRUE)\n## \n## Frequencies of alternatives:\n##    ec   ecc    er   erc    gc   gcc   hpc \n## 0.004 0.016 0.032 0.004 0.096 0.744 0.104 \n## \n## bfgs method\n## 8 iterations, 0h:0m:0s \n## g'(-H)^-1g = 3.71E-08 \n## gradient close to zero \n## \n## Coefficients :\n##               Estimate Std. Error z-value  Pr(>|z|)    \n## ich          -0.838394   0.100546 -8.3384 &lt; 2.2e-16 ***\n## och          -1.331598   0.252069 -5.2827 1.273e-07 ***\n## icca         -0.256131   0.145564 -1.7596   0.07848 .  \n## occa         -1.405656   1.207281 -1.1643   0.24430    \n## inc.room     -0.571352   0.077950 -7.3297 2.307e-13 ***\n## inc.cooling   0.311355   0.056357  5.5247 3.301e-08 ***\n## int.cooling -10.413384   5.612445 -1.8554   0.06354 .  \n## iv            0.956544   0.180722  5.2929 1.204e-07 ***\n## ---\n## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n## \n## Log-Likelihood: -180.26<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The ln<em>L<\/em><\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\"> for this model is \u2212180.26, which is lower (more negative) than for the model with two nests, which got \u2212178.12.\n\n<\/p>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/cran.r-project.org\/web\/packages\/mlogit\/vignettes\/e2nlogit.html Kenneth Train and Yves Croissant 2019-07-22 The data set HC from mlogit contains data in R format on the choice of heating and central&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-1111","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/1111","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=1111"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/1111\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=1111"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=1111"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=1111"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}