{"id":1146,"date":"2019-11-08T17:51:04","date_gmt":"2019-11-09T00:51:04","guid":{"rendered":"http:\/\/www.zhuoyao.net\/?p=1146"},"modified":"2022-11-22T06:47:27","modified_gmt":"2022-11-22T06:47:27","slug":"exercise-1-multinomial-logit-model","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2019\/11\/08\/exercise-1-multinomial-logit-model\/","title":{"rendered":"Exercise 1: Multinomial logit model"},"content":{"rendered":"\n<p><a href=\"https:\/\/cran.r-project.org\/web\/packages\/mlogit\/vignettes\/e1mlogit.html\">https:\/\/cran.r-project.org\/web\/packages\/mlogit\/vignettes\/e1mlogit.html<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Kenneth Train and Yves Croissant<\/h4>\n\n\n\n<h4 class=\"wp-block-heading\">2019-07-22<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The problem set uses data on choice of heating system in California houses. The data set <code>Heating<\/code> from the <code>mlogit<\/code> package contains the data in <code>R<\/code>\n format. The observations consist of single-family houses in California \nthat were newly built and had central air-conditioning. The choice is \namong heating systems. Five types of systems are considered to have been\n possible:<\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>gas central (<code>gc<\/code>),<\/li>\n\n\n\n<li>gas room (<code>gr<\/code>),<\/li>\n\n\n\n<li>electric central (<code>ec<\/code>),<\/li>\n\n\n\n<li>electric room (<code>er<\/code>),<\/li>\n\n\n\n<li>heat pump (<code>hp<\/code>).<\/li>\n<\/ul>\n\n\n\n<p>There are 900 observations with the following variables:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>idcase<\/code> gives the observation number (1-900),<\/li>\n\n\n\n<li><code>depvar<\/code> identifies the chosen alternative (<code>gc<\/code>, <code>gr<\/code>, <code>ec<\/code>, <code>er<\/code>, <code>hp<\/code>),<\/li>\n\n\n\n<li><code>ic.alt<\/code> is the installation cost for the 5 alternatives,<\/li>\n\n\n\n<li><code>oc.alt<\/code> is the annual operating cost for the 5 alternatives,<\/li>\n\n\n\n<li><code>income<\/code> is the annual income of the household,<\/li>\n\n\n\n<li><code>agehed<\/code> is the age of the household head,<\/li>\n\n\n\n<li><code>rooms<\/code> is the number of rooms in the house,<\/li>\n\n\n\n<li><code>region<\/code> a factor with levels <code>ncostl<\/code> (northern coastal region), <code>scostl<\/code> (southern coastal region), <code>mountn<\/code> (mountain region), <code>valley<\/code> (central valley region).<\/li>\n<\/ul>\n\n\n\n<p>Note that the attributes of the alternatives, namely, installation \ncost and operating cost, take a different value for each alternative. \nTherefore, there are 5 installation costs (one for each of the 5 \nsystems) and 5 operating costs. To estimate the logit model, the \nresearcher needs data on the attributes of all the alternatives, not \njust the attributes for the chosen alternative. For example, it is not \nsufficient for the researcher to determine how much was paid for the \nsystem that was actually installed (ie., the bill for the installation).\n The researcher needs to determine how much it would have cost to \ninstall each of the systems if they had been installed. The importance \nof costs in the choice process (i.e., the coefficients of installation \nand operating costs) is determined through comparison of the costs of \nthe chosen system with the costs of the non-chosen systems.<\/p>\n\n\n\n<p>For these data, the costs were calculated as the amount the system \nwould cost if it were installed in the house, given the characteristics \nof the house (such as size), the price of gas and electricity in the \nhouse location, and the weather conditions in the area (which determine \nthe necessary capacity of the system and the amount it will be run.) \nThese cost are conditional on the house having central air-conditioning.\n (That\u2019s why the installation cost of gas central is lower than that for\n gas room: the central system can use the air-conditioning ducts that \nhave been installed.)<\/p>\n\n\n\n<p>In a logit model, each variable takes a different value in each \nalternative. So, in our case, for example, we want to know the \ncoefficient of installation cost in the logit model of system choice. \nThe variable installation cost in the model actually consists of five \nvariables in the dataset: <code>ic.gc<\/code>, <code>ic.gr<\/code>, <code>ic.ec<\/code>, <code>ic.er<\/code> and <code>ic.hp<\/code>,\n for the installation costs of the five systems. In the current code, \nthere are two variables in the logit model. The first variable is called\n <code>ic<\/code> for installation cost. This variable consists of five variables in the dataset: <code>ic.gc<\/code> in the first alternative, <code>ic.gr<\/code> in the second alternative, etc.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Run a model with installation cost and operating cost, without intercepts<\/li>\n\n\n\n<li>Do the estimated coefficients have the expected signs?<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"mlogit\")\ndata(\"Heating\", package = \"mlogit\")\nH &lt;- mlogit.data(Heating, shape = \"wide\", choice = \"depvar\", varying = c(3:12))\nm &lt;- mlogit(depvar ~ ic + oc | 0, H)\nsummary(m)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## \n## Call:\n## mlogit(formula = depvar ~ ic + oc | 0, data = H, method = \"nr\")\n## \n## Frequencies of alternatives:\n##       ec       er       gc       gr       hp \n## 0.071111 0.093333 0.636667 0.143333 0.055556 \n## \n## nr method\n## 4 iterations, 0h:0m:0s \n## g'(-H)^-1g = 1.56E-07 \n## gradient close to zero \n## \n## Coefficients :\n##       Estimate  Std. Error z-value  Pr(&gt;|z|)    \n## ic -0.00623187  0.00035277 -17.665 &lt; 2.2e-16 ***\n## oc -0.00458008  0.00032216 -14.217 &lt; 2.2e-16 ***\n## ---\n## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n## \n## Log-Likelihood: -1095.2<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Yes, they are negative as expected, meaning that as the cost of a \nsystem rises (and the costs of the other systems remain the same) the \nprobability of that system being chosen falls.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Are both coefficients significantly different from zero?<\/li>\n<\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Yes, the t-statistics are greater than 1.96, which is the critical level for 95% confidence level.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>How closely do the average probabilities match the shares of customers choosing each alternative?<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>apply(fitted(m, outcome = FALSE), 2, mean)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##         ec         er         gc         gr         hp \n## 0.10413057 0.05141477 0.51695653 0.24030898 0.08718915<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Not very well. 63.67% of the sample chose <code>gc<\/code> (as shown \nat the top of the summary) and yet the estimated model gives an average \nprobability of only 51.695%. The other alternatives are also fairly \npoorly predicted. We will find how to fix this problem in one of the \nmodels below.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The ratio of coefficients usually provides economically meaningful information. The willingness to pay (<em>w<\/em><em>t<\/em><em>p<\/em><\/li>\n<\/ol>\n\n\n\n<p>)\n through higher installation cost for a one-dollar reduction in \noperating costs is the ratio of the operating cost coefficient to the \ninstallation cost coefficient. What is the estimated <em>w<\/em><em>t<\/em><em>p<\/em><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li> from this model? Is it reasonable in magnitude?<\/li>\n<\/ol>\n\n\n\n<p><em>U<\/em>=<em>\u03b2<\/em><em>i<\/em><em>c<\/em><em>i<\/em><em>c<\/em>+<em>\u03b2<\/em><em>o<\/em><em>c<\/em><em>o<\/em><em>c<\/em><em>d<\/em><em>U<\/em>=<em>\u03b2<\/em><em>i<\/em><em>c<\/em><em>d<\/em><em>i<\/em><em>c<\/em>+<em>\u03b2<\/em><em>o<\/em><em>c<\/em><em>d<\/em><em>o<\/em><em>c<\/em>=0\u21d2\u2212<em>d<\/em><em>i<\/em><em>c<\/em><em>d<\/em><em>o<\/em><em>c<\/em>\u2223<em>d<\/em><em>U<\/em>=0=<em>\u03b2<\/em><em>o<\/em><em>c<\/em><em>\u03b2<\/em><em>i<\/em><em>c<\/em><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>coef(m)&#91;\"oc\"]\/coef(m)&#91;\"ic\"]<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##        oc \n## 0.7349453<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The model implies that the decision-maker is willing to pay $.73 \n(ie., 73 cents) in higher installation cost in order to reduce annual \noperating costs by $1.<\/p>\n<\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A $1 reduction in annual operating costs recurs each year. It is \nunreasonable to think that the decision-maker is only willing to pay \nonly 73 cents as a one-time payment in return for a $1\/year stream of \nsaving. This unreasonable implication is another reason (along with the \ninaccurate average probabilities) to believe this model is not so good. \nWe will find below how the model can be improved.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>We can use the estimated <em>w<\/em><em>t<\/em><em>p<\/em><\/li>\n<\/ol>\n\n\n\n<p>\n to obtain an estimate of the discount rate that is implied by the model\n of choice of operating system. The present value of the future \noperating costs is the discounted sum of operating costs over the life \nof the system: <em>P<\/em><em>V<\/em>=\u2211<em>L<\/em><em>t<\/em>=1<em>O<\/em><em>C<\/em>(1+<em>r<\/em>)<em>t<\/em> where <em>r<\/em> is the discount rate and <em>L<\/em> being the life of the system. As <em>L<\/em> rises, the <em>P<\/em><em>V<\/em> approaches <em>O<\/em><em>C<\/em>\/<em>r<\/em>. Therefore, for a system with a sufficiently long life (which we will assume these systems have), a one-dollar reduction in <em>O<\/em><em>C<\/em> reduces the present value of future operating costs by (1\/<em>r<\/em>).\n This means that if the person choosing the system were incurring the \ninstallation costs and the operating costs over the life of the system, \nand rationally traded-off the two at a discount rate of <em>r<\/em>, the decisionmaker\u2019s <em>w<\/em><em>t<\/em><em>p<\/em> for operating cost reductions would be (1\/<em>r<\/em>). Given this, what value of <em>r<\/em> is implied by the estimated <em>w<\/em><em>t<\/em><em>p<\/em><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li> that you calculated in part (c)? Is this reasonable?<\/li>\n<\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>U<\/em>=<em>a<\/em><em>L<\/em><em>C<\/em><\/p>\n<\/blockquote>\n\n\n\n<p> where <em>L<\/em><em>C<\/em> is lifecycle cost, equal to the sum of installation cost and the present value of operating costs: <em>L<\/em><em>C<\/em>=<em>I<\/em><em>C<\/em>+(1\/<em>r<\/em>)<em>O<\/em><em>C<\/em>. Substituting, we have <em>U<\/em>=<em>a<\/em><em>I<\/em><em>C<\/em>+(<em>a<\/em>\/<em>r<\/em>)<em>O<\/em><em>C<\/em><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>.<\/p>\n<\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The models estimates <em>a<\/em><\/p>\n<\/blockquote>\n\n\n\n<p> as \u22120.00623 and <em>a<\/em>\/<em>r<\/em> as \u22120.00457. So <em>r<\/em>=<em>a<\/em>\/(<em>a<\/em>\/<em>r<\/em>)=\u2212.000623\/.00457=1.36<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p> or 136% discount rate. This is not reasonable, because it is far too high.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Estimate a model that imposes the constraint that <em>r<\/em>=0.12<\/li>\n<\/ol>\n\n\n\n<p> (such that <em>w<\/em><em>t<\/em><em>p<\/em>=8.33). Test the hypothesis that <em>r<\/em>=0.12<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>.<\/li>\n<\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>To impose this constraint, we create a lifecycle cost that embodies the constraint <em>l<\/em><em>c<\/em><em>c<\/em>=<em>i<\/em><em>c<\/em>+<em>o<\/em><em>c<\/em>\/0.12<\/p>\n<\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p> and estimate the model with this variable.<\/p>\n<\/blockquote>\n\n\n\n<pre class=\"wp-block-code\"><code>H$lcc &lt;- H$ic + H$oc \/ 0.12\nmlcc &lt;- mlogit(depvar ~ lcc | 0, H)\nlibrary(\"lmtest\")\nlrtest(m, mlcc)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## Likelihood ratio test\n## \n## Model 1: depvar ~ ic + oc | 0\n## Model 2: depvar ~ lcc | 0\n##   #Df  LogLik Df  Chisq Pr(&gt;Chisq)    \n## 1   2 -1095.2                         \n## 2   1 -1248.7 -1 306.93  &lt; 2.2e-16 ***\n## ---\n## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>qchisq(0.05, df = 1, lower.tail = FALSE)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## &#91;1] 3.841459<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>We perform a likelihood ratio test. The ln<em>L<\/em><\/p>\n<\/blockquote>\n\n\n\n<p> for this constrained model is \u22121248.7. The ln<em>L<\/em> for the unconstrained model is \u22121095.2. The test statistic is twice the difference in ln<em>L<\/em>: 2(1248.7\u22121095.2)=307.\n This test is for one restriction (ie a restiction on the relation of \nthe coefficient of operating cost to that of installation cost.) We \ntherefore compare 307 with the critical value of chi-squared with 1 degree of freedom. This critical value for 95% confidence is 3.8. Since the statistic exceeds the critical value, we reject the hypothesis that <em>r<\/em>=0.12<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Add alternative-specific constants to the model. With <em>J<\/em><\/li>\n<\/ol>\n\n\n\n<p> alternatives, at most <em>J<\/em>\u22121 alternative-specific constants can be estimated. The coefficients of <em>J<\/em>\u22121 constants are interpreted as relative to alternative <em>J<\/em><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>th alternative. Normalize the constant for the alternative <code>hp<\/code> to 0.<\/li>\n\n\n\n<li>How well do the estimated probabilities match the shares of customers choosing each alternative?<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>mc &lt;- mlogit(depvar ~ ic + oc, H, reflevel = 'hp')\nsummary(mc)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## \n## Call:\n## mlogit(formula = depvar ~ ic + oc, data = H, reflevel = \"hp\", \n##     method = \"nr\")\n## \n## Frequencies of alternatives:\n##       hp       ec       er       gc       gr \n## 0.055556 0.071111 0.093333 0.636667 0.143333 \n## \n## nr method\n## 6 iterations, 0h:0m:0s \n## g'(-H)^-1g = 9.58E-06 \n## successive function values within tolerance limits \n## \n## Coefficients :\n##                   Estimate  Std. Error z-value  Pr(&gt;|z|)    \n## ec:(intercept)  1.65884594  0.44841936  3.6993 0.0002162 ***\n## er:(intercept)  1.85343697  0.36195509  5.1206 3.045e-07 ***\n## gc:(intercept)  1.71097930  0.22674214  7.5459 4.485e-14 ***\n## gr:(intercept)  0.30826328  0.20659222  1.4921 0.1356640    \n## ic             -0.00153315  0.00062086 -2.4694 0.0135333 *  \n## oc             -0.00699637  0.00155408 -4.5019 6.734e-06 ***\n## ---\n## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n## \n## Log-Likelihood: -1008.2\n## McFadden R^2:  0.013691 \n## Likelihood ratio test : chisq = 27.99 (p.value = 8.3572e-07)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>apply(fitted(mc, outcome = FALSE), 2, mean)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##         hp         ec         er         gc         gr \n## 0.05555556 0.07111111 0.09333333 0.63666667 0.14333333<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Note that they match exactly: alternative-specific constants in a \nlogit model insure that the average probabilities equal the observed \nshares.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Calculate the <em>w<\/em><em>t<\/em><em>p<\/em><\/li>\n<\/ol>\n\n\n\n<p> and discount rate <em>r<\/em><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li> that is implied by the estimates. Are these reasonable?<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>wtp &lt;- coef(mc)&#91;\"oc\"] \/ coef(mc)&#91;\"ic\"]\nwtp<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##       oc \n## 4.563385<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>r &lt;- 1 \/ wtp\nr<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##        oc \n## 0.2191356<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The decision-maker is willing to pay $4.56 for a $1 year stream of savings. This implies <em>r<\/em>=0.22<\/p>\n<\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>.\n The decision-maker applies a 22% discount rate. These results are \ncertainly more reasonable that in the previous model. The decision-maker\n is still estimated to be valuing saving somewhat less than would seem \nrational (ie applying a higher discount rate than seems reasonable). \nHowever, we need to remember that the decision-maker here is the \nbuilder. If home buyers were perfectly informed, then the builder would \nadopt the buyer\u2019s discount rate. However, the builder would adopt a \nhigher discount rate if home buyers were not perfectly informed about \n(or believed) the stream of saving.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>This model contains constants for all alternatives <code>ec<\/code>&#8211;<code>er<\/code>&#8211;<code>gc<\/code>&#8211;<code>gr<\/code>, with the constant for alternative <code>hp<\/code> normalized to zero. Suppose you had included constants for alternatives <code>ec<\/code>&#8211;<code>er<\/code>&#8211;<code>gc<\/code>&#8211;<code>hp<\/code>, with the constant for alternative <code>gr<\/code> normalized to zero. What would be the estimated coefficient of the constant for alternative <code>gc<\/code>? Figure this out logically rather than actually estimating the model.<\/li>\n<\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>We know that when the hp is left out, the constant for alternative <code>gc<\/code> is 1.71074<\/p>\n<\/blockquote>\n\n\n\n<p> meaning that the average impact of unicluded factors is 1.71074 higher for alternative <code>gc<\/code> than for alternative hp. Similarly, the constant for alternative <code>gr<\/code> is 0.30777. If <code>gr<\/code> were left out instead of <code>hp<\/code>, then all the constants would be relative to alternative <code>gr<\/code>. The constant for alternative <code>gc<\/code> would the be 1.71074\u2212.30777=1.40297. That is, the average impact of unincluded factors is 1.40297 higher for alt <code>gc<\/code> than alt <code>gr<\/code>. Similarly for the other alternatives. Note the the constant for alt 5 would be 0\u2212.30777=\u2212.3077<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>, since <code>hp<\/code> is normalized to zero in the model with <code>hp<\/code> left out.<\/p>\n<\/blockquote>\n\n\n\n<pre class=\"wp-block-code\"><code>update(mc, reflevel = \"gr\")<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## \n## Call:\n## mlogit(formula = depvar ~ ic + oc, data = H, reflevel = \"gr\",     method = \"nr\")\n## \n## Coefficients:\n## ec:(intercept)  er:(intercept)  gc:(intercept)  hp:(intercept)  \n##      1.3505827       1.5451737       1.4027160      -0.3082633  \n##             ic              oc  \n##     -0.0015332      -0.0069964<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Now try some models with sociodemographic variables entering.<\/li>\n\n\n\n<li>Enter installation cost divided by income, instead of installation \ncost. With this specification, the magnitude of the installation cost \ncoefficient is inversely related to income, such that high income \nhouseholds are less concerned with installation costs than lower income \nhouseholds. Does dividing installation cost by income seem to make the \nmodel better or worse?<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>mi &lt;- mlogit(depvar ~ oc + I(ic \/ income), H)\nsummary(mi)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## \n## Call:\n## mlogit(formula = depvar ~ oc + I(ic\/income), data = H, method = \"nr\")\n## \n## Frequencies of alternatives:\n##       ec       er       gc       gr       hp \n## 0.071111 0.093333 0.636667 0.143333 0.055556 \n## \n## nr method\n## 6 iterations, 0h:0m:0s \n## g'(-H)^-1g = 1.03E-05 \n## successive function values within tolerance limits \n## \n## Coefficients :\n##                  Estimate Std. Error z-value  Pr(&gt;|z|)    \n## er:(intercept)  0.0639934  0.1944893  0.3290  0.742131    \n## gc:(intercept)  0.0563481  0.4650251  0.1212  0.903555    \n## gr:(intercept) -1.4653063  0.5033845 -2.9109  0.003604 ** \n## hp:(intercept) -1.8700773  0.4364248 -4.2850 1.827e-05 ***\n## oc             -0.0071066  0.0015518 -4.5797 4.657e-06 ***\n## I(ic\/income)   -0.0027658  0.0018944 -1.4600  0.144298    \n## ---\n## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n## \n## Log-Likelihood: -1010.2\n## McFadden R^2:  0.011765 \n## Likelihood ratio test : chisq = 24.052 (p.value = 5.9854e-06)<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The model seems to get worse. The ln<em>L<\/em><\/p>\n<\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p> is lower (more negative) and the coefficient on installation cost becomes insignificant (t-stat below 2).<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Instead of dividing installation cost by income, enter \nalternative-specific income effects. What do the estimates imply about \nthe impact of income on the choice of central systems versus room \nsystem? Do these income terms enter significantly?<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>mi2 &lt;- mlogit(depvar ~ oc + ic | income, H, reflevel = \"hp\")<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The model implies that as income rises, the probability of heat pump \nrises relative to all the others (since income in the heat pump alt is \nnormalized to zero, and the others enter with negative signs such that \nthey are lower than that for heat pumps. Also, as income rises, the \nprobability of gas room drops relative to the other non-heat-pump \nsystems (since it is most negative).<\/p>\n<\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Do these income terms enter significantly? No.&nbsp;It seems that income \ndoesn\u2019t really have an effect. Maybe this is because income is for the \nfamily that lives in the house, whereas the builder made decision of \nwhich system to install.<\/p>\n<\/blockquote>\n\n\n\n<pre class=\"wp-block-code\"><code>lrtest(mc, mi2)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## Likelihood ratio test\n## \n## Model 1: depvar ~ ic + oc\n## Model 2: depvar ~ oc + ic | income\n##   #Df  LogLik Df  Chisq Pr(&gt;Chisq)\n## 1   6 -1008.2                     \n## 2  10 -1005.9  4 4.6803     0.3217<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>waldtest(mc, mi2)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## Wald test\n## \n## Model 1: depvar ~ ic + oc\n## Model 2: depvar ~ oc + ic | income\n##   Res.Df Df  Chisq Pr(&gt;Chisq)\n## 1    894                     \n## 2    890  4 4.6456     0.3256<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>scoretest(mc, mi2)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## \n##  score test\n## \n## data:  depvar ~ oc + ic | income\n## chisq = 4.6761, df = 4, p-value = 0.3222\n## alternative hypothesis: unconstrained model<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Try other models. Determine which model you think is best from these data.<\/li>\n<\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>I\u2019m not going to give what I consider my best model: your ideas on what\u2019s best are what matter here.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>We now are going to consider the use of logit model for prediction. \nEstimate a model with installation costs, operating costs, and \nalternative specific constants. Calculate the probabilities for each \nhouse explicitly. Check to be sure that the mean probabilities are the \nsame as you got in exercise 4.<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>X &lt;- model.matrix(mc)\nalt &lt;- index(H)$alt\nchid &lt;- index(H)$chid\neXb &lt;- as.numeric(exp(X %*% coef(mc)))\nSeXb &lt;- tapply(eXb, chid, sum)\nP &lt;- eXb \/ SeXb&#91;chid]\nP &lt;- matrix(P, ncol = 5, byrow = TRUE)\nhead(P)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##            &#91;,1]       &#91;,2]      &#91;,3]      &#91;,4]       &#91;,5]\n## &#91;1,] 0.05107444 0.07035738 0.6329116 0.1877416 0.05791494\n## &#91;2,] 0.04849337 0.06420595 0.6644519 0.1558322 0.06701658\n## &#91;3,] 0.07440281 0.08716904 0.6387765 0.1439919 0.05565974\n## &#91;4,] 0.07264503 0.11879833 0.5657376 0.1879231 0.05489595\n## &#91;5,] 0.09223575 0.10238514 0.5670663 0.1561227 0.08219005\n## &#91;6,] 0.09228184 0.10466584 0.6366615 0.1152634 0.05112739<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>apply(P, 2, mean)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## &#91;1] 0.07111111 0.09333333 0.63666666 0.14333334 0.05555556<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>This can be computed much more simply using the \\Rf{fitted<code>function, with the \\Ra{outcome<\/code>{fitted<code>argument set to \\Rv{FALSE<\/code> so that the probabilities for all the alternatives (and not only the chosen one) is returned.<\/p>\n<\/blockquote>\n\n\n\n<pre class=\"wp-block-code\"><code>apply(fitted(mc, outcome = FALSE), 2, mean)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##         hp         ec         er         gc         gr \n## 0.05555556 0.07111111 0.09333333 0.63666667 0.14333333<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The California Energy Commission (CEC) is considering whether to \noffer rebates on heat pumps. The CEC wants to predict the effect of the \nrebates on the heating system choices of customers in California. The \nrebates will be set at 10% of the installation cost. Using the estimated\n coefficients from the model in exercise 6, calculate new probabilities \nand predicted shares using the new installation cost of heat pump. How \nmuch do the rebates raise the share of houses with heat pumps?<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>Hn &lt;- H\nHn&#91;Hn$alt == \"hp\", \"ic\"] &lt;- 0.9 * Hn&#91;Hn$alt == \"hp\", \"ic\"]\napply(predict(mc, newdata = Hn), 2, mean)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##         hp         ec         er         gc         gr \n## 0.06446230 0.07045486 0.09247026 0.63064443 0.14196814<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>We estimate the model with the actual costs. Then we change the costs\n and calculate probabilities with the new costs. The average probability\n is the predicted share for an alternative. At the original costs, the \nheat pump share is 0.0555<\/p>\n<\/blockquote>\n\n\n\n<p> (ie, about 5.5%) This share is predicted to rise to 0.0645<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p> (about 6.5%) when rebates are given.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Suppose a new technology is developed that provides more efficient \ncentral heating. The new technology costs $200 more than the central \nelectric system. However, it saves 25% of the electricity, such that its\n operating costs are 75% of the operating costs of <code>ec<\/code>. We \nwant to predict the potential market penetration of this technology. \nNote that there are now six alternatives: the original five alternatives\n plus this new one. Calculate the probability and predict the market \nshare (i.e., the average probability) for all six alternatives, using \nthe model that is estimated on the original five alternatives. (Be sure \nto use the original installation cost for heat pumps, rather than the \nreduced cost in exercise 7.) What is the predicted market share for the \nnew technology? From which of the original five systems does the new \ntechnology draw the most customers?<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>X &lt;- model.matrix(mc)\nXn &lt;- X&#91;alt == \"ec\",]\nXn&#91;, \"ic\"] &lt;- Xn&#91;, \"ic\"] + 200\nXn&#91;, \"oc\"] &lt;- Xn&#91;, \"oc\"] * 0.75\nunchid &lt;- unique(index(H)$chid)\nrownames(Xn) &lt;- paste(unchid, 'new', sep = \".\")\nchidb &lt;- c(chid, unchid)\nX &lt;- rbind(X, Xn)\nX &lt;- X&#91;order(chidb), ]\neXb &lt;- as.numeric(exp(X %*% coef(mc)))\nSeXb &lt;- as.numeric(tapply(eXb, sort(chidb), sum))\nP &lt;- eXb \/ SeXb&#91;sort(chidb)]\nP &lt;- matrix(P, ncol = 6, byrow = TRUE)\napply(P, 2, mean)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## &#91;1] 0.06311578 0.08347713 0.57145108 0.12855080 0.04977350\n## &#91;6] 0.10363170<\/code><\/pre>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The new technology captures a market share of 0.1036. That is, it gets slightly more than ten percent of the market.<\/p>\n<\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>It draws the same percent (about 10%) from each system. This means \nthat it draws the most in absolute terms from the most popular system, \ngas central. For example, gas central drops from to 0.637<\/p>\n<\/blockquote>\n\n\n\n<p> to 0.571; this is an absolute drop of 0.637\u22120.571=0.065 and a percent drop of 0.065\/0.637<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\n about 10%. Of the 10.36% market share that is attained by the new \ntechnology, 6.5% of it comes from gas central. The other systems drop by\n about the same percent, which is less in absolute terms.<\/p>\n<\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The same percent drop for all systems is a consequence of the IIA \nproperty of logit. To me, this property seems unreasonable in this \napplication. The new technology is a type of electric system. It seems \nreasonable that it would draw more from other electric systems than from\n gas systems. Models like nested logit, probit, and mixed logit allow \nmore flexible, and in this case, more realistic substitution patterns.<\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/cran.r-project.org\/web\/packages\/mlogit\/vignettes\/e1mlogit.html Kenneth Train and Yves Croissant 2019-07-22 There are 900 observations with the following variables: Note that the attributes of the alternatives, namely, installation cost&hellip; <\/p>\n","protected":false},"author":1,"featured_media":967,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-1146","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/1146","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=1146"}],"version-history":[{"count":1,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/1146\/revisions"}],"predecessor-version":[{"id":1237,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/1146\/revisions\/1237"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media\/967"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=1146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=1146"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=1146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}