{"id":1144,"date":"2019-11-08T17:50:08","date_gmt":"2019-11-09T00:50:08","guid":{"rendered":"http:\/\/www.zhuoyao.net\/?p=1144"},"modified":"2022-11-22T06:53:46","modified_gmt":"2022-11-22T06:53:46","slug":"random-utility-model-and-the-multinomial-logit-model","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2019\/11\/08\/random-utility-model-and-the-multinomial-logit-model\/","title":{"rendered":"Random utility model and the multinomial logit model"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Random utility model<\/h2>\n\n\n\n<p>The utility for alternative <em>l<\/em> is written as: <em>U<\/em><em>l<\/em>=<em>V<\/em><em>l<\/em>+<em>\u03f5<\/em><em>l<\/em> where <em>V<\/em><em>l<\/em> is a function of some observable covariates and unknown parameters to be estimated, and <em>\u03f5<\/em><em>l<\/em> is a random deviate which contains all the unobserved determinants of the utility. Alternative <em>l<\/em> is therefore chosen if <em>\u03f5<\/em><em>j<\/em>&lt;(<em>V<\/em><em>l<\/em>\u2212<em>V<\/em><em>j<\/em>)+<em>\u03f5<\/em><em>l<\/em>\u2200<em>j<\/em>\u2260<em>l<\/em><\/p>\n\n\n\n<p> and the probability of choosing this alternative is then:P(<em>\u03f5<\/em>1&lt;<em>V<\/em><em>l<\/em>\u2212<em>V<\/em>1+<em>\u03f5<\/em><em>l<\/em>,<em>\u03f5<\/em>2&lt;<em>V<\/em><em>l<\/em>\u2212<em>V<\/em>2+<em>\u03f5<\/em><em>l<\/em>,&#8230;,<em>\u03f5<\/em><em>J<\/em>&lt;<em>V<\/em><em>l<\/em>\u2212<em>V<\/em><em>J<\/em>+<em>\u03f5<\/em><em>l<\/em>).<\/p>\n\n\n\n<p>Denoting <em>F<\/em>\u2212<em>l<\/em> the cumulative density function of all the <em>\u03f5<\/em>s except <em>\u03f5<\/em><em>l<\/em><\/p>\n\n\n\n<p>, this probability is:(P<em>l<\/em>\u2223<em>\u03f5<\/em><em>l<\/em>)=<em>F<\/em>\u2212<em>l<\/em>(<em>V<\/em><em>l<\/em>\u2212<em>V<\/em>1+<em>\u03f5<\/em><em>l<\/em>,&#8230;,<em>V<\/em><em>l<\/em>\u2212<em>V<\/em><em>J<\/em>+<em>\u03f5<\/em><em>l<\/em>).<\/p>\n\n\n\n<p>Note that this probability is conditional on the value of <em>\u03f5<\/em><em>l<\/em>. The unconditional probability (which depends only on <em>\u03b2<\/em>\n and on the value of the observed explanatory variables) is obtained by \nintegrating out the conditional probability using the marginal density \nof <em>\u03f5<\/em><em>l<\/em>, denoted <em>f<\/em><em>l<\/em><\/p>\n\n\n\n<p>\ud83d\ude1b<em>l<\/em>=\u222b<em>F<\/em>\u2212<em>l<\/em>(<em>V<\/em><em>l<\/em>\u2212<em>V<\/em>1+<em>\u03f5<\/em><em>l<\/em>,&#8230;,<em>V<\/em><em>l<\/em>\u2212<em>V<\/em><em>J<\/em>)+<em>\u03f5<\/em><em>l<\/em>)<em>f<\/em><em>l<\/em>(<em>\u03f5<\/em><em>l<\/em>)<em>d<\/em><em>\u03f5<\/em><em>l<\/em>.<\/p>\n\n\n\n<p>The conditional probability is an integral of dimension <em>J<\/em>\u22121<\/p>\n\n\n\n<p> and the computation of the unconditional probability adds on more dimension of integration.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The distribution of the error terms<\/h2>\n\n\n\n<p>The multinomial logit model (McFadden 1974) is a special case of the model developed in the previous section. It is based on three hypothesis.<\/p>\n\n\n\n<p>The first hypothesis is the independence of the errors. In this case,\n the univariate distribution of the errors can be used, which leads to \nthe following conditional and unconditional probabilities:(P<em>l<\/em>\u2223<em>\u03f5<\/em><em>l<\/em>)=\u220f<em>j<\/em>\u2260<em>l<\/em><em>F<\/em><em>j<\/em>(<em>V<\/em><em>l<\/em>\u2212<em>V<\/em><em>j<\/em>+<em>\u03f5<\/em><em>l<\/em>)&nbsp;and&nbsp;P<em>l<\/em>=\u222b\u220f<em>j<\/em>\u2260<em>l<\/em><em>F<\/em><em>j<\/em>(<em>V<\/em><em>l<\/em>\u2212<em>V<\/em><em>j<\/em>+<em>\u03f5<\/em><em>l<\/em>)<em>f<\/em><em>l<\/em>(<em>\u03f5<\/em><em>l<\/em>)<em>d<\/em><em>\u03f5<\/em><em>l<\/em>,<\/p>\n\n\n\n<p>which means that the conditional probability is the product of <em>J<\/em>\u22121<\/p>\n\n\n\n<p>\n univariate cumulative density functions and the evaluation of only a \none-dimensional integral is required to compute the unconditional \nprobability.<\/p>\n\n\n\n<p>The second hypothesis is that each <em>\u03f5<\/em><\/p>\n\n\n\n<p> follows a Gumbel distribution, whose density and probability functions are respectively:<em>f<\/em>(<em>z<\/em>)=1<em>\u03b8<\/em><em>e<\/em>\u2212<em>z<\/em>\u2212<em>\u03bc<\/em><em>\u03b8<\/em><em>e<\/em>\u2212<em>e<\/em>\u2212<em>z<\/em>\u2212<em>\u03bc<\/em><em>\u03b8<\/em>&nbsp;and&nbsp;<em>F<\/em>(<em>z<\/em>)=\u222b<em>z<\/em>\u2212\u221e<em>f<\/em>(<em>t<\/em>)<em>d<\/em><em>t<\/em>=<em>e<\/em>\u2212<em>e<\/em>\u2212<em>z<\/em>\u2212<em>\u03bc<\/em><em>\u03b8<\/em>,<\/p>\n\n\n\n<p>where <em>\u03bc<\/em> is the location parameter and <em>\u03b8<\/em> the scale parameter. The first two moments of the Gumbel distribution are E(<em>z<\/em>)=<em>\u03bc<\/em>+<em>\u03b8<\/em><em>\u03b3<\/em>, where <em>\u03b3<\/em> is the Euler-Mascheroni constant (\u22480.577) and V(<em>z<\/em>)=<em>\u03c0<\/em>26<em>\u03b8<\/em>2. The mean of <em>\u03f5<\/em><em>j<\/em> is not identified if <em>V<\/em><em>j<\/em> contains an intercept. We can then, without loss of generality suppose that <em>\u03bc<\/em><em>j<\/em>=0,\u2200<em>j<\/em>. Moreover, the overall scale of utility is not identified. Therefore, only <em>J<\/em>\u22121 scale parameters may be identified, and a natural choice of normalization is to impose that one of the <em>\u03b8<\/em><em>j<\/em><\/p>\n\n\n\n<p> is equal to 1.<\/p>\n\n\n\n<p>The last hypothesis is that the errors are identically distributed. \nAs the location parameter is not identified for any error term, this \nhypothesis is essentially an homoscedasticity hypothesis, which means \nthat the scale parameter of the Gumbel distribution is the same for all \nthe alternatives. As one of them has been previously set to 1, we can \ntherefore suppose that, without loss of generality, <em>\u03b8<\/em><em>j<\/em>=1,\u2200<em>j<\/em>\u22081&#8230;<em>J<\/em><\/p>\n\n\n\n<p>. The conditional and unconditional probabilities then further simplify to:(P<em>l<\/em>\u2223<em>\u03f5<\/em><em>l<\/em>)=\u220f<em>j<\/em>\u2260<em>l<\/em><em>e<\/em>\u2212<em>e<\/em>\u2212(<em>V<\/em><em>l<\/em>\u2212<em>V<\/em><em>j<\/em>+<em>\u03f5<\/em><em>l<\/em>)&nbsp;and&nbsp;P<em>l<\/em>=\u222b+\u221e\u2212\u221e\u220f<em>j<\/em>\u2260<em>l<\/em><em>e<\/em>\u2212<em>e<\/em>\u2212(<em>V<\/em><em>l<\/em>\u2212<em>V<\/em><em>j<\/em>+<em>t<\/em>)<em>e<\/em>\u2212<em>t<\/em><em>e<\/em>\u2212<em>e<\/em>\u2212<em>t<\/em><em>d<\/em><em>t<\/em>.<\/p>\n\n\n\n<p>The probabilities have then very simple, closed forms, which \ncorrespond to the logit transformation of the deterministic part of the \nutility.<em>P<\/em><em>l<\/em>=<em>e<\/em><em>V<\/em><em>l<\/em>\u2211<em>J<\/em><em>j<\/em>=1<em>e<\/em><em>V<\/em><em>j<\/em>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">IIA property<\/h2>\n\n\n\n<p>If we consider the probabilities of choice for two alternatives <em>l<\/em> and <em>m<\/em>, we have <em>P<\/em><em>l<\/em>=<em>e<\/em><em>V<\/em><em>l<\/em>\u2211<em>j<\/em><em>e<\/em><em>V<\/em><em>j<\/em> and <em>P<\/em><em>m<\/em>=<em>e<\/em><em>V<\/em><em>m<\/em>\u2211<em>j<\/em><em>e<\/em><em>V<\/em><em>j<\/em><\/p>\n\n\n\n<p>. The ratio of these two probabilities is:<em>P<\/em><em>l<\/em><em>P<\/em><em>m<\/em>=<em>e<\/em><em>V<\/em><em>l<\/em><em>e<\/em><em>V<\/em><em>m<\/em>=<em>e<\/em><em>V<\/em><em>l<\/em>\u2212<em>V<\/em><em>m<\/em>.<\/p>\n\n\n\n<p>This probability ratio for the two alternatives depends only on the \ncharacteristics of these two alternatives and not on those of other \nalternatives. This is called the IIA property (for independence of \nirrelevant alternatives). IIA relies on the hypothesis that the errors \nare identical and independent. It is not a problem by itself and may \neven be considered as a useful feature for a well specified model. \nHowever, this hypothesis may be in practice violated, especially if some\n important variables are omitted.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Interpretation<\/h2>\n\n\n\n<p>In a linear model, the coefficients are the marginal effects of the \nexplanatory variables on the explained variable. This is not the case \nfor the multinomial logit model. However, meaningful results can be \nobtained using relevant transformations of the coefficients.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Marginal effects<\/h3>\n\n\n\n<p>The marginal effects are the derivatives of the probabilities with \nrespect to the covariates, which can be be choice situation-specific (<em>z<\/em><em>i<\/em>) or alternative specific (<em>x<\/em><em>i<\/em><em>j<\/em><\/p>\n\n\n\n<p>):\u2202<em>P<\/em><em>i<\/em><em>l<\/em>\u2202<em>z<\/em><em>i<\/em>\u2202<em>P<\/em><em>i<\/em><em>l<\/em>\u2202<em>x<\/em><em>i<\/em><em>l<\/em>\u2202<em>P<\/em><em>i<\/em><em>l<\/em>\u2202<em>x<\/em><em>i<\/em><em>k<\/em>===<em>P<\/em><em>i<\/em><em>l<\/em>(<em>\u03b2<\/em><em>l<\/em>\u2212\u2211<em>j<\/em><em>P<\/em><em>i<\/em><em>j<\/em><em>\u03b2<\/em><em>j<\/em>)<em>\u03b3<\/em><em>P<\/em><em>i<\/em><em>l<\/em>(1\u2212<em>P<\/em><em>i<\/em><em>l<\/em>)\u2212<em>\u03b3<\/em><em>P<\/em><em>i<\/em><em>l<\/em><em>P<\/em><em>i<\/em><em>k<\/em>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For a choice situation specific variable, the sign of the \nmarginal effect is not necessarily the sign of the coefficient. \nActually, the sign of the marginal effect is given by (<em>\u03b2<\/em><em>l<\/em>\u2212\u2211<em>j<\/em><em>P<\/em><em>i<\/em><em>j<\/em><em>\u03b2<\/em><em>j<\/em>)<\/li>\n<\/ul>\n\n\n\n<p>, which is positive if the coefficient for alternative <em>l<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n is greater than a weighted average of the coefficients for all the \nalternatives, the weights being the probabilities of choosing the \nalternatives. In this case, the sign of the marginal effect can be \nestablished with no ambiguity only for the alternatives with the lowest \nand the greatest coefficients.<\/li>\n\n\n\n<li>For an alternative-specific variable, the sign of the coefficient\n can be directly interpreted. The marginal effect is obtained by \nmultiplying the coefficient by the product of two probabilities which is\n at most 0.25. The rule of thumb is therefore to divide the coefficient \nby 4 in order to have an upper bound of the marginal effect.<\/li>\n<\/ul>\n\n\n\n<p>Note that the last equation can be rewritten: d<em>P<\/em><em>i<\/em><em>l<\/em>\/<em>P<\/em><em>i<\/em><em>l<\/em>d<em>x<\/em><em>i<\/em><em>k<\/em>=\u2212<em>\u03b3<\/em><em>P<\/em><em>i<\/em><em>k<\/em>. Therefore, when a characteristic of alternative <em>k<\/em> changes, the relative change of the probabilities for every alternatives except <em>k<\/em><\/p>\n\n\n\n<p> are the same, which is a consequence of the IIA property.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Marginal rates of substitution<\/h3>\n\n\n\n<p>Coefficients are marginal utilities, which cannot be interpreted. \nHowever, ratios of coefficients are marginal rates of substitution. For \nexample, if the observable part of utility is: <em>V<\/em>=<em>\u03b2<\/em><em>o<\/em>+<em>\u03b2<\/em>1<em>x<\/em>1+<em>\u03b2<\/em><em>x<\/em>2+<em>\u03b2<\/em><em>x<\/em>3, join variations of <em>x<\/em>1 and <em>x<\/em>2 which ensure the same level of utility are such that: <em>d<\/em><em>V<\/em>=<em>\u03b2<\/em>1<em>d<\/em><em>x<\/em>1+<em>\u03b2<\/em>2<em>d<\/em><em>x<\/em>2=0<\/p>\n\n\n\n<p> so that:\u2212<em>d<\/em><em>x<\/em>2<em>d<\/em><em>x<\/em>1\u2223<em>d<\/em><em>V<\/em>=0=<em>\u03b2<\/em>1<em>\u03b2<\/em>2.<\/p>\n\n\n\n<p>For example, if <em>x<\/em>2 is transport cost (in $), <em>x<\/em>1 transport time (in hours), <em>\u03b2<\/em>1=1.5 and <em>\u03b2<\/em>2=0.2, <em>\u03b2<\/em>1<em>\u03b2<\/em>2=30<\/p>\n\n\n\n<p>\n is the marginal rate of substitution of time in terms of $ and the \nvalue of 30 means that to reduce the travel time of one hour, the \nindividual is willing to pay at most 30$ more. Stated more simply, time \nvalue is 30$ per hour.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Consumer\u2019s surplus<\/h3>\n\n\n\n<p>Consumer\u2019s surplus has a very simple expression for multinomial logit models, which was first derived by Small and Rosen (1981). The level of utility attained by an individual is <em>U<\/em><em>j<\/em>=<em>V<\/em><em>j<\/em>+<em>\u03f5<\/em><em>j<\/em>, <em>j<\/em> being the chosen alternative. The expected utility, from the searcher\u2019s point of view is then: E(max<em>j<\/em><em>U<\/em><em>j<\/em>)<\/p>\n\n\n\n<p>,\n where the expectation is taken over the values of all the error terms. \nIts expression is simply, up to an additive unknown constant, the log of\n the denominator of the logit probabilities, often called the \u201clog-sum\u201d:E(<em>U<\/em>)=ln\u2211<em>j<\/em>=1<em>J<\/em><em>e<\/em><em>V<\/em><em>j<\/em>+<em>C<\/em>.<\/p>\n\n\n\n<p>If the marginal utility of income (<em>\u03b1<\/em>) is known and constant, the expected surplus is simply E(<em>U<\/em>)<em>\u03b1<\/em><\/p>\n\n\n\n<p>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Application<\/h2>\n\n\n\n<p>Random utility models are fitted using the <code>mlogit<\/code> function. Basically, only two arguments are mandatory, <code>formula<\/code> and <code>data<\/code>, if a <code>mlogit.data<\/code> object (and not an ordinary <code>data.frame<\/code>) is provided.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ModeCanada<\/h3>\n\n\n\n<p>We first use the <code>ModeCanada<\/code> data set, which was already coerced to a <code>mlogit.data<\/code> object (called <code>MC<\/code>) in the previous section. The same model can then be estimated using as <code>data<\/code> argument this <code>mlogit.data<\/code> object:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"mlogit\")\ndata(\"ModeCanada\", package = \"mlogit\")\nMC &lt;- mlogit.data(ModeCanada, subset = noalt == 4, chid.var = \"case\",\n                  alt.var = \"alt\", drop.index = TRUE)\nml.MC1 &lt;- mlogit(choice ~ cost + freq + ovt | income | ivt, MC)<\/code><\/pre>\n\n\n\n<p>or a <code>data.frame<\/code>. In this latter case, further arguments that will be passed to <code>mlogit.data<\/code> should be indicated:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ml.MC1b &lt;- mlogit(choice ~ cost + freq + ovt | income | ivt, ModeCanada,\nsubset = noalt == 4, alt.var = \"alt\", chid.var = \"case\")<\/code><\/pre>\n\n\n\n<p><code>mlogit<\/code> provides two further useful arguments:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>reflevel<\/code> indicates which alternative is the \u201creference\u201d\n alternative, i.e., the one for which the coefficients of choice \nsituation specific covariates are set to 0,<\/li>\n\n\n\n<li><code>alt.subset<\/code> indicates a subset of alternatives on which \nthe estimation has to be performed; in this case, only the lines that \ncorrespond to the selected alternatives are used and all the choice \nsituations where not selected alternatives has been chosen are removed.<\/li>\n<\/ul>\n\n\n\n<p>We estimate the model on the subset of three alternatives (we exclude <code>bus<\/code> whose market share is negligible in our sample) and we set <code>car<\/code>\n as the reference alternative. Moreover, we use a total transport time \nvariable computed as the sum of the in vehicle and the out of vehicle \ntime variables.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>MC$time &lt;- with(MC, ivt + ovt)\nml.MC1 &lt;- mlogit(choice ~ cost + freq | income | time, MC, \nalt.subset = c(\"car\", \"train\", \"air\"), reflevel = \"car\")<\/code><\/pre>\n\n\n\n<p>The main results of the model are computed and displayed using the <code>summary<\/code> method:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>summary(ml.MC1)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## \n## Call:\n## mlogit(formula = choice ~ cost + freq | income | time, data = MC, \n##     alt.subset = c(\"car\", \"train\", \"air\"), reflevel = \"car\", \n##     method = \"nr\")\n## \n## Frequencies of alternatives:\n##     car   train     air \n## 0.45757 0.16721 0.37523 \n## \n## nr method\n## 6 iterations, 0h:0m:0s \n## g'(-H)^-1g = 6.94E-06 \n## successive function values within tolerance limits \n## \n## Coefficients :\n##                      Estimate  Std. Error  z-value  Pr(&gt;|z|)    \n## train:(intercept) -0.97034440  0.26513065  -3.6599 0.0002523 ***\n## air:(intercept)   -1.89856552  0.68414300  -2.7751 0.0055185 ** \n## cost              -0.02849715  0.00655909  -4.3447 1.395e-05 ***\n## freq               0.07402902  0.00473270  15.6420 &lt; 2.2e-16 ***\n## train:income      -0.00646892  0.00310366  -2.0843 0.0371342 *  \n## air:income         0.02824632  0.00365435   7.7295 1.088e-14 ***\n## car:time          -0.01402405  0.00138047 -10.1589 &lt; 2.2e-16 ***\n## train:time        -0.01096877  0.00081834 -13.4036 &lt; 2.2e-16 ***\n## air:time          -0.01755120  0.00399181  -4.3968 1.099e-05 ***\n## ---\n## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n## \n## Log-Likelihood: -1951.3\n## McFadden R^2:  0.31221 \n## Likelihood ratio test : chisq = 1771.6 (p.value = &lt; 2.22e-16)<\/code><\/pre>\n\n\n\n<p>The frequencies of the different alternatives in the sample are first\n indicated. Next, some information about the optimization are displayed:\n the Newton-Ralphson method (with analytic gradient and hessian) is \nused, as it is the most efficient method for this simple model for which\n the log-likelihood function is concave. Note that very few iterations \nand computing time are required to estimate this model. Then the usual \ntable of coefficients is displayed followed by some goodness of fit \nmeasures: the value of the log-likelihood function, which is compared to\n the value when only intercepts are introduced, which leads to the \ncomputation of the McFadden <em>R<\/em>2<\/p>\n\n\n\n<p> and to the likelihood ratio test.<\/p>\n\n\n\n<p>The <code>fitted<\/code> method can be used either to obtain the probability of actual choices (<code>type = \"outcome\"<\/code>) or the probabilities for all the alternatives (<code>type = \"probabilities\"<\/code>).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>head(fitted(ml.MC1, type = \"outcome\"))<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##       109       110       111       112       113       114 \n## 0.1909475 0.3399941 0.1470527 0.3399941 0.3399941 0.2440011<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>head(fitted(ml.MC1, type = \"probabilities\"), 4)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##           car     train       air\n## 109 0.4206404 0.3884120 0.1909475\n## 110 0.3696476 0.2903582 0.3399941\n## 111 0.4296769 0.4232704 0.1470527\n## 112 0.3696476 0.2903582 0.3399941<\/code><\/pre>\n\n\n\n<p>Note that the log-likelihood is the sum of the log of the fitted \noutcome probabilities and that, as the model contains intercepts, the \naverage fitted probabilities for every alternative equals the market \nshares of the alternatives in the sample.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sum(log(fitted(ml.MC1, type = \"outcome\")))<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## &#91;1] -1951.344<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>logLik(ml.MC1)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## 'log Lik.' -1951.344 (df=9)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>apply(fitted(ml.MC1, type = \"probabilities\"), 2, mean)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##       car     train       air \n## 0.4575659 0.1672084 0.3752257<\/code><\/pre>\n\n\n\n<p>Predictions can be made using the <code>predict<\/code> method. If no data is provided, predictions are made for the sample mean values of the covariates.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>predict(ml.MC1)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##       car     train       air \n## 0.5066362 0.2116876 0.2816761<\/code><\/pre>\n\n\n\n<p>Assume, for example, that we wish to predict the effect of a reduction of train transport time of 20%. We first create a new <code>data.frame<\/code> simply by multiplying train transport time by 0.8 and then using the <code>predict<\/code> method with this new <code>data.frame<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>NMC &lt;- MC\nNMC&#91;index(NMC)$alt == \"train\", \"time\"] &lt;- 0.8 *\nNMC&#91;index(NMC)$alt == \"train\", \"time\"]\nOprob &lt;- fitted(ml.MC1, type = \"probabilities\")\nNprob &lt;- predict(ml.MC1, newdata = NMC)\nrbind(old = apply(Oprob, 2, mean), new = apply(Nprob, 2, mean))<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##           car     train       air\n## old 0.4575659 0.1672084 0.3752257\n## new 0.4044736 0.2635801 0.3319462<\/code><\/pre>\n\n\n\n<p>If, for the first individuals in the sample, we compute the ratio of the probabilities of the air and the car mode, we obtain:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>head(Nprob&#91;, \"air\"] \/ Nprob&#91;, \"car\"])<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##       109       110       111       112       113       114 \n## 0.4539448 0.9197791 0.3422401 0.9197791 0.9197791 0.6021092<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>head(Oprob&#91;, \"air\"] \/ Oprob&#91;, \"car\"])<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##       109       110       111       112       113       114 \n## 0.4539448 0.9197791 0.3422401 0.9197791 0.9197791 0.6021092<\/code><\/pre>\n\n\n\n<p>which is an illustration of the IIA property. If train time changes, \nit changes the probabilities of choosing air and car, but not their \nratio.<\/p>\n\n\n\n<p>We next compute the surplus for individuals of the sample induced by \ntrain time reduction. This requires the computation of the log-sum term \n(also called inclusive value or inclusive utility) for every choice \nsituation, which is:iv<em>i<\/em>=ln\u2211<em>j<\/em>=1<em>J<\/em><em>e<\/em><em>\u03b2<\/em>\u22a4<em>x<\/em><em>i<\/em><em>j<\/em>.<\/p>\n\n\n\n<p>For this purpose, we use the <code>logsum<\/code> function, which works on a vector of <code>coefficients<\/code> and a <code>model.matrix<\/code>. The basic use of <code>logsum<\/code> consists on providing as unique argument (called <code>coef<\/code>) a <code>mlogit<\/code> object. In this case, the <code>model.matrix<\/code> and the <code>coef<\/code> are extracted from the same model.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ivbefore &lt;- logsum(ml.MC1)<\/code><\/pre>\n\n\n\n<p>To compute the log-sum after train time reduction, we must provide a <code>model.matrix<\/code> which is not the one corresponding to the fitted model. This can be done using the <code>X<\/code> argument which is a matrix or an object from which a <code>model.matrix<\/code> can be extracted. This can also be done by filling the <code>data<\/code> argument (a <code>data.frame<\/code> or an object from which a <code>data.frame<\/code> can be extracted using a <code>model.frame<\/code> method), and eventually the <code>formula<\/code> argument (a <code>formula<\/code> or an object for which the <code>formula<\/code> method can be applied). If no formula is provided but if <code>data<\/code> is a <code>mlogit.data<\/code> object, the formula is extracted from it.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ivafter &lt;- logsum(ml.MC1, data = NMC)<\/code><\/pre>\n\n\n\n<p>Surplus variation is then computed as the difference of the log-sums \ndivided by the opposite of the cost coefficient which can be interpreted\n as the marginal utility of income:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>surplus &lt;- - (ivafter - ivbefore) \/ coef(ml.MC1)&#91;\"cost\"]\nsummary(surplus)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n##  0.5852  2.8439  3.8998  4.6971  5.8437 31.3912<\/code><\/pre>\n\n\n\n<p>Consumer\u2019s surplus variation range from 0.6 to 31 Canadian $, with a median value of about 4$.<\/p>\n\n\n\n<p>Marginal effects are computed using the <code>effects<\/code> method. By default, they are computed at the sample mean, but a <code>data<\/code>\n argument can be provided. The variation of the probability and of the \ncovariate can be either absolute or relative. This is indicated with the\n <code>type<\/code> argument which is a combination of two <code>a<\/code> (as absolute) and <code>r<\/code> (as relative) characters. For example, <code>type = \"ar\"<\/code> means that what is measured is an absolute variation of the probability for a relative variation of the covariate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>effects(ml.MC1, covariate = \"income\", type = \"ar\")<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##        car      train        air \n## -0.1822177 -0.1509079  0.3331256<\/code><\/pre>\n\n\n\n<p>The results indicate that, for a 100% increase of income, the probability of choosing <code>air<\/code> increases by 33 points of percentage, as the probabilities of choosing <code>car<\/code> and <code>train<\/code> decrease by 18 and 15 points of percentage.<\/p>\n\n\n\n<p>For an alternative specific covariate, a matrix of marginal effects is displayed.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>effects(ml.MC1, covariate = \"cost\", type = \"rr\")<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##              car      train        air\n## car   -0.9131273  0.9376923  0.9376923\n## train  0.3358005 -1.2505014  0.3358005\n## air    1.2316679  1.2316679 -3.1409703<\/code><\/pre>\n\n\n\n<p>The cell in the <em>l<\/em>th row and the <em>c<\/em>th column indicates the change of the probability of choosing alternative <em>c<\/em> when the cost of alternative <em>l<\/em><\/p>\n\n\n\n<p> changes. As <code>type = \"rr\"<\/code>,\n elasticities are computed. For example, a 10% change of train cost \nincreases the probabilities of choosing car and air by 3.36%. Note that \nthe relative changes of the probabilities of choosing one of these two \nmodes are equal, which is a consequence of the IIA property.<\/p>\n\n\n\n<p>Finally, in order to compute travel time valuation, we divide the \ncoefficients of travel times (in minutes) by the coefficient of monetary\n cost (in $).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>coef(ml.MC1)&#91;grep(\"time\", names(coef(ml.MC1)))] \/\n    coef(ml.MC1)&#91;\"cost\"] * 60 <\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##   car:time train:time   air:time \n##   29.52728   23.09447   36.95360<\/code><\/pre>\n\n\n\n<p>The value of travel time ranges from 23 for train to 37 Canadian $ per hour for plane.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">NOx<\/h3>\n\n\n\n<p>The second example is a data set used by Fowlie (2010), called <code>NOx<\/code>.\n She analyzed the effect of an emissions trading program (the NOx budget\n program which seeks to reduce the emission of nitrogen oxides) on the \nbehavior of producers. More precisely, coal electricity plant managers \nmay adopt one out of fifteen different technologies in order to comply \nto the emissions defined by the program. Some of them require high \ninvestment (the capital cost is <code>kcost<\/code>) and are very \nefficient to reduce emissions, some other require much less investment \nbut are less efficient and the operating cost (denoted <code>vcost<\/code>) is then higher, especially because pollution permits must be purchased to offset emissions exceeding their allocation.<\/p>\n\n\n\n<p>The focus of the paper is on the effects of the regulatory \nenvironment on manager\u2019s behavior. Some firms are deregulated, whereas \nother are either regulated or public. Rate of returns is applied for \nregulated firms, which means that they perceive a \u201cfair\u201d rate of return \non their investment. Public firms also enjoy significant cost of capital\n advantages. Therefore, the main hypothesis of the paper is that public \nand regulated firms will adopt much more capitalistic intensive \ntechnologies than deregulated and public ones, which means that the \ncoefficient of capital cost should take a higher negative value for \nderegulated firms. Capital cost is interacted with the age of the plant \n(measured as a deviation from the sample mean age), as firms should \nweight capital costs more heavily for older plants, as they have less \ntime to recover these costs.<\/p>\n\n\n\n<p>Multinomial logit models are estimated for the three subsamples \ndefined by the regulatory environment. The 15 technologies are not \navailable for every plants, the sample is therefore restricted to \navailable technologies, using the <code>available<\/code> covariate. Three technology dummies are introduced: <code>post<\/code> for post-combustion polution control technology, <code>cm<\/code> for combustion modification technology and <code>lnb<\/code> for low NOx burners technology.<\/p>\n\n\n\n<p>A last model is estimated for the whole sample, but the parameters \nare allowed to be proportional to each other. The scedasticity function \nis described in the fourth part of the formula, it contains here only \none covariate, <code>env<\/code>. Note also that for the last model, the author introduced a specific capital cost coefficient for deregulated firms.<a href=\"https:\/\/cran.r-project.org\/web\/packages\/mlogit\/vignettes\/c3.rum.html#fn1\"><sup>1<\/sup><\/a><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>data(\"NOx\", package = \"mlogit\")\nNOx$kdereg &lt;- with(NOx, kcost * (env == \"deregulated\"))\nNOxml &lt;- mlogit.data(NOx, chid.var = \"chid\", alt.var = \"alt\",\nid.var = \"id\")\nml.pub &lt;- mlogit(choice ~ post + cm + lnb + vcost + kcost + kcost:age |\n- 1, subset = available &amp; env == \"public\", data = NOxml)\nml.reg &lt;- update(ml.pub, subset = available &amp; env == \"regulated\")\nml.dereg &lt;- update(ml.pub, subset = available &amp; env == \"deregulated\")\nml.pool &lt;- mlogit(choice ~ post + cm + lnb + vcost + kcost + kcost:age +\nkdereg | - 1 | 0 | env, subset = available == 1, data = NOxml,\nmethod = \"bhhh\")<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"texreg\")\nhtmlreg(list(Public = ml.pub, Deregulated = ml.dereg, Regulated = ml.reg,\n             Pooled = ml.pool), caption = \"Environmental compliance choices.\",\n        omit.coef = \"(post)|(cm)|(lnb)\", float.pos = \"hbt\", label = \"tab:nox\")<\/code><\/pre>\n\n\n\n<p>\n&lt;!DOCTYPE HTML PUBLIC \u201c-\/\/W3C\/\/DTD HTML 4.01 Transitional\/\/EN\u201d \u201c<a href=\"http:\/\/www.w3.org\/TR\/html4\/loose.dtd\">http:\/\/www.w3.org\/TR\/html4\/loose.dtd<\/a>\u201d&gt;\n<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><th>\n\n<\/th><th>\n<strong>Public<\/strong>\n<\/th><th>\n<strong>Deregulated<\/strong>\n<\/th><th>\n<strong>Regulated<\/strong>\n<\/th><th>\n<strong>Pooled<\/strong>\n<\/th><\/tr><tr><td>\nvcost\n<\/td><td>\n-1.56<sup>***<\/sup>\n<\/td><td>\n-0.19<sup>***<\/sup>\n<\/td><td>\n-0.28<sup>***<\/sup>\n<\/td><td>\n-0.31<sup>***<\/sup>\n<\/td><\/tr><tr><td>\n<\/td><td>\n(0.36)\n<\/td><td>\n(0.06)\n<\/td><td>\n(0.06)\n<\/td><td>\n(0.04)\n<\/td><\/tr><tr><td>\nkcost\n<\/td><td>\n0.04\n<\/td><td>\n-0.06<sup>**<\/sup>\n<\/td><td>\n0.01\n<\/td><td>\n0.01\n<\/td><\/tr><tr><td>\n<\/td><td>\n(0.11)\n<\/td><td>\n(0.02)\n<\/td><td>\n(0.03)\n<\/td><td>\n(0.02)\n<\/td><\/tr><tr><td>\nkcost:age\n<\/td><td>\n-0.08\n<\/td><td>\n-0.04<sup>**<\/sup>\n<\/td><td>\n-0.02<sup>*<\/sup>\n<\/td><td>\n-0.02<sup>***<\/sup>\n<\/td><\/tr><tr><td>\n<\/td><td>\n(0.04)\n<\/td><td>\n(0.01)\n<\/td><td>\n(0.01)\n<\/td><td>\n(0.01)\n<\/td><\/tr><tr><td>\nkdereg\n<\/td><td>\n<\/td><td>\n<\/td><td>\n<\/td><td>\n-0.07<sup>***<\/sup>\n<\/td><\/tr><tr><td>\n<\/td><td>\n<\/td><td>\n<\/td><td>\n<\/td><td>\n(0.01)\n<\/td><\/tr><tr><td>\nsig.envderegulated\n<\/td><td>\n<\/td><td>\n<\/td><td>\n<\/td><td>\n0.32<sup>**<\/sup>\n<\/td><\/tr><tr><td>\n<\/td><td>\n<\/td><td>\n<\/td><td>\n<\/td><td>\n(0.12)\n<\/td><\/tr><tr><td>\nsig.envpublic\n<\/td><td>\n<\/td><td>\n<\/td><td>\n<\/td><td>\n-0.33<sup>***<\/sup>\n<\/td><\/tr><tr><td>\n<\/td><td>\n<\/td><td>\n<\/td><td>\n<\/td><td>\n(0.08)\n<\/td><\/tr><tr><td>\nAIC\n<\/td><td>\n168.92\n<\/td><td>\n690.15\n<\/td><td>\n731.48\n<\/td><td>\n1634.22\n<\/td><\/tr><tr><td>\nLog Likelihood\n<\/td><td>\n-78.46\n<\/td><td>\n-339.07\n<\/td><td>\n-359.74\n<\/td><td>\n-808.11\n<\/td><\/tr><tr><td>\nNum. obs.\n<\/td><td>\n113\n<\/td><td>\n227\n<\/td><td>\n292\n<\/td><td>\n632\n<\/td><\/tr><tr><td>\n<em><strong>p &lt; 0.001; <\/strong>p &lt; 0.01; <\/em>p &lt; 0.05\n<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Results are presented in the preceeding table, using the <code>texreg<\/code> package (Leifeld 2013).\n Coefficients are very different on the sub-samples defined by the \nregulatory environment. Note in particular that the capital cost \ncoefficient is positive and insignificant for public and regulated \nfirms, as it is significantly negative for deregulated firms. Errors \nseems to have significant larger variance for deregulated firms and \nlower ones for public firms compared to regulated firms. The hypothesis \nthat the coefficients (except the <code>kcost<\/code> one) are identical up to a multiplicative scalar can be performed using a likelihood ratio test:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>stat &lt;- 2 * (logLik(ml.dereg) + logLik(ml.reg) +\n             logLik(ml.pub) - logLik(ml.pool))\nstat<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## 'log Lik.' 61.67179 (df=6)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>pchisq(stat, df = 9, lower.tail = FALSE)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## 'log Lik.' 6.377284e-10 (df=6)<\/code><\/pre>\n\n\n\n<p>The hypothesis is strongly rejected.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Bibliography<\/h2>\n\n\n\n<p>Fowlie, Meredith. 2010. \u201cEmissions Trading, Electricity Restructuring, and Investment in Pollution Abatement.\u201d <em>American Economic Review<\/em> 100 (3): 837\u201369. <a href=\"https:\/\/doi.org\/10.1257\/aer.100.3.837\">https:\/\/doi.org\/10.1257\/aer.100.3.837<\/a>.<\/p>\n\n\n\n<p>Leifeld, Philip. 2013. \u201ctexreg: Conversion of Statistical Model Output in R to LaTeX and HTML Tables.\u201d <em>Journal of Statistical Software<\/em> 55 (8): 1\u201324. <a href=\"https:\/\/www.jstatsoft.org\/v55\/i08\/\">https:\/\/www.jstatsoft.org\/v55\/i08\/<\/a>.<\/p>\n\n\n\n<p>McFadden, D. 1974. \u201cThe Measurement of Urban Travel Demand.\u201d <em>Journal of Public Economics<\/em> 3: 303\u201328.<\/p>\n\n\n\n<p>Small, K. A., and H. S. Rosen. 1981. \u201cApplied Welfare Economics with Discrete Choice Models.\u201d <em>Econometrica<\/em> 49: 105\u201330.<\/p>\n\n\n\n<p>Toomet, Ott, and Arne Henningsen. 2010. <em>MaxLik: Maximum Likelihood Estimation<\/em>. <a href=\"https:\/\/CRAN.R-project.org\/package=maxLik\">https:\/\/CRAN.R-project.org\/package=maxLik<\/a>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Note the use of the <code>method<\/code> argument, set to <code>bhhh<\/code>. <code>mlogit<\/code> use its own optimisation functions, but borrows its syntax from package <code>maxLik<\/code> (Toomet and Henningsen 2010). The default method is <code>bfgs<\/code>, except for the basic model, for which it is <code>nr<\/code>. As the default algorithm failed to converged, we use here <code>bhhh<\/code>.<a href=\"https:\/\/cran.r-project.org\/web\/packages\/mlogit\/vignettes\/c3.rum.html#fnref1\">\u21a9<\/a><\/li>\n<\/ol>\n\n\n\n<p><a href=\"https:\/\/cran.r-project.org\/web\/packages\/mlogit\/vignettes\/c3.rum.html\">https:\/\/cran.r-project.org\/web\/packages\/mlogit\/vignettes\/c3.rum.html<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Random utility model The utility for alternative l is written as: Ul=Vl+\u03f5l where Vl is a function of some observable covariates and unknown parameters to&hellip; <\/p>\n","protected":false},"author":1,"featured_media":965,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-1144","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/1144","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=1144"}],"version-history":[{"count":1,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/1144\/revisions"}],"predecessor-version":[{"id":1240,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/1144\/revisions\/1240"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media\/965"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=1144"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=1144"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=1144"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}