{"id":226,"date":"2013-06-09T00:45:39","date_gmt":"2013-06-09T05:45:39","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=226"},"modified":"2013-06-09T00:45:39","modified_gmt":"2013-06-09T05:45:39","slug":"frequencies-and-crosstabs","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2013\/06\/09\/frequencies-and-crosstabs\/","title":{"rendered":"Frequencies and Crosstabs"},"content":{"rendered":"<p>This section describes the creation of frequency and contingency tables from categorical variables, along with tests of independence, measures of association, and methods for graphically displaying results.<\/p>\n<h2>Generating Frequency Tables<\/h2>\n<p><strong>R<\/strong>\u00a0provides many methods for creating frequency and contingency tables. Three are described below. In the following examples, assume that A, B, and C represent categorical variables.<\/p>\n<h3>table<\/h3>\n<p>You can generate frequency tables using the\u00a0<strong>table( )<\/strong>\u00a0function, tables of proportions using the<strong>prop.table( )<\/strong>\u00a0function, and marginal frequencies using\u00a0<strong>margin.table( )<\/strong>.<\/p>\n<p><code># 2-Way Frequency Table<br \/>\nattach(mydata)<br \/>\nmytable &lt;- table(A,B) # A will be rows, B will be columns<br \/>\nmytable # print table<\/p>\n<p>margin.table(mytable, 1) # A frequencies (summed over B)<br \/>\nmargin.table(mytable, 2) # B frequencies (summed over A)<\/p>\n<p>prop.table(mytable) # cell percentages<br \/>\nprop.table(mytable, 1) # row percentages<br \/>\nprop.table(mytable, 2) # column percentages<\/code><\/p>\n<p><strong>table( )<\/strong>\u00a0can also generate multidimensional tables based on 3 or more categorical variables. In this case, use the\u00a0<strong>ftable( )<\/strong>\u00a0function to print the results more attractively.<\/p>\n<p><code># 3-Way Frequency Table<br \/>\nmytable &lt;- table(A, B, C)<br \/>\nftable(mytable)<\/code><\/p>\n<p><strong>Table ignores missing values.\u00a0<\/strong>To include\u00a0<strong>NA<\/strong>\u00a0as a category in counts, include the table option exclude=NULL if the variable is a vector. If the variable is a factor you have to create a new factor using newfactor &lt;- factor(oldfactor, exclude=NULL).<\/p>\n<h3>xtabs<\/h3>\n<p>The\u00a0<strong>xtabs( )<\/strong>\u00a0function allows you to create crosstabulations using formula style input.<\/p>\n<p><code># 3-Way Frequency Table<br \/>\nmytable &lt;- xtabs(~A+B+c, data=mydata)<br \/>\nftable(mytable) # print table<br \/>\nsummary(mytable) # chi-square test of indepedence<\/code><\/p>\n<p>If a variable is included on the left side of the formula, it is assumed to be a vector of frequencies (useful if the data have already been tabulated).<\/p>\n<h3>Crosstable<\/h3>\n<p>The\u00a0<strong>CrossTable( )<\/strong>\u00a0function in the\u00a0<strong><a href=\"http:\/\/cran.r-project.org\/web\/packages\/gmodels\/index.html\">gmodels<\/a><\/strong>\u00a0package produces crosstabulations modeled after PROC FREQ in\u00a0<strong>SAS<\/strong>\u00a0or CROSSTABS in\u00a0<strong>SPSS<\/strong>. It has a wealth of options.<\/p>\n<p><code># 2-Way Cross Tabulation<br \/>\nlibrary(gmodels)<br \/>\nCrossTable(mydata$myrowvar, mydata$mycolvar)<\/code><\/p>\n<p>There are options to report percentages (row, column, cell), specify decimal places, produce Chi-square, Fisher, and McNemar tests of independence, report expected and residual values (pearson, standardized, adjusted standardized), include missing values as valid, annotate with row and column titles, and format as\u00a0<strong>SAS<\/strong>\u00a0or\u00a0<strong>SPSS<\/strong>\u00a0style output!<br \/>\nSee\u00a0<strong>help(CrossTable)<\/strong>\u00a0for details.<\/p>\n<h2>Tests of Independence<\/h2>\n<p>&nbsp;<\/p>\n<h3>Chi-Square Test<\/h3>\n<p>For 2-way tables you can use\u00a0<strong>chisq.test(<\/strong><em>mytable<\/em><strong>)<\/strong>\u00a0to test independence of the row and column variable. By default, the p-value is calculated from the asymptotic chi-squared distribution of the test statistic. Optionally, the p-value can be derived via Monte Carlo simultation.<\/p>\n<h3>Fisher Exact Test<\/h3>\n<p><strong>fisher.test(<\/strong><em>x<\/em><strong>)<\/strong>\u00a0provides an exact test of independence.\u00a0<em>x<\/em>\u00a0is a two dimensional contingency table in matrix form.<\/p>\n<h3><strong>Mantel<\/strong>&#8211;<strong>Haenszel<\/strong>\u00a0test<\/h3>\n<p>Use the\u00a0<strong>mantelhaen.test(<\/strong><em>x<\/em><strong>)<\/strong>\u00a0function to perform a Cochran-Mantel-Haenszel chi-squared test of the null hypothesis that two nominal variables are conditionally independent in each stratum, assuming that there is no three-way interaction.<em>\u00a0x<\/em>\u00a0is a 3 dimensional contingency table, where the last dimension refers to the strata.<\/p>\n<h3>Loglinear Models<\/h3>\n<p>You can use the<strong>\u00a0loglm( )<\/strong>\u00a0function in the\u00a0<strong>MASS<\/strong>\u00a0package to produce log-linear models. For example, let&#8217;s assume we have a 3-way contingency table based on variables A, B, and C.<\/p>\n<p><code>library(MASS)<br \/>\nmytable &lt;- xtabs(~A+B+C, data=mydata)<\/code><\/p>\n<p>We can perform the following tests:<\/p>\n<p><strong>Mutual Independence<\/strong>: A, B, and C are pairwise independent.<code>loglm(~A+B+C, mytable)<\/code><\/p>\n<p><strong>Partial Independence<\/strong>: A is partially independent of B and C (i.e., A is independent of the composite variable BC).<code>loglin(~A+B+C+B*C, mytable)<\/code><\/p>\n<p><strong>Conditional Independence:<\/strong>\u00a0A is independent of B, given C.<code>loglm(~A+B+C+A*C+B*C, mytable)<\/code><\/p>\n<p><strong>No Three-Way Interaction<\/strong><code>loglm(~A+B+C+A*B+A*C+B*C, mytable)<\/code><\/p>\n<p>Martin Theus and Stephan Lauer have written an excellent article on\u00a0<a href=\"http:\/\/home.vrweb.de\/~martin.theus\/theus.pdf\">Visualizing Loglinear Models<\/a>, using<a href=\"http:\/\/www.statmethods.net\/advgraphs\/mosaic.html\">mosaic plots<\/a>. There is also great tutorial example by Kevin Quinn on\u00a0<a href=\"http:\/\/www.stat.washington.edu\/quinn\/classes\/536\/S\/loglinexample.html\">analyzing loglinear models<\/a>\u00a0via\u00a0<a href=\"http:\/\/www.statmethods.net\/advstats\/glm.html\">glm<\/a>.<\/p>\n<h2>Measures of Association<\/h2>\n<p>The\u00a0<strong>assocstats(<\/strong><em>mytable<\/em><strong>)\u00a0<\/strong>function in the\u00a0<a href=\"http:\/\/cran.r-project.org\/web\/packages\/vcd\/index.html\"><strong>vcd<\/strong><\/a>\u00a0package calculates the phi coefficient, contingency coefficient, and Cramer&#8217;s V for an rxc table. The\u00a0<strong>kappa(<\/strong><em>mytable<\/em><strong>)<\/strong>\u00a0function in the\u00a0<a href=\"http:\/\/cran.r-project.org\/web\/packages\/vcd\/index.html\"><strong>vcd<\/strong><\/a>\u00a0package calculates Cohen&#8217;s kappa and weighted kappa for a confusion matrix. See Richard Darlington&#8217;s article on\u00a0<a href=\"http:\/\/node101.psych.cornell.edu\/Darlington\/crosstab\/TABLE0.HTM\">Measures of Association in Crosstab Tables<\/a>\u00a0for an excellent review of these statistics.<\/p>\n<h2>Visualizing results<\/h2>\n<p>Use\u00a0<a href=\"http:\/\/www.statmethods.net\/graphs\/bar.html\">bar<\/a>\u00a0and\u00a0<a href=\"http:\/\/www.statmethods.net\/graphs\/pie.html\">pie charts<\/a>\u00a0for visualizing frequencies in one dimension.<\/p>\n<p>Use the\u00a0<strong><a href=\"http:\/\/www.statmethods.net\/advgraphs\/mosaic.html\">vcd<\/a><\/strong>\u00a0package for visualizing relationships among categorical data (e.g. mosaic and association plots).<\/p>\n<p>Use the\u00a0<a href=\"http:\/\/www.statmethods.net\/advstats\/ca.html\"><strong>ca<\/strong><\/a>\u00a0package for correspondence analysis (visually exploring relationships between rows and columns in contingency tables).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This section describes the creation of frequency and contingency tables from categorical variables, along with tests of independence, measures of association, and methods for graphically&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-226","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/226","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=226"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/226\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=226"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=226"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=226"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}