{"id":141,"date":"2013-03-31T23:45:53","date_gmt":"2013-04-01T04:45:53","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=141"},"modified":"2013-03-31T23:45:53","modified_gmt":"2013-04-01T04:45:53","slug":"reshaping-and-pivot-tables","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2013\/03\/31\/reshaping-and-pivot-tables\/","title":{"rendered":"Reshaping and Pivot Tables"},"content":{"rendered":"<h2>Reshaping by pivoting DataFrame objects<\/h2>\n<p>Data is often stored in CSV files or databases in so-called \u201cstacked\u201d or \u201crecord\u201d format:<\/p>\n<div>\n<div>\n<pre>In [1450]: df\nOut[1450]: \n                  date variable     value\n0  2000-01-03 00:00:00        A  0.469112\n1  2000-01-04 00:00:00        A -0.282863\n2  2000-01-05 00:00:00        A -1.509059\n3  2000-01-03 00:00:00        B -1.135632\n4  2000-01-04 00:00:00        B  1.212112\n5  2000-01-05 00:00:00        B -0.173215\n6  2000-01-03 00:00:00        C  0.119209\n7  2000-01-04 00:00:00        C -1.044236\n8  2000-01-05 00:00:00        C -0.861849\n9  2000-01-03 00:00:00        D -2.104569\n10 2000-01-04 00:00:00        D -0.494929\n11 2000-01-05 00:00:00        D  1.071804<\/pre>\n<\/div>\n<\/div>\n<p>For the curious here is how the above DataFrame was created:<\/p>\n<div>\n<div>\n<pre>import pandas.util.testing as tm; tm.N = 3\ndef unpivot(frame):\n    N, K = frame.shape\n    data = {'value' : frame.values.ravel('F'),\n            'variable' : np.asarray(frame.columns).repeat(N),\n            'date' : np.tile(np.asarray(frame.index), K)}\n    return DataFrame(data, columns=['date', 'variable', 'value'])\ndf = unpivot(tm.makeTimeDataFrame())<\/pre>\n<\/div>\n<\/div>\n<p>To select out everything for variable\u00a0<tt>A<\/tt>\u00a0we could do:<\/p>\n<div>\n<div>\n<pre>In [1451]: df[df['variable'] == 'A']\nOut[1451]: \n                 date variable     value\n0 2000-01-03 00:00:00        A  0.469112\n1 2000-01-04 00:00:00        A -0.282863\n2 2000-01-05 00:00:00        A -1.509059<\/pre>\n<\/div>\n<\/div>\n<p>But suppose we wish to do time series operations with the variables. A better representation would be where the\u00a0<tt>columns<\/tt>\u00a0are the unique variables and an<tt>index<\/tt>\u00a0of dates identifies individual observations. To reshape the data into this form, use the\u00a0<tt>pivot<\/tt>\u00a0function:<\/p>\n<div>\n<div>\n<pre>In [1452]: df.pivot(index='date', columns='variable', values='value')\nOut[1452]: \nvariable           A         B         C         D\ndate                                              \n2000-01-03  0.469112 -1.135632  0.119209 -2.104569\n2000-01-04 -0.282863  1.212112 -1.044236 -0.494929\n2000-01-05 -1.509059 -0.173215 -0.861849  1.071804<\/pre>\n<\/div>\n<\/div>\n<p>If the\u00a0<tt>values<\/tt>\u00a0argument is omitted, and the input DataFrame has more than one column of values which are not used as column or index inputs to\u00a0<tt>pivot<\/tt>, then the resulting \u201cpivoted\u201d DataFrame will have\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/indexing.html#indexing-hierarchical\"><em>hierarchical columns<\/em><\/a>\u00a0whose topmost level indicates the respective value column:<\/p>\n<div>\n<div>\n<pre>In [1453]: df['value2'] = df['value'] * 2\n\nIn [1454]: pivoted = df.pivot('date', 'variable')\n\nIn [1455]: pivoted\nOut[1455]: \n               value                                  value2                      \\\nvariable           A         B         C         D         A         B         C   \ndate                                                                               \n2000-01-03  0.469112 -1.135632  0.119209 -2.104569  0.938225 -2.271265  0.238417   \n2000-01-04 -0.282863  1.212112 -1.044236 -0.494929 -0.565727  2.424224 -2.088472   \n2000-01-05 -1.509059 -0.173215 -0.861849  1.071804 -3.018117 -0.346429 -1.723698   \n\nvariable           D  \ndate                  \n2000-01-03 -4.209138  \n2000-01-04 -0.989859  \n2000-01-05  2.143608<\/pre>\n<\/div>\n<\/div>\n<p>You of course can then select subsets from the pivoted DataFrame:<\/p>\n<div>\n<div>\n<pre>In [1456]: pivoted['value2']\nOut[1456]: \nvariable           A         B         C         D\ndate                                              \n2000-01-03  0.938225 -2.271265  0.238417 -4.209138\n2000-01-04 -0.565727  2.424224 -2.088472 -0.989859\n2000-01-05 -3.018117 -0.346429 -1.723698  2.143608<\/pre>\n<\/div>\n<\/div>\n<p>Note that this returns a view on the underlying data in the case where the data are homogeneously-typed.<\/p>\n<div id=\"reshaping-by-stacking-and-unstacking\">\n<h2>Reshaping by stacking and unstacking<\/h2>\n<p>Closely related to the\u00a0<tt>pivot<\/tt>\u00a0function are the related\u00a0<tt>stack<\/tt>\u00a0and\u00a0<tt>unstack<\/tt>\u00a0functions currently available on Series and DataFrame. These functions are designed to work together with<tt>MultiIndex<\/tt>\u00a0objects (see the section on\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/indexing.html#indexing-hierarchical\"><em>hierarchical indexing<\/em><\/a>). Here are essentially what these functions do:<\/p>\n<blockquote>\n<ul>\n<li><tt>stack<\/tt>: \u201cpivot\u201d a level of the (possibly hierarchical) column labels, returning a DataFrame with an index with a new inner-most level of row labels.<\/li>\n<li><tt>unstack<\/tt>: inverse operation from\u00a0<tt>stack<\/tt>: \u201cpivot\u201d a level of the (possibly hierarchical) row index to the column axis, producing a reshaped DataFrame with a new inner-most level of column labels.<\/li>\n<\/ul>\n<\/blockquote>\n<p>The clearest way to explain is by example. Let\u2019s take a prior example data set from the hierarchical indexing section:<\/p>\n<div>\n<div>\n<pre>In [1457]: tuples = zip(*[['bar', 'bar', 'baz', 'baz',\n   ......:                 'foo', 'foo', 'qux', 'qux'],\n   ......:                ['one', 'two', 'one', 'two',\n   ......:                 'one', 'two', 'one', 'two']])\n   ......:\n\nIn [1458]: index = MultiIndex.from_tuples(tuples, names=['first', 'second'])\n\nIn [1459]: df = DataFrame(randn(8, 2), index=index, columns=['A', 'B'])\n\nIn [1460]: df2 = df[:4]\n\nIn [1461]: df2\nOut[1461]: \n                     A         B\nfirst second                    \nbar   one     0.721555 -0.706771\n      two    -1.039575  0.271860\nbaz   one    -0.424972  0.567020\n      two     0.276232 -1.087401<\/pre>\n<\/div>\n<\/div>\n<p>The\u00a0<tt>stack<\/tt>\u00a0function \u201ccompresses\u201d a level in the DataFrame\u2019s columns to produce either:<\/p>\n<blockquote>\n<ul>\n<li>A Series, in the case of a simple column Index<\/li>\n<li>A DataFrame, in the case of a\u00a0<tt>MultiIndex<\/tt>\u00a0in the columns<\/li>\n<\/ul>\n<\/blockquote>\n<p>If the columns have a\u00a0<tt>MultiIndex<\/tt>, you can choose which level to stack. The stacked level becomes the new lowest level in a\u00a0<tt>MultiIndex<\/tt>\u00a0on the columns:<\/p>\n<div>\n<div>\n<pre>In [1462]: stacked = df2.stack()\n\nIn [1463]: stacked\nOut[1463]: \nfirst  second   \nbar    one     A    0.721555\n               B   -0.706771\n       two     A   -1.039575\n               B    0.271860\nbaz    one     A   -0.424972\n               B    0.567020\n       two     A    0.276232\n               B   -1.087401\ndtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>With a \u201cstacked\u201d DataFrame or Series (having a\u00a0<tt>MultiIndex<\/tt>\u00a0as the\u00a0<tt>index<\/tt>), the inverse operation of\u00a0<tt>stack<\/tt>\u00a0is\u00a0<tt>unstack<\/tt>, which by default unstacks the\u00a0<strong>last level<\/strong>:<\/p>\n<div>\n<div>\n<pre>In [1464]: stacked.unstack()\nOut[1464]: \n                     A         B\nfirst second                    \nbar   one     0.721555 -0.706771\n      two    -1.039575  0.271860\nbaz   one    -0.424972  0.567020\n      two     0.276232 -1.087401\n\nIn [1465]: stacked.unstack(1)\nOut[1465]: \nsecond        one       two\nfirst                      \nbar   A  0.721555 -1.039575\n      B -0.706771  0.271860\nbaz   A -0.424972  0.276232\n      B  0.567020 -1.087401\n\nIn [1466]: stacked.unstack(0)\nOut[1466]: \nfirst          bar       baz\nsecond                      \none    A  0.721555 -0.424972\n       B -0.706771  0.567020\ntwo    A -1.039575  0.276232\n       B  0.271860 -1.087401<\/pre>\n<\/div>\n<\/div>\n<p id=\"reshaping-unstack-by-name\">If the indexes have names, you can use the level names instead of specifying the level numbers:<\/p>\n<div>\n<div>\n<pre>In [1467]: stacked.unstack('second')\nOut[1467]: \nsecond        one       two\nfirst                      \nbar   A  0.721555 -1.039575\n      B -0.706771  0.271860\nbaz   A -0.424972  0.276232\n      B  0.567020 -1.087401<\/pre>\n<\/div>\n<\/div>\n<p>You may also stack or unstack more than one level at a time by passing a list of levels, in which case the end result is as if each level in the list were processed individually.<\/p>\n<p>These functions are intelligent about handling missing data and do not expect each subgroup within the hierarchical index to have the same set of labels. They also can handle the index being unsorted (but you can make it sorted by calling\u00a0<tt>sortlevel<\/tt>, of course). Here is a more complex example:<\/p>\n<div>\n<div>\n<pre>In [1468]: columns = MultiIndex.from_tuples([('A', 'cat'), ('B', 'dog'),\n   ......:                                   ('B', 'cat'), ('A', 'dog')],\n   ......:                                  names=['exp', 'animal'])\n   ......:\n\nIn [1469]: df = DataFrame(randn(8, 4), index=index, columns=columns)\n\nIn [1470]: df2 = df.ix[[0, 1, 2, 4, 5, 7]]\n\nIn [1471]: df2\nOut[1471]: \nexp                  A         B                   A\nanimal             cat       dog       cat       dog\nfirst second                                        \nbar   one    -0.370647 -1.157892 -1.344312  0.844885\n      two     1.075770 -0.109050  1.643563 -1.469388\nbaz   one     0.357021 -0.674600 -1.776904 -0.968914\nfoo   one    -0.013960 -0.362543 -0.006154 -0.923061\n      two     0.895717  0.805244 -1.206412  2.565646\nqux   two     0.410835  0.813850  0.132003 -0.827317<\/pre>\n<\/div>\n<\/div>\n<p>As mentioned above,\u00a0<tt>stack<\/tt>\u00a0can be called with a\u00a0<tt>level<\/tt>\u00a0argument to select which level in the columns to stack:<\/p>\n<div>\n<div>\n<pre>In [1472]: df2.stack('exp')\nOut[1472]: \nanimal                 cat       dog\nfirst second exp                    \nbar   one    A   -0.370647  0.844885\n             B   -1.344312 -1.157892\n      two    A    1.075770 -1.469388\n             B    1.643563 -0.109050\nbaz   one    A    0.357021 -0.968914\n             B   -1.776904 -0.674600\nfoo   one    A   -0.013960 -0.923061\n             B   -0.006154 -0.362543\n      two    A    0.895717  2.565646\n             B   -1.206412  0.805244\nqux   two    A    0.410835 -0.827317\n             B    0.132003  0.813850\n\nIn [1473]: df2.stack('animal')\nOut[1473]: \nexp                         A         B\nfirst second animal                    \nbar   one    cat    -0.370647 -1.344312\n             dog     0.844885 -1.157892\n      two    cat     1.075770  1.643563\n             dog    -1.469388 -0.109050\nbaz   one    cat     0.357021 -1.776904\n             dog    -0.968914 -0.674600\nfoo   one    cat    -0.013960 -0.006154\n             dog    -0.923061 -0.362543\n      two    cat     0.895717 -1.206412\n             dog     2.565646  0.805244\nqux   two    cat     0.410835  0.132003\n             dog    -0.827317  0.813850<\/pre>\n<\/div>\n<\/div>\n<p>Unstacking when the columns are a\u00a0<tt>MultiIndex<\/tt>\u00a0is also careful about doing the right thing:<\/p>\n<div>\n<div>\n<pre>In [1474]: df[:3].unstack(0)\nOut[1474]: \nexp            A                   B                                     A          \nanimal       cat                 dog               cat                 dog          \nfirst        bar       baz       bar     baz       bar       baz       bar       baz\nsecond                                                                              \none    -0.370647  0.357021 -1.157892 -0.6746 -1.344312 -1.776904  0.844885 -0.968914\ntwo     1.075770       NaN -0.109050     NaN  1.643563       NaN -1.469388       NaN\n\nIn [1475]: df2.unstack(1)\nOut[1475]: \nexp            A                   B                                       A          \nanimal       cat                 dog                 cat                 dog          \nsecond       one       two       one       two       one       two       one       two\nfirst                                                                                 \nbar    -0.370647  1.075770 -1.157892 -0.109050 -1.344312  1.643563  0.844885 -1.469388\nbaz     0.357021       NaN -0.674600       NaN -1.776904       NaN -0.968914       NaN\nfoo    -0.013960  0.895717 -0.362543  0.805244 -0.006154 -1.206412 -0.923061  2.565646\nqux          NaN  0.410835       NaN  0.813850       NaN  0.132003       NaN -0.827317<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"reshaping-by-melt\">\n<h2>Reshaping by Melt<\/h2>\n<p>The\u00a0<tt>melt<\/tt>\u00a0function found in\u00a0<tt>pandas.core.reshape<\/tt>\u00a0is useful to massage a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are \u201cpivoted\u201d to the row axis, leaving just two non-identifier columns, \u201cvariable\u201d and \u201cvalue\u201d.<\/p>\n<p>For instance,<\/p>\n<div>\n<div>\n<pre>In [1476]: cheese = DataFrame({'first' : ['John', 'Mary'],\n   ......:                     'last' : ['Doe', 'Bo'],\n   ......:                     'height' : [5.5, 6.0],\n   ......:                     'weight' : [130, 150]})\n   ......:\n\nIn [1477]: cheese\nOut[1477]: \n  first  height last  weight\n0  John     5.5  Doe     130\n1  Mary     6.0   Bo     150\n\nIn [1478]: melt(cheese, id_vars=['first', 'last'])\nOut[1478]: \n  first last variable  value\n0  John  Doe   height    5.5\n1  Mary   Bo   height    6.0\n2  John  Doe   weight  130.0\n3  Mary   Bo   weight  150.0<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"combining-with-stats-and-groupby\">\n<h2>Combining with stats and GroupBy<\/h2>\n<p>It should be no shock that combining\u00a0<tt>pivot<\/tt>\u00a0\/\u00a0<tt>stack<\/tt>\u00a0\/\u00a0<tt>unstack<\/tt>\u00a0with GroupBy and the basic Series and DataFrame statistical functions can produce some very expressive and fast data manipulations.<\/p>\n<div>\n<div>\n<pre>In [1479]: df\nOut[1479]: \nexp                  A         B                   A\nanimal             cat       dog       cat       dog\nfirst second                                        \nbar   one    -0.370647 -1.157892 -1.344312  0.844885\n      two     1.075770 -0.109050  1.643563 -1.469388\nbaz   one     0.357021 -0.674600 -1.776904 -0.968914\n      two    -1.294524  0.413738  0.276662 -0.472035\nfoo   one    -0.013960 -0.362543 -0.006154 -0.923061\n      two     0.895717  0.805244 -1.206412  2.565646\nqux   one     1.431256  1.340309 -1.170299 -0.226169\n      two     0.410835  0.813850  0.132003 -0.827317\n\nIn [1480]: df.stack().mean(1).unstack()\nOut[1480]: \nanimal             cat       dog\nfirst second                    \nbar   one    -0.857479 -0.156504\n      two     1.359666 -0.789219\nbaz   one    -0.709942 -0.821757\n      two    -0.508931 -0.029148\nfoo   one    -0.010057 -0.642802\n      two    -0.155347  1.685445\nqux   one     0.130479  0.557070\n      two     0.271419 -0.006733\n\n# same result, another way\nIn [1481]: df.groupby(level=1, axis=1).mean()\nOut[1481]: \nanimal             cat       dog\nfirst second                    \nbar   one    -0.857479 -0.156504\n      two     1.359666 -0.789219\nbaz   one    -0.709942 -0.821757\n      two    -0.508931 -0.029148\nfoo   one    -0.010057 -0.642802\n      two    -0.155347  1.685445\nqux   one     0.130479  0.557070\n      two     0.271419 -0.006733\n\nIn [1482]: df.stack().groupby(level=1).mean()\nOut[1482]: \nexp            A         B\nsecond                    \none     0.016301 -0.644049\ntwo     0.110588  0.346200\n\nIn [1483]: df.mean().unstack(0)\nOut[1483]: \nexp            A         B\nanimal                    \ncat     0.311433 -0.431481\ndog    -0.184544  0.133632<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"pivot-tables-and-cross-tabulations\">\n<h2>Pivot tables and cross-tabulations<\/h2>\n<p id=\"reshaping-pivot\">The function\u00a0<tt>pandas.pivot_table<\/tt>\u00a0can be used to create spreadsheet-style pivot tables. It takes a number of arguments<\/p>\n<ul>\n<li><tt>data<\/tt>: A DataFrame object<\/li>\n<li><tt>values<\/tt>: a column or a list of columns to aggregate<\/li>\n<li><tt>rows<\/tt>: list of columns to group by on the table rows<\/li>\n<li><tt>cols<\/tt>: list of columns to group by on the table columns<\/li>\n<li><tt>aggfunc<\/tt>: function to use for aggregation, defaulting to\u00a0<tt>numpy.mean<\/tt><\/li>\n<\/ul>\n<p>Consider a data set like this:<\/p>\n<div>\n<div>\n<pre>In [1484]: df = DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,\n   ......:                 'B' : ['A', 'B', 'C'] * 8,\n   ......:                 'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,\n   ......:                 'D' : np.random.randn(24),\n   ......:                 'E' : np.random.randn(24)})\n   ......:\n\nIn [1485]: df\nOut[1485]: \n        A  B    C         D         E\n0     one  A  foo -0.076467  0.959726\n1     one  B  foo -1.187678 -1.110336\n2     two  C  foo  1.130127 -0.619976\n3   three  A  bar -1.436737  0.149748\n4     one  B  bar -1.413681 -0.732339\n5     one  C  bar  1.607920  0.687738\n6     two  A  foo  1.024180  0.176444\n7   three  B  foo  0.569605  0.403310\n8     one  C  foo  0.875906 -0.154951\n9     one  A  bar -2.211372  0.301624\n10    two  B  bar  0.974466 -2.179861\n11  three  C  bar -2.006747 -1.369849\n12    one  A  foo -0.410001 -0.954208\n13    one  B  foo -0.078638  1.462696\n14    two  C  foo  0.545952 -1.743161\n15  three  A  bar -1.219217 -0.826591\n16    one  B  bar -1.226825 -0.345352\n17    one  C  bar  0.769804  1.314232\n18    two  A  foo -1.281247  0.690579\n19  three  B  foo -0.727707  0.995761\n20    one  C  foo -0.121306  2.396780\n21    one  A  bar -0.097883  0.014871\n22    two  B  bar  0.695775  3.357427\n23  three  C  bar  0.341734 -0.317441<\/pre>\n<\/div>\n<\/div>\n<p>We can produce pivot tables from this data very easily:<\/p>\n<div>\n<div>\n<pre>In [1486]: pivot_table(df, values='D', rows=['A', 'B'], cols=['C'])\nOut[1486]: \nC             bar       foo\nA     B                    \none   A -1.154627 -0.243234\n      B -1.320253 -0.633158\n      C  1.188862  0.377300\nthree A -1.327977       NaN\n      B       NaN -0.079051\n      C -0.832506       NaN\ntwo   A       NaN -0.128534\n      B  0.835120       NaN\n      C       NaN  0.838040\n\nIn [1487]: pivot_table(df, values='D', rows=['B'], cols=['A', 'C'], aggfunc=np.sum)\nOut[1487]: \nA       one               three                 two          \nC       bar       foo       bar       foo       bar       foo\nB                                                            \nA -2.309255 -0.486468 -2.655954       NaN       NaN -0.257067\nB -2.640506 -1.266315       NaN -0.158102  1.670241       NaN\nC  2.377724  0.754600 -1.665013       NaN       NaN  1.676079\n\nIn [1488]: pivot_table(df, values=['D','E'], rows=['B'], cols=['A', 'C'], aggfunc=np.sum)\nOut[1488]: \n          D                                                           E            \\\nA       one               three                 two                 one             \nC       bar       foo       bar       foo       bar       foo       bar       foo   \nB                                                                                   \nA -2.309255 -0.486468 -2.655954       NaN       NaN -0.257067  0.316495  0.005518   \nB -2.640506 -1.266315       NaN -0.158102  1.670241       NaN -1.077692  0.352360   \nC  2.377724  0.754600 -1.665013       NaN       NaN  1.676079  2.001971  2.241830   \n\nA     three                two            \nC       bar      foo       bar       foo  \nB                                         \nA -0.676843      NaN       NaN  0.867024  \nB       NaN  1.39907  1.177566       NaN  \nC -1.687290      NaN       NaN -2.363137<\/pre>\n<\/div>\n<\/div>\n<p>The result object is a DataFrame having potentially hierarchical indexes on the rows and columns. If the\u00a0<tt>values<\/tt>\u00a0column name is not given, the pivot table will include all of the data that can be aggregated in an additional level of hierarchy in the columns:<\/p>\n<div>\n<div>\n<pre>In [1489]: pivot_table(df, rows=['A', 'B'], cols=['C'])\nOut[1489]: \n                D                   E          \nC             bar       foo       bar       foo\nA     B                                        \none   A -1.154627 -0.243234  0.158248  0.002759\n      B -1.320253 -0.633158 -0.538846  0.176180\n      C  1.188862  0.377300  1.000985  1.120915\nthree A -1.327977       NaN -0.338421       NaN\n      B       NaN -0.079051       NaN  0.699535\n      C -0.832506       NaN -0.843645       NaN\ntwo   A       NaN -0.128534       NaN  0.433512\n      B  0.835120       NaN  0.588783       NaN\n      C       NaN  0.838040       NaN -1.181568<\/pre>\n<\/div>\n<\/div>\n<p>You can render a nice output of the table omitting the missing values by calling\u00a0<tt>to_string<\/tt>\u00a0if you wish:<\/p>\n<div>\n<div>\n<pre>In [1490]: table = pivot_table(df, rows=['A', 'B'], cols=['C'])\n\nIn [1491]: print table.to_string(na_rep='')\n                D                   E          \nC             bar       foo       bar       foo\nA     B                                        \none   A -1.154627 -0.243234  0.158248  0.002759\n      B -1.320253 -0.633158 -0.538846  0.176180\n      C  1.188862  0.377300  1.000985  1.120915\nthree A -1.327977           -0.338421          \n      B           -0.079051            0.699535\n      C -0.832506           -0.843645          \ntwo   A           -0.128534            0.433512\n      B  0.835120            0.588783          \n      C            0.838040           -1.181568<\/pre>\n<\/div>\n<\/div>\n<p>Note that\u00a0<tt>pivot_table<\/tt>\u00a0is also available as an instance method on DataFrame.<\/p>\n<div id=\"cross-tabulations\">\n<h3>Cross tabulations<\/h3>\n<p>Use the\u00a0<tt>crosstab<\/tt>\u00a0function to compute a cross-tabulation of two (or more) factors. By default<tt>crosstab<\/tt>\u00a0computes a frequency table of the factors unless an array of values and an aggregation function are passed.<\/p>\n<p>It takes a number of arguments<\/p>\n<ul>\n<li><tt>rows<\/tt>: array-like, values to group by in the rows<\/li>\n<li><tt>cols<\/tt>: array-like, values to group by in the columns<\/li>\n<li><tt>values<\/tt>: array-like, optional, array of values to aggregate according to the factors<\/li>\n<li><tt>aggfunc<\/tt>: function, optional, If no values array is passed, computes a frequency table<\/li>\n<li><tt>rownames<\/tt>: sequence, default None, must match number of row arrays passed<\/li>\n<li><tt>colnames<\/tt>: sequence, default None, if passed, must match number of column arrays passed<\/li>\n<li><tt>margins<\/tt>: boolean, default False, Add row\/column margins (subtotals)<\/li>\n<\/ul>\n<p>Any Series passed will have their name attributes used unless row or column names for the cross-tabulation are specified<\/p>\n<p>For example:<\/p>\n<div>\n<div>\n<pre>In [1492]: foo, bar, dull, shiny, one, two = 'foo', 'bar', 'dull', 'shiny', 'one', 'two'\n\nIn [1493]: a = np.array([foo, foo, bar, bar, foo, foo], dtype=object)\n\nIn [1494]: b = np.array([one, one, two, one, two, one], dtype=object)\n\nIn [1495]: c = np.array([dull, dull, shiny, dull, dull, shiny], dtype=object)\n\nIn [1496]: crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])\nOut[1496]: \nb     one          two       \nc    dull  shiny  dull  shiny\na                            \nbar     1      0     0      1\nfoo     2      1     1      0<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"adding-margins-partial-aggregates\">\n<h3>Adding margins (partial aggregates)<\/h3>\n<p>If you pass\u00a0<tt>margins=True<\/tt>\u00a0to\u00a0<tt>pivot_table<\/tt>, special\u00a0<tt>All<\/tt>\u00a0columns and rows will be added with partial group aggregates across the categories on the rows and columns:<\/p>\n<div>\n<div>\n<pre>In [1497]: df.pivot_table(rows=['A', 'B'], cols='C', margins=True, aggfunc=np.std)\nOut[1497]: \n                D                             E                    \nC             bar       foo       All       bar       foo       All\nA     B                                                            \none   A  1.494463  0.235844  1.019752  0.202765  1.353355  0.795165\n      B  0.132127  0.784210  0.606779  0.273641  1.819408  1.139647\n      C  0.592638  0.705136  0.708771  0.442998  1.804346  1.074910\nthree A  0.153810       NaN  0.153810  0.690376       NaN  0.690376\n      B       NaN  0.917338  0.917338       NaN  0.418926  0.418926\n      C  1.660627       NaN  1.660627  0.744165       NaN  0.744165\ntwo   A       NaN  1.630183  1.630183       NaN  0.363548  0.363548\n      B  0.197065       NaN  0.197065  3.915454       NaN  3.915454\n      C       NaN  0.413074  0.413074       NaN  0.794212  0.794212\nAll      1.294620  0.824989  1.064129  1.403041  1.188419  1.248988<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"tiling\">\n<h2>Tiling<\/h2>\n<p id=\"reshaping-tile-cut\">The\u00a0<tt>cut<\/tt>\u00a0function computes groupings for the values of the input array and is often used to transform continuous variables to discrete or categorical variables:<\/p>\n<div>\n<div>\n<pre>In [1498]: ages = np.array([10, 15, 13, 12, 23, 25, 28, 59, 60])\n\nIn [1499]: cut(ages, bins=3)\nOut[1499]: \nCategorical: \narray([(9.95, 26.667], (9.95, 26.667], (9.95, 26.667], (9.95, 26.667],\n       (9.95, 26.667], (9.95, 26.667], (26.667, 43.333], (43.333, 60],\n       (43.333, 60]], dtype=object)\nLevels (3): Index([(9.95, 26.667], (26.667, 43.333], (43.333, 60]], dtype=object)<\/pre>\n<\/div>\n<\/div>\n<p>If the\u00a0<tt>bins<\/tt>\u00a0keyword is an integer, then equal-width bins are formed. Alternatively we can specify custom bin-edges:<\/p>\n<div>\n<div>\n<pre>In [1500]: cut(ages, bins=[0, 18, 35, 70])\nOut[1500]: \nCategorical: \narray([(0, 18], (0, 18], (0, 18], (0, 18], (18, 35], (18, 35], (18, 35],\n       (35, 70], (35, 70]], dtype=object)\nLevels (3): Index([(0, 18], (18, 35], (35, 70]], dtype=object)<\/pre>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Reshaping by pivoting DataFrame objects Data is often stored in CSV files or databases in so-called \u201cstacked\u201d or \u201crecord\u201d format: In [1450]: df Out[1450]: date&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[],"class_list":["post-141","post","type-post","status-publish","format-standard","hentry","category-python"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/141","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=141"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/141\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=141"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=141"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=141"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}