{"id":131,"date":"2013-03-31T23:41:34","date_gmt":"2013-04-01T04:41:34","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=131"},"modified":"2013-03-31T23:41:34","modified_gmt":"2013-04-01T04:41:34","slug":"10-minutes-to-pandas","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2013\/03\/31\/10-minutes-to-pandas\/","title":{"rendered":"10 Minutes to Pandas"},"content":{"rendered":"<p>This is a short introduction to pandas, geared mainly for new users.<\/p>\n<p>Customarily, we import as follows<\/p>\n<div>\n<div>\n<pre>In [1]: import pandas as pd\n\nIn [2]: import numpy as np<\/pre>\n<\/div>\n<\/div>\n<div id=\"object-creation\">\n<h2>Object Creation<\/h2>\n<p>See the\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/dsintro.html#dsintro\"><em>Data Structure Intro section<\/em><\/a><\/p>\n<p>Creating a\u00a0<tt>Series<\/tt>\u00a0by passing a list of values, letting pandas create a default integer index<\/p>\n<div>\n<div>\n<pre>In [3]: s = pd.Series([1,3,5,np.nan,6,8])\n\nIn [4]: s\nOut[4]: \n0     1\n1     3\n2     5\n3   NaN\n4     6\n5     8\ndtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>Creating a\u00a0<tt>DataFrame<\/tt>\u00a0by passing a numpy array, with a datetime index and labeled columns.<\/p>\n<div>\n<div>\n<pre>In [5]: dates = pd.date_range('20130101',periods=6)\n\nIn [6]: dates\nOut[6]: \n&lt;class 'pandas.tseries.index.DatetimeIndex'&gt;\n[2013-01-01 00:00:00, ..., 2013-01-06 00:00:00]\nLength: 6, Freq: D, Timezone: None\n\nIn [7]: df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))\n\nIn [8]: df\nOut[8]: \n                   A         B         C         D\n2013-01-01  0.469112 -0.282863 -1.509059 -1.135632\n2013-01-02  1.212112 -0.173215  0.119209 -1.044236\n2013-01-03 -0.861849 -2.104569 -0.494929  1.071804\n2013-01-04  0.721555 -0.706771 -1.039575  0.271860\n2013-01-05 -0.424972  0.567020  0.276232 -1.087401\n2013-01-06 -0.673690  0.113648 -1.478427  0.524988<\/pre>\n<\/div>\n<\/div>\n<p>Creating a\u00a0<tt>DataFrame<\/tt>\u00a0by passing a dict of objects that can be converted to series-like.<\/p>\n<div>\n<div>\n<pre>In [9]: df2 = pd.DataFrame({ 'A' : 1.,\n   ...:                      'B' : pd.Timestamp('20130102'),\n   ...:                      'C' : pd.Series(1,index=range(4),dtype='float32'),\n   ...:                      'D' : np.array([3] * 4,dtype='int32'),\n   ...:                      'E' : 'foo' })\n   ...:\n\nIn [10]: df2\nOut[10]: \n   A                   B  C  D    E\n0  1 2013-01-02 00:00:00  1  3  foo\n1  1 2013-01-02 00:00:00  1  3  foo\n2  1 2013-01-02 00:00:00  1  3  foo\n3  1 2013-01-02 00:00:00  1  3  foo<\/pre>\n<\/div>\n<\/div>\n<p>Having specific\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/basics.html#basics-dtypes\"><em>dtypes<\/em><\/a><\/p>\n<div>\n<div>\n<pre>In [11]: df2.dtypes\nOut[11]: \nA           float64\nB    datetime64[ns]\nC           float32\nD             int32\nE            object\ndtype: object<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"viewing-data\">\n<h2>Viewing Data<\/h2>\n<p>See the\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/basics.html#basics\"><em>Basics section<\/em><\/a><\/p>\n<p>See the top &amp; bottom rows of the frame<\/p>\n<div>\n<div>\n<pre>In [12]: df.head()\nOut[12]: \n                   A         B         C         D\n2013-01-01  0.469112 -0.282863 -1.509059 -1.135632\n2013-01-02  1.212112 -0.173215  0.119209 -1.044236\n2013-01-03 -0.861849 -2.104569 -0.494929  1.071804\n2013-01-04  0.721555 -0.706771 -1.039575  0.271860\n2013-01-05 -0.424972  0.567020  0.276232 -1.087401\n\nIn [13]: df.tail(3)\nOut[13]: \n                   A         B         C         D\n2013-01-04  0.721555 -0.706771 -1.039575  0.271860\n2013-01-05 -0.424972  0.567020  0.276232 -1.087401\n2013-01-06 -0.673690  0.113648 -1.478427  0.524988<\/pre>\n<\/div>\n<\/div>\n<p>Display the index,columns, and the underlying numpy data<\/p>\n<div>\n<div>\n<pre>In [14]: df.index\nOut[14]: \n&lt;class 'pandas.tseries.index.DatetimeIndex'&gt;\n[2013-01-01 00:00:00, ..., 2013-01-06 00:00:00]\nLength: 6, Freq: D, Timezone: None\n\nIn [15]: df.columns\nOut[15]: Index([A, B, C, D], dtype=object)\n\nIn [16]: df.values\nOut[16]: \narray([[ 0.4691, -0.2829, -1.5091, -1.1356],\n       [ 1.2121, -0.1732,  0.1192, -1.0442],\n       [-0.8618, -2.1046, -0.4949,  1.0718],\n       [ 0.7216, -0.7068, -1.0396,  0.2719],\n       [-0.425 ,  0.567 ,  0.2762, -1.0874],\n       [-0.6737,  0.1136, -1.4784,  0.525 ]])<\/pre>\n<\/div>\n<\/div>\n<p>Describe shows a quick statistic summary of your data<\/p>\n<div>\n<div>\n<pre>In [17]: df.describe()\nOut[17]: \n              A         B         C         D\ncount  6.000000  6.000000  6.000000  6.000000\nmean   0.073711 -0.431125 -0.687758 -0.233103\nstd    0.843157  0.922818  0.779887  0.973118\nmin   -0.861849 -2.104569 -1.509059 -1.135632\n25%   -0.611510 -0.600794 -1.368714 -1.076610\n50%    0.022070 -0.228039 -0.767252 -0.386188\n75%    0.658444  0.041933 -0.034326  0.461706\nmax    1.212112  0.567020  0.276232  1.071804<\/pre>\n<\/div>\n<\/div>\n<p>Transposing your data<\/p>\n<div>\n<div>\n<pre>In [18]: df.T\nOut[18]: \n   2013-01-01  2013-01-02  2013-01-03  2013-01-04  2013-01-05  2013-01-06\nA    0.469112    1.212112   -0.861849    0.721555   -0.424972   -0.673690\nB   -0.282863   -0.173215   -2.104569   -0.706771    0.567020    0.113648\nC   -1.509059    0.119209   -0.494929   -1.039575    0.276232   -1.478427\nD   -1.135632   -1.044236    1.071804    0.271860   -1.087401    0.524988<\/pre>\n<\/div>\n<\/div>\n<p>Sorting by an axis<\/p>\n<div>\n<div>\n<pre>In [19]: df.sort_index(axis=1, ascending=False)\nOut[19]: \n                   D         C         B         A\n2013-01-01 -1.135632 -1.509059 -0.282863  0.469112\n2013-01-02 -1.044236  0.119209 -0.173215  1.212112\n2013-01-03  1.071804 -0.494929 -2.104569 -0.861849\n2013-01-04  0.271860 -1.039575 -0.706771  0.721555\n2013-01-05 -1.087401  0.276232  0.567020 -0.424972\n2013-01-06  0.524988 -1.478427  0.113648 -0.673690<\/pre>\n<\/div>\n<\/div>\n<p>Sorting by values<\/p>\n<div>\n<div>\n<pre>In [20]: df.sort(columns='B')\nOut[20]: \n                   A         B         C         D\n2013-01-03 -0.861849 -2.104569 -0.494929  1.071804\n2013-01-04  0.721555 -0.706771 -1.039575  0.271860\n2013-01-01  0.469112 -0.282863 -1.509059 -1.135632\n2013-01-02  1.212112 -0.173215  0.119209 -1.044236\n2013-01-06 -0.673690  0.113648 -1.478427  0.524988\n2013-01-05 -0.424972  0.567020  0.276232 -1.087401<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"selection\">\n<h2>Selection<\/h2>\n<p>See the\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/indexing.html#indexing\"><em>Indexing section<\/em><\/a><\/p>\n<div id=\"getting\">\n<h3>Getting<\/h3>\n<p>Selecting a single column, which yields a\u00a0<tt>Series<\/tt>, equivalent to\u00a0<tt>df.A<\/tt><\/p>\n<div>\n<div>\n<pre>In [21]: df['A']\nOut[21]: \n2013-01-01    0.469112\n2013-01-02    1.212112\n2013-01-03   -0.861849\n2013-01-04    0.721555\n2013-01-05   -0.424972\n2013-01-06   -0.673690\nFreq: D, Name: A, dtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>Selecting via\u00a0<tt>[]<\/tt>, which slices the rows.<\/p>\n<div>\n<div>\n<pre>In [22]: df[0:3]\nOut[22]: \n                   A         B         C         D\n2013-01-01  0.469112 -0.282863 -1.509059 -1.135632\n2013-01-02  1.212112 -0.173215  0.119209 -1.044236\n2013-01-03 -0.861849 -2.104569 -0.494929  1.071804\n\nIn [23]: df['20130102':'20130104']\nOut[23]: \n                   A         B         C         D\n2013-01-02  1.212112 -0.173215  0.119209 -1.044236\n2013-01-03 -0.861849 -2.104569 -0.494929  1.071804\n2013-01-04  0.721555 -0.706771 -1.039575  0.271860<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"selection-by-label\">\n<h3>Selection by Label<\/h3>\n<p>See more in\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/indexing.html#indexing-label\"><em>Selection by Label<\/em><\/a><\/p>\n<p>For getting a cross section using a label<\/p>\n<div>\n<div>\n<pre>In [24]: df.loc[dates[0]]\nOut[24]: \nA    0.469112\nB   -0.282863\nC   -1.509059\nD   -1.135632\nName: 2013-01-01 00:00:00, dtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>Selecting on a multi-axis by label<\/p>\n<div>\n<div>\n<pre>In [25]: df.loc[:,['A','B']]\nOut[25]: \n                   A         B\n2013-01-01  0.469112 -0.282863\n2013-01-02  1.212112 -0.173215\n2013-01-03 -0.861849 -2.104569\n2013-01-04  0.721555 -0.706771\n2013-01-05 -0.424972  0.567020\n2013-01-06 -0.673690  0.113648<\/pre>\n<\/div>\n<\/div>\n<p>Showing label slicing, both endpoints are\u00a0<em>included<\/em><\/p>\n<div>\n<div>\n<pre>In [26]: df.loc['20130102':'20130104',['A','B']]\nOut[26]: \n                   A         B\n2013-01-02  1.212112 -0.173215\n2013-01-03 -0.861849 -2.104569\n2013-01-04  0.721555 -0.706771<\/pre>\n<\/div>\n<\/div>\n<p>Reduction in the dimensions of the returned object<\/p>\n<div>\n<div>\n<pre>In [27]: df.loc['20130102',['A','B']]\nOut[27]: \nA    1.212112\nB   -0.173215\nName: 2013-01-02 00:00:00, dtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>For getting a scalar value<\/p>\n<div>\n<div>\n<pre>In [28]: df.loc[dates[0],'A']\nOut[28]: 0.46911229990718628<\/pre>\n<\/div>\n<\/div>\n<p>For getting fast access to a scalar (equiv to the prior method)<\/p>\n<div>\n<div>\n<pre>In [29]: df.at[dates[0],'A']\nOut[29]: 0.46911229990718628<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"selection-by-position\">\n<h3>Selection by Position<\/h3>\n<p>See more in\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/indexing.html#indexing-integer\"><em>Selection by Position<\/em><\/a><\/p>\n<p>Select via the position of the passed integers<\/p>\n<div>\n<div>\n<pre>In [30]: df.iloc[3]\nOut[30]: \nA    0.721555\nB   -0.706771\nC   -1.039575\nD    0.271860\nName: 2013-01-04 00:00:00, dtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>By integer slices, acting similar to numpy\/python<\/p>\n<div>\n<div>\n<pre>In [31]: df.iloc[3:5,0:2]\nOut[31]: \n                   A         B\n2013-01-04  0.721555 -0.706771\n2013-01-05 -0.424972  0.567020<\/pre>\n<\/div>\n<\/div>\n<p>By lists of integer position locations, similar to the numpy\/python style<\/p>\n<div>\n<div>\n<pre>In [32]: df.iloc[[1,2,4],[0,2]]\nOut[32]: \n                   A         C\n2013-01-02  1.212112  0.119209\n2013-01-03 -0.861849 -0.494929\n2013-01-05 -0.424972  0.276232<\/pre>\n<\/div>\n<\/div>\n<p>For slicing rows explicitly<\/p>\n<div>\n<div>\n<pre>In [33]: df.iloc[1:3,:]\nOut[33]: \n                   A         B         C         D\n2013-01-02  1.212112 -0.173215  0.119209 -1.044236\n2013-01-03 -0.861849 -2.104569 -0.494929  1.071804<\/pre>\n<\/div>\n<\/div>\n<p>For slicing columns explicitly<\/p>\n<div>\n<div>\n<pre>In [34]: df.iloc[:,1:3]\nOut[34]: \n                   B         C\n2013-01-01 -0.282863 -1.509059\n2013-01-02 -0.173215  0.119209\n2013-01-03 -2.104569 -0.494929\n2013-01-04 -0.706771 -1.039575\n2013-01-05  0.567020  0.276232\n2013-01-06  0.113648 -1.478427<\/pre>\n<\/div>\n<\/div>\n<p>For getting a value explicity<\/p>\n<div>\n<div>\n<pre>In [35]: df.iloc[1,1]\nOut[35]: -0.17321464905330858<\/pre>\n<\/div>\n<\/div>\n<p>For getting fast access to a scalar (equiv to the prior method)<\/p>\n<div>\n<div>\n<pre>In [36]: df.iat[1,1]\nOut[36]: -0.17321464905330858<\/pre>\n<\/div>\n<\/div>\n<p>There is one signficant departure from standard python\/numpy slicing semantics. python\/numpy allow slicing past the end of an array without an associated error.<\/p>\n<div>\n<div>\n<pre># these are allowed in python\/numpy.\nIn [37]: x = list('abcdef')\n\nIn [38]: x[4:10]\nOut[38]: ['e', 'f']\n\nIn [39]: x[8:10]\nOut[39]: []<\/pre>\n<\/div>\n<\/div>\n<p>Pandas will detect this and raise\u00a0<tt>IndexError<\/tt>, rather than return an empty structure.<\/p>\n<div>\n<div>\n<pre>&gt;&gt;&gt; df.iloc[:,8:10]\nIndexError: out-of-bounds on slice (end)<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"boolean-indexing\">\n<h3>Boolean Indexing<\/h3>\n<p>Using a single column\u2019s values to select data.<\/p>\n<div>\n<div>\n<pre>In [40]: df[df.A &gt; 0]\nOut[40]: \n                   A         B         C         D\n2013-01-01  0.469112 -0.282863 -1.509059 -1.135632\n2013-01-02  1.212112 -0.173215  0.119209 -1.044236\n2013-01-04  0.721555 -0.706771 -1.039575  0.271860<\/pre>\n<\/div>\n<\/div>\n<p>A\u00a0<tt>where<\/tt>\u00a0operation for getting.<\/p>\n<div>\n<div>\n<pre>In [41]: df[df &gt; 0]\nOut[41]: \n                   A         B         C         D\n2013-01-01  0.469112       NaN       NaN       NaN\n2013-01-02  1.212112       NaN  0.119209       NaN\n2013-01-03       NaN       NaN       NaN  1.071804\n2013-01-04  0.721555       NaN       NaN  0.271860\n2013-01-05       NaN  0.567020  0.276232       NaN\n2013-01-06       NaN  0.113648       NaN  0.524988<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"setting\">\n<h3>Setting<\/h3>\n<p>Setting a new column automatically aligns the data by the indexes<\/p>\n<div>\n<div>\n<pre>In [42]: s1 = pd.Series([1,2,3,4,5,6],index=date_range('20130102',periods=6))\n\nIn [43]: s1\nOut[43]: \n2013-01-02    1\n2013-01-03    2\n2013-01-04    3\n2013-01-05    4\n2013-01-06    5\n2013-01-07    6\nFreq: D, dtype: int64\n\nIn [44]: df['F'] = s1<\/pre>\n<\/div>\n<\/div>\n<p>Setting values by label<\/p>\n<div>\n<div>\n<pre>In [45]: df.at[dates[0],'A'] = 0<\/pre>\n<\/div>\n<\/div>\n<p>Setting values by position<\/p>\n<div>\n<div>\n<pre>In [46]: df.iat[0,1] = 0<\/pre>\n<\/div>\n<\/div>\n<p>Setting by assigning with a numpy array<\/p>\n<div>\n<div>\n<pre>In [47]: df.loc[:,'D'] = np.array([5] * len(df))<\/pre>\n<\/div>\n<\/div>\n<p>The result of the prior setting operations<\/p>\n<div>\n<div>\n<pre>In [48]: df\nOut[48]: \n                   A         B         C  D   F\n2013-01-01  0.000000  0.000000 -1.509059  5 NaN\n2013-01-02  1.212112 -0.173215  0.119209  5   1\n2013-01-03 -0.861849 -2.104569 -0.494929  5   2\n2013-01-04  0.721555 -0.706771 -1.039575  5   3\n2013-01-05 -0.424972  0.567020  0.276232  5   4\n2013-01-06 -0.673690  0.113648 -1.478427  5   5<\/pre>\n<\/div>\n<\/div>\n<p>A\u00a0<tt>where<\/tt>\u00a0operation with setting.<\/p>\n<div>\n<div>\n<pre>In [49]: df2 = df.copy()\n\nIn [50]: df2[df2 &gt; 0] = -df2\n\nIn [51]: df2\nOut[51]: \n                   A         B         C  D   F\n2013-01-01  0.000000  0.000000 -1.509059 -5 NaN\n2013-01-02 -1.212112 -0.173215 -0.119209 -5  -1\n2013-01-03 -0.861849 -2.104569 -0.494929 -5  -2\n2013-01-04 -0.721555 -0.706771 -1.039575 -5  -3\n2013-01-05 -0.424972 -0.567020 -0.276232 -5  -4\n2013-01-06 -0.673690 -0.113648 -1.478427 -5  -5<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"missing-data\">\n<h2>Missing Data<\/h2>\n<p>Pandas primarily uses the value\u00a0<tt>np.nan<\/tt>\u00a0to represent missing data. It is by default not included in computations. See the\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/missing_data.html#missing-data\"><em>Missing Data section<\/em><\/a><\/p>\n<p>Reindexing allows you to change\/add\/delete the index on a specified axis. This returns a copy of the data.<\/p>\n<div>\n<div>\n<pre>In [52]: df1 = df.reindex(index=dates[0:4],columns=list(df.columns) + ['E'])\n\nIn [53]: df1.loc[dates[0]:dates[1],'E'] = 1\n\nIn [54]: df1\nOut[54]: \n                   A         B         C  D   F   E\n2013-01-01  0.000000  0.000000 -1.509059  5 NaN   1\n2013-01-02  1.212112 -0.173215  0.119209  5   1   1\n2013-01-03 -0.861849 -2.104569 -0.494929  5   2 NaN\n2013-01-04  0.721555 -0.706771 -1.039575  5   3 NaN<\/pre>\n<\/div>\n<\/div>\n<p>To drop any rows that have missing data.<\/p>\n<div>\n<div>\n<pre>In [55]: df1.dropna(how='any')\nOut[55]: \n                   A         B         C  D  F  E\n2013-01-02  1.212112 -0.173215  0.119209  5  1  1<\/pre>\n<\/div>\n<\/div>\n<p>Filling missing data<\/p>\n<div>\n<div>\n<pre>In [56]: df1.fillna(value=5)\nOut[56]: \n                   A         B         C  D  F  E\n2013-01-01  0.000000  0.000000 -1.509059  5  5  1\n2013-01-02  1.212112 -0.173215  0.119209  5  1  1\n2013-01-03 -0.861849 -2.104569 -0.494929  5  2  5\n2013-01-04  0.721555 -0.706771 -1.039575  5  3  5<\/pre>\n<\/div>\n<\/div>\n<p>To get the boolean mask where values are\u00a0<tt>nan<\/tt><\/p>\n<div>\n<div>\n<pre>In [57]: pd.isnull(df1)\nOut[57]: \n                A      B      C      D      F      E\n2013-01-01  False  False  False  False   True  False\n2013-01-02  False  False  False  False  False  False\n2013-01-03  False  False  False  False  False   True\n2013-01-04  False  False  False  False  False   True<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"operations\">\n<h2>Operations<\/h2>\n<p>See the\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/basics.html#basics-binop\"><em>Basic section on Binary Ops<\/em><\/a><\/p>\n<div id=\"stats\">\n<h3>Stats<\/h3>\n<p>Operations in general\u00a0<em>exclude<\/em>\u00a0missing data.<\/p>\n<p>Performing a descriptive statistic<\/p>\n<div>\n<div>\n<pre>In [58]: df.mean()\nOut[58]: \nA   -0.004474\nB   -0.383981\nC   -0.687758\nD    5.000000\nF    3.000000\ndtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>Same operation on the other axis<\/p>\n<div>\n<div>\n<pre>In [59]: df.mean(1)\nOut[59]: \n2013-01-01    0.872735\n2013-01-02    1.431621\n2013-01-03    0.707731\n2013-01-04    1.395042\n2013-01-05    1.883656\n2013-01-06    1.592306\nFreq: D, dtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>Operating with objects that have different dimensionality and need alignment. In addition, pandas automatically broadcasts along the specified dimension.<\/p>\n<div>\n<div>\n<pre>In [60]: s = pd.Series([1,3,5,np.nan,6,8],index=dates).shift(2)\n\nIn [61]: s\nOut[61]: \n2013-01-01   NaN\n2013-01-02   NaN\n2013-01-03     1\n2013-01-04     3\n2013-01-05     5\n2013-01-06   NaN\nFreq: D, dtype: float64\n\nIn [62]: df.sub(s,axis='index')\nOut[62]: \n                   A         B         C   D   F\n2013-01-01       NaN       NaN       NaN NaN NaN\n2013-01-02       NaN       NaN       NaN NaN NaN\n2013-01-03 -1.861849 -3.104569 -1.494929   4   1\n2013-01-04 -2.278445 -3.706771 -4.039575   2   0\n2013-01-05 -5.424972 -4.432980 -4.723768   0  -1\n2013-01-06       NaN       NaN       NaN NaN NaN<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"apply\">\n<h3>Apply<\/h3>\n<p>Applying functions to the data<\/p>\n<div>\n<div>\n<pre>In [63]: df.apply(np.cumsum)\nOut[63]: \n                   A         B         C   D   F\n2013-01-01  0.000000  0.000000 -1.509059   5 NaN\n2013-01-02  1.212112 -0.173215 -1.389850  10   1\n2013-01-03  0.350263 -2.277784 -1.884779  15   3\n2013-01-04  1.071818 -2.984555 -2.924354  20   6\n2013-01-05  0.646846 -2.417535 -2.648122  25  10\n2013-01-06 -0.026844 -2.303886 -4.126549  30  15\n\nIn [64]: df.apply(lambda x: x.max() - x.min())\nOut[64]: \nA    2.073961\nB    2.671590\nC    1.785291\nD    0.000000\nF    4.000000\ndtype: float64<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"histogramming\">\n<h3>Histogramming<\/h3>\n<p>See more at\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/basics.html#basics-discretization\"><em>Histogramming and Discretization<\/em><\/a><\/p>\n<div>\n<div>\n<pre>In [65]: s = Series(np.random.randint(0,7,size=10))\n\nIn [66]: s\nOut[66]: \n0    4\n1    2\n2    1\n3    2\n4    6\n5    4\n6    4\n7    6\n8    4\n9    4\ndtype: int64\n\nIn [67]: s.value_counts()\nOut[67]: \n4    5\n6    2\n2    2\n1    1\ndtype: int64<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"string-methods\">\n<h3>String Methods<\/h3>\n<p>See more at\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/basics.html#basics-string-methods\"><em>Vectorized String Methods<\/em><\/a><\/p>\n<div>\n<div>\n<pre>In [68]: s = Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])\n\nIn [69]: s.str.lower()\nOut[69]: \n0       a\n1       b\n2       c\n3    aaba\n4    baca\n5     NaN\n6    caba\n7     dog\n8     cat\ndtype: object<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"merge\">\n<h2>Merge<\/h2>\n<div id=\"concat\">\n<h3>Concat<\/h3>\n<p>Pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join \/ merge-type operations.<\/p>\n<p>See the\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/merging.html#merging\"><em>Merging section<\/em><\/a><\/p>\n<p>Concatenating pandas objects together<\/p>\n<div>\n<div>\n<pre>In [70]: df = pd.DataFrame(np.random.randn(10, 4))\n\nIn [71]: df\nOut[71]: \n          0         1         2         3\n0 -0.548702  1.467327 -1.015962 -0.483075\n1  1.637550 -1.217659 -0.291519 -1.745505\n2 -0.263952  0.991460 -0.919069  0.266046\n3 -0.709661  1.669052  1.037882 -1.705775\n4 -0.919854 -0.042379  1.247642 -0.009920\n5  0.290213  0.495767  0.362949  1.548106\n6 -1.131345 -0.089329  0.337863 -0.945867\n7 -0.932132  1.956030  0.017587 -0.016692\n8 -0.575247  0.254161 -1.143704  0.215897\n9  1.193555 -0.077118 -0.408530 -0.862495\n\n# break it into pieces\nIn [72]: pieces = [df[:3], df[3:7], df[7:]]\n\nIn [73]: concat(pieces)\nOut[73]: \n          0         1         2         3\n0 -0.548702  1.467327 -1.015962 -0.483075\n1  1.637550 -1.217659 -0.291519 -1.745505\n2 -0.263952  0.991460 -0.919069  0.266046\n3 -0.709661  1.669052  1.037882 -1.705775\n4 -0.919854 -0.042379  1.247642 -0.009920\n5  0.290213  0.495767  0.362949  1.548106\n6 -1.131345 -0.089329  0.337863 -0.945867\n7 -0.932132  1.956030  0.017587 -0.016692\n8 -0.575247  0.254161 -1.143704  0.215897\n9  1.193555 -0.077118 -0.408530 -0.862495<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"join\">\n<h3>Join<\/h3>\n<p>SQL style merges. See the\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/merging.html#merging-join\"><em>Database style joining<\/em><\/a><\/p>\n<div>\n<div>\n<pre>In [74]: left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})\n\nIn [75]: right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})\n\nIn [76]: left\nOut[76]: \n   key  lval\n0  foo     1\n1  foo     2\n\nIn [77]: right\nOut[77]: \n   key  rval\n0  foo     4\n1  foo     5\n\nIn [78]: merge(left, right, on='key')\nOut[78]: \n   key  lval  rval\n0  foo     1     4\n1  foo     1     5\n2  foo     2     4\n3  foo     2     5<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"append\">\n<h3>Append<\/h3>\n<p>Append rows to a dataframe. See the\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/merging.html#merging-concatenation\"><em>Appending<\/em><\/a><\/p>\n<div>\n<div>\n<pre>In [79]: df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])\n\nIn [80]: df\nOut[80]: \n          A         B         C         D\n0  1.346061  1.511763  1.627081 -0.990582\n1 -0.441652  1.211526  0.268520  0.024580\n2 -1.577585  0.396823 -0.105381 -0.532532\n3  1.453749  1.208843 -0.080952 -0.264610\n4 -0.727965 -0.589346  0.339969 -0.693205\n5 -0.339355  0.593616  0.884345  1.591431\n6  0.141809  0.220390  0.435589  0.192451\n7 -0.096701  0.803351  1.715071 -0.708758\n\nIn [81]: s = df.iloc[3]\n\nIn [82]: df.append(s, ignore_index=True)\nOut[82]: \n          A         B         C         D\n0  1.346061  1.511763  1.627081 -0.990582\n1 -0.441652  1.211526  0.268520  0.024580\n2 -1.577585  0.396823 -0.105381 -0.532532\n3  1.453749  1.208843 -0.080952 -0.264610\n4 -0.727965 -0.589346  0.339969 -0.693205\n5 -0.339355  0.593616  0.884345  1.591431\n6  0.141809  0.220390  0.435589  0.192451\n7 -0.096701  0.803351  1.715071 -0.708758\n8  1.453749  1.208843 -0.080952 -0.264610\n\nIn [83]: df\nOut[83]: \n          A         B         C         D\n0  1.346061  1.511763  1.627081 -0.990582\n1 -0.441652  1.211526  0.268520  0.024580\n2 -1.577585  0.396823 -0.105381 -0.532532\n3  1.453749  1.208843 -0.080952 -0.264610\n4 -0.727965 -0.589346  0.339969 -0.693205\n5 -0.339355  0.593616  0.884345  1.591431\n6  0.141809  0.220390  0.435589  0.192451\n7 -0.096701  0.803351  1.715071 -0.708758<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"grouping\">\n<h2>Grouping<\/h2>\n<p>By \u201cgroup by\u201d we are referring to a process involving one or more of the following steps<\/p>\n<blockquote>\n<ul>\n<li><strong>Splitting<\/strong>\u00a0the data into groups based on some criteria<\/li>\n<li><strong>Applying<\/strong>\u00a0a function to each group independently<\/li>\n<li><strong>Combining<\/strong>\u00a0the results into a data structure<\/li>\n<\/ul>\n<\/blockquote>\n<p>See the\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/groupby.html#groupby\"><em>Grouping section<\/em><\/a><\/p>\n<div>\n<div>\n<pre>In [84]: df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',\n   ....:                          'foo', 'bar', 'foo', 'foo'],\n   ....:                    'B' : ['one', 'one', 'two', 'three',\n   ....:                          'two', 'two', 'one', 'three'],\n   ....:                    'C' : randn(8), 'D' : randn(8)})\n   ....:\n\nIn [85]: df\nOut[85]: \n     A      B         C         D\n0  foo    one -1.202872 -0.055224\n1  bar    one -1.814470  2.395985\n2  foo    two  1.018601  1.552825\n3  bar  three -0.595447  0.166599\n4  foo    two  1.395433  0.047609\n5  bar    two -0.392670 -0.136473\n6  foo    one  0.007207 -0.561757\n7  foo  three  1.928123 -1.623033<\/pre>\n<\/div>\n<\/div>\n<p>Grouping and then applying a function\u00a0<tt>sum<\/tt>\u00a0to the resulting groups.<\/p>\n<div>\n<div>\n<pre>In [86]: df.groupby('A').sum()\nOut[86]: \n            C        D\nA                     \nbar -2.802588  2.42611\nfoo  3.146492 -0.63958<\/pre>\n<\/div>\n<\/div>\n<p>Grouping by multiple columns forms a hierarchical index, which we then apply the function.<\/p>\n<div>\n<div>\n<pre>In [87]: df.groupby(['A','B']).sum()\nOut[87]: \n                  C         D\nA   B                        \nbar one   -1.814470  2.395985\n    three -0.595447  0.166599\n    two   -0.392670 -0.136473\nfoo one   -1.195665 -0.616981\n    three  1.928123 -1.623033\n    two    2.414034  1.600434<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"reshaping\">\n<h2>Reshaping<\/h2>\n<p>See the section on\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/indexing.html#indexing-hierarchical\"><em>Hierarchical Indexing<\/em><\/a>\u00a0and see the section on\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/reshaping.html#reshaping-stacking\"><em>Reshaping<\/em><\/a>).<\/p>\n<div id=\"stack\">\n<h3>Stack<\/h3>\n<div>\n<div>\n<pre>In [88]: tuples = zip(*[['bar', 'bar', 'baz', 'baz',\n   ....:                 'foo', 'foo', 'qux', 'qux'],\n   ....:                ['one', 'two', 'one', 'two',\n   ....:                 'one', 'two', 'one', 'two']])\n   ....:\n\nIn [89]: index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])\n\nIn [90]: df = pd.DataFrame(randn(8, 2), index=index, columns=['A', 'B'])\n\nIn [91]: df2 = df[:4]\n\nIn [92]: df2\nOut[92]: \n                     A         B\nfirst second                    \nbar   one     0.029399 -0.542108\n      two     0.282696 -0.087302\nbaz   one    -1.575170  1.771208\n      two     0.816482  1.100230<\/pre>\n<\/div>\n<\/div>\n<p>The\u00a0<tt>stack<\/tt>\u00a0function \u201ccompresses\u201d a level in the DataFrame\u2019s columns. to<\/p>\n<div>\n<div>\n<pre>In [93]: stacked = df2.stack()\n\nIn [94]: stacked\nOut[94]: \nfirst  second   \nbar    one     A    0.029399\n               B   -0.542108\n       two     A    0.282696\n               B   -0.087302\nbaz    one     A   -1.575170\n               B    1.771208\n       two     A    0.816482\n               B    1.100230\ndtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>With a \u201cstacked\u201d DataFrame or Series (having a\u00a0<tt>MultiIndex<\/tt>\u00a0as the\u00a0<tt>index<\/tt>), the inverse operation of\u00a0<tt>stack<\/tt>\u00a0is\u00a0<tt>unstack<\/tt>, which by default unstacks the\u00a0<strong>last level<\/strong>:<\/p>\n<div>\n<div>\n<pre>In [95]: stacked.unstack()\nOut[95]: \n                     A         B\nfirst second                    \nbar   one     0.029399 -0.542108\n      two     0.282696 -0.087302\nbaz   one    -1.575170  1.771208\n      two     0.816482  1.100230\n\nIn [96]: stacked.unstack(1)\nOut[96]: \nsecond        one       two\nfirst                      \nbar   A  0.029399  0.282696\n      B -0.542108 -0.087302\nbaz   A -1.575170  0.816482\n      B  1.771208  1.100230\n\nIn [97]: stacked.unstack(0)\nOut[97]: \nfirst          bar       baz\nsecond                      \none    A  0.029399 -1.575170\n       B -0.542108  1.771208\ntwo    A  0.282696  0.816482\n       B -0.087302  1.100230<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"pivot-tables\">\n<h3>Pivot Tables<\/h3>\n<p>See the section on\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/reshaping.html#reshaping-pivot\"><em>Pivot Tables<\/em><\/a>.<\/p>\n<div>\n<div>\n<pre>In [98]: df = DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,\n   ....:                 'B' : ['A', 'B', 'C'] * 4,\n   ....:                 'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,\n   ....:                 'D' : np.random.randn(12),\n   ....:                 'E' : np.random.randn(12)})\n   ....:\n\nIn [99]: df\nOut[99]: \n        A  B    C         D         E\n0     one  A  foo  1.418757 -0.179666\n1     one  B  foo -1.879024  1.291836\n2     two  C  foo  0.536826 -0.009614\n3   three  A  bar  1.006160  0.392149\n4     one  B  bar -0.029716  0.264599\n5     one  C  bar -1.146178 -0.057409\n6     two  A  foo  0.100900 -1.425638\n7   three  B  foo -1.035018  1.024098\n8     one  C  foo  0.314665 -0.106062\n9     one  A  bar -0.773723  1.824375\n10    two  B  bar -1.170653  0.595974\n11  three  C  bar  0.648740  1.167115<\/pre>\n<\/div>\n<\/div>\n<p>We can produce pivot tables from this data very easily:<\/p>\n<div>\n<div>\n<pre>In [100]: pivot_table(df, values='D', rows=['A', 'B'], cols=['C'])\nOut[100]: \nC             bar       foo\nA     B                    \none   A -0.773723  1.418757\n      B -0.029716 -1.879024\n      C -1.146178  0.314665\nthree A  1.006160       NaN\n      B       NaN -1.035018\n      C  0.648740       NaN\ntwo   A       NaN  0.100900\n      B -1.170653       NaN\n      C       NaN  0.536826<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"time-series\">\n<h2>Time Series<\/h2>\n<p>Pandas has simple, powerful, and efficient functionality for performing resampling operations during frequency conversion (e.g., converting secondly data into 5-minutely data). This is extremely common in, but not limited to, financial applications. See the\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/timeseries.html#timeseries\"><em>Time Series section<\/em><\/a><\/p>\n<div>\n<div>\n<pre>In [101]: rng = pd.date_range('1\/1\/2012', periods=100, freq='S')\n\nIn [102]: ts = pd.Series(randint(0, 500, len(rng)), index=rng)\n\nIn [103]: ts.resample('5Min', how='sum')\nOut[103]: \n2012-01-01    25083\nFreq: 5T, dtype: int64<\/pre>\n<\/div>\n<\/div>\n<p>Time zone representation<\/p>\n<div>\n<div>\n<pre>In [104]: rng = pd.date_range('3\/6\/2012 00:00', periods=5, freq='D')\n\nIn [105]: ts = pd.Series(randn(len(rng)), rng)\n\nIn [106]: ts_utc = ts.tz_localize('UTC')\n\nIn [107]: ts_utc\nOut[107]: \n2012-03-06 00:00:00+00:00    0.464000\n2012-03-07 00:00:00+00:00    0.227371\n2012-03-08 00:00:00+00:00   -0.496922\n2012-03-09 00:00:00+00:00    0.306389\n2012-03-10 00:00:00+00:00   -2.290613\nFreq: D, dtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>Convert to another time zone<\/p>\n<div>\n<div>\n<pre>In [108]: ts_utc.tz_convert('US\/Eastern')\nOut[108]: \n2012-03-05 19:00:00-05:00    0.464000\n2012-03-06 19:00:00-05:00    0.227371\n2012-03-07 19:00:00-05:00   -0.496922\n2012-03-08 19:00:00-05:00    0.306389\n2012-03-09 19:00:00-05:00   -2.290613\nFreq: D, dtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>Converting between time span representations<\/p>\n<div>\n<div>\n<pre>In [109]: rng = pd.date_range('1\/1\/2012', periods=5, freq='M')\n\nIn [110]: ts = pd.Series(randn(len(rng)), index=rng)\n\nIn [111]: ts\nOut[111]: \n2012-01-31   -1.134623\n2012-02-29   -1.561819\n2012-03-31   -0.260838\n2012-04-30    0.281957\n2012-05-31    1.523962\nFreq: M, dtype: float64\n\nIn [112]: ps = ts.to_period()\n\nIn [113]: ps\nOut[113]: \n2012-01   -1.134623\n2012-02   -1.561819\n2012-03   -0.260838\n2012-04    0.281957\n2012-05    1.523962\nFreq: M, dtype: float64\n\nIn [114]: ps.to_timestamp()\nOut[114]: \n2012-01-01   -1.134623\n2012-02-01   -1.561819\n2012-03-01   -0.260838\n2012-04-01    0.281957\n2012-05-01    1.523962\nFreq: MS, dtype: float64<\/pre>\n<\/div>\n<\/div>\n<p>Converting between period and timestamp enables some convenient arithmetic functions to be used. In the following example, we convert a quarterly frequency with year ending in November to 9am of the end of the month following the quarter end:<\/p>\n<div>\n<div>\n<pre>In [115]: prng = period_range('1990Q1', '2000Q4', freq='Q-NOV')\n\nIn [116]: ts = Series(randn(len(prng)), prng)\n\nIn [117]: ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9\n\nIn [118]: ts.head()\nOut[118]: \n1990-03-01 09:00   -0.902937\n1990-06-01 09:00    0.068159\n1990-09-01 09:00   -0.057873\n1990-12-01 09:00   -0.368204\n1991-03-01 09:00   -1.144073\nFreq: H, dtype: float64<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"plotting\">\n<h2>Plotting<\/h2>\n<p><a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/visualization.html#visualization\"><em>Plotting<\/em><\/a>\u00a0docs.<\/p>\n<div>\n<div>\n<pre>In [119]: ts = pd.Series(randn(1000), index=pd.date_range('1\/1\/2000', periods=1000))\n\nIn [120]: ts = ts.cumsum()\n\nIn [121]: ts.plot()\nOut[121]: &lt;matplotlib.axes.AxesSubplot at 0x43c13d0&gt;<\/pre>\n<\/div>\n<\/div>\n<p><img decoding=\"async\" alt=\"_images\/series_plot_basic.png\" src=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/_images\/series_plot_basic.png\" \/>On DataFrame,\u00a0<tt>plot<\/tt>\u00a0is a convenience to plot all of the columns with labels:<\/p>\n<div>\n<div>\n<pre>In [122]: df = pd.DataFrame(randn(1000, 4), index=ts.index,\n   .....:                   columns=['A', 'B', 'C', 'D'])\n   .....:\n\nIn [123]: df = df.cumsum()\n\nIn [124]: plt.figure(); df.plot(); plt.legend(loc='best')\nOut[124]: &lt;matplotlib.legend.Legend at 0x5857e90&gt;<\/pre>\n<\/div>\n<\/div>\n<p><img decoding=\"async\" alt=\"_images\/frame_plot_basic.png\" src=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/_images\/frame_plot_basic.png\" \/><\/div>\n<div id=\"getting-data-in-out\">\n<h2>Getting Data In\/Out<\/h2>\n<div id=\"csv\">\n<h3>CSV<\/h3>\n<p><a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/io.html#io-store-in-csv\"><em>Writing to a csv file<\/em><\/a><\/p>\n<div>\n<div>\n<pre>In [125]: df.to_csv('foo.csv')<\/pre>\n<\/div>\n<\/div>\n<p><a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/io.html#io-read-csv-table\"><em>Reading from a csv file<\/em><\/a><\/p>\n<div>\n<div>\n<pre>In [126]: pd.read_csv('foo.csv')\nOut[126]: \n&lt;class 'pandas.core.frame.DataFrame'&gt;\nInt64Index: 1000 entries, 0 to 999\nData columns (total 5 columns):\nUnnamed: 0    1000  non-null values\nA             1000  non-null values\nB             1000  non-null values\nC             1000  non-null values\nD             1000  non-null values\ndtypes: float64(4), object(1)<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"hdf5\">\n<h3>HDF5<\/h3>\n<p>Reading and writing to\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/io.html#io-hdf5\"><em>HDFStores<\/em><\/a><\/p>\n<p>Writing to a HDF5 Store<\/p>\n<div>\n<div>\n<pre>In [127]: store = pd.HDFStore('foo.h5')\n\nIn [128]: store['df'] = df<\/pre>\n<\/div>\n<\/div>\n<p>Reading from a HDF5 Store<\/p>\n<div>\n<div>\n<pre>In [129]: store['df']\nOut[129]: \n&lt;class 'pandas.core.frame.DataFrame'&gt;\nDatetimeIndex: 1000 entries, 2000-01-01 00:00:00 to 2002-09-26 00:00:00\nFreq: D\nData columns (total 4 columns):\nA    1000  non-null values\nB    1000  non-null values\nC    1000  non-null values\nD    1000  non-null values\ndtypes: float64(4)<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"excel\">\n<h3>Excel<\/h3>\n<p>Reading and writing to\u00a0<a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/dev\/io.html#io-excel\"><em>MS Excel<\/em><\/a><\/p>\n<p>Writing to an excel file<\/p>\n<div>\n<div>\n<pre>In [130]: df.to_excel('foo.xlsx', sheet_name='sheet1')<\/pre>\n<\/div>\n<\/div>\n<p>Reading from an excel file<\/p>\n<div>\n<div>\n<pre>In [131]: xls = ExcelFile('foo.xlsx')\n\nIn [132]: xls.parse('sheet1', index_col=None, na_values=['NA'])\nOut[132]: \n&lt;class 'pandas.core.frame.DataFrame'&gt;\nDatetimeIndex: 1000 entries, 2000-01-01 00:00:00 to 2002-09-26 00:00:00\nData columns (total 4 columns):\nA    1000  non-null values\nB    1000  non-null values\nC    1000  non-null values\nD    1000  non-null values\ndtypes: float64(4)<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>This is a short introduction to pandas, geared mainly for new users. Customarily, we import as follows In [1]: import pandas as pd In [2]:&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[],"class_list":["post-131","post","type-post","status-publish","format-standard","hentry","category-python"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/131","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=131"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/131\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=131"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=131"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=131"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}