{"id":157,"date":"2013-04-01T23:28:45","date_gmt":"2013-04-02T04:28:45","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=157"},"modified":"2013-04-01T23:28:45","modified_gmt":"2013-04-02T04:28:45","slug":"manipulating-excel-files-using-python-part-1-reading-excel-files","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2013\/04\/01\/manipulating-excel-files-using-python-part-1-reading-excel-files\/","title":{"rendered":"Manipulating Excel files using Python part 1: Reading Excel Files"},"content":{"rendered":"<p>It is often the case that the freely available data online are in Excel format. If one has Excel, then one has the ability to do some sort of basic manipulation of the files. But if Excel is not available, or your analysis software does not read Excel files, there is another way: use Python to manipulate Excel files.<\/p>\n<p><img decoding=\"async\" title=\"excelPreview\" alt=\"excelPreview\" src=\"http:\/\/michalisavraam.org\/wp-content\/uploads\/2009\/06\/excelPreview.png\" width=\"461\" \/>Before continuing, let\u2019s discuss the basic idea of Excel. This programs stores data in what is called a<strong>Workbook<\/strong>. Think of this as the file on your computer (the .xls). A Workbook can then have one or more\u00a0<strong>Sheets<\/strong>, the little tabs on the bottom left corner usually. Within each Sheet, you can find an arrangement of\u00a0<strong>Cells<\/strong>\u00a0in a matrix form, which are referenced by Column Name and Row Number (so the top left cell is A1, the one directly to its right is A2, the one below is B2, etc. The image should help shed some light if you are not familiar with the concept of spreadsheets.<\/p>\n<p>In order to\u00a0<strong>read<\/strong>\u00a0Excel sheets in Python without using any Microsoft Office code, we need to use the excellent<a title=\"xlrd Package Website\" href=\"http:\/\/www.lexicon.net\/sjmachin\/xlrd.htm\">xlrd packge<\/a>\u00a0by\u00a0<a title=\"Lingfo Website\" href=\"http:\/\/www.lexicon.net\/sjmachin\/\">Lingfo<\/a>. The basic workflow is as follows:<\/p>\n<ol>\n<li>Open a workbook using\u00a0<em>book = xlrd.open_workbook(\u201cfiletoread.xls\u201d)<\/em><\/li>\n<li>Access a sheet in the workbook using\u00a0<em>sheet = book.sheet_by_index(0)<\/em>\u00a0to open the first sheet, or if you know the name,\u00a0<em>sheet = book.sheet_by_name(\u201cName of Sheet\u201d)<\/em><\/li>\n<li>Retrieve data using any of the multiple methods:\u00a0<em>cellValue =sheet.cell_value(rowx, colx)<\/em>, or to retrieve a whole column use\u00a0<em>colValues = sheet.col_values(colx, start_rowx=0, end_rowx=None)<\/em>\u00a0or the retrieve a whole row use\u00a0<em>rowValues = sheet.row_values(rowx, start_colx=0, end_colx=None)<\/em><\/li>\n<\/ol>\n<p>The basic idea is very simple, and we can easily write a little Python script that will allow us to read Excel files without the need to use Excel itself. The following example will offer a preview of an Excel file\u2019s first sheet using the first 5 rows and columns of a file.<\/p>\n<div>\n<div id=\"highlighter_613572\">\n<div><a href=\"http:\/\/michalisavraam.org\/2009\/06\/manipulating-excel-files-using-python-part-1-reading-files\/#\">?<\/a><\/div>\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>\n<div>1<\/div>\n<div>2<\/div>\n<div>3<\/div>\n<div>4<\/div>\n<div>5<\/div>\n<div>6<\/div>\n<div>7<\/div>\n<div>8<\/div>\n<div>9<\/div>\n<div>10<\/div>\n<\/td>\n<td>\n<div>\n<div><code>import<\/code> <code>xlrd <\/code><code># Import the package<\/code><\/div>\n<div><code>book <\/code><code>=<\/code> <code>xlrd.open_workbook(&amp;amp;amp;quot;sample.xls&amp;amp;amp;quot;) <\/code><code># Open an .xls file<\/code><\/div>\n<div><code>sheet <\/code><code>=<\/code> <code>book.sheet_by_index(<\/code><code>0<\/code><code>) <\/code><code># Get the first sheet<\/code><\/div>\n<div><code>for<\/code> <code>counter <\/code><code>in<\/code> <code>range<\/code><code>(<\/code><code>5<\/code><code>): <\/code><code># Loop for five times<\/code><\/div>\n<div><code># grab the current row<\/code><\/div>\n<div><code>rowValues <\/code><code>=<\/code> <code>sheet.row_values(counter,start_col<\/code><code>=<\/code><code>0<\/code><code>, end_colx<\/code><code>=<\/code><code>4<\/code><code>)<\/code><\/div>\n<div><code># Print the values of the row formatted to 10 characters wide<\/code><\/div>\n<div><code>print<\/code> <code>&amp;amp;amp;quot;<\/code><code>%<\/code><code>-<\/code><code>10s<\/code> <code>| <\/code><code>%<\/code><code>-<\/code><code>10s<\/code> <code>| <\/code><code>%<\/code><code>-<\/code><code>10s<\/code> <code>| <\/code><code>%<\/code><code>-<\/code><code>10s<\/code> <code>| <\/code><code>%<\/code><code>-<\/code><code>10s<\/code><code>&amp;amp;amp;quot; <\/code><code>%<\/code> <code>tuple<\/code><code>(rowValues)<\/code><\/div>\n<div><code># Print row separator<\/code><\/div>\n<div><code>print<\/code> <code>&amp;amp;amp;quot;<\/code><code>-<\/code><code>&amp;amp;amp;quot; <\/code><code>*<\/code><code>62<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p>That is all there is to it. Quick, fast and no need for Excel whatsoever. Sample output is presented below:<\/p>\n<p><img decoding=\"async\" title=\"sampleOutput\" alt=\"sampleOutput\" src=\"http:\/\/michalisavraam.org\/wp-content\/uploads\/2009\/06\/sampleOutput.png\" width=\"585\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>It is often the case that the freely available data online are in Excel format. If one has Excel, then one has the ability to&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[],"class_list":["post-157","post","type-post","status-publish","format-standard","hentry","category-python"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/157","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=157"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/157\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=157"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=157"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=157"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}