{"id":341,"date":"2013-11-25T14:35:23","date_gmt":"2013-11-25T19:35:23","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=341"},"modified":"2013-11-25T14:35:23","modified_gmt":"2013-11-25T19:35:23","slug":"rmysql-examples","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2013\/11\/25\/rmysql-examples\/","title":{"rendered":"RMySQL Examples"},"content":{"rendered":"<h3>Definitions &#8211; Databases, Queries, Relational Databases, SQL, MySQL<\/h3>\n<p>A database is a structured collection of records or data that is stored in a computer system. A database relies upon software to organize the storage of data.<\/p>\n<p>A query is a line of text that a user sends to a database telling it what you want it to do. It is usually written in SQL (Structured query language).<\/p>\n<p>Example of SQL query:<\/p>\n<pre>select * from books where price &lt; 20.00 order by title;<\/pre>\n<p>The term &#8220;relational database&#8221; comes from &#8220;Relation&#8221;, a mathematical term loosely meaning &#8220;table&#8221;, and thus &#8220;relational&#8221; roughly means &#8220;based on tables&#8221;.<\/p>\n<p>MySQL is a popular database management system within which you can set up and query databases. Many web applications use databases. Youtube, flickr, and wikipedia are all built on MySQL databases. A lot of biological databases (e.g. GO, ensembl) are already available in mysql database tables, or are easily input into a database for easy querying.<\/p>\n<p>For example, lets say I have a database table with yeast genes and some information about them (created by parsing a text file from SGD):<\/p>\n<p>&nbsp;<\/p>\n<pre>Here's the first 5 rows of the table, as an example.\n\nmysql&gt; select id,chrom,start,end,strand,sys_id,gene_name from yeast_genes_07 limit 5;\n+----+-------+-------+------+--------+-----------+-----------+\n| id | chrom | start | end  | strand | sys_id    | gene_name |\n+----+-------+-------+------+--------+-----------+-----------+\n|  1 | 1     |   335 |  649 | W      | YAL069W   | NULL      |\n|  2 | 1     |   538 |  792 | W      | YAL068W-A | NULL      |\n|  3 | 1     |  1807 | 2169 | C      | YAL068C   | NULL      |\n|  4 | 1     |  2480 | 2707 | W      | YAL067W-A | NULL      |\n|  5 | 1     |  7236 | 9017 | C      | YAL067C   | SEO1      |\n+----+-------+-------+------+--------+-----------+-----------+\n5 rows in set (0.00 sec)<\/pre>\n<p>Using a database, I can slice and dice the information in this table using almost-english queries like this:<\/p>\n<p>&nbsp;<\/p>\n<pre>select * from yeast_genes_07 where strand = 'W' and start &lt; 5000 and chrom = 5;<\/pre>\n<p>which means &#8211; show me all the information about genes whose strand is W, start is less than 5000, and are on chromosome 5.<\/p>\n<p>We have a bunch of databases here at stowers. Additionally, MySQL is installed by default on pretty much any linux system I&#8217;ve ever run into.<\/p>\n<p>&nbsp;<\/p>\n<h3>RMySQL<\/h3>\n<p>RMySQL is a little tricky to install, depending on your version of R. Here&#8217;s some places to look for more info if you have trouble:<a href=\"http:\/\/www.phaget4.org\/R\/R_notes.html\">http:\/\/www.phaget4.org\/R\/R_notes.html<\/a><br \/>\n<a href=\"http:\/\/tolstoy.newcastle.edu.au\/R\/e2\/help\/07\/10\/28442.html\">http:\/\/tolstoy.newcastle.edu.au\/R\/e2\/help\/07\/10\/28442.html<\/a><\/p>\n<p>I currently have it installed in R 2.4.0, but couldn&#8217;t get it to work in 2.5.1.<\/p>\n<p>Download from:<br \/>\n<a href=\"http:\/\/cran.r-project.org\/web\/packages\/RMySQL\/index.html\">CRAN (non-windows only)<\/a><br \/>\n<a href=\"http:\/\/www.stats.ox.ac.uk\/pub\/RWin\/bin\/windows\/contrib\/2.6\/\">Brian Ripley, windows versions<\/a><\/p>\n<p>RMySQL allows you to access MySQL databases from within R.<\/p>\n<p>Why would you want to do this?<\/p>\n<p>Because writing SQL queries is intuitive and powerful. And a database is a good place for information that you want to access in different, flexible ways.<\/p>\n<p>&nbsp;<\/p>\n<h2>RMySQL Examples<\/h2>\n<p>This is actually how I most often use RMySQL, just to slurp in a whole database table at once. I do this because I like storing information in databases so it is easy to access from R, perl, php, or within mysql itself &#8211; Once a data set is in the database, I don&#8217;t have to worry about parsing a file anytime I want to look at it, I just grab it from the database without worry about delimiters or column order.<\/p>\n<p>&nbsp;<\/p>\n<pre>library(DBI)\n\nlibrary(RMySQL)\n\nm&lt;-dbDriver(\"MySQL\");\ncon&lt;-dbConnect(m,user='mcm',password='welcome323',host='omega',dbname='yeast');\nres&lt;-dbSendQuery(con, \"select * from yeast_genes_07\")\ngenes&lt;- fetch(res, n = -1)<\/pre>\n<p>You can do whatever mysql query you want, though. You can grab only the parts of the table you&#8217;re interested in based on different column values.<\/p>\n<pre>res&lt;-dbSendQuery(con, \"select * from yeast_genes_07 where description like \\\"%phosphatase%\\\"\")\nphosphatase_genes&lt;- fetch(res, n = -1)\n\nres&lt;-dbSendQuery(con, \"select chrom,start,end,sys_id,strand from yeast_genes_07 where strand = \\\"C\\\" and chrom = 1\")\n\nc1_genes&lt;- fetch(res, n = -1)<\/pre>\n<p>&nbsp;<\/p>\n<h3>RODBC<\/h3>\n<p>You can use the RODBC package to connect to an excel file or access database. Download from:\u00a0<a href=\"http:\/\/cran.r-project.org\/web\/packages\/RODBC\/index.html\">http:\/\/cran.r-project.org\/web\/packages\/RODBC\/index.html<\/a><\/p>\n<p>&nbsp;<\/p>\n<h2>RODBC-Excel example<\/h2>\n<p>Warning &#8211; the excel file has to have a pretty regular format or lots of glitches will pop up.<\/p>\n<pre># efg, 6 August 2007\n# modified by mcm, 4\/14\/2008\n\nlibrary(RODBC)\n\nsetwd(\"U:\/mcm\/presentations\/RDisc\/db\/\");\nfilename &lt;-  \"yeast_genes_07.xls\"\nchannel &lt;- odbcConnectExcel(filename)\nsqlTables(channel)\nd &lt;- sqlFetch(channel,\"yeast_genes_07\", as.is=TRUE) #this is the sheet name\nodbcClose(channel)\nnames(d)\n\n# Any SQL query can be used.  (Note \"$\" on sheet name is needed here)\nchannel &lt;- odbcConnectExcel(filename)\n\n#pick some columns from the table\n\ns &lt;- sqlQuery(channel, \"select chrom,start,end,sys_id from [yeast_genes_07$]\" )\ns\n\n#pick only chromosome 3\ns &lt;- sqlQuery(channel, \"select chrom,start,end,sys_id from [yeast_genes_07$] where chrom = 3\" )\ns\n\n#chromosome 3 and strand W\ns &lt;- sqlQuery(channel, \"select chrom,start,end,sys_id from [yeast_genes_07$] where chrom = 3 and strand = \\'W\\' \" )\ns\n\n#usually, if you want to limit the number of rows you retrieve, you use limit. This should work, but doesn't.\ns &lt;- sqlQuery(channel, \"select chrom,start,end,sys_id from [yeast_genes_07$] where chrom = 3 and strand = \\'W\\' limit 5\")\n\ns\n\n#if you need to limit the number of rows you retrieve with RODBC, use the max argument.\ns &lt;- sqlQuery(channel, \"select chrom,start,end,sys_id from [yeast_genes_07$] where chrom = 3 and strand = \\'W\\'\" ,max=5)\ns\n\nodbcClose(channel)<\/pre>\n<p>&nbsp;<\/p>\n<h2>RODBC-Access example<\/h2>\n<p>&nbsp;<\/p>\n<pre>library(RODBC)\nfilename &lt;- \"S:\/Bioinformatics\/Software\/Affy\/Data\/Mouse430_2\/Mouse430_2.mdb\"\nconnection &lt;- odbcConnectAccess(filename)\n\nsqlTables(connection)\n\n# fetch the whole table (fairly slow)\ncodes  &lt;- sqlFetch(connection,\"Annotation\")\n\n# fetch only ProbeIDs, GeneID and Gene title (much faster) \nquery &lt;- \"select [Probe Set ID], [Gene Title], [Gene Symbol] from Annotation\" \ncodes &lt;- sqlQuery(connection, query)\nodbcClose(connection)\n\nfilename &lt;- \"S:\/Bioinformatics\/Software\/Affy\/Data\/MG_U74Av2\/MG_U74Av2.mdb\"\nconnection &lt;- odbcConnectAccess(filename)\n\nsqlTables(connection)\n\nquery &lt;- \"SELECT [Probe Set ID], [Gene Title], [Gene Symbol] FROM Annotation\" \nd2 &lt;- sqlQuery(connection, query)\ndim(d2)\n\nquery &lt;- \"SELECT [Probe Set ID], [Gene Title], [Gene Symbol] FROM Annotation WHERE [Gene Symbol] = 'Tfrc'\" \nd3 &lt;- sqlQuery(connection, query, na.strings=\"NA\", as.is=TRUE)\nd3\n\n# Use \"%\" as a wildcard, not the \"*\" allowed in Access\nquery &lt;- \"SELECT [Probe Set ID], [Gene Symbol], [RefSeq Transcript ID]\n\n         FROM   Annotation\n         WHERE  ([Gene Symbol] LIKE 'Tf%') AND ([RefSeq Transcript ID] LIKE 'NM_%')\n         ORDER BY [Gene Symbol]\"\nd4 &lt;- sqlQuery(connection, query, na.strings=\"NA\", as.is=TRUE)\nd4\n\nodbcClose(connection)<\/pre>\n<p>&nbsp;<\/p>\n<h2>Earl&#8217;s RODBC notes<\/h2>\n<p>&nbsp;<\/p>\n<pre>DO NOT USE RODBC with Excel files unless the data are quite regular - no holes.  \n\nAny character fields mixed with numeric fileds are likely to be changed to NAs.    \n\nFirst row of Excel spreadsheet MUST have column headings.\n\n[http:\/\/www.omegahat.org\/RDCOMClient\/ RDCOMClient] is a package you can alternatively use to connect to an excel file, but it's not SQL-based. You can also save an excel file as text, tab delimited and then read it into R with the read.table command.<\/pre>\n<p>&nbsp;<\/p>\n<h2>R-database resources<\/h2>\n<p><a href=\"http:\/\/cran.r-project.org\/doc\/manuals\/R-data.pdf\">R data import and export manual<\/a>\u00a0<a href=\"https:\/\/stat.ethz.ch\/mailman\/listinfo\/r-sig-db\">R database mailing list<\/a>\u00a0(or google with term r-sig-db).<\/p>\n<p>&nbsp;<\/p>\n<h2>Other MySQL resources<\/h2>\n<p>Pronunciation: &#8220;MySQL&#8221; is officially pronounced My Ess Q ell, not &#8220;My sequel&#8221;. Just FYI.<br \/>\n<a href=\"http:\/\/www.mysql.com\/\">mysql.com<\/a><br \/>\n<a href=\"http:\/\/dev.mysql.com\/doc\/\">manuals<\/a><br \/>\n<a href=\"http:\/\/www.mysql.com\/products\/tools\/query-browser\/\">MySQL query browser (easy mysql interface)<\/a><br \/>\n<a href=\"http:\/\/www.ilovejackdaniels.com\/mysql_cheat_sheet.pdf\">mysql cheat sheet<\/a><br \/>\n<a href=\"http:\/\/www.nparikh.org\/unix\/mysql.php\">another cheat sheet<\/a><br \/>\n<a href=\"http:\/\/www.pantz.org\/software\/mysql\/mysqlcommands.html\">and some more mysql queries<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Definitions &#8211; Databases, Queries, Relational Databases, SQL, MySQL A database is a structured collection of records or data that is stored in a computer system.&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-341","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/341","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=341"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/341\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=341"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=341"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=341"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}