{"id":760,"date":"2015-02-16T12:53:56","date_gmt":"2015-02-16T19:53:56","guid":{"rendered":"http:\/\/homepages.uc.edu\/~yaozo\/wordpress\/?p=760"},"modified":"2015-02-16T12:53:56","modified_gmt":"2015-02-16T19:53:56","slug":"r-regular-expression","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2015\/02\/16\/r-regular-expression\/","title":{"rendered":"R Regular Expression"},"content":{"rendered":"<p>R has various functions for regular expression based match and replaces. The <code>grep<\/code>,<code>grepl<\/code>, <code>regexpr<\/code> and <code>gregexpr<\/code> functions are used for searching for matches, while <a href=\"http:\/\/www.endmemo.com\/program\/R\/sub.php\"><code>sub<\/code><\/a> and<a href=\"http:\/\/www.endmemo.com\/program\/R\/gsub.php\"><code>gsub<\/code><\/a> for performing replacement.<\/p>\n<p>&nbsp;<\/p>\n<p>\u2022\u00a0<code>grep(value = FALSE)<\/code> returns an integer vector of the indices of the elements of <code>x<\/code> that yielded a match (or not, for <code>invert = TRUE<\/code>).<\/p>\n<pre class=\"r\">&gt;str &lt;- c(\"Regular\", \"expression\", \"examples of R language\")\n&gt;x &lt;- grep(\"ex\",str,value=F)\n&gt;x\n<\/pre>\n<p>[1] 2 3<\/p>\n<pre class=\"r\">&gt;x &lt;- \"line 4322: He is now 25 years old, and weights 130lbs\";\n&gt;x &lt;- grep(\"\\\\d\",\"\",x)\n&gt;x\n<\/pre>\n<pre class=\"b\">[1] 1\n<\/pre>\n<p>&nbsp;<\/p>\n<p>\u2022\u00a0<code>grep(value = TRUE)<\/code> returns a character vector containing the selected elements of <code>x<\/code>(after coercion, preserving names but no other attributes).<\/p>\n<pre class=\"r\">&gt;x &lt;- grep(\"ex\",str,value=T)\n&gt;x\n<\/pre>\n<p>[1] &#8220;expression&#8221; &#8220;examples of R language&#8221;<\/p>\n<p>\u2022\u00a0<code>grepl<\/code> returns a logical vector (match or not for each element of <code>x<\/code>).<\/p>\n<pre class=\"r\">&gt;x &lt;- grepl(\"ex\",str)\n&gt;x\n[1] FALSE  TRUE  TRUE\n<\/pre>\n<p>&nbsp;<\/p>\n<p>\u2022\u00a0<code>sub<\/code> and <code>gsub<\/code> return a character vector of the same length and with the same attributes as<code>x<\/code> (after possible coercion to character). Elements of character vectors <code>x<\/code> which are not substituted will be returned unchanged (including any declared encoding). If <code>useBytes = FALSE<\/code> a non-ASCII substituted result will often be in UTF-8 with a marked encoding (e.g. if there is a UTF-8 input, and in a multibyte locale unless <code>fixed = TRUE<\/code>).<\/p>\n<pre class=\"r\">&gt;str &lt;- c(\"Regular\", \"expression\", \"examples of R language\")\n&gt;x &lt;- sub(\"x.ress\",\"\",str)\n&gt;x\n<\/pre>\n<p>[1] &#8220;Regular&#8221; &#8220;eion&#8221; &#8220;examples of R language&#8221;<\/p>\n<pre class=\"r\">&gt;x &lt;- sub(\"x.+e\",\"\",str)\n&gt;x\n<\/pre>\n<p>[1] &#8220;Regular&#8221; &#8220;ession&#8221; &#8220;e&#8221;<\/p>\n<pre class=\"r\">&gt;x &lt;- \"line 4322: He is now 25 years old, and weights 130lbs\";\n&gt;x &lt;- gsub(\"[[:digit:]]\",\"\",x)\n&gt;x\n<\/pre>\n<pre class=\"b\">[1] \"line : He is now  years old, and weights lbs\"\n<\/pre>\n<p>&nbsp;<\/p>\n<pre class=\"r\">&gt;x &lt;- \"line 4322: He is now 25 years old, and weights 130lbs\";\n&gt;x &lt;- gsub(\"\\\\d+\",\"\",x)\n&gt;x\n<\/pre>\n<pre class=\"b\">[1] \"line : He is now  years old, and weights lbs\"\n<\/pre>\n<p>&nbsp;<\/p>\n<p>\u2022\u00a0<code>regexpr<\/code> returns an integer vector of the same length as <code>text<\/code> giving the starting position of the first match or <i>-1<\/i> if there is none, with attribute <code>\"match.length\"<\/code>, an integer vector giving the length of the matched text (or <i>-1<\/i> for no match). The match positions and lengths are in characters unless <code>useBytes = TRUE<\/code> is used, when they are in bytes.<\/p>\n<pre class=\"r\">&gt;str &lt;- c(\"Regular\", \"expression\", \"examples of R language\")\n&gt;x &lt;- regexpr(\"x*ress\",str)\n&gt;x\n<\/pre>\n<p>[1] -1 4 -1<\/p>\n<p>\u2022\u00a0<code>gregexpr<\/code> returns a list of the same length as <code>text<\/code> each element of which is of the same form as the return value for <code>regexpr<\/code>, except that the starting positions of every (disjoint) match are given.<\/p>\n<pre class=\"r\">&gt;str &lt;- c(\"Regular\", \"expression\", \"examples of R language\")\n&gt;x &lt;- gregexpr(\"x*ress\",str)\n&gt;x\n<\/pre>\n<pre class=\"b\">[[1]]\n[1] -1\nattr(,\"match.length\")\n[1] -1\nattr(,\"useBytes\")\n[1] TRUE\n\n[[2]]\n[1] 4\nattr(,\"match.length\")\n[1] 4\nattr(,\"useBytes\")\n[1] TRUE\n\n[[3]]\n[1] -1\nattr(,\"match.length\")\n[1] -1\nattr(,\"useBytes\")\n[1] TRUE\n<\/pre>\n<p>&nbsp;<\/p>\n<h3>Function Syntax:<\/h3>\n<pre>\ngrep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,\n     fixed = FALSE, useBytes = FALSE, invert = FALSE)\n\ngrepl(pattern, x, ignore.case = FALSE, perl = FALSE,\n      fixed = FALSE, useBytes = FALSE)\n\nsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,\n    fixed = FALSE, useBytes = FALSE)\n\ngsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,\n     fixed = FALSE, useBytes = FALSE)\n\nregexpr(pattern, text, ignore.case = FALSE, perl = FALSE,\n        fixed = FALSE, useBytes = FALSE)\n\ngregexpr(pattern, text, ignore.case = FALSE, perl = FALSE,\n         fixed = FALSE, useBytes = FALSE)\n\n<\/pre>\n<p>&nbsp;<\/p>\n<h3>Regular Expression Syntax:<\/h3>\n<table class=\"countrytable\" width=\"580\">\n<tbody>\n<tr bgcolor=\"#eeeeee\">\n<td width=\"100\">Syntax<\/td>\n<td width=\"480\">Description<\/td>\n<\/tr>\n<tr>\n<td>\\\\d<\/td>\n<td>Digit, 0,1,2 &#8230; 9<\/td>\n<\/tr>\n<tr>\n<td>\\\\D<\/td>\n<td>Not Digit<\/td>\n<\/tr>\n<tr>\n<td>\\\\s<\/td>\n<td>Space<\/td>\n<\/tr>\n<tr>\n<td>\\\\S<\/td>\n<td>Not Space<\/td>\n<\/tr>\n<tr>\n<td>\\\\w<\/td>\n<td>Word<\/td>\n<\/tr>\n<tr>\n<td>\\\\W<\/td>\n<td>Not Word<\/td>\n<\/tr>\n<tr>\n<td>\\\\t<\/td>\n<td>Tab<\/td>\n<\/tr>\n<tr>\n<td>\\\\n<\/td>\n<td>New line<\/td>\n<\/tr>\n<tr>\n<td>^<\/td>\n<td>Beginning of the string<\/td>\n<\/tr>\n<tr>\n<td>$<\/td>\n<td>End of the string<\/td>\n<\/tr>\n<tr>\n<td>\\<\/td>\n<td>Escape special characters, e.g. \\\\ is &#8220;\\&#8221;, \\+ is &#8220;+&#8221;<\/td>\n<\/tr>\n<tr>\n<td>|<\/td>\n<td>Alternation match. e.g. \/(e|d)n\/ matches &#8220;en&#8221; and &#8220;dn&#8221;<\/td>\n<\/tr>\n<tr>\n<td>\u2022<\/td>\n<td>Any character, except \\n or line terminator<\/td>\n<\/tr>\n<tr>\n<td>[ab]<\/td>\n<td>a or b<\/td>\n<\/tr>\n<tr>\n<td>[^ab]<\/td>\n<td>Any character except a and b<\/td>\n<\/tr>\n<tr>\n<td>[0-9]<\/td>\n<td>All Digit<\/td>\n<\/tr>\n<tr>\n<td>[A-Z]<\/td>\n<td>All uppercase A to Z letters<\/td>\n<\/tr>\n<tr>\n<td>[a-z]<\/td>\n<td>All lowercase a to z letters<\/td>\n<\/tr>\n<tr>\n<td>[A-z]<\/td>\n<td>All Uppercase and lowercase a to z letters<\/td>\n<\/tr>\n<tr>\n<td>i+<\/td>\n<td>i at least one time<\/td>\n<\/tr>\n<tr>\n<td>i*<\/td>\n<td>i zero or more times<\/td>\n<\/tr>\n<tr>\n<td>i?<\/td>\n<td>i zero or 1 time<\/td>\n<\/tr>\n<tr>\n<td>i{n}<\/td>\n<td>i occurs n times in sequence<\/td>\n<\/tr>\n<tr>\n<td>i{n1,n2}<\/td>\n<td>i occurs n1 &#8211; n2 times in sequence<\/td>\n<\/tr>\n<tr>\n<td>i{n1,n2}?<\/td>\n<td>non greedy match, see above example<\/td>\n<\/tr>\n<tr>\n<td>i{n,}<\/td>\n<td>i occures &gt;= n times<\/td>\n<\/tr>\n<tr>\n<td>[:alnum:]<\/td>\n<td>Alphanumeric characters: [:alpha:] and [:digit:]<\/td>\n<\/tr>\n<tr>\n<td>[:alpha:]<\/td>\n<td>Alphabetic characters: [:lower:] and [:upper:]<\/td>\n<\/tr>\n<tr>\n<td>[:blank:]<\/td>\n<td>Blank characters: e.g. space, tab<\/td>\n<\/tr>\n<tr>\n<td>[:cntrl:]<\/td>\n<td>Control characters<\/td>\n<\/tr>\n<tr>\n<td>[:digit:]<\/td>\n<td>Digits: 0 1 2 3 4 5 6 7 8 9<\/td>\n<\/tr>\n<tr>\n<td>[:graph:]<\/td>\n<td>Graphical characters: [:alnum:] and [:punct:]<\/td>\n<\/tr>\n<tr>\n<td>[:lower:]<\/td>\n<td>Lower-case letters in the current locale<\/td>\n<\/tr>\n<tr>\n<td>[:print:]<\/td>\n<td>Printable characters: [:alnum:], [:punct:] and space<\/td>\n<\/tr>\n<tr>\n<td>[:punct:]<\/td>\n<td>Punctuation character: ! &#8221; # $ % &amp; &#8216; ( ) * + , &#8211; . \/ : ; &lt; = &gt; ? @ [ \\ ] ^ _ ` { | } ~<\/td>\n<\/tr>\n<tr>\n<td>[:space:]<\/td>\n<td>Space characters: tab, newline, vertical tab, form feed, carriage return, space<\/td>\n<\/tr>\n<tr>\n<td>[:upper:]<\/td>\n<td>Upper-case letters in the current locale<\/td>\n<\/tr>\n<tr>\n<td>[:xdigit:]<\/td>\n<td>Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>R has various functions for regular expression based match and replaces. The grep,grepl, regexpr and gregexpr functions are used for searching for matches, while sub&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-760","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/760","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=760"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/760\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=760"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=760"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=760"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}