Extract substrings defined by regular expressions from a vector of strings
extractSubstring(pattern, x, index, stringsAsFactors = FALSE)
pattern | regular expression containing parts in pairs of opening and closing parentheses defining the part(s) to be extracted |
---|---|
x | vector of character strings |
index | index(es) of parenthesized subexpression(s) to be extracted. If
the length of |
stringsAsFactors | if |
# Define pattern matching a date pattern <- "([^ ]+), ([0-9]+) of ([^ ]+)" # Extract single sub expressions from one string datestring <- "Thursday, 8 of December" extractSubstring(pattern, datestring, 1) # ""Thursday""#> [1] "Thursday"extractSubstring(pattern, datestring, 2) # "8"#> [1] "8"extractSubstring(pattern, datestring, 3) # "December"#> [1] "December"# Extract single sub expressions from a vector of strings datestrings <- c("Thursday, 8 of December", "Tuesday, 14 of January") extractSubstring(pattern, datestrings, 1) # "Thursday" "Tuesday"#> [1] "Thursday" "Tuesday"extractSubstring(pattern, datestrings, 2) # "8" "14"#> [1] "8" "14"extractSubstring(pattern, datestrings, 3) # "December" "January"#> [1] "December" "January"# Extract more than one subexpression at once -> data.frame extractSubstring(pattern, datestrings, 1:3)#> subexp.1 subexp.2 subexp.3 #> 1 Thursday 8 December #> 2 Tuesday 14 January# subexp.1 subexp.2 subexp.3 # 1 Thursday 8 December # 2 Tuesday 14 January # Name the sub expressions by naming their number in index (3rd argument) extractSubstring(pattern, datestrings, index = c(weekday = 1, 2, month = 3))#> weekday subexp.2 month #> 1 Thursday 8 December #> 2 Tuesday 14 January# weekday subexp.2 month # 1 Thursday 8 December # 2 Tuesday 14 January