Extract substrings defined by regular expressions from a vector of strings

extractSubstring(pattern, x, index, stringsAsFactors = FALSE)

Arguments

pattern

regular expression containing parts in pairs of opening and closing parentheses defining the part(s) to be extracted

x

vector of character strings

index

index(es) of parenthesized subexpression(s) to be extracted. If the length of x is greater than one a data frame is returned with each column containing the substrings matching the subexpression at the corresponding index. If index is named, the names will be used as column names.

stringsAsFactors

if TRUE (default is FALSE) and a data frame is returned then the columns in the returned data frame are of factors, otherwise vectors of character.

Examples

# Define pattern matching a date pattern <- "([^ ]+), ([0-9]+) of ([^ ]+)" # Extract single sub expressions from one string datestring <- "Thursday, 8 of December" extractSubstring(pattern, datestring, 1) # ""Thursday""
#> [1] "Thursday"
extractSubstring(pattern, datestring, 2) # "8"
#> [1] "8"
extractSubstring(pattern, datestring, 3) # "December"
#> [1] "December"
# Extract single sub expressions from a vector of strings datestrings <- c("Thursday, 8 of December", "Tuesday, 14 of January") extractSubstring(pattern, datestrings, 1) # "Thursday" "Tuesday"
#> [1] "Thursday" "Tuesday"
extractSubstring(pattern, datestrings, 2) # "8" "14"
#> [1] "8" "14"
extractSubstring(pattern, datestrings, 3) # "December" "January"
#> [1] "December" "January"
# Extract more than one subexpression at once -> data.frame extractSubstring(pattern, datestrings, 1:3)
#> subexp.1 subexp.2 subexp.3 #> 1 Thursday 8 December #> 2 Tuesday 14 January
# subexp.1 subexp.2 subexp.3 # 1 Thursday 8 December # 2 Tuesday 14 January # Name the sub expressions by naming their number in index (3rd argument) extractSubstring(pattern, datestrings, index = c(weekday = 1, 2, month = 3))
#> weekday subexp.2 month #> 1 Thursday 8 December #> 2 Tuesday 14 January
# weekday subexp.2 month # 1 Thursday 8 December # 2 Tuesday 14 January