This function finds the element indices of partial matching or similar strings in a character vector. Can be used to find exact or slightly mistyped elements in a string vector.

str_pos(search.string, find.term, maxdist = 2, part.dist.match = 0,
  show.pbar = FALSE)

Arguments

search.string

Character vector with string elements.

find.term

String that should be matched against the elements of search.string.

maxdist

Maximum distance between two string elements, which is allowed to treat them as similar or equal. Smaller values mean less tolerance in matching.

part.dist.match

Activates similar matching (close distance strings) for parts (substrings) of the search.string. Following values are accepted:

  • 0 for no partial distance matching

  • 1 for one-step matching, which means, only substrings of same length as find.term are extracted from search.string matching

  • 2 for two-step matching, which means, substrings of same length as find.term as well as strings with a slightly wider range are extracted from search.string matching

Default value is 0. See 'Details' for more information.

show.pbar

Logical; f TRUE, the progress bar is displayed when computing the distance matrix. Default in FALSE, hence the bar is hidden.

Value

A numeric vector with index position of elements in search.string that partially match or are similar to find.term. Returns -1 if no match was found.

Details

For part.dist.match = 1, a substring of length(find.term) is extracted from search.string, starting at position 0 in search.string until the end of search.string is reached. Each substring is matched against find.term, and results with a maximum distance of maxdist are considered as "matching". If part.dist.match = 2, the range of the extracted substring is increased by 2, i.e. the extracted substring is two chars longer and so on.

Note

This function does not return the position of a matching string inside another string, but the element's index of the search.string vector, where a (partial) match with find.term was found. Thus, searching for "abc" in a string "this is abc" will not return 9 (the start position of the substring), but 1 (the element index, which is always 1 if search.string only has one element).

See also

Examples

# NOT RUN {
string <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic")
str_pos(string, "hel")   # partial match
str_pos(string, "stem")  # partial match
str_pos(string, "R")     # no match
str_pos(string, "saste") # similarity to "System"

# finds two indices, because partial matching now
# also applies to "Systemic"
str_pos(string,
        "sytsme",
        part.dist.match = 1)

# finds nothing
str_pos("We are Sex Pistols!", "postils")
# finds partial matching of similarity
str_pos("We are Sex Pistols!", "postils", part.dist.match = 1)
# }