stri_duplicated: Determine Duplicated Elements¶
Description¶
stri_duplicated()
determines which strings in a character vector are duplicates of other elements.
stri_duplicated_any()
determines if there are any duplicated strings in a character vector.
Usage¶
stri_duplicated(
str,
from_last = FALSE,
fromLast = from_last,
...,
opts_collator = NULL
)
stri_duplicated_any(
str,
from_last = FALSE,
fromLast = from_last,
...,
opts_collator = NULL
)
Arguments¶
|
a character vector |
|
a single logical value; indicates whether search should be performed from the last to the first string |
|
[DEPRECATED] alias of |
|
additional settings for |
|
a named list with ICU Collator’s options, see stri_opts_collator, |
Details¶
Missing values are regarded as equal.
Unlike duplicated
and anyDuplicated
, these functions test for canonical equivalence of strings (and not whether the strings are just bytewise equal) Such operations are locale-dependent. Hence, stri_duplicated
and stri_duplicated_any
are significantly slower (but much better suited for natural language processing) than their base R counterparts.
See also stri_unique for extracting unique elements.
Value¶
stri_duplicated()
returns a logical vector of the same length as str
. Each of its elements indicates whether a canonically equivalent string was already found in str
.
stri_duplicated_any()
returns a single non-negative integer. Value of 0 indicates that all the elements in str
are unique. Otherwise, it gives the index of the first non-unique element.
References¶
Collation - ICU User Guide, http://userguide.icu-project.org/collation
See Also¶
Other locale_sensitive: %s<%(), about_locale, about_search_boundaries, about_search_coll, stri_compare(), stri_count_boundaries(), stri_enc_detect2(), stri_extract_all_boundaries(), stri_locate_all_boundaries(), stri_opts_collator(), stri_order(), stri_rank(), stri_sort_key(), stri_sort(), stri_split_boundaries(), stri_trans_tolower(), stri_unique(), stri_wrap()
Examples¶
# In the following examples, we have 3 duplicated values,
# 'a' - 2 times, NA - 1 time
stri_duplicated(c('a', 'b', 'a', NA, 'a', NA))
stri_duplicated(c('a', 'b', 'a', NA, 'a', NA), from_last=TRUE)
stri_duplicated_any(c('a', 'b', 'a', NA, 'a', NA))
# compare the results:
stri_duplicated(c('\u0105', stri_trans_nfkd('\u0105')))
duplicated(c('\u0105', stri_trans_nfkd('\u0105')))
stri_duplicated(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1)
duplicated(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'))