nzchar()
– returns a logical vector;
determines whether a string is NOT empty
Note that missing values are not handled properly.
!nzchar(c("", "not empty", NA))
## [1] TRUE FALSE FALSE
You may use the following, keeping in mind performance issues, especially for UTF-8-encoded strings:
(str_length(c("", "not empty", NA)) == 0)
## [1] TRUE FALSE NA
stri_isempty()
– handles missing values properly.
stri_isempty(c("", "not empty", NA))
## [1] TRUE FALSE NA
test1 <- rep(c("", "not empty", NA), 100) microbenchmark(nzchar(test1), str_length(test1) == 0, stri_isempty(test1))
## Unit: nanoseconds ## expr min lq median uq max neval ## nzchar(test1) 817 951.5 1103.5 1214.5 3492 100 ## str_length(test1) == 0 54940 56892.5 58188.5 61168.5 174070 100 ## stri_isempty(test1) 2284 2882.5 3547.0 4477.5 21730 100
nchar()
– does not handle NAs properly
nchar(c("ąśćźół", "abc", NA, ""))
## [1] 6 3 2 0
str_length()
– handles NAs properly
str_length(c("ąśćźół", "abc", NA, ""))
## [1] 6 3 NA 0
stri_length()
– handles NAs properly
stri_length(c("ąśćźół", "abc", NA, ""))
## [1] 6 3 NA 0
Determining the length of an 8-bit-encoded string is O(1) [as it is not the same as calculating the number of bytes in a string], and in UTF-8 has linear time complexity.
test1 <- rep(c("ąśćźół", "abc", NA, ""), 100) # first string is in UTF-8 microbenchmark(nchar(test1), str_length(test1), stri_length(test1))
## Unit: microseconds ## expr min lq median uq max neval ## nchar(test1) 36.749 37.7550 38.0050 38.2800 77.063 100 ## str_length(test1) 66.064 67.9150 69.3035 70.2555 125.020 100 ## stri_length(test1) 6.322 6.9685 7.4345 8.5595 31.301 100
nchar()
with argument type='bytes'
– does not handle NAs properly
nchar(c("ąśćźół", "abc", NA, ""), type='bytes')
## [1] 12 3 2 0
stri_numbytes()
handles missing values properly.
stri_numbytes(c("ąśćźół", "abc", NA, ""))
## [1] 12 3 NA 0
stri_length()
.
test1 <- rep(c("ąśćźół", "abc", NA, ""), 100) # first string is in UTF-8 microbenchmark(nchar(test1, type='bytes'), stri_numbytes(test1))
## Unit: microseconds ## expr min lq median uq max neval ## nchar(test1, type = "bytes") 9.564 9.7995 9.999 10.286 33.683 100 ## stri_numbytes(test1) 2.531 2.7240 2.894 3.117 24.026 100
nchar()
with argument type='width'
– does not handle NAs properly;
Returns the estimated number of columns that
the cat()
function
will use to print the string in a monospaced font.
The same as chars if this cannot be calculated.
The R manual does not state how the numbers are determined.
nchar(c("gryzeldą", "", NA, "持続可能な統計環境"), type='width')
## [1] 8 0 2 18
TO DO: add stri_width()
.