A new release of the stringi
package is available on CRAN (please wait a few days for Windows and OS X binary builds).
# install.packages("stringi") or update.packages()
library("stringi")
Here’s a list of changes in version 0.4-1. In the current release, we particularly focussed on making the package’s interface more consistent with that of the well-known stringr
package. For a general overview of stringi
’s facilities and base R string processing issues, see e.g. here.
(IMPORTANT CHANGE) n_max
argument in stri_split_*()
has been renamed n
.
(IMPORTANT CHANGE) simplify=FALSE
in stri_extract_all_*()
and stri_split_*()
now calls stri_list2matrix()
with fill=""
. fill=NA_character_
may be obtained by using simplify=NA
.
(IMPORTANT CHANGE, NEW FUNCTIONS) #120: stri_extract_words
has been renamed stri_extract_all_words
and stri_locate_boundaries
- stri_locate_all_boundaries
as well as stri_locate_words
- stri_locate_all_words
. New functions are now available: stri_locate_first_boundaries
, stri_locate_last_boundaries
, stri_locate_first_words
, stri_locate_last_words
, stri_extract_first_words
, stri_extract_last_words
.
# uses ICU's locale-dependent word break iterator
stri_extract_all_words("stringi: THE string processing package for R")
## [[1]]
## [1] "stringi" "THE" "string" "processing" "package"
## [6] "for" "R"
opts_regex
, opts_collator
, opts_fixed
, and opts_brkiter
can now be supplied individually via ...
. In other words, you may now simply call e.g.stri_detect_regex(c("stringi", "STRINGI"), "stringi", case_insensitive=TRUE)
## [1] TRUE TRUE
instead of:
stri_detect_regex(c("stringi", "STRINGI"), "stringi", opts_regex=stri_opts_regex(case_insensitive=TRUE))
## [1] TRUE TRUE
(NEW FEATURE) #110: Fixed pattern search engine’s settings can now be supplied via opts_fixed
argument in stri_*_fixed()
, see stri_opts_fixed()
. A simple (not suitable for natural language processing) yet very fast case_insensitive
pattern matching can be performed now. stri_extract_*_fixed
is again available.
(NEW FEATURE) #23: stri_extract_all_fixed
, stri_count
, and stri_locate_all_fixed
may now also look for overlapping pattern matches, see ?stri_opts_fixed
.
stri_extract_all_fixed("abaBAaba", "ABA", case_insensitive=TRUE, overlap=TRUE)
## [[1]]
## [1] "aba" "aBA" "aba"
(NEW FEATURE) #129: stri_match_*_regex
gained a cg_missing
argument.
(NEW FEATURE) #117: stri_extract_all_*()
, stri_locate_all_*()
, stri_match_all_*()
gained a new argument: omit_no_match
. Setting it to TRUE
makes these functions compatible with their stringr
equivalents.
(NEW FEATURE) #118: stri_wrap()
gained indent
, exdent
, initial
, and prefix
arguments. Moreover Knuth’s dynamic word wrapping algorithm now assumes that the cost of printing the last line is zero, see #128.
cat(stri_wrap(stri_rand_lipsum(1), 40, 2.0), sep="\n")
## Lorem ipsum dolor sit amet, et et diam
## vitae est ut. At tristique, tincidunt
## taciti, ac egestas vestibulum magna.
## Volutpat nisl non sed ultricies nisl
## nibh magna. Nullam rhoncus ut phasellus
## sed. Congue enim libero congue massa
## eget. Ligula, quis est amet velit.
## Accumsan amet nunc ad. Porttitor,
## sed vestibulum diam vestibulum quis
## sed gravida ultrices. Per urna enim.
## Scelerisque interdum sed vestibulum
## rhoncus quis imperdiet pharetra. Sapien
## iaculis, lacinia ac cras ante, sed
## vitae inceptos dis tristique dignissim.
## Venenatis volutpat lectus sodales,
## hac feugiat molestie mollis. A, urna
## pellentesque ante himenaeos ante at
## potenti in.
stri_subset()
gained an omit_na
argument.stri_subset_fixed(c("abc", NA, "def"), "a")
## [1] "abc" NA
stri_subset_fixed(c("abc", NA, "def"), "a", omit_na=TRUE)
## [1] "abc"
(NEW FEATURE) stri_list2matrix()
gained an n_min
argument.
(NEW FEATURE) #126: stri_split()
now is also able to act just like stringr::str_split_fixed()
.
stri_split_regex(c("bab", "babab"), "a", n = 3, simplify=TRUE)
## [,1] [,2] [,3]
## [1,] "b" "b" ""
## [2,] "b" "b" "b"
(NEW FEATURE) #119: stri_split_boundaries()
now have n
, tokens_only
, and simplify
arguments. Additionally, stri_extract_all_words()
is now equipped with simplify
arg.
(NEW FEATURE) #116: stri_paste()
gained a new argument: ignore_null
. Setting it to TRUE
makes this function more compatible with paste()
.
for (test in c(TRUE, FALSE))
print(stri_paste("a", if (test) 1:9, ignore_null=TRUE))
## [1] "a1" "a2" "a3" "a4" "a5" "a6" "a7" "a8" "a9"
## [1] "a"
Enjoy! Any comments and suggestions are welcome.