stringi: THE String Processing Package for R¶
stringi (pronounced “stringy”, IPA [strinɡi]) is THE R package for very fast, portable, correct, consistent, and convenient string/text processing in any locale or character encoding.
—by Marek Gagolewski
Thanks to ICU, stringi fully supports a wide range of Unicode standards (see also this video).
It gives you a multitude of functions for:
string concatenation, padding, wrapping,
substring extraction,
pattern searching (e.g., with ICU Java-like regular expressions),
collation and sorting,
random string generation,
case mapping and folding,
string transliteration,
Unicode normalisation,
date-time formatting and parsing,
and many more.
stringi is among the most often downloaded R
packages.
You can obtain it from CRAN by calling:
install.packages("stringi")
stringi’s source code is hosted on GitHub. It has been released under the open source BSD-3-clause license.
The package’s API was inspired by that of the early (pre-tidyverse; v0.6.2) version of Hadley Wickham’s stringr package (and since the 2015 v1.0.0 stringr is powered by stringi). Moreover, Hadley suggested quite a few new package features. The contributions from Bartłomiej Tartanus and many others is greatly appreciated. Thanks!
stringi
Reference Manual
- R Package stringi Reference
- about_arguments: Passing Arguments to Functions in stringi
- about_encoding: Character Encodings and stringi
- about_locale: Locales and stringi
- about_search_boundaries: Text Boundary Analysis in stringi
- about_search_charclass: Character Classes in stringi
- about_search_coll: Locale-Sensitive Text Searching in stringi
- about_search_fixed: Locale-Insensitive Fixed Pattern Matching in stringi
- about_search_regex: Regular Expressions in stringi
- about_search: String Searching
- about_stringi: THE String Processing Package
- operator_add: Concatenate Two Character Vectors
- operator_compare: Compare Strings with or without Collation
- operator_dollar: C-Style Formatting with sprintf as a Binary Operator
- stri_compare: Compare Strings with or without Collation
- stri_count_boundaries: Count the Number of Text Boundaries
- stri_count: Count the Number of Pattern Matches
- stri_datetime_add: Date and Time Arithmetic
- stri_datetime_create: Create a Date-Time Object
- stri_datetime_fields: Get Values for Date and Time Fields
- stri_datetime_format: Date and Time Formatting and Parsing
- stri_datetime_fstr: Convert strptime-Style Format Strings
- stri_datetime_now: Get Current Date and Time
- stri_datetime_symbols: List Localizable Date-Time Formatting Data
- stri_detect: Detect a Pattern Match
- stri_dup: Duplicate Strings
- stri_duplicated: Determine Duplicated Elements
- stri_enc_detect: Detect Character Set and Language
- stri_enc_detect2: [DEPRECATED] Detect Locale-Sensitive Character Encoding
- stri_enc_fromutf32: Convert From UTF-32
- stri_enc_info: Query a Character Encoding
- stri_enc_isascii: Check If a Data Stream Is Possibly in ASCII
- stri_enc_isutf16: Check If a Data Stream Is Possibly in UTF-16 or UTF-32
- stri_enc_isutf8: Check If a Data Stream Is Possibly in UTF-8
- stri_enc_list: List Known Character Encodings
- stri_enc_mark: Get Declared Encodings of Each String
- stri_enc_set: Set or Get Default Character Encoding in stringi
- stri_enc_toascii: Convert To ASCII
- stri_enc_tonative: Convert Strings To Native Encoding
- stri_enc_toutf32: Convert Strings To UTF-32
- stri_enc_toutf8: Convert Strings To UTF-8
- stri_encode: Convert Strings Between Given Encodings
- stri_escape_unicode: Escape Unicode Code Points
- stri_extract_boundaries: Extract Data Between Text Boundaries
- stri_extract: Extract Occurrences of a Pattern
- stri_flatten: Flatten a String
- stri_info: Query Default Settings for stringi
- stri_isempty: Determine if a String is of Length Zero
- stri_join_list: Concatenate Strings in a List
- stri_join: Concatenate Character Vectors
- stri_length: Count the Number of Code Points
- stri_list2matrix: Convert a List to a Character Matrix
- stri_locale_info: Query Given Locale
- stri_locale_list: List Available Locales
- stri_locale_set: Set or Get Default Locale in stringi
- stri_locate_boundaries: Locate Text Boundaries
- stri_locate: Locate Occurrences of a Pattern
- stri_match: Extract Regex Pattern Matches, Together with Capture Groups
- stri_na2empty: Replace NAs with Empty Strings
- stri_numbytes: Count the Number of Bytes
- stri_opts_brkiter: Generate a List with BreakIterator Settings
- stri_opts_collator: Generate a List with Collator Settings
- stri_opts_fixed: Generate a List with Fixed Pattern Search Engine’s Settings
- stri_opts_regex: Generate a List with Regex Matcher Settings
- stri_order: Ordering Permutation
- stri_pad: Pad (Center/Left/Right Align) a String
- stri_rand_lipsum: A Lorem Ipsum Generator
- stri_rand_shuffle: Randomly Shuffle Code Points in Each String
- stri_rand_strings: Generate Random Strings
- stri_rank: Ranking
- stri_read_lines: Read Text Lines from a Text File
- stri_read_raw: Read Text File as Raw
- stri_remove_empty: Remove All Empty Strings from a Character Vector
- stri_replace_na: Replace Missing Values in a Character Vector
- stri_replace: Replace Occurrences of a Pattern
- stri_reverse: Reverse Each String
- stri_sort_key: Sort Keys
- stri_sort: Sorting
- stri_split_boundaries: Split a String at Text Boundaries
- stri_split_lines: Split a String Into Text Lines
- stri_split: Split a String By Pattern Matches
- stri_startsendswith: Determine if the Start or End of a String Matches a Pattern
- stri_stats_general: General Statistics for a Character Vector
- stri_stats_latex: Statistics for a Character Vector Containing LaTeX Commands
- stri_sub_all: Extract or Replace Multiple Substrings
- stri_sub: Extract a Substring From or Replace a Substring In a Character Vector
- stri_subset: Select Elements that Match a Given Pattern
- stri_timezone_info: Query a Given Time Zone
- stri_timezone_list: List Available Time Zone Identifiers
- stri_timezone_set: Set or Get Default Time Zone in stringi
- stri_trans_casemap: Transform Strings with Case Mapping or Folding
- stri_trans_char: Translate Characters
- stri_trans_general: General Text Transforms, Including Transliteration
- stri_trans_list: List Available Text Transforms and Transliterators
- stri_trans_nf: Perform or Check For Unicode Normalization
- stri_trim: Trim Characters from the Left and/or Right Side of a String
- stri_unescape_unicode: Un-escape All Escape Sequences
- stri_unique: Extract Unique Elements
- stri_width: Determine the Width of Code Points
- stri_wrap: Word Wrap Text to Format Paragraphs
- stri_write_lines: Write Text Lines to a Text File
Other
- Source Code (GitHub)
- Bug Tracker and Feature Suggestions
- CRAN Entry
- Author's Homepage
- C++ API — Rcpp Example
- What Is New in stringi
- 1.6.1 (2021-05-05)
- 1.5.3 (2020-09-04)
- 1.4.6 (2020-02-17)
- 1.4.5 (2020-01-11)
- 1.4.4 (2020-01-06)
- 1.4.3 (2019-03-12)
- 1.3.1 (2019-02-10)
- 1.2.4 (2018-07-20)
- 1.2.3 (2018-05-16)
- 1.2.2 (2018-05-01)
- 1.1.7 (2018-03-06)
- 1.1.6 (2017-11-10)
- 1.1.5 (2017-04-07)
- 1.1.3 (2017-03-21)
- 1.1.2 (2016-09-30)
- 1.1.1 (2016-05-25)
- 1.0-1 (2015-10-22)
- 0.5-5 (2015-06-28)
- 0.5-2 (2015-06-21)
- 0.4-1 (2014-12-11)
- 0.3-1 (2014-11-06)
- 0.2-5 (2014-05-16)
- 0.2-4 (2014-05-15)
- 0.2-3 (2014-05-14)
- 0.1-25 (2014-03-12)
- 0.1-24 (2014-03-11)
- 0.1-23 (2014-03-11)
- 0.1-22 (2014-02-20)
- 0.1-21 (2014-02-19)
- 0.1-20 (2014-02-17)
- 0.1-11 (2013-11-16)
- 0.1-10 (2013-11-13)
- 0.1-6 (2013-07-05)
- 0.1-1 (2013-01-05)
- Installing stringi