Unicode

    Examples

    Alternatively, finer control and additional transformations may be be obtained by calling Unicode.normalize(s; keywords...), where any number of the following boolean keywords options (which all default to except for compose) are specified:

    • compose=false: do not perform canonical composition
    • decompose=true: do canonical decomposition instead of canonical composition (compose=true is ignored if present)
    • casefold=true: perform Unicode case folding, e.g. for case-insensitive string comparison
    • newline2lf=true, , or newline2ps=true: convert various newline sequences (LF, CRLF, CR, NEL) into a linefeed (LF), line-separation (LS), or paragraph-separation (PS) character, respectively
    • stripignore=true: strip Unicode’s “default ignorable” characters (e.g. the soft hyphen or the left-to-right marker)
    • stripcc=true: strip control characters; horizontal tabs and form feeds are converted to spaces; newlines are also converted to spaces unless a newline-conversion flag was specified
    • stable=true: enforce Unicode Versioning Stability

    For example, NFKC corresponds to the options compose=true, compat=true, stable=true.

    source

    Returns an iterator over substrings of that correspond to the extended graphemes in the string, as defined by Unicode UAX #29. (Roughly, these are what users would perceive as single characters, even though they may contain more than one codepoint; for example a letter combined with an accent mark is a single grapheme.)