Unicode

    Returns if the given char or integer is an assigned Unicode code point.

    Examples

    Normalize the string s. By default, canonical composition (compose=true) is performed without ensuring Unicode versioning stability (compat=false), which produces the shortest possible equivalent string but may introduce composition characters not present in earlier Unicode versions.

    Alternatively, one of the four “normal forms” of the Unicode standard can be specified: normalform can be :NFC, :NFD, :NFKC, or :NFKD. Normal forms C (canonical composition) and D (canonical decomposition) convert different visually identical representations of the same abstract string into a single canonical form, with form C being more compact. Normal forms KC and KD additionally canonicalize “compatibility equivalents”: they convert characters that are abstractly similar but visually distinct into a single canonical choice (e.g. they expand ligatures into the individual characters), with form KC being more compact.

    Alternatively, finer control and additional transformations may be obtained by calling , where any number of the following boolean keywords options (which all default to false except for compose) are specified:

    • compose=false: do not perform canonical composition
    • decompose=true: do canonical decomposition instead of canonical composition (compose=true is ignored if present)
    • casefold=true: perform Unicode case folding, e.g. for case-insensitive string comparison
    • newline2lf=true, , or newline2ps=true: convert various newline sequences (LF, CRLF, CR, NEL) into a linefeed (LF), line-separation (LS), or paragraph-separation (PS) character, respectively
    • stripignore=true: strip Unicode’s “default ignorable” characters (e.g. the soft hyphen or the left-to-right marker)
    • stripcc=true: strip control characters; horizontal tabs and form feeds are converted to spaces; newlines are also converted to spaces unless a newline-conversion flag was specified
    • stable=true: enforce Unicode versioning stability (never introduce characters missing from earlier Unicode versions)

    Examples

    source

    — Function

    source