Character Set Support

    Parent topic: Greenplum Database Reference Guide

    gpinitsystem defines the default character set for a Greenplum Database system by reading the setting of the ENCODING parameter in the gp_init_config file at initialization time. The default character set is UNICODE or UTF8.

    You can create a database with a different character set besides what is used as the system-wide default. For example:

    Important: Although you can specify any encoding you want for a database, it is unwise to choose an encoding that is not what is expected by the locale you have selected. The LC_COLLATE and LC_CTYPE settings imply a particular encoding, and locale-dependent operations (such as sorting) are likely to misinterpret data that is in an incompatible encoding.

    Since these locale settings are frozen by gpinitsystem, the apparent flexibility to use different encodings in different databases is more theoretical than real.

    Character Set Conversion Between Server and Client

    Greenplum Database supports automatic character set conversion between server and client for certain character set combinations. The conversion information is stored in the master pg_conversion system catalog table. Greenplum Database comes with some predefined conversions or you can create a new conversion using the SQL command CREATE CONVERSION.

    Server Character SetAvailable Client Character Sets
    BIG5not supported as a server encoding
    EUC_CNEUC_CN, MULE_INTERNAL, UTF8
    EUC_JPEUC_JP, MULE_INTERNAL, SJIS, UTF8
    EUC_KREUC_KR, MULE_INTERNAL, UTF8
    EUC_TWEUC_TW, BIG5, MULE_INTERNAL, UTF8
    GB18030not supported as a server encoding
    GBKnot supported as a server encoding
    ISO_8859_5ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN866, WIN1251
    ISO_8859_6ISO_8859_6, UTF8
    ISO_8859_7ISO_8859_7, UTF8
    ISO_8859_8ISO_8859_8, UTF8
    JOHABJOHAB, UTF8
    KOI8KOI8, ISO_8859_5, MULE_INTERNAL, UTF8, WIN866, WIN1251
    LATIN1LATIN1, MULE_INTERNAL, UTF8
    LATIN2LATIN2, MULE_INTERNAL, UTF8, WIN1250
    LATIN3LATIN3, MULE_INTERNAL, UTF8
    LATIN4LATIN4, MULE_INTERNAL, UTF8
    LATIN5LATIN5, UTF8
    LATIN6LATIN6, UTF8
    LATIN7LATIN7, UTF8
    LATIN8LATIN8, UTF8
    LATIN9LATIN9, UTF8
    LATIN10LATIN10, UTF8
    MULE_INTERNALMULE_INTERNAL, BIG5, EUC_CN, EUC_JP, EUC_KR, EUC_TW, ISO_8859_5, KOI8, LATIN1 to LATIN4, SJIS, WIN866, WIN1250, WIN1251
    SJISnot supported as a server encoding
    SQL_ASCIInot supported as a server encoding
    UHCnot supported as a server encoding
    UTF8all supported encodings
    WIN866WIN866
    ISO_8859_5KOI8, MULE_INTERNAL, UTF8, WIN1251
    WIN874WIN874, UTF8
    WIN1250WIN1250, LATIN2, MULE_INTERNAL, UTF8
    WIN1251WIN1251, ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN866
    WIN1252WIN1252, UTF8
    WIN1253WIN1253, UTF8
    WIN1254WIN1254, UTF8
    WIN1255WIN1255, UTF8
    WIN1256WIN1256, UTF8
    WIN1257WIN1257, UTF8
    WIN1258WIN1258, UTF8

    To enable automatic character set conversion, you have to tell Greenplum Database the character set (encoding) you would like to use in the client. There are several ways to accomplish this:

    • Using the \encoding command in psql, which allows you to change client encoding on the fly.

    • Using the PGCLIENTENCODING environment variable. When PGCLIENTENCODING is defined in the client’s environment, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.)

    If the conversion of a particular character is not possible “ suppose you chose EUC_JP for the server and LATIN1 for the client, then some Japanese characters do not have a representation in LATIN1 “ then an error is reported.

    If the client character set is defined as SQL_ASCII, encoding conversion is deactivated, regardless of the server’s character set. The use of SQL_ASCII is unwise unless you are working with all-ASCII data. SQL_ASCII is not supported as a server encoding.