Character Set Support
Parent topic: Greenplum Database Reference Guide
gpinitsystem defines the default character set for a Greenplum Database system by reading the setting of the ENCODING
parameter in the gp_init_config
file at initialization time. The default character set is UNICODE
or UTF8
.
You can create a database with a different character set besides what is used as the system-wide default. For example:
Important: Although you can specify any encoding you want for a database, it is unwise to choose an encoding that is not what is expected by the locale you have selected. The LC_COLLATE
and LC_CTYPE
settings imply a particular encoding, and locale-dependent operations (such as sorting) are likely to misinterpret data that is in an incompatible encoding.
Since these locale settings are frozen by gpinitsystem, the apparent flexibility to use different encodings in different databases is more theoretical than real.
Character Set Conversion Between Server and Client
Greenplum Database supports automatic character set conversion between server and client for certain character set combinations. The conversion information is stored in the master pg_conversion system catalog table. Greenplum Database comes with some predefined conversions or you can create a new conversion using the SQL command CREATE CONVERSION
.
Server Character Set | Available Client Character Sets |
---|---|
BIG5 | not supported as a server encoding |
EUC_CN | EUC_CN, MULE_INTERNAL, UTF8 |
EUC_JP | EUC_JP, MULE_INTERNAL, SJIS, UTF8 |
EUC_KR | EUC_KR, MULE_INTERNAL, UTF8 |
EUC_TW | EUC_TW, BIG5, MULE_INTERNAL, UTF8 |
GB18030 | not supported as a server encoding |
GBK | not supported as a server encoding |
ISO_8859_5 | ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN866, WIN1251 |
ISO_8859_6 | ISO_8859_6, UTF8 |
ISO_8859_7 | ISO_8859_7, UTF8 |
ISO_8859_8 | ISO_8859_8, UTF8 |
JOHAB | JOHAB, UTF8 |
KOI8 | KOI8, ISO_8859_5, MULE_INTERNAL, UTF8, WIN866, WIN1251 |
LATIN1 | LATIN1, MULE_INTERNAL, UTF8 |
LATIN2 | LATIN2, MULE_INTERNAL, UTF8, WIN1250 |
LATIN3 | LATIN3, MULE_INTERNAL, UTF8 |
LATIN4 | LATIN4, MULE_INTERNAL, UTF8 |
LATIN5 | LATIN5, UTF8 |
LATIN6 | LATIN6, UTF8 |
LATIN7 | LATIN7, UTF8 |
LATIN8 | LATIN8, UTF8 |
LATIN9 | LATIN9, UTF8 |
LATIN10 | LATIN10, UTF8 |
MULE_INTERNAL | MULE_INTERNAL, BIG5, EUC_CN, EUC_JP, EUC_KR, EUC_TW, ISO_8859_5, KOI8, LATIN1 to LATIN4, SJIS, WIN866, WIN1250, WIN1251 |
SJIS | not supported as a server encoding |
SQL_ASCII | not supported as a server encoding |
UHC | not supported as a server encoding |
UTF8 | all supported encodings |
WIN866 | WIN866 |
ISO_8859_5 | KOI8, MULE_INTERNAL, UTF8, WIN1251 |
WIN874 | WIN874, UTF8 |
WIN1250 | WIN1250, LATIN2, MULE_INTERNAL, UTF8 |
WIN1251 | WIN1251, ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN866 |
WIN1252 | WIN1252, UTF8 |
WIN1253 | WIN1253, UTF8 |
WIN1254 | WIN1254, UTF8 |
WIN1255 | WIN1255, UTF8 |
WIN1256 | WIN1256, UTF8 |
WIN1257 | WIN1257, UTF8 |
WIN1258 | WIN1258, UTF8 |
To enable automatic character set conversion, you have to tell Greenplum Database the character set (encoding) you would like to use in the client. There are several ways to accomplish this:
Using the
\encoding
command inpsql
, which allows you to change client encoding on the fly.Using the
PGCLIENTENCODING
environment variable. WhenPGCLIENTENCODING
is defined in the client’s environment, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.)
If the conversion of a particular character is not possible “ suppose you chose EUC_JP
for the server and LATIN1
for the client, then some Japanese characters do not have a representation in LATIN1
“ then an error is reported.
If the client character set is defined as SQL_ASCII
, encoding conversion is deactivated, regardless of the server’s character set. The use of SQL_ASCII
is unwise unless you are working with all-ASCII data. SQL_ASCII
is not supported as a server encoding.