[experimental] MaterializedMySQL

    This is an experimental feature that should not be used in production.

    Creates ClickHouse database with all the tables existing in MySQL, and all the data in those tables.

    ClickHouse server works as MySQL replica. It reads binlog and performs DDL and DML queries.

    Engine Parameters

    • — MySQL server endpoint.
    • database — MySQL database name.
    • user — MySQL user.
    • password — User password.

    Engine Settings

    • max_rows_in_buffer — Maximum number of rows that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: 65 505.
    • max_bytes_in_buffer — Maximum number of bytes that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: 1 048 576.
    • max_rows_in_buffers — Maximum number of rows that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: 65 505.
    • max_bytes_in_buffers — Maximum number of bytes that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: 1 048 576.
    • max_flush_data_time — Maximum number of milliseconds that data is allowed to cache in memory (for database and the cache data unable to query). When this time is exceeded, the data will be materialized. Default: 1000.
    • allows_query_when_mysql_lost — Allows to query a materialized table when MySQL is lost. Default: 0 (false).
    1. CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***')
    2. SETTINGS
    3. allows_query_when_mysql_lost=true,
    4. max_wait_time_when_mysql_unavailable=10000;

    Settings on MySQL-server Side

    For the correct work of MaterializedMySQL, there are few mandatory MySQL-side configuration settings that must be set:

    • default_authentication_plugin = mysql_native_password since MaterializedMySQL can only authorize with this method.
    • gtid_mode = on since GTID based logging is a mandatory for providing correct MaterializedMySQL replication.

    Attention

    While turning on gtid_mode you should also specify .

    When working with the MaterializedMySQL database engine, ReplacingMergeTree tables are used with virtual _sign and _version columns.

    • _version — Transaction counter. Type .
    • _sign — Deletion mark. Type Int8. Possible values:
      • 1 — Row is not deleted,
      • -1 — Row is deleted.

    Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception “Unhandled data type” and stops replication.

    Apart of the data types limitations there are few restrictions comparing to MySQL databases, that should be resolved before replication will be possible:

    • Each table in MySQL should contain PRIMARY KEY.

    • Replication for tables, those are containing rows with ENUM field values out of range (specified in ENUM signature) will not work.

    MySQL DDL queries are converted into the corresponding ClickHouse DDL queries (, CREATE, , RENAME). If ClickHouse cannot parse some DDL query, the query is ignored.

    MaterializedMySQL does not support direct INSERT, DELETE and UPDATE queries. However, they are supported in terms of data replication:

    • MySQL INSERT query is converted into INSERT with _sign=1.

    • MySQL DELETE query is converted into INSERT with _sign=-1.

    • MySQL UPDATE query is converted into INSERT with _sign=-1 and INSERT with _sign=1.

    SELECT query from MaterializedMySQL tables has some specifics:

    • If _sign is not specified in the SELECT query, WHERE _sign=1 is used by default. So the deleted rows are not included into the result set.

    • The result includes columns comments in case they exist in MySQL database tables.

    MySQL PRIMARY KEY and INDEX clauses are converted into ORDER BY tuples in ClickHouse tables.

    ClickHouse has only one physical order, which is determined by ORDER BY clause. To create a new physical order, use .

    Notes

    • Cascade UPDATE/DELETE queries are not supported by the MaterializedMySQL engine.
    • Replication can be easily broken.
    • Manual operations on database and tables are forbidden.
    • MaterializedMySQL is influenced by optimize_on_insert setting. The data is merged in the corresponding table in the MaterializedMySQL database when a table in the MySQL server changes.

    Queries in MySQL:

    1. mysql> CREATE DATABASE db;
    2. mysql> CREATE TABLE db.test (a INT PRIMARY KEY, b INT);
    3. mysql> INSERT INTO db.test VALUES (1, 11), (2, 22);
    4. mysql> DELETE FROM db.test WHERE a=1;
    5. mysql> ALTER TABLE db.test ADD COLUMN c VARCHAR(16);
    6. mysql> UPDATE db.test SET c='Wow!', b=222;
    7. mysql> SELECT * FROM test;

    Database in ClickHouse, exchanging data with the MySQL server:

    The database and the table created:

    1. CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***');
    2. SHOW TABLES FROM mysql;
    1. ┌─name─┐
    2. test
    3. └──────┘

    After inserting data:

    1. ┌─a─┬──b─┐
    2. 1 11
    3. 2 22

    After deleting data, adding the column and updating:

    1. SELECT * FROM mysql.test;