Sequence Column

    To solve this problem, Doris supported a sequence column by allowing the user to specify the sequence column when importing. Under the same key column, columns of the REPLACE aggregate type will be replaced according to the value of the sequence column, larger values can be replaced with smaller values, and vice versa. In this method, the order is determined by the user, and the user controls the replacement order.

    Implemented by adding a hidden column , the type of the column is specified by the user while create the table, determines the specific value of the column on import, and replaces the REPLACE column with that value.

    When you create the Uniq table, a hidden column __DORIS_SEQUENCE_COL__ is automatically added, depending on the type specified by the user

    Import

    When importing, fe sets the value of the hidden column during parsing to the value of the ‘order by’ expression (Broker Load and routine Load), or the value of the ‘function_column.sequence_col’ expression (stream load), and the value column will be replaced according to this value. The value of the hidden column __DORIS_SEQUENCE_COL__ can be set as a column in the source data or in the table structure.

    The request with the value column needs to read the additional column of __DORIS_SEQUENCE_COL__, which is used as a basis for the order of replacement aggregation function replacement under the same key column, with the larger value replacing the smaller value and not the reverse.

    Cumulative Compaction

    Cumulative Compaction works in the same way as the reading process

    Base Compaction works in the same way as the reading process

    Syntax

    Create Table

    When you create the Uniq table, you can specify the sequence column type

    The sequence_type is used to specify the type of the sequence column, which can be integral and time

    stream load

    The syntax of the stream load is to add the mapping of hidden columns corresponding to source_sequence in the ‘function_column.sequence_col’ field in the header, for example

    1. curl --location-trusted -u root -H "columns: k1,k2,source_sequence,v1,v2" -H "function_column.sequence_col: source_sequence" -T testData http://host:port/api/testDb/testTbl/_stream_load

    broker load

    Set the source_sequence field for the hidden column map at ORDER BY

    1. LOAD LABEL db1.label1
    2. (
    3. DATA INFILE("hdfs://host:port/user/data/*/test.txt")
    4. INTO TABLE `tbl1`
    5. COLUMNS TERMINATED BY ","
    6. (k1,k2,source_sequence,v1,v2)
    7. ORDER BY source_sequence
    8. )
    9. WITH BROKER 'broker'
    10. (
    11. "username"="user",
    12. "password"="pass"
    13. PROPERTIES
    14. (
    15. "timeout" = "3600"
    16. );

    routine load

    The mapping method is the same as above, as shown below

    1. CREATE ROUTINE LOAD example_db.test1 ON example_tbl
    2. COLUMNS(k1, k2, source_sequence, v1, v2),
    3. WHERE k1 > 100 and k2 like "%doris%"
    4. [ORDER BY source_sequence]
    5. PROPERTIES
    6. (
    7. "desired_concurrent_number"="3",
    8. "max_batch_interval" = "20",
    9. "max_batch_rows" = "300000",
    10. "max_batch_size" = "209715200",
    11. "strict_mode" = "false"
    12. )
    13. FROM KAFKA
    14. (
    15. "kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
    16. "kafka_topic" = "my_topic",
    17. "kafka_partitions" = "0,1,2,3",
    18. "kafka_offsets" = "101,0,0,200"
    19. );

    If function_column.sequence_type is set when creating a new table, then the sequence column will be supported. For a table that does not support sequence column, use the following statement if you would like to use this feature: ALTER TABLE example_db.my_table ENABLE FEATURE "SEQUENCE_LOAD" WITH PROPERTIES ("function_column.sequence_type" = "Date") to enable. If you want to determine if a table supports sequence column, you can set the session variable to display the hidden column SET show_hidden_columns=true, followed by desc Tablename, if the output contains the column __DORIS_SEQUENCE_COL__, it is supported, if not, it is not supported

    Let’s take the stream Load as an example to show how to use it

    1. Create a table that supports sequence column.
    1. Import data normally:

    Import the following data

    1. 1 2020-02-22 1 2020-03-05 c
    2. 1 2020-02-22 1 2020-02-26 d
    3. 1 2020-02-22 1 2020-02-22 e
    4. 1 2020-02-22 1 2020-02-22 b

    Take the Stream Load as an example here and map the sequence column to the modify_date column

    1. curl --location-trusted -u root: -H "function_column.sequence_col: modify_date" -T testData http://host:port/api/test/test_table/_stream_load

    The results is

    1. MySQL > select * from test_table;
    2. +---------+------------+----------+-------------+---------+
    3. | user_id | date | group_id | modify_date | keyword |
    4. +---------+------------+----------+-------------+---------+
    5. | 1 | 2020-02-22 | 1 | 2020-03-05 | c |
    6. +---------+------------+----------+-------------+---------+

    In this import, the c is eventually retained in the keyword column because the value of the sequence column (the value in modify_date) is the maximum value: ‘2020-03-05’.

    1. Guarantee of substitution order

    After the above steps are completed, import the following data

    Query data

    1. MySQL [test]> select * from test_table;
    2. +---------+------------+----------+-------------+---------+
    3. | user_id | date | group_id | modify_date | keyword |
    4. +---------+------------+----------+-------------+---------+
    5. | 1 | 2020-02-22 | 1 | 2020-03-05 | c |
    6. +---------+------------+----------+-------------+---------+

    Because the sequence column for the newly imported data are all smaller than the values already in the table, they cannot be replaced Try importing the following data again

    1. 1 2020-02-22 1 2020-02-22 a
    2. 1 2020-02-22 1 2020-03-23 w
    1. MySQL [test]> select * from test_table;
    2. +---------+------------+----------+-------------+---------+
    3. | user_id | date | group_id | modify_date | keyword |
    4. +---------+------------+----------+-------------+---------+
    5. | 1 | 2020-02-22 | 1 | 2020-03-23 | w |

    At this point, you can replace the original data in the table