MINI LOAD

    MINI LOAD is imported through HTTP protocol. Users can import without relying on Hadoop or Mysql client. The user describes the import through HTTP protocol, and the data is streamed into Doris in the process of receiving http requests. After the ** import job is completed, the ** returns to the user the imported results.

    • Note: In order to be compatible with the old version of mini load usage habits, users can still view the import results through the ‘SHOW LOAD’ command.

    Grammar: Import:

    curl —location-trusted -u user:passwd -T data.file http://host:port/api/{db}/{table}/\_load?label=xxx

    View import information

    curl -u user:passwd

    HTTP Protocol Specification

    Privilege Authentication Currently Doris uses the Basic mode of HTTP for privilege authentication. So you need to specify a username and password when importing This way is to pass the password in plaintext, and does not support encrypted transmission for the time being.

    Expect Doris needs to send an HTTP request with the ‘Expect’ header information,’100-continue’. Why? Because we need to redirect the request, we have to transfer the data content before. This can avoid causing multiple data transmission, thereby improving efficiency.

    Content-Length Doris needs to send a request with the header ‘Content-Length’. If the content ratio is sent ‘Content-Length’ is less, so Doris believes that if there is a transmission problem, the submission task fails. NOTE: If you send more data than ‘Content-Length’, Doris reads only ‘Content-Length’. Length content and import

    Description of parameters:

    Label: The label used to specify this batch of imports for later job queries, etc. This parameter must be passed in.

    Columns: Used to describe the corresponding column name in the import file. If it is not passed in, the column order in the file is considered to be the same as the order in which the table is built. The specified method is comma-separated, such as columns = k1, k2, k3, K4

    Column_separator: Used to specify the separator between columns, default is’ t’ NOTE: Url encoding is required, for example If you need to specify ‘\t’ as a separator, you should pass in ‘column_separator=% 09’ If you need to specify ‘x01’as a delimiter, you should pass in ‘column_separator=% 01’ If you need to specify’,’as a separator, you should pass in ‘column_separator=% 2c’

    Max_filter_ratio: Used to specify the maximum percentage allowed to filter irregular data, default is 0, not allowed to filter Custom specification should be as follows:’max_filter_ratio = 0.2’, meaning that 20% error rate is allowed.

    Timeout: Specifies the timeout time of the load job in seconds. When the load execution time exceeds this threshold, it is automatically cancelled. The default timeout time is 86400 seconds. It is recommended to specify a timeout time of less than 86400 seconds.

    Hll: Used to specify the corresponding relationship between the HLL columns in the data and the tables, the columns in the tables and the columns specified in the data. (If columns are not specified, the columns of the data column surface can also be other non-HLL columns in the table.) By “partition” Specify multiple HLL columns using “:” splitting, for example:’hll1, cuid: hll2, device’

    NOTE:

    1. This method of importing is currently completed on a single machine, so it is not suitable to import a large amount of data. It is recommended that the amount of data imported should not exceed 1 GB.

    2. Miniload is imported in exactly the same way as streaming. It returns the results synchronously to users after the import of streaming is completed. Although the information of mini load can be found in subsequent queries, it cannot be operated on. The queries are only compatible with the old ways of use.

    ‘35;’35; example

    1. Import the data from the local file ‘testData’ into the table of ‘testTbl’ in the database ‘testDb’(the user is in default_cluster) curl —location-trusted -u root -T testData http://host:port/api/testDb/testTbl/\_load?label=123

    2. Import the data from the local file ‘testData’ into the table of ‘testTbl’ in the database’testDb’(the user is in test_cluster). The timeout time is 3600 seconds. curl —location-trusted -u root@test_cluster:root -T testData

    3. Import the data from the local file ‘testData’ into the table ‘testTbl’ in the database ‘testDb’, allowing a 20% error rate, and specify the column name of the file (the user is in default_cluster) curl —location-trusted -u root -T testData http://host:port/api/testDb/testTbl/\_load?label=123&max\_filter\_ratio=0.2&columns=k1,k2,k3

    4. Import in streaming mode (user is in default_cluster) seq 1 10 | awk ‘{OFS=”\t”}{print $1, $1 * 10}’ | curl —location-trusted -u root -T -

    5. Import tables containing HLL columns, which can be columns in tables or columns in data to generate HLL columns (users are in default_cluster)

      curl —location-trusted -u root -T testData http://host:port/api/testDb/testTbl/\_load?label=123&max\_filter\_ratio=0.2&hll=hll\_column1,k1:hll\_column2,k2 &columns=k1,k2,k3

      curl —location-trusted -u root -T testData &hll=hll_column1,tmp_k4:hll_column2,tmp_k5&columns=k1,k2,k3,tmp_k4,tmp_k5

    6. View imports after submission

    keyword

    MINI, LOAD