Export Query Result
Export simple query results to the file , specifying the export format as CSV.
When exporting to a local file, you need to configure enable_outfile_to_local=true
in fe.conf first
select * from tbl1 limit 10
INTO OUTFILE "file:///home/work/path/result_";
For more usage, see .
Concurrent export
- session variable ‘enable_parallel_outfile’ to enable concurrent export:
set enable_parallel_outfile = true;
- The export method is S3, HDFS instead of using a broker
- The query can meet the needs of concurrent export, for example, the top level does not contain single point nodes such as sort. (I will give an example later, which is a query that does not export the result set concurrently)
If the above three conditions are met, the concurrent export query result set can be triggered. Concurrency = be_instacne_num * parallel_fragment_exec_instance_num
After the user enables concurrent export through the session variable setting, if you want to verify whether the current query can be exported concurrently, you can use the following method.
After explaining the query, Doris will return the plan of the query. If you find that RESULT FILE SINK
appears in PLAN FRAGMENT 1
, it means that the export concurrency has been opened successfully. If RESULT FILE SINK
appears in PLAN FRAGMENT 0
, it means that the current query cannot be exported concurrently (the current query does not satisfy the three conditions of concurrent export at the same time).
Planning example for concurrent export:
+-----------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------+
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 1:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS:`k1` + `k2` |
| PARTITION: HASH_PARTITIONED: `default_cluster:test`.`multi_tablet`.`k1` |
| |
| RESULT FILE SINK |
| FILE PATH: s3://ml-bd-repo/bpit_test/outfile_1951_ |
| STORAGE TYPE: S3 |
| |
| TABLE: multi_tablet |
+-----------------------------------------------------------------------------+
Return result
The command is a synchronization command. The command returns, which means the operation is over. At the same time, a row of results will be returned to show the exported execution result.
If it exports and returns normally, the result is as follows:
- TotalRows: The number of rows in the result set.
- FileSize: The total size of the exported file. Unit byte.
- URL: If it is exported to a local disk, the Compute Node to which it is exported is displayed here.
If a concurrent export is performed, multiple rows of data will be returned.
+------------+-----------+----------+--------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL |
+------------+-----------+----------+--------------------------------------------------------------------+
| 1 | 3 | 7 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |
| 1 | 2 | 4 | file:///192.168.1.11/home/work/path/result_{fragment_instance_id}_ |
+------------+-----------+----------+--------------------------------------------------------------------+
2 rows in set (2.218 sec)
- The CSV format does not support exporting binary types, such as BITMAP and HLL types. These types will be output as
\N
, which is null. - If you do not enable concurrent export, the query result is exported by a single BE node in a single thread. Therefore, the export time and the export result set size are positively correlated. Turning on concurrent export can reduce the export time.
- The export command does not check whether the file and file path exist. Whether the path will be automatically created or whether the existing file will be overwritten is entirely determined by the semantics of the remote storage system.
- If an error occurs during the export process, the exported file may remain on the remote storage system. Doris will not clean these files. The user needs to manually clean up.
- The timeout of the export command is the same as the timeout of the query. It can be set by
SET query_timeout = xxx
. - For empty result query, there will be an empty file.
- File spliting will ensure that a row of data is stored in a single file. Therefore, the size of the file is not strictly equal to
max_file_size
. - For functions whose output is invisible characters, such as BITMAP and HLL types, the output is
\N
, which is NULL.
More Help
For more detailed syntax and best practices for using OUTFILE, please refer to the OUTFILE command manual, you can also More help information can be obtained by typing HELP OUTFILE
at the command line of the MySql client.