Spark Engine File Import Export

Furthermore, users often need to undertake joint analyses of data files such as CSV, Excel and online Hive databases, which need to be imported into the Hive database.

For more confidential industries, such as banks, data exports often require sensitive export fields such as identity cards, mobile phone numbers.

Using Spark’s distributed computing capability and supporting DataSource, which connects multiple data sources.

The export process is shown below in graph：

The user selects the corresponding data source and the corresponding data form to be exported, such as the user order form in the：Mysql library;
User selects file formats and output paths to export, e.g.：export user order form to excel, path to /home/username/orders.xlsx
Spark read corresponding data based on user configured data sources and tables and querying statements. DataSource supports multiple data storage components such as：Hive,Mysql, Oracle,HDF,Hbase,Mongodb
The data is then processed to DataFrame according to the data conversion format configured by the user
Gets the file write object according to the file format type of the user configuration, e.g.：supports the file writing object for Spark’s Excel.Writer’s support for multiple file formats such as Excel, exce, Json
Write the corresponding data via writer to the corresponding destination, e.g.：/home/username/orders.xlsx.

3.2 Import

Import process below：

Data type extrapolators use the first 10 lines of incoming data type to determine the type of data in each column. The method is to determine the data type in each row by determining the type of data and ultimately by determining the number of times the type appears, and to return to the user.

e.g.：user：String,orderId：Int;
User selected data sources to import, e.g.：Mysql.Import data also supports multiple selections;
The user chooses whether to create a new tree or rewrite the data or add the data.Select user order form and select data appending;
User-defined data import transformation format and imported column information, such as：decrypting user information
The scheme uses Spark and transforms the file to DataFrame via user incoming data to events and column information;