Hive Engine Usage

If you want to use the hive engineConn on your linkis server, you need to ensure that the following environment variables have been set correctly and that the user who started the engineConn has these environment variables.

It is strongly recommended that you check these environment variables of the executing user before executing hive tasks.

Table 1-1 Environmental configuration list

The version of Hive supports hive1.x/hive2.x/hive3.x. The hive version supported by default is 2.3.3. If you want to modify the hive version, such as 2.3.3, you can find the linkis-engineConnplugin-hive module and change the \<hive.version> tag to 2.3 .3, then compile this module separately. The default is to support hive on MapReduce, if you want to change to Hive on Tez, You need to copy all the jars prefixed with tez-* to the directory: . Other hive operating modes are similar, just copy the corresponding dependencies to the lib directory of Hive EngineConn.

2.2 hive engineConnConn deployment and loading

If you have already compiled your hive engineConn plug-in has been compiled, then you need to put the new plug-in in the specified location to load, you can refer to the following article for details

EngineConnPlugin Installation

Linkis can configure the corresponding EngineConn parameters on the management console. If your newly added EngineConn needs this feature, you can refer to the following documents:

EngineConnPlugin Installation > 2.2 Configuration modification of management console (optional)

Preparation for operation, queue setting

Figure 3-1 Queue settings

You can also add the queue value in the StartUpMap of the submission parameter: startupMap.put("wds.linkis.rm.yarnqueue", "dws")

Linkis provides a client method to call hive tasks. The call method is through the SDK provided by LinkisClient. We provide java and scala two ways to call, the specific usage can refer to JAVA SDK Manual. If you use Hive, you only need to make the following changes:

3.2 How to use Linkis-cli

After Linkis 1.0, you can submit tasks through cli. We only need to specify the corresponding EngineConn and CodeType tag types. The use of Hive is as follows:

sh ./bin/linkis-cli -engineType jdbc-4 -codeType jdbc -code "show tables"  -submitUser hadoop -proxyUser hadoop

The specific usage can refer to Linkis CLI Manual.

The use of Scriptis is the simplest. You can directly enter Scriptis, right-click the directory and create a new hive script and write hivesql code.

The implementation of the hive engineConn is by instantiating the driver instance of hive, and then the driver submits the task, and obtains the result set and displays it.

Figure 3-2 Screenshot of the execution effect of hql

In addition to the above engineConn configuration, users can also make custom settings, including the memory size of the hive Driver process, etc.

Figure 4-1 User-defined configuration management console of hive

The default log interface does not display the application_id and the number of tasks completed, the user can output the log as needed The code blocks that need to be modified in the log4j2-engineconn.xml/log4j2.xml configuration file in the engine are as follows 1.Need to add under the appenders component

2.Need to add under the root component

3.Need to add under the loggers component

After making the above relevant modifications, the log can add task progress information, which is displayed in the following style

2022-04-08 11:06:50.228 INFO  [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Status: Running (Executing on YARN cluster with App id application_1631114297082_432445)
2022-04-08 11:06:50.248 INFO  [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: -/-  Reducer 2: 0/1  
2022-04-08 11:06:55.060 INFO  [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 0(+1)/1  Reducer 2: 0/1  
2022-04-08 11:06:57.495 INFO  [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 1/1  Reducer 2: 0(+1)/1