Quick Deployment

Reminder: If you want to experience LINKIS Family Bucket: DSS + Linkis + Qualitis + Visualis + Azkaban, please visit


  1. Please note: the lite version only allows users to submit Python scripts.

Simple version:

  1. depends on Python, Hadoop and Hive, distributed installation mode, including Python engine and Hive engine, requires the user's Linux environment to install Hadoop and Hive first.
  2. The simple version allows users to submit HiveQL and Python scripts.

Standard Edition

  1. Depends on Python, Hadoop, Hive and Spark, distributed installation mode, including Python engine, Hive engine and Spark engine, requires the user's Linux environment to install Hadoop first , Hive and Spark, Linkis machines rely on the cluster's hadoop/hive/spark configuration files, and do not need to be deployed with the DataNode and NameNode machines, but can be deployed on a separate Client machine.
  2. The standard version allows users to submit Spark scripts (including SparkSQL, Pyspark and Scala), HiveQL and Python scripts. **Please note: the installation of the standard version requires the machine's memory to be above 10G** If the machine's memory is not enough, you need to add or modify the environment variable: `export SERVER_HEAP_SIZE="512M"`

2 Simplified version of Linkis environment preparation

  1. The following software must be installed:
  • MySQL (5.5+),
  • JDK (above 1.8.0_141), How to install JDK
  • Python (support both 2.x and 3.x),

2.2 Create User

  1. For example: **Deployment user is hadoop account**
  1. Create a deployment user on the deployment machine for installation
  1. sudo useradd hadoop
  1. Because the Linkis service uses sudo -u ${linux-user} to switch engines to perform operations, the deployment user needs to have sudo permissions and is password-free.
  1. vi /etc/sudoers
  1. hadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL
  1. If your Python wants to have the drawing function, you also need to install the drawing module in the installation node. The command is as follows:
  1. python -m pip install matplotlib

2.3 Installation package preparation

  1. from the released release of Linkis ([click here to enter the download page](https://github.com/apache/incubator-linkis/releases)), Download the latest installation package.
  2. First decompress the installation package to the installation directory, and modify the configuration of the decompressed files.
  1. tar -xvf wedatasphere-linkis-x.x.x-dist.tar.gz

(1) Modify the basic configuration

  1. vi conf/config.sh
  1. SSH_PORT=22 #Specify the SSH port, if the stand-alone version is installed, it may not be configured
  2. deployUser=hadoop #Specify deployment user
  3. LINKIS_HOME=/appcom/Install/Linkis # Specify the installation directory
  4. WORKSPACE_USER_ROOT_PATH=file:///tmp/hadoop # Specify the user root directory, which is generally used to store the user's script files and log files, etc. It is the user's workspace.
  5. RESULT_SET_ROOT_PATH=file:///tmp/linkis # The result set file path, used to store the result set file of the job
  6. #HDFS_USER_ROOT_PATH=hdfs:///tmp/linkis #This parameter needs to be commented for the streamlined version installation

(2) Modify the database configuration

  1. vi conf/db.sh
  1. # Set the connection information of the database
  2. # Including IP address, database name, user name, port
  3. # Mainly used to store user-defined variables, configuration parameters, UDF and small functions, and provide the underlying storage of JobHistory
  4. MYSQL_HOST=
  5. MYSQL_PORT=
  6. MYSQL_DB=
  7. MYSQL_USER=
  8. MYSQL_PASSWORD=
  1. The environment is ready, click me to enter \[5-installation deployment\](#5-installation deployment)

3.1 Basic software installation

  1. The following software must be installed:
  • MySQL (5.5+),
  • JDK (above 1.8.0_141), How to install JDK
  • Python (support both 2.x and 3.x),
  • Hadoop (Community version and versions below CDH3.0 are supported)
  • Hive (1.2.1, 2.0 and above 2.0, there may be compatibility issues)

3.2 Create User

  1. For example: **Deployment user is hadoop account**
  1. Create deployment users on all machines that need to be deployed for installation
  1. sudo useradd hadoop
  1. Because the Linkis service uses sudo -u ${linux-user} to switch engines to perform operations, the deployment user needs to have sudo permissions and is password-free.
  1. hadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL
  1. Set the following global environment variables on each installation node so that Linkis can use Hadoop and Hive normally

    1. Modify the installation user's .bash\_rc, the command is as follows:
  1. vim /home/hadoop/.bash_rc
  1. The following is an example of environment variables:
  1. #JDK
  2. export JAVA_HOME=/nemo/jdk1.8.0_141
  3. #HADOOP
  4. export HADOOP_HOME=/appcom/Install/hadoop
  5. export HADOOP_CONF_DIR=/appcom/config/hadoop-config
  6. #Hive
  7. export HIVE_HOME=/appcom/Install/hive
  8. export HIVE_CONF_DIR=/appcom/config/hive-config

3.3 SSH password-free configuration (required for distributed mode)

  1. If your Linkis is deployed on multiple servers, then you also need to configure ssh password-free login for these servers.
  2. [How to configure SSH password-free login](https://www.jianshu.com/p/0922095f69f3)
  1. from the released release of Linkis ([click here to enter the download page](https://github.com/apache/incubator-linkis/releases)), Download the latest installation package.
  2. First decompress the installation package to the installation directory, and modify the configuration of the decompressed files.
  1. tar -xvf wedatasphere-linkis-x.x.x-dist.tar.gz

(1) Modify the basic configuration

  1. vi /conf/config.sh
  1. erties
  2. deployUser=hadoop #Specify deployment user
  3. LINKIS_HOME=/appcom/Install/Linkis # Specify the installation directory
  4. WORKSPACE_USER_ROOT_PATH=file:///tmp/hadoop # Specify the user root directory, which is generally used to store the user's script files and log files, etc. It is the user's workspace.
  5. HDFS_USER_ROOT_PATH=hdfs:///tmp/linkis # Specify the user's HDFS root directory, which is generally used to store the result set files of the job
  6. # If you want to use it with Scriptis, the CDH version of Hive, you also need to configure the following parameters (the community version of Hive can ignore this configuration)
  7. HIVE_META_URL=jdbc://... # HiveMeta Metadata Database URL
  8. HIVE_META_USER= # HiveMeta Metadata Database User
  9. HIVE_META_PASSWORD= # Password of HiveMeta Metabase
  10. # Configure hadoop/hive/spark configuration directory
  11. HADOOP_CONF_DIR=/appcom/config/hadoop-config #hadoop's conf directory

(2) Modify the database configuration

  1. vi conf/db.sh
  1. # Set the connection information of the database
  2. # Including IP address, database name, user name, port
  3. # Mainly used to store user-defined variables, configuration parameters, UDF and small functions, and provide the underlying storage of JobHistory
  4. MYSQL_HOST=
  5. MYSQL_PORT=
  6. MYSQL_DB=
  7. MYSQL_USER=
  8. MYSQL_PASSWORD=
  1. The environment is ready, click me to enter \[5-installation deployment\](#5-installation deployment)

4 Standard Linkis Environment Preparation

4.1 Basic software installation

  1. The following software must be installed:
  • MySQL (5.5+), How to install MySQL
  • JDK (above 1.8.0_141),
  • Python (support both 2.x and 3.x), How to install Python
  • Hadoop (Community version and versions below CDH3.0 are supported)
  • Hive (1.2.1, 2.0 and above 2.0, there may be compatibility issues)
  • Spark (Start from Linkis release 0.7.0, support all versions above Spark 2.0)

4.2 Create User

  1. For example: **Deployment user is hadoop account**
  1. Create deployment users on all machines that need to be deployed for installation
  1. sudo useradd hadoop
  1. Because the Linkis service uses sudo -u ${linux-user} to switch engines to perform operations, the deployment user needs to have sudo permissions and is password-free.
  1. vi /etc/sudoers
  1. hadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL
  1. Modify the .bash_rc of the installing user, the command is as follows:

  1. vim /home/hadoop/.bash_rc
  1. #JDK
  2. export JAVA_HOME=/nemo/jdk1.8.0_141
  3. #HADOOP
  4. export HADOOP_HOME=/appcom/Install/hadoop
  5. export HADOOP_CONF_DIR=/appcom/config/hadoop-config
  6. #Hive
  7. export HIVE_HOME=/appcom/Install/hive
  8. export HIVE_CONF_DIR=/appcom/config/hive-config
  9. #Spark
  10. export SPARK_HOME=/appcom/Install/spark
  11. export SPARK_CONF_DIR=/appcom/config/spark-config/spark-submit
  12. export PYSPARK_ALLOW_INSECURE_GATEWAY=1 # Pyspark must add parameters
  1. If your Pyspark wants to have the drawing function, you also need to install the drawing module on all installation nodes. The command is as follows:
  1. python -m pip install matplotlib

4.3 SSH password-free configuration (required for distributed mode)

  1. If your Linkis are deployed on the same server, this step can be skipped.
  2. If your Linkis is deployed on multiple servers, then you also need to configure ssh password-free login for these servers.
  3. [How to configure SSH password-free login](https://www.jianshu.com/p/0922095f69f3)

4.4 Installation package preparation

  1. from the released release of Linkis ([click here to enter the download page](https://github.com/apache/incubator-linkis/releases)), Download the latest installation package.
  2. First decompress the installation package to the installation directory, and modify the configuration of the decompressed files.
  1. tar -xvf wedatasphere-linkis-x.x.0-dist.tar.gz

(1) Modify the basic configuration

  1. vi conf/config.sh
  1. SSH_PORT=22 #Specify the SSH port, if the stand-alone version is installed, it may not be configured
  2. deployUser=hadoop #Specify deployment user
  3. LINKIS_HOME=/appcom/Install/Linkis # Specify the installation directory
  4. WORKSPACE_USER_ROOT_PATH=file:///tmp/hadoop # Specify the user root directory, which is generally used to store the user's script files and log files, etc. It is the user's workspace.
  5. HDFS_USER_ROOT_PATH=hdfs:///tmp/linkis # Specify the user's HDFS root directory, which is generally used to store the result set files of the job
  6. # If you want to use it with Scriptis, the CDH version of Hive, you also need to configure the following parameters (the community version of Hive can ignore this configuration)
  7. HIVE_META_URL=jdbc://... # HiveMeta Metadata Database URL
  8. HIVE_META_USER= # HiveMeta Metadata Database User
  9. HIVE_META_PASSWORD= # Password of HiveMeta Metabase
  10. # Configure hadoop/hive/spark configuration directory
  11. HADOOP_CONF_DIR=/appcom/config/hadoop-config #hadoop's conf directory
  12. HIVE_CONF_DIR=/appcom/config/hive-config #hive's conf directory
  13. SPARK_CONF_DIR=/appcom/config/spark-config #spark's conf directory

(2) Modify the database configuration

  1. vi conf/db.sh
  1. # Set the connection information of the database
  2. # Including IP address, database name, user name, port
  3. # Mainly used to store user-defined variables, configuration parameters, UDF and small functions, and provide the underlying storage of JobHistory
  4. MYSQL_HOST=
  5. MYSQL_PORT=
  6. MYSQL_DB=
  7. MYSQL_USER=
  8. MYSQL_PASSWORD=

5.1 Execute the installation script:

  1. sh bin/install.sh
  • The install.sh script will ask you about the installation mode.

    The installation mode is condensed mode, simple mode or standard mode. Please choose the appropriate installation mode according to the environment you prepare.

  • The install.sh script will ask you if you need to initialize the database and import metadata.

    Because the user is worried that the user repeatedly executes the install.sh script to clear the user data in the database, when the install.sh is executed, the user will be asked if they need to initialize the database and import metadata.

5.3 Is the installation successful:

  1. Check whether the installation is successful by viewing the log information printed on the console.
  2. If there is an error message, you can check the specific reason for the error.

5.4 Quick start Linkis

(1), start the service:

Execute the following command in the installation directory to start all services:

  1. ./bin/start-all.sh> start.log 2>start_error.log

(2), check whether the startup is successful

You can check the success of the service startup on the Eureka interface, and check the method:

Use , open it in a browser, and view the server Whether the registration is successful.

If you did not specify EUREKA_INSTALL_IP and EUREKA_INSTALL_IP in config.sh, the HTTP address is: http://127.0.0.1:20303

As shown in the figure below, if the following microservices appear on your Eureka homepage, it means that the services have been started successfully and you can provide services to the outside world normally:

Note: The ones marked in red are DSS services, and the rest are services of Linkis. If you only use linkis, you can ignore the parts marked in red

6. Quickly use Linkis

6.1 Overview

  1. Linkis provides users with a Java client implementation, and users can use UJESClient to quickly access Linkis back-end services.

6.2 Fast running

  1. We provide two test classes of UJESClient under the ujes/client/src/test module:
  1. com.webank.wedatasphere.linkis.ujes.client.UJESClientImplTestJ # Java-based test class
  1. If you cloned the source code of Linkis, you can run these two test classes directly.

6.3 Quick implementation

  1. **The following specifically introduces how to quickly implement a linkis code submission and execution.**

6.3.1 maven dependency

  1. <dependency>
  2. <groupId>com.webank.wedatasphere.Linkis</groupId>
  3. <artifactId>Linkis-ujes-client</artifactId>
  4. <version>0.11.0</version>
  5. </dependency>

6.3.2 Reference Implementation

  1. package com.webank.bdp.dataworkcloud.ujes.client;
  2. import com.webank.wedatasphere.Linkis.common.utils.Utils;
  3. import com.webank.wedatasphere.Linkis.httpclient.dws.authentication.StaticAuthenticationStrategy;
  4. import com.webank.wedatasphere.Linkis.httpclient.dws.config.DWSClientConfig;
  5. import com.webank.wedatasphere.Linkis.httpclient.dws.config.DWSClientConfigBuilder;
  6. import com.webank.wedatasphere.Linkis.ujes.client.UJESClient;
  7. import com.webank.wedatasphere.Linkis.ujes.client.UJESClientImpl;
  8. import com.webank.wedatasphere.Linkis.ujes.client.request.JobExecuteAction;
  9. import com.webank.wedatasphere.Linkis.ujes.client.request.ResultSetAction;
  10. import com.webank.wedatasphere.Linkis.ujes.client.response.JobExecuteResult;
  11. import com.webank.wedatasphere.Linkis.ujes.client.response.JobInfoResult;
  12. import com.webank.wedatasphere.Linkis.ujes.client.response.JobProgressResult;
  13. import com.webank.wedatasphere.Linkis.ujes.client.response.JobStatusResult;
  14. import org.apache.commons.io.IOUtils;
  15. import java.util.concurrent.TimeUnit;
  16. public class UJESClientImplTestJ{
  17. public static void main(String[] args){
  18. // 1. Configure DWSClientBuilder, get a DWSClientConfig through DWSClientBuilder
  19. DWSClientConfig clientConfig = ((DWSClientConfigBuilder) (DWSClientConfigBuilder.newBuilder()
  20. .addUJESServerUrl("http://${ip}:${port}") //Specify ServerUrl, the address of the Linkis server-side gateway, such as http://{ip}:{port}
  21. .connectionTimeout(30000) //connectionTimeOut client connection timeout
  22. .discoveryEnabled(true).discoveryFrequency(1, TimeUnit.MINUTES) //Whether to enable registration discovery, if enabled, the newly launched Gateway will be automatically discovered
  23. .loadbalancerEnabled(true) // Whether to enable load balancing, if registration discovery is not enabled, load balancing is meaningless
  24. .maxConnectionSize(5) //Specify the maximum number of connections, that is, the maximum number of concurrent
  25. .retryEnabled(false).readTimeout(30000) //execution failed, whether to allow retry
  26. .setAuthenticationStrategy(new StaticAuthenticationStrategy()) //AuthenticationStrategy Linkis authentication method
  27. .setAuthTokenKey("johnnwang").setAuthTokenValue("Abcd1234"))) //Authentication key, generally the user name; authentication value, generally the password corresponding to the user name
  28. .setDWSVersion("v1").build(); //Linkis backend protocol version, the current version is v1
  29. // 2. Get a UJESClient through DWSClientConfig
  30. UJESClient client = new UJESClientImpl(clientConfig);
  31. // 3. Start code execution
  32. JobExecuteResult jobExecuteResult = client.execute(JobExecuteAction.builder()
  33. .setCreator("LinkisClient-Test") //creator, requesting the system name of the Linkis client, used for system-level isolation
  34. .addExecuteCode("show tables") //ExecutionCode The code to be executed
  35. .setEngineType(JobExecuteAction.EngineType$.MODULE$.HIVE()) // The execution engine type of Linkis that you want to request, such as Spark hive, etc.
  36. .setUser("johnnwang") //User, requesting user; used for user-level multi-tenant isolation
  37. .build());
  38. System.out.println("execId: "+ jobExecuteResult.getExecID() + ", taskId:" + jobExecuteResult.taskID());
  39. // 4. Get the execution status of the script
  40. JobStatusResult status = client.status(jobExecuteResult);
  41. while(!status.isCompleted()) {
  42. // 5. Get the execution progress of the script
  43. JobProgressResult progress = client.progress(jobExecuteResult);
  44. Utils.sleepQuietly(500);
  45. status = client.status(jobExecuteResult);
  46. }
  47. // 6. Get the job information of the script
  48. JobInfoResult jobInfo = client.getJobInfo(jobExecuteResult);
  49. // 7. Get the list of result sets (if the user submits multiple SQL at a time, multiple result sets will be generated)
  50. String resultSet = jobInfo.getResultSetList(client)[0];
  51. // 8. Get a specific result set through a result set information
  52. Object fileContents = client.resultSet(ResultSetAction.builder().setPath(resultSet).setUser(jobExecuteResult.getUser()).build()).getFileContent();
  53. System.out.println("fileContents: "+ fileContents);
  54. IOUtils.closeQuietly(client);
  55. }

-SCALA