Spark Engine


    Before starting the SparkEngineManager microservice, please make sure that the above environment variables have been set.
    If you have not set it, please first download linkis.properties in the /home/${USER}/.bash\_rc or linkis-ujes-spark-enginemanager/conf directory Set in the configuration file. As shown below

1.2 Start dependent services

-1), Eureka: Used for service registration and discovery. -2), Linkis-gateway: used for user request forwarding. -3) Linkis-publicService: Provides basic functions such as persistence and udf. -4) Linkis-ResourceManager: Provides Linkis resource management functions.

Before starting spark related microservices, users can set related configuration parameters about the spark engine.

Considering that users want to be able to set parameters more freely, Linkis provides many configuration parameters.

The following table has some commonly used parameters. The Spark engine supports configuring more parameters for better performance. If you have tuning needs, welcome to read the tuning manual.

Users can configure these parameters in linkis.properties.

1.4 Front-end deployment

In addition, Scriptis also has a management console function for configuring the startup parameters of the spark engine.

The Scriptis page provides us with a configuration page where we can set startup parameters. The memory size of the Driver, the number of executors, and the number of memory and CPU cores can be set. These parameters will be read and used to start a spark engine.

Figure 1 Management console configuration interface

1.6 Running examples

In the web browser, open the scriptis address, and the user can create a new sql, scala or pyspark script in the workspace on the left column. After writing the script code in the script editing area, click Run to submit your code to Linkis Execute in the background. After submission, the background will push the log, progress, status and other information to the user in real time through websocket. And after finishing, show the result to the user.

Figure 3 Spark running effect Figure 2

Figure 4 Spark running effect Figure 3


    In the EngineManager module, we chose to use the spark-submit command to start the java process, so Linkis took the rewrite of ProcessEngineBuilder's build method to configure the spark The startup parameters of is integrated with the spark-submit command to form a command to start the spark engine, and then the command is executed.

In the Engine module, Linkis uses the yarn-client mode by default to start spark sessions. Spark’s Driver process will exist in the form of a Linkis engine and owned by the user who starts it.

Spark execution engine now supports three types of spark jobs, sparksql, scala and pyspark. The code in the Engine module implements three SparkExecutors to execute separately, SQL is submitted using SparkSession, scala is submitted using Console, and pyspark is submitted using py4j.

    The release version of Linkis0.5.0 and Linkis0.6.0 only supports spark2.1.0.
    Of course, if the spark version used in your cluster is not compatible with our supported version, you may need to change the spark.version variable in the top-level pom.xml , And then recompile and package.
    If you encounter problems starting up and running, you can join a group to consult us.