Production Deployment Reference Guide

    1. The following is a detailed description of the number of simultaneous users. Assuming that users prefer spark the most, hive is the second, and it is recommended that the server host memory is 64G or more.
    2. **We generally recommend to reserve about 20G on the server where EM is installed for use by the Linux system, EM's own process and other processes, such as 128G memory For the server, after removing the 20G memory, there is still 100G of memory that can be used to start the engine process. For example, if a Spark Driver has 4G memory, then the server can start up to 25 spark engines.**

    Number of people online at the same time * (Driver or Hive client memory) + number of people online at the same time * (Driver or Hive client cores)

    For example, if there are 50 people using at the same time, Spark’s Driver memory is 2G, Hive Client memory is 2G, and each engine uses two cores, then it is 50 * 2G + 50 * 2 cores = 100G memory + 100 CPU cores

    Convention before parameter configuration (must see):

    1. The parameters are generally configured in linkis.properties of the conf directory in the microservice installation directory, and configured in the form of key=value, such as wds.linkis.enginemanager.cores.max=20. The only exception is that the configuration of engine microservices needs to be configured in linkis-engine.properties.

    2. After the parameter configuration, the microservice needs to be restarted to take effect. After the engine parameter configuration, after the engine manager of the page is killed, restart the engine to take effect

    A reference deployment plan is provided below.

    2.1 The number of simultaneous users 10-50

    1). The best recommendation for server configuration: 4 servers, named S1, S2, S3, S4

    2). Minimum server configuration: 2 servers

    3). Parameter configuration

    If you need to do this, you need to configure it in linkis.properties and linkis-engine.properties in the conf directory under the microservice installation directory. Parameter configuration is generally divided into two parameter types, Entrance and EngineManager.

    a) Entrance microservice

    Parameter nameParameter functionSuggested parameter value
    wds.linkis.rpc.receiver.asyn.queue.size.maxSpecify the queue size of RPC messages received by the entrance microservice2000
    wds.linkis.rpc.receiver.asyn.consumer.thread.maxSpecify Entrance microservice RPC consumption thread pool size100

    Note: Linkis defines the concept of protecting resources. The purpose of protecting resources is to reserve a certain amount of resources. EM will not use up the maximum resources and activate the role of protecting the machine.

    Parameter nameParameter functionSuggested parameter value
    wds.linkis.enginemanager.memory.maxUsed to specify the total memory of all engines started by the EM process40G (64) or 100G (128)
    wds.linkis.enginemanager.cores.maxUsed to specify the total number of cores of all engines started by the EM process20
    wds.linkis.enginemanager.engine.instances.maxUsed to specify the total number of all engines started by the EM process20
    wds.linkis.enginemanager.protected.memoryUsed to specify the memory used by the EM process for protection2G (meaning that up to 38 (40-2) G of memory can be used)
    wds.linkis.enginemanager.protected.cores.maxUsed to specify the number of cores used for protection by the EM process2 (meaning that up to 18 (20-2) cores can be used)
    wds.linkis.enginemanager.protected.engine.instancesUsed to specify the number of engines used for protection by the EM process1 (meaning that up to 19 (20-1) engines can be started)

    2.2 Number of concurrent users 50-100

    1). Recommended server configuration: 7 servers, named S1, S2…S7

    Service NameDeployment SelectionDescription
    SparkEngineMangerS1, S2
    SparkEntranceS5
    HiveEngineManagerS3, S4
    HiveEntranceS5
    PythonEngineManagerS4
    PythonEntranceS4
    Eureka, Gateway, RMS6Eureka and RM require high availability deployment
    PublicService, RM, Datasource, EurekaS7Eureka and RM require high availability deployment

    2). Minimum server configuration: 4 servers

    3). Parameter configuration

    a) Entrance microservice

    Parameter nameParameter functionSuggested parameter value
    wds.linkis.rpc.receiver.asyn.queue.size.maxSpecify the queue size of RPC messages received by the entrance microservice3000
    wds.linkis.rpc.receiver.asyn.consumer.thread.maxSpecify Entrance microservice RPC consumption thread pool size120

    b) EngineManager microservice

    2.3 Number of simultaneous users 100-300

    1). Recommended server configuration: 11 servers, named S1, S2…S11

    Service NameDeployment SelectionDescription
    SparkEngineMangerS1, S2, S3, S4
    SparkEntranceS8
    HiveEngineManagerS5, S6, S7
    HiveEntranceS8
    PythonEngineManagerS9
    PythonEntranceS9
    Eureka, Gateway, RMS10Eureka and RM require high availability deployment
    PublicService, RM, Datasource, Eurekas11Eureka and RM require high availability deployment

    2). Minimum server configuration: 6 servers

    3). Parameter configuration

    a) Entrance microservice

    Parameter nameParameter functionSuggested parameter value
    wds.linkis.rpc.receiver.asyn.queue.size.maxSpecify the queue size of RPC messages received by the entrance microservice4000
    wds.linkis.rpc.receiver.asyn.consumer.thread.maxSpecify Entrance microservice RPC consumption thread pool size150

    b) EngineManager microservice

    Parameter nameParameter functionSuggested parameter value
    wds.linkis.enginemanager.memory.maxUsed to specify the total memory of all engines started by the EM process40G (64) or 100G (128)
    wds.linkis.enginemanager.cores.maxUsed to specify the total number of cores of all engines started by the EM process20
    wds.linkis.enginemanager.engine.instances.maxUsed to specify the total number of all engines started by the EM process20
    wds.linkis.enginemanager.protected.memoryUsed to specify the memory used by the EM process for protection2G (meaning that up to 38 (40-2) G of memory can be used)
    wds.linkis.enginemanager.protected.cores.maxUsed to specify the number of cores used for protection by the EM process2 (meaning that up to 18 (20-2) cores can be used)
    wds.linkis.enginemanager.protected.engine.instancesUsed to specify the number of engines used for protection by the EM process1 (meaning that up to 19 (20-1) engines can be started)

    2.4 Number of concurrent users 300-500

    Service NameDeployment SelectionDescription
    SparkEngineMangerS1, S2, S3, S4, S5, S6, S7
    SparkEntranceS12
    HiveEngineManagerS8, S9, S10, S11
    HiveEntranceS12
    PythonEngineManagerS13
    PythonEntranceS13
    Eureka, Gateway, RMS14Eureka and RM require high availability deployment
    PublicService, RM, Datasource, Eurekas15Eureka and RM require high availability deployment

    2). Minimum server configuration: 10 servers

    3). Parameter configuration

    a) Entrance microservice

    b) EngineManager microservice

    Parameter nameParameter functionSuggested parameter value
    wds.linkis.enginemanager.memory.maxUsed to specify the total memory of all engines started by the EM process40G (64) or 100G (128)
    wds.linkis.enginemanager.cores.maxUsed to specify the total number of cores of all engines started by the EM process20
    wds.linkis.enginemanager.engine.instances.maxUsed to specify the total number of all engines started by the EM process20
    wds.linkis.enginemanager.protected.memoryUsed to specify the memory used by the EM process for protection2G (meaning that up to 38 (40-2) G of memory can be used)
    wds.linkis.enginemanager.protected.cores.maxUsed to specify the number of cores used for protection by the EM process2 (meaning that up to 18 (20-2) cores can be used)
    wds.linkis.enginemanager.protected.engine.instancesUsed to specify the number of engines used for protection by the EM process1 (meaning that up to 19 (20-1) engines can be started)

    2.5 The number of simultaneous users is more than 500

    1). Recommended server configuration: 25 servers, named S1, S2.. S19, S25

    Service NameDeployment SelectionDescription
    SparkEngineMangerS1, S2, S3, S4, S5, S6, S7
    S8, S9, S10
    SparkEntranceS17
    HiveEngineManagerS11,S12,S13,S14,S15,
    S16
    HiveEntranceS17
    PythonEngineManagerS18, S19
    PythonEntranceS20
    Eureka, RMS21Eureka and RM require high availability deployment
    RM, ,EurekaS22Eureka and RM require high availability deployment
    Eureka, PublicServiceS23Eureka and RM require high availability deployment
    Gateway, DatasourceS24

    2). Minimum server configuration: 15 servers

    3). Parameter configuration

    a) Entrance microservice

    Parameter nameParameter functionSuggested parameter value
    wds.linkis.rpc.receiver.asyn.queue.size.maxSpecify the queue size of RPC messages received by the entrance microservice5000
    wds.linkis.rpc.receiver.asyn.consumer.thread.maxSpecify Entrance microservice RPC consumption thread pool size200

    b) EngineManager microservice

    Parameter nameParameter functionSuggested parameter value
    wds.linkis.enginemanager.memory.maxUsed to specify the total memory of all engines started by the EM process40G (64) or 100G (128)
    wds.linkis.enginemanager.cores.maxUsed to specify the total number of cores of all engines started by the EM process20
    wds.linkis.enginemanager.engine.instances.maxUsed to specify the total number of all engines started by the EM process20
    wds.linkis.enginemanager.protected.memoryUsed to specify the memory used by the EM process for protection2G (meaning that up to 38 (40-2) G of memory can be used)
    wds.linkis.enginemanager.protected.cores.maxUsed to specify the number of cores used for protection by the EM process2 (meaning that up to 18 (20-2) cores can be used)
    wds.linkis.enginemanager.protected.engine.instancesUsed to specify the number of engines used for protection by the EM process1 (meaning that up to 19 (20-1) engines can be started)

    In addition to the two types of microservices, Entrance and EngineManager, Linkis has other microservices that also have their own parameters for configuration.

    3.1 PublicService custom configuration

    The publicService microservice carries various auxiliary functions run by Linkis, including file editing and saving, and result set reading.

    3.2 Engine Microservice

    Parameter nameParameter functionSuggested parameter value
    wds.linkis.engine.max.free.timeUsed to specify how long an engine will be killed if idle3h (meaning that an engine will be automatically killed after three hours of not performing a task)

    The deployment plan of Linkis is closely related to how it is used. At the same time, the number of users is the biggest influencing factor. In order to enable users to use it comfortably and reduce the cost of cluster servers, it is necessary for operation and maintenance developers to try and listen to user feedback. If it has been deployed The plan is inappropriate, and the deployment plan needs to be changed in a timely and appropriate manner.